Digging around the Github v3 API
Series: worklog July 09, 2011
I’ve had a little project idea brewing for a while that would require
which eliminates the ability to use existing libraries like
A few of the libraries have
Node.js bindings, but that’s not exactly what I wanted. There is a cool project
git.js that looked like it would be perfect, but it doesn’t (yet) support
any write operations.
Instead, I’ll have to use the Github HTTP API.
The real problem was that, prior to v3, the Github API was pretty useless for working with repositories. You could get the stats (forks, watchers, title, etc) but the file data wasn’t accessible. I guess you could just scrape the website to get the data but that’s a little hacky and wouldn’t work for making commits or pushing changes (theoretically possible now that you can Edit files online and commit the changes through the web interface?).
(Addendum: the v2 did have read access to git objects, but not write access, as technoweenie from Github points out here)
Anyways, the Github v3 API exposes all the raw git data now, so that seems like a much more robust way to go.
Unfortunately, there is a lack of useful examples on the new API methods; instead of showing you how to get file at a given commit from the API, you get pointed to the ProGit ‘Git Internals’ chapter. So I began to dig into the git internal plumbing with the goal of getting the latest version of a file from one of my repos.
I first started looking at the
API route since that’s where a file
is stored in a git repo. Before I went digging around with the command line
to find the SHA hash for the file, I tried to see if I could get it from the
Github website — however most of the SHA identifiers are for commits, not files.
So that was a dead-end, but a bit of googling later, I found that you can get the
SHA for a file by running
git hash-object FILEPATH (aside:
this StackOverflow question
was really insightful too). So I picked a random file from my blog repo and found
the SHA (
Then I pinged the
/blobs/:sha route with hurl.it and got
content field looks promising,
but I think it is the
git loose object,
which is the content deflated with zlib.
but it is base64 encoded. Not exactly ideal for showing the file contents to the user.
A bit more digging and I found that the API has some
custom MIME types. Adding the
Accept: application/vnd.github-blob.raw to the HTTP Header, I got this response:
Bingo! Access to the file data and it’s human readable.
I suspect that using
that MIME type causes the API to call
git cat-file or something similar on
Now, I need to be able to get the data for any file in the repo and I would
prefer to not have to use the command line to get the SHA every time. So it looks
API method will be useful — it displays the directories and files
present at a given commit. And since you can use branch names (like
of SHA (read more here), I figured
I could try plugging that into the Github API.
Cool, it worked. And better yet, the
url field gives me the API URL to call for to
get the file contents (or the tree if it’s a directory).
So given a Github username and repo name, I can pull down the file structure
/tree/master and then drill down on the individual files with
/blobs/ and the
special MIME type.
More to come later as I try to figure out how to make an edit to a file and commit it to the repo and more details on the project this is all for once I make more progress.