Digging around the Github v3 API
Series: worklog July 09, 2011
I’ve had a little project idea brewing for a while that would require
read and write access to a Github repository with client side Javascript,
which eliminates the ability to use existing libraries like
grit
and
libgit2
.
A few of the libraries have
Node.js bindings, but that’s not exactly what I wanted. There is a cool project
called git.js
that looked like it would be perfect, but it doesn’t (yet) support
any write operations.
Instead, I’ll have to use the Github HTTP API.
The real problem was that, prior to v3, the Github API was pretty useless for working with repositories. You could get the stats (forks, watchers, title, etc) but the file data wasn’t accessible. I guess you could just scrape the website to get the data but that’s a little hacky and wouldn’t work for making commits or pushing changes (theoretically possible now that you can Edit files online and commit the changes through the web interface?).
(Addendum: the v2 did have read access to git objects, but not write access, as technoweenie from Github points out here)
Anyways, the Github v3 API exposes all the raw git data now, so that seems like a much more robust way to go.
Unfortunately, there is a lack of useful examples on the new API methods; instead of showing you how to get file at a given commit from the API, you get pointed to the ProGit ‘Git Internals’ chapter. So I began to dig into the git internal plumbing with the goal of getting the latest version of a file from one of my repos.
I first started looking at the
/blobs/
API route since that’s where a file
is stored in a git repo. Before I went digging around with the command line
to find the SHA hash for the file, I tried to see if I could get it from the
Github website — however most of the SHA identifiers are for commits, not files.
So that was a dead-end, but a bit of googling later, I found that you can get the
SHA for a file by running git hash-object FILEPATH
(aside:
this StackOverflow question
was really insightful too). So I picked a random file from my blog repo and found
the SHA (957e6b4efb22fa921d0e6b17a1fbf46788c97ed3
).
Then I pinged the /blobs/:sha
route with hurl.it and got
this response:
Interesting, the content
field looks promising, but I think it is the
git loose object,
which is the content deflated with zlib.
but it is base64 encoded. Not exactly ideal for showing the file contents to the user.
A bit more digging and I found that the API has some
custom MIME types. Adding the
Accept: application/vnd.github-blob.raw
to the HTTP Header, I got this response:
Bingo! Access to the file data and it’s human readable. I suspect that using
that MIME type causes the API to call git cat-file
or something similar on
the server.
Now, I need to be able to get the data for any file in the repo and I would
prefer to not have to use the command line to get the SHA every time. So it looks
like the /trees/
API method will be useful — it displays the directories and files
present at a given commit. And since you can use branch names (like master
) instead
of SHA (read more here), I figured
I could try plugging that into the Github API.
Cool, it worked. And better yet, the url
field gives me the API URL to call for to
get the file contents (or the tree if it’s a directory).
So given a Github username and repo name, I can pull down the file structure
with /tree/master
and then drill down on the individual files with /blobs/
and the
special MIME type.
More to come later as I try to figure out how to make an edit to a file and commit it to the repo and more details on the project this is all for once I make more progress.