Return to our regularly scheduled technical blogging, I’m going to give a quick overview of one of the software libraries I know best: Kenneth Reitz’s Requests library for the Python programming language. Since I started making minor (and I mean really minor) contributions to the library, I’ve become increasingly familiar with its use and utility, and I now firmly believe that it is the best third-party library available for Python.
The documentation is in general good: the bad parts tend to be the parts I wrote. Nevertheless, it’s worth having an informal introduction to the library somewhere on the internet. I’m actually pretty sure other sources exist, but this is one of the very few things I can write about with even a modicum of legitimate authority, so damnit I’m going to write it anyway.
Be warned: Requests provides a great many features that I’m not going to go into in this post. The library has developed from being a useful wrapper over some of Python’s less Pythonic libraries (oh, hello there urllib2 and httplib) into the most intuitive method of interacting with HTTP and HTTPS available in any programming language I have ever seen. The intention of this blog post is to quickly highlight the utility of Requests, not its depth. As always, the best place to look is the official docs.
What is Requests?
tl;dr: Requests is a Python library intended to make it as easy as possible to work with HTTP.
More fully, Requests is intended to provide a simple and intuitive API that allows the creation of complex workflows from Python code. This library is, in my opinion, the holy grail of API-Oriented Design. For the maintainer, any addition to Requests that makes the API any less clear is a bad change.
This means that Requests has the ability to scale naturally to the complexity of your problem. If all you want to do is grab some JSON from a single web URL, Requests will make that a one-line program. If you want to manage a complicated workflow using OAuth to grab data from multiple sources, Requests provides you with all the tools you need to handle the data exchange.
Any Python developer who has to interact with the web as a client will eventually find their way to Requests, and when they do they’ll have stumbled onto the single best Python library around.
Let’s Start!
Ok, let’s get going. Firstly, you want to create a sandbox in which you can
play. It is rarely a good idea to pollute your system Python installation with
any package installation, so you should probably create a useful
virtualenv
. If there was
any package to install into your site packages, Requests would be it, but it
then becomes something you need to maintain. As a result, I recommend you
create a virtualenv
for following through this little tour. Do this like
so:
$ virtualenv venv
$ source venv/bin/activate
You should get a little (venv)
at the start of your prompt. This lets you
know that you’re using the virtualenv
. When that’s done, install Requests
into the virtualenv
:
$ pip install requests
Requests is present. Grab yourself an interactive Python shell (python
at
the command prompt). Now you can begin to see some of the glory of Requests.
Let’s see what we need to do to get Google’s homepage.
>>> import requests
>>> r = requests.get('http://www.google.com/')
And we’re done.
Seriously, that’s it. The variable r
now contains a Response
object, which
contains everything associated with the response from the web server. We can
take a look at what we got.
>>> print r.status_code
200
>>> print r.reason
OK
>>> print r.text
<!doctype html><html>...
...snip...
...</html>
As you can see, you can find out the HTTP status code returned by the server, and also the human-readable response code. In this case, because the URL I used points to a real resource, the server has returned a 200 OK response.
We can make other forms of HTTP requests too. POST, DELETE, PUT, OPTIONS,
HEAD and PATCH can all be made using the exact same syntax: requests.post()
,
requests.delete()
, requests.put()
, requests.options()
,
requests.head()
, requests.patch()
.
Knowing that the resource I wanted has been returned, I can access it using
the text
method of the Response
object.
Getting Data From a Response
The text
method is one of a few methods of getting the content of a
Response
. To demonstrate the others, the most effective way is to use a
resource that returns JSON, for which Requests has built-in support.
A good resource of this type is Twitter. Let’s find the top 20 trending topics for each hour of today. To do that, run:
>>> r = requests.get('http://api.twitter.com/1/trends/daily.json')
>>> print r.status_code
200
In Requests you can get the content of a Response
in four ways. For now,
we’ll focus on three. The first is to get the content as a stream of bytes,
which in Python is represented by a standard string (Python 2.x) or a Python
bytestring (Python 3.x). This is the content
method:
>>> print r.content
...snip...
The next option is the preferred method for non-JSON requests, which is to use
the text
method. This is because
encoding text is hard.
It’s highly likely that your web server has returned data that is not encoded
in ASCII. This means the text needs to be decoded first. Requests will look at
the headers of the response and use them to determine the encoding used. We
can see this clearly in our response from Twitter. First, take a look at the
response headers.
>>> print r.headers['Content-Type']
application/json; charset=utf-8
Requests looks for that Content-Type header, in particular for the charset field. If that is present, Requests assumes that the encoding specified is the one being used. We can see that:
>>> print r.encoding
utf-8
If the header isn’t present (and
a specific other header isn’t there either),
Requests uses chardet to try to guess
the encoding. If it turns out that Requests gets it wrong, you can also set
the encoding
property to the correct value.
Regardless, when you call Response.text
, you will get the content of the
response decoded using that codec. This happens behind the scenes, so that
in 99.99% of cases everything will be seamless.
>>> print r.text
...unicode snip...
The final method works well when you have JSON, and that is to use the json
method. This method returns a Python object (basically a dict) that contains
the JSON from the response, exactly like you’d get if you called
json.loads()
on the content of
the response. This dict can then be immediately interacted with like any other
Python dictionary.
>>> print r.json
...dict snip...
Cookies and Headers and Such!
Requests makes it really easy to get at the headers and cookies sent in a Response. The headers are stored in objects that behave almost exactly like Python dictionaries. This means you can print the entire thing, like this:
>>> print r.headers
...dict snip...
Both the headers and cookies support access like Python dictionaries. This means you can do this:
>>> print r.headers['cache-control']
max-age=300, must-revalidate
>>> print r.cookies['_twitter_sess']
...long alphanumeric string here...
You can also send custom headers and cookies by using Python dictionaries. For instance, you can do this:
>>> headers = {'User-Agent': 'not-a-real-browser (like Gecko)'}
>>> cookies = {'Tasty': 'Chocolate_chip_cookies'}
>>> t = requests.get('http://httpbin.org/get', headers=headers, cookies=cookies)
Of course, cookies aren’t really of much use when you only send them on one request. The whole point is that cookies enable you to maintain state, and Requests’ API appears to be stateless. Obviously, Requests makes it easy to maintain state.
State in Requests
To maintain state, Requests provides you with the Session
class. Once you
create an instance of the class, you can make all of you requests using
methods on the Session
instance. This looks like this:
>>> s = requests.Session()
>>> s.get('http://www.google.com/')
>>> s.close()
There are two main reasons to use sessions. The first is to maintain state over a series of connections. If you make subsequent GETs or POSTs or what-have-you, Requests will continue to pass the cookies around, which is exactly what you want to have happen.
The next reason is that the Session
class allows you to set some
parameters that will be used for all subsequent requests. This includes
headers and cookies, as well as some of the other options I won’t be going
into this time around.
As you can see in the code example above, sessions need to be instantiated and cleaned up. It’s good form, then, to use sessions as context managers, like so:
>>> with requests.Session(headers=headers) as s:
... s.get('http://www.google.com/')
... s.get('http://www.example.com/')
>>> do_some_other_stuff()
This saves you having to call Session.close()
on your session, and ensures
that you don’t leak sockets. It’s also very Pythonic, and that’s always a good
thing.
Query Parameters
A common feature of web APIs and URLs in general is the use of query parameters. These are the portions of the URL that follow the leading ‘?’ character. Requests makes it easy to add these into the URL without having to form it correctly yourself:
>>> params = {'hi': 'there'}
>>> r = requests.get('http://httpbin.org/get', params=params}
>>> print r.url
http://httpbin.org/get?hi=there
These parameters are placed into the URL and correctly encoded. Passing parameters in this way ensures that there will not be difficulties passing them to APIs upstream.
So much more!
This was the most brief look at the features available in Requests. There are so many other features: proxies, automatic HTTP authentication, automatic redirect handlers, multipart file upload, SSL certificate verification and more. If I can find a good example, I’ll walk through some of the more advanced features of the library in the future.
In the meantime, I thoroughly suggest that if you have any need to programmatically access web services, whether an API or in some web-scraping, you try out Python’s Requests library. It’s truly an exemplar in the world of programming libraries.