frickenate

Tag Archive: keep-alive

Protocol wrappers – fetch a URL as an HTTP 1.1 client

Protocol wrappers – an alternative to cURL

When the need arises to fetch the contents of a URL in PHP, it can often be convenient to skip over the hassle of using cURL and instead use PHP’s built-in support for protocol wrappers. This feature lets us use convenient functions like file_get_contents and fopen to interact with data streams sourced from somewhere other than the local filesystem – in this case from a remote server over the network in the form of a URL. Please note that the allow_url_fopen php.ini configuration parameter must be enabled.

The first attempt at using file_get_contents to pull down the contents of a URL looks like this:

<?php
$html = file_get_contents('http://example.com/');

That worked well

In fact that’s technically all that’s needed. The request made to the server looks like this:

GET / HTTP/1.0
Host: example.com

Wait a second, does that really say HTTP/1.0? Version 1.0 of the HTTP protocol is so ancient! It turns out that the version of the HTTP protocol used matters very little, if at all. What matters is that we pass a Host header, something that PHP does automatically. The Host header allows for one web server to serve content for multiple domains, a concept introduced with HTTP 1.1. When the Host header gained traction, web servers made sure to accept it – even from HTTP 1.0 clients.

What to do if that 1.0 version is nagging at you? Really, it shouldn’t. That is, unless you’re like me and like everything to be neat and tidy. No problem, PHP allows you to upgrade the protocol version to 1.1 with a feature called stream contexts:

<?php
$html = file_get_contents('http://example.com', null, stream_context_create([
    'http' => [
        'method'           => 'POST',
        'protocol_version' => 1.1,
    ],
]);

There, our requests are now being formatted like a good HTTP 1.1 client. I included a change of the HTTP method from the default GET to a POST simply to illustrate that the context options can be used to manipulate the request in other ways. Check out the list of available context options to see how to pass POST data, set additional HTTP headers, and control socket timeouts and server redirects. The server is now receiving this:

POST / HTTP/1.1
Host: example.com

Almost there

There’s a catch to upgrading the protocol_version. HTTP 1.1 clients are supposed to support keep-alive connections, a feature designed to reduce TCP overhead by letting clients issue subsequent requests over the same network socket. By using HTTP 1.1, the server expects us to manage the keep-alive session – something PHP’s stream context is not prepared to handle. The side effect is a slow script, as the PHP engine will not return control to user-land code until the socket is closed, which can take a long time. There is a simple solution in the form of a Connection: close request header that informs the web server we do not support keep-alive connections, and it should therefore terminate the socket connection the moment the response transfer is complete.

The final code snippet looks like this:

<?php
$html = file_get_contents('http://example.com', null, stream_context_create([
    'http' => [
        'protocol_version' => 1.1,
        'header'           => [
            'Connection: close',
        ],
    ],
]);

Finally, here’s the request being received by the server:

GET / HTTP/1.1
Host: example.com
Connection: close

Done! Of course, you probably should have simply stuck with an HTTP/1.0 request unless you happen to have run into a server that won’t cooperate. ;)