frickenate

Tag Archive: php

Custom JSON data type for Doctrine

Existing support for JSON in Doctrine

It turns out Doctrine does ship with a built-in JSON data type. The name of the data type being json_array hints at the potential for problems should one attempt to store values that are valid JSON without being an array. Inspecting the code for Doctrine’s \Doctrine\DBAL\Types\JsonArrayType class, the first thing we note is that Doctrine doesn’t put its nose in the values you store through this data type. It simply uses a call to json_encode() to convert a PHP value to a column’s storage value, and then a call to json_decode() when hydrating your value from the database to userland code. So far so good, it seems that we can probably get away with using json_array for all json data types.

The null problem

Sadly there is one problem: a column defined with the json_array type cannot properly handle the null value. When this data type was conceived, it was apparently imagined that it would only be useful to have support for JSON arrays rather than the JSON format in general — and thus the name json_array.

When converting an Entity’s value to the column’s storage value, Doctrine does in fact abide by null values. As can be seen by the definition of \Doctrine\DBAL\Types\JsonArrayType::convertToDatabaseValue(), if your Entity object contains the null value for a column defined with json_array, Doctrine will happily insert or update your database with null.

Unfortunately, the companion \Doctrine\DBAL\Types\JsonArrayType::convertToPHPValue() method does not follow the same logic. When reading the column’s value out from the database, the null value is converted to an empty array! To be honest I’m a little baffled that this slipped through with the original release of the data type — how can convertToDatabaseValue() and convertToPHPValue() have different logic for null values? It makes very little sense for conversions between userland code and database to differ in this regard.

Will Doctrine be fixing this?

Not anytime soon. Bug report DBAL-446 is still open and unresolved, but a pull request to fix the bug was rejected and closed. Fixes that break backwards compatibility are not permitted before the next major release, which in this case would be 3.0 or later. This is a case of “once a bug, always a bug” due to overly strict versioning requirements.

A custom data type to the rescue

So we’re left implementing a custom data type in Doctrine that does nothing more than replace the return of an empty array with null. I’ve gone ahead and made the data type compatible with Doctrine 2.4, as well as 2.5 where Platforms having a native JSON data type will work as expected.

<?php

namespace Path\To\Custom\Type;

/**
 * Custom Doctrine data type for JSON.
 *
 * Doctrine has a json_array type but, as its name suggests, it was designed with
 * only arrays in mind. This extending type fixes a bug with the json_array type
 * wherein a null value in database gets converted to an empty array.
 *
 * IMPORTANT NOTE: you must register custom types with Doctrine:
 *      \Doctrine\DBAL\Types\Type::addType('json', '\Path\To\Custom\Type\Json');
 *
 * @link https://github.com/doctrine/dbal/issues/1643
 * @link https://github.com/doctrine/dbal/pull/655
 */
class Json extends \Doctrine\DBAL\Types\JsonArrayType
{
    /**
     * Made to be compatible with Doctrine 2.4 and 2.5; 2.5 added getJsonTypeDeclarationSQL().
     *
     * {@inheritdoc}
     */
    public function getSQLDeclaration(array $fieldDeclaration, \Doctrine\DBAL\Platforms\AbstractPlatform $platform)
    {
        return method_exists($platform, 'getJsonTypeDeclarationSQL') ? (
            $platform->getJsonTypeDeclarationSQL($fieldDeclaration)
        ) : $platform->getClobTypeDeclarationSQL($fieldDeclaration);
    }

    /**
     * When database value is null, we return null instead of empty array like our parent does.
     *
     * {@inheritdoc}
     */
    public function convertToPHPValue($value, \Doctrine\DBAL\Platforms\AbstractPlatform $platform)
    {
        return $value === null ? null : parent::convertToPHPValue($value, $platform);
    }

    /**
     * {@inheritdoc}
     */
    public function getName()
    {
        return 'json';
    }
} 

Protocol wrappers – fetch a URL as an HTTP 1.1 client

Protocol wrappers – an alternative to cURL

When the need arises to fetch the contents of a URL in PHP, it can often be convenient to skip over the hassle of using cURL and instead use PHP’s built-in support for protocol wrappers. This feature lets us use convenient functions like file_get_contents and fopen to interact with data streams sourced from somewhere other than the local filesystem – in this case from a remote server over the network in the form of a URL. Please note that the allow_url_fopen php.ini configuration parameter must be enabled.

The first attempt at using file_get_contents to pull down the contents of a URL looks like this:

<?php
$html = file_get_contents('http://example.com/');

That worked well

In fact that’s technically all that’s needed. The request made to the server looks like this:

GET / HTTP/1.0
Host: example.com

Wait a second, does that really say HTTP/1.0? Version 1.0 of the HTTP protocol is so ancient! It turns out that the version of the HTTP protocol used matters very little, if at all. What matters is that we pass a Host header, something that PHP does automatically. The Host header allows for one web server to serve content for multiple domains, a concept introduced with HTTP 1.1. When the Host header gained traction, web servers made sure to accept it – even from HTTP 1.0 clients.

What to do if that 1.0 version is nagging at you? Really, it shouldn’t. That is, unless you’re like me and like everything to be neat and tidy. No problem, PHP allows you to upgrade the protocol version to 1.1 with a feature called stream contexts:

<?php
$html = file_get_contents('http://example.com', null, stream_context_create([
    'http' => [
        'method'           => 'POST',
        'protocol_version' => 1.1,
    ],
]);

There, our requests are now being formatted like a good HTTP 1.1 client. I included a change of the HTTP method from the default GET to a POST simply to illustrate that the context options can be used to manipulate the request in other ways. Check out the list of available context options to see how to pass POST data, set additional HTTP headers, and control socket timeouts and server redirects. The server is now receiving this:

POST / HTTP/1.1
Host: example.com

Almost there

There’s a catch to upgrading the protocol_version. HTTP 1.1 clients are supposed to support keep-alive connections, a feature designed to reduce TCP overhead by letting clients issue subsequent requests over the same network socket. By using HTTP 1.1, the server expects us to manage the keep-alive session – something PHP’s stream context is not prepared to handle. The side effect is a slow script, as the PHP engine will not return control to user-land code until the socket is closed, which can take a long time. There is a simple solution in the form of a Connection: close request header that informs the web server we do not support keep-alive connections, and it should therefore terminate the socket connection the moment the response transfer is complete.

The final code snippet looks like this:

<?php
$html = file_get_contents('http://example.com', null, stream_context_create([
    'http' => [
        'protocol_version' => 1.1,
        'header'           => [
            'Connection: close',
        ],
    ],
]);

Finally, here’s the request being received by the server:

GET / HTTP/1.1
Host: example.com
Connection: close

Done! Of course, you probably should have simply stuck with an HTTP/1.0 request unless you happen to have run into a server that won’t cooperate. ;)

IDE code completion for ArrayAccess via object properties

The method that used to work — and it did work

It turns out I don’t much like ArrayAccess in PHP, or at least the way it is being used in some projects. While it’s an interesting technique allowing to treat an object like an array, it has a negative effect on tooling in the IDE. Developers enduring the use of the magic methods at least have the ability to use PHPDoc @property tags on classes to not only document the use of dynamic properties but also to make the IDE aware of their existence to make code completion available.

With the magic methods, the approach looks like this:

<?php
/**
 * @property Inbox $inbox
 */
class User {
    public function __get($name) {
        switch ($name) {
            case 'inbox':
                return $this->$name;
        }
    }

    // also handle __set, __isset, __unset
}

$user = new User();
$numMessages = $user->inbox->getNewMessageCount();

In a modern IDE, the @property PHPDoc tag is parsed and even though $inbox is not explicitly declared by the class, its presence is inferred and code completion works as expected. When entering $user->, the $inbox property is shown as available. Better yet, when entering $user->inbox->, the IDE knows that $inbox is typed to class Inbox, and so all of its properties and methods are available to choose from.

How ArrayAccess has broken the old method

ArrayAccess breaks this convention. There is no way to indicate to the IDE that $user['inbox'] is an instance of the Inbox class, at least not without having to add /** @var Inbox $inbox */ in a dozen places throughout the codebase. Suddenly every property accessed through an ArrayAccess container has no context. This is disastrous to the workflow of coding with a modern IDE. I came across the first full-fledged use of ArrayAccess I have seen with the Silex framework for PHP. The single most important object of the framework is an instance of \Silex\Application created to house an entire application. This class, through its superclass, implements ArrayAccess. The application object performs the duty of a dependency injection container for the framework which basically means that as an application grows and begins to use more features, the number of objects and configurations held within it are likely to increase dramatically.

Even if a developer can remember the indexes of all the objects held within the container (ex: that $app['orm.em'] contains the Doctrine Entity Manager), the fact that it is not possible for the IDE to offer code completion after entering $app['orm.em']-> to show the availability of beginTransaction() and getRepository() makes coding much more time-consuming. Why would anyone want to spend time digging through code and documentation to find out what methods are available on an object rather than having the IDE instantly offer up the list? This is a step backwards to a time when the IDE was not well-versed in the ways of code analysis. I wanted to use this framework, but turning my powerful IDE into a glorified Notepad is a deal-breaker. What to do? It was time to do some tinkering in an effort to find a way to make code completion work again.

A workaround to bring back the old method

The solution? We’re going back to the magic methods: __set, __get, __isset, __unset. They may not be pretty, but they offer similar functionality to ArrayAccess, allow for dynamic object properties that can be documented and made available to the IDE, and best of all: I managed to come up with a solution that allows extending Silex to do what I want. Below is a reusable trait for use with classes implementing ArrayAccess. It re-introduces the existing solution of @property declarations for keeping the IDE informed, and does a little magic to proxy requests between object properties and ArrayAccess indexes.

<?php
trait ArrayAccessPropertyAliases {
    protected $arrayAccessAliases = [];
 
    public function processArrayAccessPropertyAliases() {
        $this->arrayAccessAliases = preg_match_all(
            '/^\s*+\*\s*+@property[^$]++\$(\S++)\s++array-access\s*+=\s*+(["\'])((?:(?!\2).)++)\2/m',
            (new \ReflectionClass($this))->getDocComment(), $aliases
        ) ? array_combine($aliases[1], $aliases[3]) : [];
    }
 
    public function __set($id, $value) { $this->doPropertyAccess('offsetSet', $id, $value); }
    public function __unset($id) { $this->doPropertyAccess('offsetUnset', $id); }
    public function __isset($id) { return $this->doPropertyAccess('offsetExists', $id); }
    public function __get($id) { return $this->doPropertyAccess('offsetGet', $id); }
 
    public function offsetSet($id, $value) { $this->doArrayAccess('offsetSet', $id, $value); }
    public function offsetUnset($id) { $this->doArrayAccess('offsetUnset', $id); }
    public function offsetExists($id) { return $this->doArrayAccess('offsetExists', $id); }
    public function offsetGet($id) { return $this->doArrayAccess('offsetGet', $id); }
 
    protected function doPropertyAccess($method, $name, $value = null) {
        $actualId = isset($this->arrayAccessAliases[$name]) ? $this->arrayAccessAliases[$name] : $name;
        if ($actualId !== $name && parent::offsetExists($name)) {
            throw new \Exception("Property '{$name}' conflicts with contents of ArrayAccess - please rename it");
        }
        return parent::$method($actualId, $value);
    }
 
    protected function doArrayAccess($method, $name, $value = null) {
        if (isset($this->arrayAccessAliases[$name]) && $this->arrayAccessAliases[$name] !== $name) {
            throw new \Exception("Property '{$name}' conflicts with contents of ArrayAccess - please rename it");
        }
        return parent::$method($name, $value);
    }
}

The fully documented version is available on GitHub. If you plan on using this, please read the documentation at the top of the file to understand the potential risk of using this solution. The important thing to note is that this trait is not safe to use with any class that defines any of the magic methods (__set, __get, __isset, __unset). It should be possible to refactor the trait to work with such classes, but I didn’t factor such requirements into this solution as I didn’t need it.

So how does it work? The workaround starts with proxying access of magic object properties to the ArrayAccess index of the same name. However the most interesting and useful aspect of this solution is in its ability to alias property objects to indexes of a different name within the container. This is necessary in the case of indexes that are not valid PHP identifiers, such as '!foo.bar!'. The trait makes it possible to alias the object property $foo_bar to the underlying '!foo.bar!' index. As a preventative measure, the eventual possibility of conflicting names between property and index names is also considered. If the ArrayAccess were to contain indexes for both 'foo.bar' and 'foo_bar', a friendly reminder in the form of an exception will be provided if an attempt is made to use 'foo_bar' as an alias for 'foo.bar'. This will prevent hard-to-debug problems caused by confusion over which index was desired.

The end result

Back to Silex for a moment. Here’s the “fix” to the framework to bring back IDE code completion:

namespace Project;

/**
 * @property bool $debug Toggle app debug mode
 * @property \Doctrine\DBAL\Connection $db
 * @property \Doctrine\ORM\EntityManager $orm_em array-access='orm.em'
 */
class Application extends \Silex\Application {
    use \ArrayAccessPropertyAliases;

    public function __construct(array $values = []) {
        $this->processArrayAccessPropertyAliases();
        parent::__construct($values);
    }
}

$app = new \Project\Application();
$app->debug = true; // instead of $app['debug'] = true;

$app->register(new \Silex\Provider\DoctrineServiceProvider(), [
    // this sets up $app['db']
]);

$app->register(new \Dflydev\Silex\Provider\DoctrineOrm\DoctrineOrmServiceProvider(), [
    // this sets up $app['orm.em']. the dot makes the name invalid for an object property,
    // however we mapped a @property declaration to use 'orm_em' in place of 'orm.em'
]);

$app->get('/', function(\Project\Application $app) {
    $user = $app->orm_em                                // IDE knows about $orm_em
                ->getRepository('Project:User')         // IDE knows about getRepository()
                ->findOneBy(['username' => 'foobar']);  // IDE knows about findOneBy()
});

Seems like overkill — there is a simpler alternative

If this seems like a lot of overhead just to get code completion for ArrayAccess members, there is a much simpler alternative: define accessor functions for the ArrayAccess indexes that need to be made available:

<?php
namespace Project;

class Application extends \Silex\Application {
    /**
     * @param bool $debug
     */
    public function setDebug($debug) {
        $this['debug'] = $debug;
    }

    /**
     * @return bool
     */
    public function getDebug() {
        return $this['debug'];
    }

    /**
     * @return \Doctrine\ORM\EntityManager
     */
    public function getOrmEm() {
        return $this['orm.em'];
    }
}

$app->get('/', function(\Project\Application $app) {
    $user = $app->getOrmEm()                            // explicit method with typed return
                ->getRepository('Project:User')         // works normally
                ->findOneBy(['username' => 'foobar']);  // works normally
});

Ultimately it comes down to preference — declare a lot of methods, or declare a lot of @property tags. Take your pick!