frickenate

Author Archive: nate

Rsync – deleting a remote directory

Emptying a Remote Directory with Rsync

The fairly common task of emptying a remote directory is quite straightforward with rsync:

rsync -vr --delete $(mktemp -d)/ user@example.com:/path/to/dir/

If we only have rsync daemon access, we have two options for the syntax:

rsync -vr --delete $(mktemp -d)/ user@example.com::/module/path/to/dir/
rsync -vr --delete $(mktemp -d)/ rsync://user@example.com/module/path/to/dir/

The use of mktemp -d is simply to avoid having to separately create an empty directory which we will sync to the remote server. It creates an empty directory in your system’s temporary directory, typically /tmp/.

Deleting a Remote Directory Entirely

What if we want to delete both the contents of a remote directory, as well as the directory itself? What we want is the rsync equivalent to the following command:

ssh user@example.com rm -rf /path/to/dir/

If we have ssh shell access to the server, the command to execute is listed right above. But what do we do in the case wherein we only have rsync access? It takes a close reading of the man page to determine the proper arguments, or – if you happen to be me – a trip to the #linux IRC channel on freenode.net. The first iteration of the command you might discover looks like this:

rsync -vr --delete --include '/dir/**' --include '/dir/' --exclude='*' $(mktemp -d)/ user@example.com::/module/path/to/

We first specify that we wish to delete all contents of the directory (--incude '/dir/**'), along with the directory itself (--include '/dir/'), whilst not affecting other subdirectories within /path/to/ (via --exclude '*'). Note that we must refer to the parent directory of the subdirectory we intend to delete as the root remote sync location. If we want to delete the remote /path/to/dir/ directory, we sync against its /path/to/ directory while specifying that we wish to include the dir/ subdirectory and its contents in the deletion, while excluding all other paths.

The Proper Command

A thorough reading of the rsync(1) man page reveals the following note:

INCLUDE/EXCLUDE PATTERN RULES

  • a trailing “dir_name/***” will match both the directory (as if “dir_name/” had been specified) and everything in the directory (as if “dir_name/**” had been specified). This behavior was added in version 2.6.7.

This means we can simplify having to specify both the contents of a remote directory as well as the directory itself down to a single --include filter:

rsync -vr --delete --include '/dir/***' --exclude='*' $(mktemp -d)/ user@example.com::/module/path/to/

The three asterisks, as indicated by the man page quote above, means we will sync both the directory itself and its contents in one filter. As we are syncing an empty directory, this will result in the directory being removed entirely, just as if we has executed an rm -rf /path/to/dir/.

Custom JSON data type for Doctrine

Existing support for JSON in Doctrine

It turns out Doctrine does ship with a built-in JSON data type. The name of the data type being json_array hints at the potential for problems should one attempt to store values that are valid JSON without being an array. Inspecting the code for Doctrine’s \Doctrine\DBAL\Types\JsonArrayType class, the first thing we note is that Doctrine doesn’t put its nose in the values you store through this data type. It simply uses a call to json_encode() to convert a PHP value to a column’s storage value, and then a call to json_decode() when hydrating your value from the database to userland code. So far so good, it seems that we can probably get away with using json_array for all json data types.

The null problem

Sadly there is one problem: a column defined with the json_array type cannot properly handle the null value. When this data type was conceived, it was apparently imagined that it would only be useful to have support for JSON arrays rather than the JSON format in general — and thus the name json_array.

When converting an Entity’s value to the column’s storage value, Doctrine does in fact abide by null values. As can be seen by the definition of \Doctrine\DBAL\Types\JsonArrayType::convertToDatabaseValue(), if your Entity object contains the null value for a column defined with json_array, Doctrine will happily insert or update your database with null.

Unfortunately, the companion \Doctrine\DBAL\Types\JsonArrayType::convertToPHPValue() method does not follow the same logic. When reading the column’s value out from the database, the null value is converted to an empty array! To be honest I’m a little baffled that this slipped through with the original release of the data type — how can convertToDatabaseValue() and convertToPHPValue() have different logic for null values? It makes very little sense for conversions between userland code and database to differ in this regard.

Will Doctrine be fixing this?

Not anytime soon. Bug report DBAL-446 is still open and unresolved, but a pull request to fix the bug was rejected and closed. Fixes that break backwards compatibility are not permitted before the next major release, which in this case would be 3.0 or later. This is a case of “once a bug, always a bug” due to overly strict versioning requirements.

A custom data type to the rescue

So we’re left implementing a custom data type in Doctrine that does nothing more than replace the return of an empty array with null. I’ve gone ahead and made the data type compatible with Doctrine 2.4, as well as 2.5 where Platforms having a native JSON data type will work as expected.

<?php

namespace Path\To\Custom\Type;

/**
 * Custom Doctrine data type for JSON.
 *
 * Doctrine has a json_array type but, as its name suggests, it was designed with
 * only arrays in mind. This extending type fixes a bug with the json_array type
 * wherein a null value in database gets converted to an empty array.
 *
 * IMPORTANT NOTE: you must register custom types with Doctrine:
 *      \Doctrine\DBAL\Types\Type::addType('json', '\Path\To\Custom\Type\Json');
 *
 * @link https://github.com/doctrine/dbal/issues/1643
 * @link https://github.com/doctrine/dbal/pull/655
 */
class Json extends \Doctrine\DBAL\Types\JsonArrayType
{
    /**
     * Made to be compatible with Doctrine 2.4 and 2.5; 2.5 added getJsonTypeDeclarationSQL().
     *
     * {@inheritdoc}
     */
    public function getSQLDeclaration(array $fieldDeclaration, \Doctrine\DBAL\Platforms\AbstractPlatform $platform)
    {
        return method_exists($platform, 'getJsonTypeDeclarationSQL') ? (
            $platform->getJsonTypeDeclarationSQL($fieldDeclaration)
        ) : $platform->getClobTypeDeclarationSQL($fieldDeclaration);
    }

    /**
     * When database value is null, we return null instead of empty array like our parent does.
     *
     * {@inheritdoc}
     */
    public function convertToPHPValue($value, \Doctrine\DBAL\Platforms\AbstractPlatform $platform)
    {
        return $value === null ? null : parent::convertToPHPValue($value, $platform);
    }

    /**
     * {@inheritdoc}
     */
    public function getName()
    {
        return 'json';
    }
} 

Amazon Glacier – the important lesson of backups vs. archiving

Peeking into the world of backups

I recently spent some time pouring over Amazon Glacier as a candidate for use as a generic backup service for data I’d prefer to not lose anytime soon. The low cost of storage coupled with the anticipated durability typical of other AWS offerings had me believing I had probably found the perfect service to trust with my backups. What I learned during the course of my introduction to the service turned out to be interesting and – more importantly – eye-openingly informative.

I mention Amazon Glacier throughout this post as it is the service I looked into. It is possible – maybe even likely – the revelations I encountered would apply to various other services known for performing data archiving.

Amazon Glacier as a generic backup solution?

Anyone looking for a backup service for personal use or within a small business might be surprised to discover a solution like Glacier may not be for them. This isn’t necessarily the impression one gets by reading through the overview page. Right from the beginning, the first line of text on the page seems to encourage readers to think of the service as a backup solution:

Amazon Glacier is an extremely low-cost storage service that provides secure and durable storage for data archiving and backup.

To the uninitiated, we seem to be off to a good start. The rest of the page gives some situations in which the service is worth using: offsite enterprise information archiving, archiving media assets, archiving research and scientific data, digital preservation, and magnetic tape replacement. While the given list of scenarios is clearly targeted at enterprise needs, there is nothing written to preclude the use of Glacier as a more generic backup solution for the simpler needs of an individual or small business. Herein lies a possible trap: with no up-front indication to the contrary, one can make the mistake of believing the service is perfectly capable of serving the needs of a generic backup solution even though the marketing is aimed at the enterprise.

So what’s the catch?

Data backup is not data archival

Glacier, and other services offering data archival, are designed for use by businesses with data which is critical to hold on to. Such a service is not meant to serve as storage for data that would be missed if lost, but rather for critical data that must not be lost, at all costs. The difference between the words backup and archive may not be immediately apparent to the everyday citizen, but it turns out that in the world of IT there is a widely understood distinction.

A backup is nothing more than a copy of data, kept for the purpose of being able to recover the original data in the event it is modified to an undesired state, accidentally deleted, or lost. A backup might be stored on a USB flash drive, a second hard disk on the same computer, a separate server on the network, or another easily accessible location. A backup provides some peace of mind knowing we can most likely recover from a localized user error, software problem, or hardware failure.

An archive is not quite so different; in fact, it is little more than a backup with a pessimistic outlook on potential risk. Unlike a “normal” backup which is merely desirable to have, data should be archived when the modification, deletion, or loss of data is unacceptable. Archival is useful for the protection of essential data from disasters a business may face such as burglary, hurricanes, or fire. Many companies keep backups in the same building where the data was created as a first line of defense – but what happens if the office building burns to the ground? Having an archive of all mission-critical data means the business can be reassembled rather than dismantled. Another case is businesses where data retention is required to comply with regulatory requirements – often for a period of many years. It is not uncommon to archive every piece of email correspondence such that nothing can be tampered with or lost, in the event of future legal proceedings years after the communication took place.

So why not use a service like Glacier for generic backups?

Cost

Data archival performed seriously is not cheap. Before the dawn of archival solutions provided by services like Glacier, enterprises had little choice but to build up their own immense infrastructure to ensure data could not possibly be permanently lost. Assembling a system and process designed to practically guarantee the safeguarding of data is an astronomical undertaking, typically costing thousands upon thousands of dollars. The burden of this financial cost had to be paid up front and then maintained indefinitely, whether or not the safety net provided by the archive would ever be used – even once – over the course of the business’s entire lifetime.

Enter hosted archival services. At least in the case of Glacier, a business does not pay the astronomical up-front costs to store data that may never need to be retrieved. Glacier’s price for storing the data is a small fraction of the financial burden which would otherwise be taken on by having to build out the full infrastructure. The foundation of the service relies upon the premise that most customers will rarely – if ever – need to download data they have archived. Should data retrieval become necessary due to a catastrophic event, the cost of doing so may be expensive but not burdensome if it means the difference between the resurrection of the business and its demise.

It’s time for a car analogy. Archiving data with a service like Glacier is like buying car insurance with total loss protection. A small insurance premium is paid monthly to protect against the possibility of needing to replace the car; likewise, a small storage fee is paid to the company archiving the data to protect against the possibility of needing to retrieve that data. If the car is destroyed or stolen, a larger deductible is paid for the replacement of the vehicle; likewise, in the event the archived data must be retrieved a larger fee is paid to the archival company for having replaced a copy of the data. Now imagine car insurance didn’t exist, but it is required to have replacement cars ready to replace the original at a moment’s notice in the event of an accident? Multiple new cars would have to be purchased up front, even though the original car may never actually need to be replaced. This is akin to the previous era of data archiving in which the full financial burden of the infrastructure had to be paid up front to protect against the possibility of disaster. Which insurance policy sounds better?

The danger of Glacier for the uninitiated

Back to Amazon Glacier. Before writing this article, I was not aware of the definition of data archiving as opposed to data backups. I initially believed I was going to be using Glacier for my non-critical backups. I had read the service’s overview page and it seemed to be the right service for me. I wasn’t an enterprise customer, but the description of how my data would be safe sounded just great. I then thoroughly read the entire pricing page to make sure I understood the costs associated with the service. Every other AWS service I have dealt with has straight-forward pricing, and so it seemed for Glacier. The pricing tables held reasonable breakdowns, and here’s a screenshot showing the supposed cost of data retrievals (note the image is blown up here – on the page it is in the smaller fine print):

Misleading Amazon Glacier retrieval pricing

I’ve highlighted in yellow the parts that made me believe the costs were minimal. The first 5% is free, with further retrieval billed at $0.01 per GB – perfect. Alas! If it sounds too good to be true, it probably is. I initially failed to notice the dodgy marketing words “starting at” prefixing the “$0.01 per GB” figure. Worse than that, I neglected to click the little “Learn more” link – which leads to this section buried in the FAQ. This single FAQ answer lays out the true costs of Glacier, and it’s not easy to understand… at all. The formula used to determine the cost of data retrievals is complicated, to the point that the speed at which you retrieve data matters greatly. Frankly, I’ve read the formula description multiple times and still do not understand how one would calculate the cost of a retrieval.

In the end, my credit card came out the winner as I had not begun to use Glacier before a friend of mine beat me to it. He had the same understanding of the pricing structure as I did, and performed his first retrieval for 45 GB. The cost came to just under $85 – nowhere remotely close to the $5.85 [45 GB * $0.01 = $0.45 retrieval fee + 45 GB * $0.12 = $5.40 outbound bandwidth] we would have expected based on the details present on the pricing page.

Apparently the formula used to calculate retrieval fees – using the speed of the retrieval and all – is fairly standard across data archiving services. Now that I understand the design principle behind long-term, hands-off data archiving as opposed to simple backups, it makes sense to me that retrievals would be costly. For everyone who is not an IT expert with archiving experience, hopefully Amazon will include more information about the retrieval costs on the pricing page instead of leaving it hidden in the FAQ with only a link in the fine print to lead us there.

Final words

For brave souls: the retrieval cost FAQ entry explains how it is possible to greatly reduce the large retrieval cost by downloading your data over a period of days or weeks. This does not fit the “I need it now” consumer mentality of backup retrievals, but if you are a small business or patient individual you might get away with using Glacier or a similar competitor without the risk of incurring enormous fees – if you can figure out the retrieval cost formula.

For the rest of us: remember the difference between a backup and archiving. Steer clear of archiving services unless you understand their purpose and associated costs. Stick to providers offering cheap backup solutions targeted at the consumer market.

Linode custom service provider for dynamic DNS on Synology NAS

I recently purchased a Synology NAS. The machine’s software comes with a built-in dynamic DNS client, and I wanted to take advantage of this feature. The software comes preconfigured with a number of dynamic DNS providers, but the list is understandably short. I host my domain along with its DNS on Linode, and wanted to use a subdomain on my Linode DNS to point to a machine that operates on a dynamic IP address.

I dug into the dynamic DNS client’s configuration file which contained some documentation on how to add a custom service provider. I did just that and created a service provider for Linode. The installation instructions are at the top of the file. Note that proper setup will require a certain amount of experience with the command line through an ssh shell. An exact copy of the code below can also be found on GitHub.

#!/usr/bin/env php
<?php

/*
 * installation
 *
 * please note that this requires a certain amount of knowledge with
 * the ssh shell (vi, chmod, etc.). this process will not be easy
 * for someone without existing experience using the command line.
 *
 * 1. check the "editable configuration" section below. defaults should work out of the box
 * 2. ssh as root to the NAS (enabled via web manager: Control Panel > Terminal & SNMP > Enable SSH service)
 * 3. place this file on the NAS at /etc.defaults/ddns_linode.php (hint: 1. cat > 2. paste 3. ctrl+d)
 * 4. set file permissions: chmod 755 /etc.defaults/ddns_linode.php
 * 5. add a "Linode" section at the bottom of /etc.defaults/ddns_provider.conf:
 *      [Linode]
 *          modulepath=/etc.defaults/ddns_linode.php
 *          queryurl=linode.com
 * 6. now configure your linode details in the web manager (example for home.example.com)
 *      a) navigate to: Control Panel > External Access, click the "Add" button
 *      b) Service provider: select "Linode"
 *      c) Hostname: enter the SUBDOMAIN of your primary domain you have created at Linode (ex: home)
 *      d) Username/Email: enter the PRIMARY domain you have hosted with Linode DNS (ex: example.com)
 *      e) Password: enter your Linode API key - NOT YOUR LINODE PASSWORD
 */

/* editable configuration */

// writable temporary directory for log file and tracking last ip
const TEMP_DIR = '/var/services/tmp';

// the NAS passes what the current IP address is. if you'd rather use an external service
// to determine the IP, set this to a URL that outputs the IP of the accessing client
const IP_ALTERNATIVE = null;
//const IP_ALTERNATIVE = 'http://ip.dnsexit.com/';

// TTL of the DNS record in seconds. 3600 makes for a good default.
// 300 = 5 mins, 3600 = 1 hr, 7200 = 2 hrs, 14400 = 4 hrs, 28800 = 8 hrs, 86400 = 24 hrs
const LINODE_DNS_TTL = 3600;

/* end of editable configuration */

// map of cURL errors to NAS DDNS errors
$CURL_ERRORS = [
    CURLE_COULDNT_RESOLVE_HOST => 'badresolv',
    CURLE_OPERATION_TIMEOUTED  => 'badconn',
    CURLE_URL_MALFORMAT        => 'badagent',
];

// map of linode errors to NAS DDNS errors
$LINODE_ERRORS = [4 => 'badauth', 5 => 'nohost'];

class DnsException extends Exception {
    public function __construct($message, $code) { parent::__construct($message); $this->code = $code; }
}

try {
    // parse script arguments
    if (!preg_match('/^([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+)$/', (string)@$argv[1], $match)) {
        throw new DnsException('synology ddns service provided invalid script arguments', '911');
    }
    list($domain, $apiKey, $subdomain, $ip) = array_slice($match, 1);

    // override current ip reported by NAS with an external service
    if (is_string(IP_ALTERNATIVE)) {
        $ip = curlPost(IP_ALTERNATIVE);
    }

    // extract the current ip from the source text
    if (($newIp = preg_match('/(?:\d{1,3}\.){3}\d{1,3}/', $ip, $match) ? $match[0] : false) === false) {
        throw new DnsException("new ip address not valid: '{$ip}'", '911');
    }

    // compare against last known ip
    if (($lastIp = (string)@file_get_contents($lastIpFile = TEMP_DIR . '/ddns_linode.lastip')) === $newIp) {
        throw new DnsException("ip address unchanged: '{$newIp}'", 'nochg');
    }

    // find linode primary domain record
    if (($domainId = array_reduce(linode($apiKey, 'domain.list'), function ($ret, $item) use ($domain) {
        return $item['DOMAIN'] === $domain ? $item['DOMAINID'] : $ret;
    })) === null) {
        throw new DnsException("linode domain zone not found: '{$domain}'", 'nohost');
    }

    // find linode subdomain record
    if (($subdomainId = array_reduce(linode(
        $apiKey, 'domain.resource.list', ['DomainID' => $domainId]
    ), function ($ret, $item) use ($subdomain) {
        return $item['NAME'] === $subdomain ? $item['RESOURCEID'] : $ret;
    })) === null) {
        throw new DnsException("linode subdomain not found: '{$subdomain}.{$domain}'", 'nohost');
    }

    // update the dns record
    linode($apiKey, 'domain.resource.update', [
        'DomainId' => $domainId, 'ResourceId' => $subdomainId, 'Target' => $newIp, 'TTL_sec' => LINODE_DNS_TTL
    ]);

    // store the new current ip as the last known ip
    @file_put_contents($lastIpFile, $newIp);
    throw new DnsException("ip successfully updated: '{$newIp}'", 'good');
} catch (DnsException $e) {
    // log entry
    @file_put_contents(TEMP_DIR . '/ddns_linode.log', sprintf(
        "%s : %s : %s\n", date('Y-m-d H:i T'), $e->getCode(), $e->getMessage()
    ), FILE_APPEND);

    echo $e->getCode() . "\n";
}

function linode($apiKey, $action, array $params = []) {
    if (!is_array($data = @json_decode($response = curlPost(
        'https://api.linode.com/', ['api_key' => $apiKey, 'api_action' => $action] + $params
    ), true))) {
        throw new DnsException("bad linode response: '" . trim(preg_replace('/\s+/', ' ', $response)) . "'", '911');
    }

    if (($linodeCode = @$data['ERRORARRAY'][0]['ERRORCODE']) !== null) {
        throw new DnsException("linode error: '{$data['ERRORARRAY'][0]['ERRORMESSAGE']}'", array_reduce(
            array_keys($GLOBALS['LINODE_ERRORS']), function ($ret, $item) use ($linodeCode) {
                return (int)$linodeCode === (int)$item ? $GLOBALS['LINODE_ERRORS'][$item] : $ret;
            }, '911'
        ));
    }

    return $data['DATA'];
}

function curlPost($url, array $params = []) {
    if (!($ch = @curl_init($url)) || !@curl_setopt_array($ch, [
        CURLOPT_POSTFIELDS     => http_build_query($params),
        CURLOPT_POST           => true,                      CURLOPT_CONNECTTIMEOUT => 8,
        CURLOPT_RETURNTRANSFER => true,                      CURLOPT_TIMEOUT        => 10,
    ]) || ($response = @curl_exec($ch)) === false) {
        $code = @$GLOBALS['CURL_ERRORS'][$ch ? @curl_errno($ch) : ''] ? : '911';
        throw new DnsException("failed curl request: '" . ($ch ? @curl_error($ch) : 'init error') .  "'", $code);
    }

    return $response;
}

Protocol wrappers – fetch a URL as an HTTP 1.1 client

Protocol wrappers – an alternative to cURL

When the need arises to fetch the contents of a URL in PHP, it can often be convenient to skip over the hassle of using cURL and instead use PHP’s built-in support for protocol wrappers. This feature lets us use convenient functions like file_get_contents and fopen to interact with data streams sourced from somewhere other than the local filesystem – in this case from a remote server over the network in the form of a URL. Please note that the allow_url_fopen php.ini configuration parameter must be enabled.

The first attempt at using file_get_contents to pull down the contents of a URL looks like this:

<?php
$html = file_get_contents('http://example.com/');

That worked well

In fact that’s technically all that’s needed. The request made to the server looks like this:

GET / HTTP/1.0
Host: example.com

Wait a second, does that really say HTTP/1.0? Version 1.0 of the HTTP protocol is so ancient! It turns out that the version of the HTTP protocol used matters very little, if at all. What matters is that we pass a Host header, something that PHP does automatically. The Host header allows for one web server to serve content for multiple domains, a concept introduced with HTTP 1.1. When the Host header gained traction, web servers made sure to accept it – even from HTTP 1.0 clients.

What to do if that 1.0 version is nagging at you? Really, it shouldn’t. That is, unless you’re like me and like everything to be neat and tidy. No problem, PHP allows you to upgrade the protocol version to 1.1 with a feature called stream contexts:

<?php
$html = file_get_contents('http://example.com', null, stream_context_create([
    'http' => [
        'method'           => 'POST',
        'protocol_version' => 1.1,
    ],
]);

There, our requests are now being formatted like a good HTTP 1.1 client. I included a change of the HTTP method from the default GET to a POST simply to illustrate that the context options can be used to manipulate the request in other ways. Check out the list of available context options to see how to pass POST data, set additional HTTP headers, and control socket timeouts and server redirects. The server is now receiving this:

POST / HTTP/1.1
Host: example.com

Almost there

There’s a catch to upgrading the protocol_version. HTTP 1.1 clients are supposed to support keep-alive connections, a feature designed to reduce TCP overhead by letting clients issue subsequent requests over the same network socket. By using HTTP 1.1, the server expects us to manage the keep-alive session – something PHP’s stream context is not prepared to handle. The side effect is a slow script, as the PHP engine will not return control to user-land code until the socket is closed, which can take a long time. There is a simple solution in the form of a Connection: close request header that informs the web server we do not support keep-alive connections, and it should therefore terminate the socket connection the moment the response transfer is complete.

The final code snippet looks like this:

<?php
$html = file_get_contents('http://example.com', null, stream_context_create([
    'http' => [
        'protocol_version' => 1.1,
        'header'           => [
            'Connection: close',
        ],
    ],
]);

Finally, here’s the request being received by the server:

GET / HTTP/1.1
Host: example.com
Connection: close

Done! Of course, you probably should have simply stuck with an HTTP/1.0 request unless you happen to have run into a server that won’t cooperate. ;)

IDE code completion for ArrayAccess via object properties

The method that used to work — and it did work

It turns out I don’t much like ArrayAccess in PHP, or at least the way it is being used in some projects. While it’s an interesting technique allowing to treat an object like an array, it has a negative effect on tooling in the IDE. Developers enduring the use of the magic methods at least have the ability to use PHPDoc @property tags on classes to not only document the use of dynamic properties but also to make the IDE aware of their existence to make code completion available.

With the magic methods, the approach looks like this:

<?php
/**
 * @property Inbox $inbox
 */
class User {
    public function __get($name) {
        switch ($name) {
            case 'inbox':
                return $this->$name;
        }
    }

    // also handle __set, __isset, __unset
}

$user = new User();
$numMessages = $user->inbox->getNewMessageCount();

In a modern IDE, the @property PHPDoc tag is parsed and even though $inbox is not explicitly declared by the class, its presence is inferred and code completion works as expected. When entering $user->, the $inbox property is shown as available. Better yet, when entering $user->inbox->, the IDE knows that $inbox is typed to class Inbox, and so all of its properties and methods are available to choose from.

How ArrayAccess has broken the old method

ArrayAccess breaks this convention. There is no way to indicate to the IDE that $user['inbox'] is an instance of the Inbox class, at least not without having to add /** @var Inbox $inbox */ in a dozen places throughout the codebase. Suddenly every property accessed through an ArrayAccess container has no context. This is disastrous to the workflow of coding with a modern IDE. I came across the first full-fledged use of ArrayAccess I have seen with the Silex framework for PHP. The single most important object of the framework is an instance of \Silex\Application created to house an entire application. This class, through its superclass, implements ArrayAccess. The application object performs the duty of a dependency injection container for the framework which basically means that as an application grows and begins to use more features, the number of objects and configurations held within it are likely to increase dramatically.

Even if a developer can remember the indexes of all the objects held within the container (ex: that $app['orm.em'] contains the Doctrine Entity Manager), the fact that it is not possible for the IDE to offer code completion after entering $app['orm.em']-> to show the availability of beginTransaction() and getRepository() makes coding much more time-consuming. Why would anyone want to spend time digging through code and documentation to find out what methods are available on an object rather than having the IDE instantly offer up the list? This is a step backwards to a time when the IDE was not well-versed in the ways of code analysis. I wanted to use this framework, but turning my powerful IDE into a glorified Notepad is a deal-breaker. What to do? It was time to do some tinkering in an effort to find a way to make code completion work again.

A workaround to bring back the old method

The solution? We’re going back to the magic methods: __set, __get, __isset, __unset. They may not be pretty, but they offer similar functionality to ArrayAccess, allow for dynamic object properties that can be documented and made available to the IDE, and best of all: I managed to come up with a solution that allows extending Silex to do what I want. Below is a reusable trait for use with classes implementing ArrayAccess. It re-introduces the existing solution of @property declarations for keeping the IDE informed, and does a little magic to proxy requests between object properties and ArrayAccess indexes.

<?php
trait ArrayAccessPropertyAliases {
    protected $arrayAccessAliases = [];
 
    public function processArrayAccessPropertyAliases() {
        $this->arrayAccessAliases = preg_match_all(
            '/^\s*+\*\s*+@property[^$]++\$(\S++)\s++array-access\s*+=\s*+(["\'])((?:(?!\2).)++)\2/m',
            (new \ReflectionClass($this))->getDocComment(), $aliases
        ) ? array_combine($aliases[1], $aliases[3]) : [];
    }
 
    public function __set($id, $value) { $this->doPropertyAccess('offsetSet', $id, $value); }
    public function __unset($id) { $this->doPropertyAccess('offsetUnset', $id); }
    public function __isset($id) { return $this->doPropertyAccess('offsetExists', $id); }
    public function __get($id) { return $this->doPropertyAccess('offsetGet', $id); }
 
    public function offsetSet($id, $value) { $this->doArrayAccess('offsetSet', $id, $value); }
    public function offsetUnset($id) { $this->doArrayAccess('offsetUnset', $id); }
    public function offsetExists($id) { return $this->doArrayAccess('offsetExists', $id); }
    public function offsetGet($id) { return $this->doArrayAccess('offsetGet', $id); }
 
    protected function doPropertyAccess($method, $name, $value = null) {
        $actualId = isset($this->arrayAccessAliases[$name]) ? $this->arrayAccessAliases[$name] : $name;
        if ($actualId !== $name && parent::offsetExists($name)) {
            throw new \Exception("Property '{$name}' conflicts with contents of ArrayAccess - please rename it");
        }
        return parent::$method($actualId, $value);
    }
 
    protected function doArrayAccess($method, $name, $value = null) {
        if (isset($this->arrayAccessAliases[$name]) && $this->arrayAccessAliases[$name] !== $name) {
            throw new \Exception("Property '{$name}' conflicts with contents of ArrayAccess - please rename it");
        }
        return parent::$method($name, $value);
    }
}

The fully documented version is available on GitHub. If you plan on using this, please read the documentation at the top of the file to understand the potential risk of using this solution. The important thing to note is that this trait is not safe to use with any class that defines any of the magic methods (__set, __get, __isset, __unset). It should be possible to refactor the trait to work with such classes, but I didn’t factor such requirements into this solution as I didn’t need it.

So how does it work? The workaround starts with proxying access of magic object properties to the ArrayAccess index of the same name. However the most interesting and useful aspect of this solution is in its ability to alias property objects to indexes of a different name within the container. This is necessary in the case of indexes that are not valid PHP identifiers, such as '!foo.bar!'. The trait makes it possible to alias the object property $foo_bar to the underlying '!foo.bar!' index. As a preventative measure, the eventual possibility of conflicting names between property and index names is also considered. If the ArrayAccess were to contain indexes for both 'foo.bar' and 'foo_bar', a friendly reminder in the form of an exception will be provided if an attempt is made to use 'foo_bar' as an alias for 'foo.bar'. This will prevent hard-to-debug problems caused by confusion over which index was desired.

The end result

Back to Silex for a moment. Here’s the “fix” to the framework to bring back IDE code completion:

namespace Project;

/**
 * @property bool $debug Toggle app debug mode
 * @property \Doctrine\DBAL\Connection $db
 * @property \Doctrine\ORM\EntityManager $orm_em array-access='orm.em'
 */
class Application extends \Silex\Application {
    use \ArrayAccessPropertyAliases;

    public function __construct(array $values = []) {
        $this->processArrayAccessPropertyAliases();
        parent::__construct($values);
    }
}

$app = new \Project\Application();
$app->debug = true; // instead of $app['debug'] = true;

$app->register(new \Silex\Provider\DoctrineServiceProvider(), [
    // this sets up $app['db']
]);

$app->register(new \Dflydev\Silex\Provider\DoctrineOrm\DoctrineOrmServiceProvider(), [
    // this sets up $app['orm.em']. the dot makes the name invalid for an object property,
    // however we mapped a @property declaration to use 'orm_em' in place of 'orm.em'
]);

$app->get('/', function(\Project\Application $app) {
    $user = $app->orm_em                                // IDE knows about $orm_em
                ->getRepository('Project:User')         // IDE knows about getRepository()
                ->findOneBy(['username' => 'foobar']);  // IDE knows about findOneBy()
});

Seems like overkill — there is a simpler alternative

If this seems like a lot of overhead just to get code completion for ArrayAccess members, there is a much simpler alternative: define accessor functions for the ArrayAccess indexes that need to be made available:

<?php
namespace Project;

class Application extends \Silex\Application {
    /**
     * @param bool $debug
     */
    public function setDebug($debug) {
        $this['debug'] = $debug;
    }

    /**
     * @return bool
     */
    public function getDebug() {
        return $this['debug'];
    }

    /**
     * @return \Doctrine\ORM\EntityManager
     */
    public function getOrmEm() {
        return $this['orm.em'];
    }
}

$app->get('/', function(\Project\Application $app) {
    $user = $app->getOrmEm()                            // explicit method with typed return
                ->getRepository('Project:User')         // works normally
                ->findOneBy(['username' => 'foobar']);  // works normally
});

Ultimately it comes down to preference — declare a lot of methods, or declare a lot of @property tags. Take your pick!