Socket.io: ur doing it wrong

I hate it when documentation doesn’t advocate best practices. Even worse, that means that virtually every tutorial on the bleeding internet copypastas those same bad practices. I’m looking at you here, socket.io!

What’s the matter? All official examples – and, hence, the entire interwebz – use the following structure with anonymous functions:

io.on('connection', function (socket) {
    socket.on('some-event', function () {
        //...do something...
    });
});

Seems reasonable, right? I used this same approach in our dating-app, since that’s what everyone recommended. Who wouldn’t?

Well, woe on me.

A few days ago, as the app started gaining some traction, I started getting surprised by some weird out of memory errors on the webserver. “That’s odd”, I thought to myself, “there’s some traction but no way should it cause these errors just yet. Isn’t socket.io supposed to be, like, super-efficient?”

Well, yeah, but not when you’re Doing It Wrong.

See, using anonymous functions comes at a price: NodeJS creates a copy of them each time. In other words, each client was consuming memory for every event (and FlirtTracker has about 50 of them). So yeah, once I realised that the memory usage started making sense.

The solution? Use named functions declared just once (or, as I did ES6-style, anonymous functions assigned to a constant). Now NodeJS uses references to them and the whole memory problem goes away (well, for now, I guess it’ll pop up again when we really get some traction but it would be premature optimization to worry about that now).

Something that’s also not very obvious from the official docs (at least, I didn’t spot it) is that the socket events are actually bound to the current socket. In my original code I was passing the socket object around to utility methods that then returned the actual handler. Something like this:

module.exports = socket => (data, callback) => {
    // ...do something...
};

…and then in the main function run on connection:

socket.on('someEvent', require('./my-event')(socket));

But, because socket is bound, you can simply refer to it as this inside the handler:

socket.on('someEvent', function () {
    this.emit('someOtherEvent');
});

Actually, this solved a whole lot of other issues where I was passing all sorts of stuff around – I simply planted them on the socket in the main file, and they were now available via this! Add in some smart helper methods on the socket and I was able to reduce memory usage even more. For example, we also use a remote object pointing to a DNode server. I optimized that to this:

const getRemote = () => remote;
io.on('connection', function (socket) {
    socket.remote = getRemote;
});

(I used a method since I no longer trusted NodeJS to also use a reference of the object instead of a copy – I’m pretty sure Javascript passed objects as references like a good boy, but better safe than sorry.)

It even simplified our unit tests, for we could now use function.call or function.apply to bind a fake socket!

Last point to note: make sure the functions handling the events are actual functions, not arrow functions. I love arrow functions but one of their explicit features is they don’t have scope, so they can’t be bound to the socket. this will simply be an empty object in that case.

The poor man’s PHP daemon

We have this project lying around from a while back that’s based on PHP/AngularJS and also sports a socket.io server for (among other things) a real time chat. Pretty nifty (no, we didn’t do the design :)). The socket part is powered by a NodeJS process, and since we didn’t feel like (nor did the client have budget for) rewriting all our PHP code to Javascript, we used dNode-PHP (well, actually a fork with a few small project-specific adjustments) to let the Javascript code in the NodeJS process communicate with our existing PHP libraries. So now we had two processes which is suboptimal but worked at the time.

As a poor man’s solution (time pressure, limited budget, etc.) these processes were simply kept running in a permanent screen on the server. In theory, that worked well enough for then – the idea was that once the client found new budget, we’d finally properly daemonize them. And our server doesn’t reboot that often anyway, so remembering to restart the screen and processes on those occasions wasn’t a big deal.

Of course, that day never came.

There was however one annoying issue: PHP’s database resource would go stale after a while. According to the docs it should automatically reconnect, only it didn’t. This brought the need to manually restart those processes every once in a while (we put the MySQL timeout to a week to alleviate the burden, but the exact moment was random depending on the last moment of activity. Of course, one could argue that a site that’s completely inactive for >7 days regularly isn’t worth the effort, but a) that wasn’t our problem and b) the client was still working on his marketing plan. Fair enough).

Today I got fed up with it, had an hour to spare, and the marketing plan was ready as well. Time to bite the bullet; here’s what I came up with.

1. The NodeJS process

That part was easy; essentially I followed the instructions here. We don’t use Ubuntu but rather Debian, but it should be similar on all *nix systems.

2. The dNode-PHP process

This is where it got interesting, and which had been the actual bullet I’d been putting off biting. PHP isn’t very well suited to run as a daemon. It’s possible, but that doesn’t mean it’s desirable, in the same sense as writing out all your CSS in <script>document.write('<style>...<' + '/style>')</script> tags is only a theoretical option. But in this case it was still better than duplicating code in Javascript.

Now, there are ways to turn a PHP script into an actual daemon. It was still overkill for our purposes, so I simply went with a cronjob that killed any existing process and restarts it every hour. If the client ever needs an actual daemon, we’ll get to that then 🙂

Yes, this “daemon” is a beggar, but as they say: beggars can’t be choosers…