The first thing we do every night on KnpBundles.com is look for new Symfony2 bundles on twitter, github, google… and add them in the database. Once this is done, the update process can actually start.
- First, the bundle is cloned on the server (or only updated if the bundle isn’t new)
- Once this is done, some information is fetched from the Github API (number of forks, latest description, number of watchers, homepage…).
- Then we fetch the new README, composer file, latest commits, tags
- Finally, we update the score
Before today, this entire process was performed in a Symfony2 command. This worked well until it didn’t: a 5 hour long PHP process running out of memory.
This was before RabbitMQ.
From its website, RabbitMQ is a complete and highly reliable enterprise messaging system based on the emerging AMQP standard.
Cool, heh? Let’s see how we used that to improve our process.
The update process remains the same, RabbitMQ or not. What actually changed is the way it is performed. It’s no longer a single command which upates one bundle at the time but a command that basically tells the RabbitMQ server “hey, this RabbitMQBundle needs to be updated”, “this KnpPaginatorBundle needs to be updated”, “this YetAnotherBootstrapBundle needs… wait, YET ANOTHER BOOTSTRAP BUNDLE?!”, etc.
Meanwhile, a few consumers are running. They will stay in standby mode, ready to update a bundle as soon as one needs to be updated.
So what are the benefits for KnpBundles.com? Well, it’s now possible to update multiple bundles at a time, we no longer have a single huge PHP process hurting our beloved server and last but not least, it makes it possible to update a bundle as soon as it has been manually added to the website without hurting anybody.
There were a few things we needed to change in order to make this work. For example, if you had a look at KnpBundles.com‘ internals, there was some kind of array-based caching to prevent a bundle or a user from being created twice in the database. This worked as it was supposed to until we added some RabbitMQ in the mix, having many consumers running at the same time. In this very particular case, we removed the array-based caching and went for asking the database instead. This makes a lot more queries but I’m pretty sure the server can handle it.
There is one problem, once you’ve started toying with RabbitMQ, which is that you will want to use it everywhere. Seriously, don’t implement it thoughtlessly.