A feedback on static website development
As you can see, KNP has a new blog, yay! If you’re interested in learning what happened behind the scene, then this article is for you. We will be talking about technical aspects, choices, their evolutions and the place of internal projects at KNP.
A brief history
As you may know, we do have the habit of giving ourselves a little break every six month with our famous hackathons. This year though, it appeared that a freaking worldwide pandemic decided otherwise (maybe you heard about it).
No luck for the covid, as far as the hackathon is concerned, we’re quite stubborn people, and we insisted to hackathonize ourselves about three month ago, despite being stuck in our respective cities.
Our people manager Eve wrote a very good article to summarize the different projects we worked on, which you can read here. On this occasion, we formed a little team and started grinding our gears on what would become the new blog you’re browsing.
Gatsby is a static site generator, you can think of it as an executable process which can consume whatever data source you want (a github repository, static markdown files, any database, etc) and turn the fetched resources into static files (HTML, CSS, etc). If your source is a MySQL database with n articles in an article table, you’ll end up with as many HTML files as records in this table.
Here are some of the benefits that convinced us to use Gatsby: first, although we had no experience with Gatsby itself, this framework is built upon React for which we do have a strong background, as we use it on multiple client’s projects of various sizes. We know that those projects scale pretty well, could be correctly tested and we were confident in the global maintainability over time. The “experimental” aspect of the chosen technology was therefore controlled and reasonable, which is important, even for internal projects (one may argue: even more for internal projects).
Second, as the final generated files are only plain old HTML and CSS ones, we gain on both security and performance pictures. Security because there is no longer a database to interact with, nor some CGI scripting intermediate language to fetch and transform datas, send mails or do stuff that requires complex interactions with third party services. SQL injections threats and so on just vanished along with these layers, as we don’t need them anymore for the final exposed website. Performances, for quite the same reasons, are deeply improved by removing the most common bottlenecks: multiple services interacting through virtual or physical networks. HTTP requests are limited to fetch ressources used on the HTML pages, period. With a proper caching strategy, the result can be blazing fast.
Of course, in our case as in most situations, a backend will still be needed for article creation, edition, and every operation needed to properly feed our blog. This backend may have its own database, which could fail, be unreachable or slow etc, but this highlights the best advantage of working with a static website: the frontend and backend are totally decoupled, which implies that one may fail and the other still be up. As long as the public files are served and reachable, you can sleep tight.
Last but not least, as performance is mandatory to ensure a good ranking in search engines, SEO with static websites is a breeze. Each file embedding its own data statically, without having to wait for any backend request to complete, crawling bots will index your website in a glimpse. This may sound silly, but after struggling for more than two years with SEO problems with SPAs, this is worth to be considered, not to mention KNP’s blog plays a huge part in its visibility and on incoming projects following future client’s prospections.
To be fair, one must also recall some of the drawbacks such technology implies. Because the frontend and the backend are no longer coupled, the immediate consequence is to be forced to regenerate the entire website's static files every time something has changed on the backend. Fortunately, continuous integration and containerization will help us to orchestrate and automate these tasks, as long as you’re ready to get your hands dirty with more devops-related tools. Things can become a bit messy when it comes to integrating more dynamic components to your static pages, like a comment section for example.
Using Gatsby, you may have to rely on plugins or build your own, which are not always well documented nor maintained. Once more, the scope of your project will lead you to the good track. Static websites are, by definition static, and this is a very restrictive constraint. Although it suits some projects very well, it’s also a highway to bloat if you’re trying to use it to achieve what it’s not supposed to.
Phew, time to sum things up a little bit. This is the final setup we want to achieve:
- Static frontend using Gatsby.
- Backend to write and publish contents on the frontend. This part will be discussed in the next section.
- Continuous integration pipeline to synchronize changes performed over the backend on the frontend. We will be using CircleCI for that part.
Choices and evolutions
During the hackathon, another team took care of migrating our knowledge base, formerly a github repository, to a dedicated more UX friendly tool and used Wiki.js to do so. When we first bootstrapped the new blog, we cherished the hope to directly use this wiki as a datasource, and by doing so, avoid multiplying internal tools. Wiki.js embeds its own editor, supports oauth and other cool features that lets us think we could use it for both our internal knowledge base and as a back office for the blog.
The initial workflow we thought about looked like the following:
- An article is created or edited on the Wiki.js backoffice. It offers the possibility to regularly backup articles as markdown files in a github repository, which is convenient to be used as a data source by Gatsby.
- A Circle CI workflow is triggered every time something has changed on that github repository.
- This workflow basically checks out the frontend project code, which contains the Gatsby configuration to generate the static files. A docker production image is then built and deployed so the changes become visible on the public website.
Quickly though, problems with this workflow started to dangerously pile up, in a way we could not ignore. First, the limited job filtering options available on Circle CI (branch or tags) implied the following choice: either run the workflow at every commit on the master branch of the articles repository, which could result in tremendous amounts of docker images being built, pushed etc. Another possibility was to arbitrarily define a draft and published subfolder in the repository and use a tool to only run the CI if a commit occurred against that published folder. We were not pleased with this solution either, as it required the end user to open a pull requests in order to move articles from draft to published places. No big deal from a developper point of view, as it also offers a last occasion to review what’s going to be published, but this was not compliant with our will to offer the simplest workflow possible for non technical users.
Second, the data structure of the markdown files exported by Wiki.js lacked some crucial information, which we needed in order to build a beautiful frontend (publication date, author, category…). The way we started twisting both projects started to look like arguments to fill an entire package of dismissal letters. The idea to only have one tool to both feed the blog and the knowledge base was attractive, but not realistic enough based on the limitation of that tool.
After some digging, we decided to use Strapi as a dedicated back office for the blog, which came with everything we needed: simple collections and relationships, a handy markdown editor, and ready to consume endpoints for every collection. Icing on the cake, Strapi came with webhooks on which we can rely to trigger our deploy workflow. Some wandering and cleaning later, things finally started to shine again.
Here is the final workflow we aimed at:
- Editors write and publish articles from a Strapi back office. This BO exposes a REST API.
- We hook on Strapi collection events to fire the deploy workflow on Circle CI, using Strapi webhooks and the Circle CI trigger job API.
- In that workflow, Gatsby connects to the strapi API to pull sources, build static files, etc etc, just the same as before.
Here is a non-exhaustive list of things we learned during the development of that project, especially when the time to recover production data from the former blog came.
HTTPS break: we use traefik reverse proxy on both projects to generate certificates and force HTTPS requests. Some old articles were embedding hardcoded, absolute images links over HTTP, resulting in HTTPS alerts. We decided to rewrite those absolute urls to systematically use HTTPS: either the targeted external source migrated to HTTPS and we’re good to go, or it didn’t and the image link will be broken. This is the expected behavior, as those images should be replaced to avoid security breaches.
Switch between former and new blog: some research showed there is a debate on whether or not using subdomain is a good practice when it comes to SEO. Concretely, is blog.knplabs.com/my-article crawled the same way as knplabs.com/blog/my-article? Well, there is no consensus on that matter. We decided to keep the URLs the same and, to do so, used a proxy pass.
In the end, we’re quite happy with the achieved result. This project required between 30 and 40 days to complete. At first sight, it's a huge amount of time for such a little project. It includes research, discovery of several new tools, problems, bypassing problems, bad decisions, constant adjustments, etc.
We had the chance to work on this project in very good conditions, having a dedicated person nearly fully committed to it, and this made a great difference regarding internal projects. 40 days spreaded over one day per week for 40 weeks (!) is not the same as 40 days straight, especially in the build phase. Planning and clients do not always offer such possibilities, but we realized that there is a better chance of internal projects to succeed by fully committing to them.