Hugo and IPFS: How does this blog work (and can immediately scale to a peak of 5000%!)

Interesting fact: If you are reading this article, then you are using a distributed network. Since mid-February 2019, this blog With Blue Ink has served through IPFS and Cloudflare distributed web gateways.

Last November, I published a blog post about how to run a static website from IPFS . I have run several apps in the way I use myself and my family, and I think it's time to migrate my blog. Since I dealt with some issues, some of which were explained below, which was longer than expected, but I flipped the (DNS) switch about a month ago and eventually shut down the single-instance virtual machine hosting the blog.

This decision made me anxious, but things for a month seemed almost perfect.

Hacker News + Reddit Effects

Something has happened since the migration to IPFS.

More than a week ago, I posted a blog post about the importance of normalizing Unicode strings . These strings are almost viral, peaking at the 4th place on the Hacker News homepage and getting the head of r/programming. Put the chair and be included in some popular newsletters. ( Thank you for your love and wonderful discussion! )

Then, on Monday I released a new open source project, Heredias , which also has a good exposure on Reddit.

For blogs that used to average less than 3,000 page views per month, this happened:

On Wednesday, March 13, traffic was 5,060% higher than the same day a week ago. In one day, the number of page views using Blue Ink was almost twice that of the previous month.

Despite the significant increase in traffic, this is the CPU usage of the primary IPFS node serving the website:

No.

With the distribution of content over IPFS and the provision of services through the Cloudflare CDN, With Blue Ink has little impact on performance and availability after a 5,000% traffic spike.

Not only that: tests have shown that the site has been very fast for users around the world.

Meet Hugo

Using Blue Ink is a static website. I write content in a bunch of Markdown files and then use Hugo to generate HTML pages. The entire site (content, theme, script) is open source and is released on GitHub's ItalyPaleAle / WithBlueInk .

Three years ago, when I started using this blog, I initially chose another popular static website generator, Jekyll. However, when I was trying to migrate to IPFS, I had to replace Jekyll with Hugo because Jekyll does not support relative URLs. When using Jekyll, all generated URLs start with a fixed base URL, which doesn't work when you browse content through dynamic URLs via the base URL (for more information on its importance , see my previous IPFS guide. ).

Moving to Hugo brings some other huge benefits. Hugo is a small application written in Go that is much faster than Jekyll, a Ruby gem. Not only is Hugo faster in building a website (actually, it feels almost instantaneous), but since it is a separate, self-contained binary, it installs faster in a CI environment. My CI builds from five minutes to less than a minute. In addition, Hugo has many powerful and interesting features, and it has been actively maintained.

Meet IPFS

The Interstellar File System , or IPFS , is a protocol and network that distributes unchangeable content in a peer-to-peer network.

If you are not familiar with IPFS, consider it a distributed CDN. Once the IPFS node is started, you can use it to publish documents on the IPFS network, and others around the world can request it directly. The best thing is that whenever someone asks for a file, they will immediately start planting it to someone else. This means that when IPFS is used, the more popular a document is, the more it is copied, so the faster others download it.

Distributing files over IPFS can be very fast and very flexible. Due to distributed and peer-to-peer, IPFS networks can withstand auditing and DDoS attacks.

In addition, all documents on IPFS are resolved by hashing their content, so they are also tamper-proof: if someone wants to change a single bit in a file, the entire hash value will change, so the address will be different.

The problem with IPFS is that it is just a content distribution protocol, not a storage service. It is more similar to CDN than NAS. I still need some servers, I currently have three servers, configured in a cluster with an IPFS cluster . They are Azure's compact, inexpensive B1ms VM (1 vCPU, 2 GB RAM) running in three different regions around the world. You can read how I set them up in the previous article .

Due to the use of IPFS, this simple and relatively inexpensive solution is able to provide "100%" uptime and DDoS resistance. These sites automatically replicate on all nodes in the cluster, and these nodes start seeding immediately, and with geographically dispersed users, virtual machines get very fast worldwide.

Let's take a look at this architecture.

The blog's architecture is relatively simple:

Pushing some new code to the main branch on GitHub triggers an automatic build in the Azure pipeline that clones the source code and runs Hugo to build the site (it's all free!). You can see the configuration in the file in repo with azure-pipelines.yaml .

Once the build is complete, the Azure pipeline will trigger an automated publish job . The release pipeline has two phases (you can read them to configure them in other IPFS articles):

  1. Copy the files to one of the IPFS VMs, then call the ipfs-cluster-ctl pin add command via SSH to add the documents to the cluster and copy them on all nodes.
  2. Make a REST call to the Cloudflare API to update DNSLink , which is a TXT DNS record containing _dnslink.withblue.ink containing the website IPFS hash.

When the first phase occurs automatically, the administrator (I!) is required for manual approval before the second phase can be run. This allows me to test and ensure that the site is successfully loaded via IPFS (using its full hash) and then served to any visitor withblue.ink.

After the release pipeline is complete, anyone running the IPFS daemon can access the website with this IPFS address:

  /ipns/withblue.ink 

It's simple and easy to remember. But it only works for people who run the IPFS daemon or know how to use the gateway (for example, try to use gateway.ipfs.io ).

If you want to try IPFS, Firefox and Chrome's ipfs-companion extension allows you to easily browse the IPFS network, using an external gateway or a built-in gateway.

Most users are still using HTTP and a normal web browser, which is when Cloudflare helps. Through its (free) distributed Web gateway , edge nodes in the Cloudflare network can act as IPFS gateways and provide documents published over IPFS networks. The setup is very simple, if Cloudflare manages your DNS, you can also use the root domain (eg withblue.ink without www) due to the flattening of the CNAME!

Learn from real experience

I have been using the IPFS service web application for six months, and this blog has been around for more than a month. Overall, I have a positive experience, but if you are considering using IPFS yourself, I have learned something worth sharing.

What is going well?

In general, relying on IPFS has brought some interesting benefits.

  • Get "100%" uptime for documents over IPFS networks. As long as at least one of the companions provides content, because it has recently viewed the site (any type of client), or has fixed it (my three servers), the blog can be accessed via IPFS.
  • Speed: The more users accessing the site via IPFS, the faster others will be.
  • The site should also resist DDoS in a natural way.

But in reality, most users don't access this blog via IPFS, but instead access it via HTTP(S) through the Cloudflare Gateway. This still works quite well:

  • Because every document in IPFS is immutable, Cloudflare caches websites extensively across edge nodes around the world. As long as the DNSLink is the same, there is no need for the CDN to connect to the upstream server to check for new content. Delay tests from multiple locations around the world show consistent, fast page load times. This is very impressive when the front page of your blog is fully loaded (including images) in about 3 seconds and the new cache is consistently more or less from every corner of the globe.
  • The setup is very simple. Things are only valid except that the CNAME is pointed to the Cloudflare gateway and they are asked to enable TLS certificates for my domain. No need to configure high availability, load balancing, copy content across multiple servers, and more.
  • Cloudflare CDN has also done amazing things for you, including support for HTTPS and HTTP / 2.0 (SPDY!), gzipping response and more.

What I learned / might be better

HTTP is already 30 years old this month, and IPFS is still a new technology. With IPFS, some things work differently than we are used to, while others don't work at all.

  • IPFS is not serverless; it is definitely not free. You really need at least one server to seed your data. The good news is that you don't need a large server. An extensible 1-core VM provides enough CPU; however, if you are running an IPFS cluster, you need 2 GB of memory. Adding three nodes like me can be overkill (but this is a great learning experience – very interesting).
  • All URLs on your site must be relative. I explained this in detail in the previous article on IPFS. In short, because users can get from multiple base URLs (in my case https://withblue.ink/, https://<gateway>/ipns/withblue.ink or https://<gateway>/ Ipfs/<ipfs-hash>) access your website, so you can't use absolute URLs in HTML pages. This is also the main reason why I have to switch from Jekyll to Hugo.
  • As I wrote above, most users do not browse websites directly through IPFS, but browse the website through Cloudflare. This means that our actual uptime depends on them. Although Cloudflare has worked very well for me so far, they don't provide SLAs for their free services, and it's even less clear if IPFS gateways have SLAs. Sadly, I don't currently have data on how many visitors use IPFS, but I think they are just a few.
  • Some content is not available when using the Cloudflare IPFS Gateway, including:
    • Unable to set custom HTTP headers. In two cases this can be a problem: when you want to enable HSTS (there is no way at all), and when you want to manually set the Content-Type (IPFS gateway determines the content type from the file extension and uses some heuristics Please refer to this question ).
    • There are no custom 404 pages.
    • No server-side analysis, not even through Cloudflare. Your only option is to use a hosted solution such as Google Analytics.
  • Another issue I noticed is that when you change the value of DNSLink, the Cloudflare IPFS gateway does not always reliably clear the cache . It takes a few hours for everyone to see the latest content. This is the biggest problem I have encountered so far.
  • After updating the DNSLink value, there may be some cold boot time issues. The first page takes an extra few seconds to load, but in my experience, this is not too bad. This happens because the IPFS client in the Cloudflare gateway needs to traverse the DHT to find the node that serves your content. Once the content is copied, it gets faster and faster, so far it is no longer a problem.
  • Finally, one of the problems I encountered when running an IPFS node was that it could use quite a bit of bandwidth just to make the network work (not even for providing your content!). IPFS 0.4.19 has greatly alleviated this, but my Azure virtual machine is still measuring outbound traffic of approximately 160GB/month ( more than 400GB when IPFS is 0.4.18).

Many of the above issues, including caching, cold-start time, server-side analysis, custom HTTP headers, and 404 pages, can be mitigated by implementing a custom IPFS gateway instead of relying on Cloudflare. The official ipfs.io website does the same ; if the issue of caching on Cloudflare does not improve, I am considering this issue.

Original author: Alessandro Segala

Original link: https://withblue.ink/2019/03/20/hugo-and-ipfs-how-this-blog-works-and-scales.html