How This Site Works

2024-03-30#tech #frontend

I tend to over-engineer everything... including this website. I'm really proud of everything I've contributed to it and today I would like to take you through how my website works.

Since When Are Websites "Engineered"? ¶

If you're not a computer scientist you might not consider websites as "engineered" or "programmed", but under the hood, websites are just combinations of scripts and code that control how users view and interact with your website. There are many aspects of a website you need to consider such as the server, the client, browser compatibility, performance, and accessibility, to name a few. I've invested way too much time and effort into each line of code I wrote to make this website work.

That's really all the qualifications I care about to call this website "engineering".

Here is my website repository if you're interested in viewing my code:

https://git.sr.ht/~bossley9/website

The Server ¶

First, we'll start with the server. In a previous post I wrote about how I switched to sourcehut for hosting. I used to host this website on a VPS (virtual private server) which is just a fancy phrase for "someone else's computer", but the constant maintenance and resource scaling was tedious. I don't even have anything special or dynamic on my website. It's just text with some images and videos. Sourcehut is a fantastic cheap alternative if you decide to host your own static site.

With sourcehut I use an intense cache-control strategy for all static assets (images, styles, fonts) to be immutable with a max age of 1 year:

cache-control: max-age=31536000, public, immutable

This basically means that as soon as you view my site once, you don't need to re-download the fonts, styles, images, and videos all over again every time you revisit the site - your browser can remember it all. This makes my website feel extremely lightweight and makes page navigation feel immediate. Try navigating to my homepage and back. It should (hopefully) take only a few milliseconds.

How do you have to space to host all those big files on your website? ¶

I have gigabytes and gigabytes of images, streams, and audio files that I host on a NAS (network attached storage) using the same caching strategy via a Caddy webserver. This means I can add whatever videos and images to my website without ever having to worry about storage space (I still have terabytes of free space). I thought about hosting my streams and files on Youtube or a paid CDN (content delivery network) but I don't want to pay a monthly fee for ultimately less control.

Hosting my own content also means I can post whatever I want without the worry of someone copyright-claiming my video because it played an obscure song for 3 seconds (this happened every time I used to stream on Youtube in the past). I really don't want various music corporations making money off of my content and I try my best to give due credit to the external content I use on my website.

An old video of mine that was copyright-claimed and partially blocked in some countries

If you feel like I've used your music or content without providing credit, I'd be more than happy to credit you.

What about gzip compression? ¶

Most websites use gzip compression to make websites faster due to smaller file sizes. I used to do this when I paid for a VPS but sourcehut doesn't do this on their server. I don't see a downside to serving uncompressed files. My opinion is that websites shouldn't need compression to have a reasonably performant website.

robots.txt ¶

robots.txt is the key to effective SEO and how crawlers interact with your website. I've allowed most of the bigger search engines to index my site but I block crawlers like OpenAI's ChatGPT because I don't want an AI to steal my blog content. I should remove Google too at some point in the future.

User-agent: *
Disallow: /

User-agent: Bingbot
User-agent: Googlebot
User-agent: DuckDuckBot
User-agent: archive.org_bot
Allow: /

The Framework ¶

Although the entire website is static HTML and CSS, I don't write raw HTML by hand because that would be too much duplicated work. I used a website building framework called Lume which allows me to leverage two of my favorite technologies, Deno and JSX. Deno allows me to write TypeScript with a fast built-in compiler and JSX allows me to write JavaScript-like HTML. I use my own JSX package called sjsx that I wrote about in a previous post which allows me to write HTML using proper semantics while deduplicating markup. For example, I group all header meta tags into a layout to boost SEO consistently across all my blog posts. You can steal this snippet and include it on your own website:

function Meta({ title, site_title, author, desc, tags, currentUrl }) {
  return (
    <meta name="og:title" content={title} />
    <meta name="twitter:title" content={title} />

    <meta property="og:site_name" content={site_title} />
    <meta name="twitter:site" content={site_title} />
    <meta name="application-name" content={site_title} />
    <meta name="apple-mobile-web-app-title" content={site_title} />

    <meta name="author" content={author} />
    <meta name="twitter:creator" content={author} />

    <meta name="description" content={desc} />
    <meta name="og:description" content={desc} />
    <meta name="twitter:description" content={desc} />

    <meta name="keywords" content={tags.join(",")} />

    {image && (
      <>
        <meta name="og:image" content={image} />
        <meta name="twitter:image" content={image} />
      </>
    )}

    <meta name="og:url" content={currentUrl} />
    <link rel="canonical" href={currentUrl} />

    <meta name="twitter:card" content="summary" />
    <meta name="og:type" content="website" />
  )
}

Of course, some of these meta properties will likely change once X decides to rename them.

With this layout template I can automate SEO without having to even think about it.

Lume ¶

Lume elegantly generates static pages using JavaScript generators and frontmatter. It makes it really easy to generate large quantities of categorized pages. For example, all these blog posts I write are generated from markdown like so:

export default function* () {
  const pages = search.pages("thought", "date=desc"),
  for (const page of pages) {
    yield {
      layout: Layouts.BaseLayout,
      url,
       content: ( /* blog contents formatted here */ ),
    }
  }
}

Lume also follows the pages router pattern popularized by NextJS which makes it very easy to organize my project according to how the generated site will be structured. It greatly reduces my cognitive load.

Lume also has fantastic composability with plugins. Missing functionality? Just write your own plugin for it! For example, I wrote my own plugin to cache bust my CSS to ensure it always gets cached but reloaded when necessary:

export function cacheBustCSS() {
  return (site: Lume.Site) => {
    site.process([".css"], function (files: Lume.Page[]) {
      for (const file of files) {
        if (
          file.sourcePath === "/styles/main.scss"
        ) {
          const hash = encodeHex(
            await crypto.subtle.digest(
              "SHA-256",
              new TextEncoder().encode(
                Deno.readTextFileSync("./src/styles/main.scss"),
              ),
            ),
          ).substring(0, 10);
          file.data.url = `/styles/${hash}.min.css`;
          // then update the hash on each page/layout
        }
      }
    });
  };
}

I've also written a simple plugin to add relative links to each heading. It allows me to link to specific sections on a page within my website. Try hovering with a mouse over any of the headings on this page and you'll see a ¶ symbol which allows you to copy the relative link.

const HEADING_ELEMENTS = [
  // do not apply anchors to h1s
  "h2",
  "h3",
  "h4",
  "h5",
  "h6",
];

function processPage(page: Lume.Page) {
  for (const headingTag of HEADING_ELEMENTS) {
    page.document?.querySelectorAll(headingTag)?.forEach((headingEl) => {
      const headingID = slugify(headingEl.textContent || "");
      headingEl.setAttribute("id", headingID);

      const anchorEl = page.document?.createElement("a");
      if (anchorEl) {
        anchorEl.innerHTML = "¶";
        anchorEl.setAttribute("href", "#" + headingID);
        anchorEl.setAttribute("class", "anchor");
        headingEl.appendChild(anchorEl);
      }
    });
  }
}

I'm super happy with Lume as a framework as well as the assortment of plugins I've accumulated:

sass
sjsx (custom)
mdx
code highlight
autolink headings (custom)
minify HTML
css cache busting (custom)

The Content ¶

I put way too much content on my website and have it organized into different pages and sections.

Thoughts, Recipes ¶

Thoughts are the "bread and butter" of my website and I absolutely love writing these posts. The templating for both thoughts and recipes is very straightforward. Everything is rendered as-is by markdown and wrapped in an <article> tag. Nothing special. If I need to include custom HTML, I write the post in MDX.

Tabs and Poems ¶

Tabs and poems are rendered similarly. Since both heavily on exact spacing and indentation, I render both in <pre> tags to preserve the text formatting. I've even tried to make it so that guitar tabs can be printed out easily in case you want to print out my tabs and play them yourself. Try pressing ctrl-p or cmd-p on one of my tabs and see that the print preview removes all the extra website header/footer noise from the printed page.

Recs ¶

All my recommendations are stored as data in JSON files. I originally began storing these as .bib files to reference in LaTeX but eventually converted it all to JSON. It makes parsing all of my recs and ratings effortless.

One feature I missed when I transitioned from Astro to Lume a while back was its ability to validate data based on schemas in development runtime. To make up for this, I wrote TypeScript assertion functions for each data object I record to safeguard all data structures at build time. This ensures I add the right properties every time I update my website. I'm aware Zod does the same thing, but my manual assertions give me more control and reduce the amount of external libraries I need in my code. All of my rec pages look something like this:

import data from "@/__data/recs/books.json" with { type: "json" };

export default function () {
  assertBookList(data);
  const bookList: Book[] = data; // data must be valid
  // ...
}

Where assertBookList looks like this:

export function assertBookList(list: unknown): asserts list is Book[] {
  if (!Array.isArray(list)) {
    throw Error("Book list is invalid");
  }
  for (const item of list) {
    assertBookData(item);
  }
}

Streams ¶

My streams are a recent addition to my website. This required more careful thinking when properly loading video files. I wanted the viewing experience to be independent of JavaScript, but still structured similar to Youtube. The hardest part was formatting the <video> tag:

<video
  controls
  crossorigin
  id="player"
  poster="https://cdn.com/file.jpg"
  preload="auto"
  width="100%"
>
  <source
    src="https://cdn.com/video.mp4"
    type="video/mp4"
  />
  <track
    default
    kind="captions"
    label="English"
    src="https://cdn.com/video-captions.vtt"
    srclang="en"
  />
  <p class="err">Your browser does not support video. <a download href="https://cdn.com/video.mp4">Download this video</a> instead.</p>
</video>

We have been conditioned to expect videos to load as soon as a page loads so I use preload="auto", then apply a poster of the first frame of the video to make the playback feel seamless. I also wrote a script utilizing OpenAI's Whisper to try and use AI to auto-caption my videos. Regrettably, I haven't been using it recently because the captions are currently hit or miss and I need to refine the script (and my enunciation).

I've also created a timestamp system that allows me to reference specific points in a video via URL. I love how Youtube allows you to add a ?t=xxx query param to reference video timestamps and I wanted to do the same. The logic behind it is much simpler than I expected:

function playTime(id, time, load) {
  const vid = document.getElementById(id);
  if (!vid) return;
  vid.currentTime = time;
  if (!load) {
    vid.play();
    const url = new URL(window.location.href);
    url.searchParams.set("t", String(time));
    window.history.pushState({}, "", url);
  }
}
window.addEventListener("load", function () {
  const params = new URLSearchParams(window.location.search);
  if (params.has("t")) playTime("player", params.get("t"), true);
});

// in HTML:
// <a href="#player" onclick="playTime('player',400)">time 1</a>

A Note on JavaScript ¶

I should mention explicitly that every page on this site functions without JavaScript. There's no magic rendering behind the scenes and I want to keep it that way. I hate when websites gatekeep users by moving functionality under JavaScript - not only does it hurt page performance and SEO, it also makes curls, RSS, and web scraping tools nearly impossible to use.

Lately I've been adding small pieces of JavaScript functionality here and there to enhance the website (such as the video timestamps I mentioned earlier) but I am averse to adding critical JS functionality or tracking scripts of any kind. I prefer anyone and everyone to be able to access my website free of charge.

Static pages ¶

I host an increasing number of distinct pages dedicated to a single purpose. For these I write a simple function JSX containing the entire contents. I'm planning on moving all of my lesser-known pages like these into a separate "miscellaneous" section of my website (if I haven't already).

Performance ¶

I've already alluded to this multiple times but website performance is one of my top priorities. Google graciously provides Lighthouse as a means of capturing the overall performance of a website into concise numbers and feedback, which I monitor frequently when I add new features. It's an extremely neat tool for breaking down CLS, TBT, FCP, and all those other fancy acronyms.

Huge static assets are the primary killers of performance on most websites (other than large scripts, of course). As I mentioned earlier, I get around this by using an intense caching strategy and caching everything. I also make sure to compress as much as I can without reducing quality. Most images can be chroma-quartered with metadata removed. Some videos can use smaller resolutions and newer codecs.

I'm still waiting for universal JXL and H.265 support. It'll probably be another 10 years before Google and Apple finally agree on supporting the standards, unfortunately.

Loading all the images, videos, and audio clips of a page as soon as the page loads can be detrimental to performance. To make pages load faster, I lazy load images "below the fold" so they don't load until the user actually scrolls down to them. The best way to lazy load images would be using scripts but I would rather use native HTML properties to do this since they are getting better support over time.

<img
  src="https://cdn.com/image.jpg"
  alt="alt text"
  loading="lazy"
/>

The same can apply to videos. If they're not the main focus, don't preload them!

<video preload="none">
  <!-- ... -->
</video>

<audio preload="none">
  <!-- ... -->
</audio>

The Easter Eggs ¶

I love adding features on my website for the fun of adding them. I have plenty of small quality of life features I have added over time, such as dark mode, footnotes, timestamps, and others. I've also added a memes page and a Magic page. All of these are for my own satisfaction. (I'll leave it up to you to find all of them.)

Conclusion ¶

This website may seem empty and simple but a lot of effort has been poured into it behind the scenes. I love the way my website continually improves and grows over time and I'm proud of how far it has come.

Future ¶

I'm always looking for ways to add more content and improve my website. For example, I've lately been researching media source extensions to see if I can emulate the great work of Youtube's engineers and see if I can make my streams buffer faster.

Edit: I'm unable to recreate video chunk streaming due to implementation difficulty and lack of storage space. It would require me to store at least 3x the storage space for each video, not to mention creating an overly complex script to automatically chunk videos and read the chunks. Instead, I chose to use Bunny CDN which automates chunking in 5MB increments on the server for me. It's absolutely amazing and drops a video's buffer time from 10 seconds down to 500ms.