Bootloaders & The Wilderness of The Firmware

2023-10-06

Between “pressing the power button” on your device to actually start using it, there exists many pieces of software from different vendors that must execute & work together for a system to Gracefully Boot™.

Most modern devices are capable of booting in a few seconds & being fully ready to be used. There are even areas (safety-critical) where sub 1 second boot time is not only possible but required! These timings are fascinating when you learn how much “software switching” happens & the important role they each play.

The overall process I just described above is called a boot process. It’s a rather “big” topic that covers things like: Firmware, BIOS, [U]EFI, First-Stage & Second-Stage Bootloaders, ACPI, Device Trees and a little bit of X86, ARM & Hardware. But worry not! I’m here to explain all of these (and more) in a nicely formatted & easily digestible way.

Bootloader-Related Terms & Logos

If you are ready, grab something to drink for yourself because this is going to be a “long” one >.<

Quick Note: Initially I intended this to be a one giant writing. However, it turned out to be a REALLy long one, so I divided it into 3 parts.

Bootloaders & The Wilderness of The Firmware* (this one)
First-Stage Loaders: BIOS, [U]EFI & ARM Counterparts (coming soon)
Second-Stage Loaders: GRUB2, iBoot, BOOTMGR & More (W.I.P.)

WARNING: This is a beginner — intermediate level writing with relatively limited info on technical stuff like; how-to build or implement-in-C and is written with the sole purpose of education and bringing more people into the topic of bootloaders in the hopes of ‘nurturing’ their curiosity.

Problem

There are basically two types of memory/storage types: volatile and non-volatile. When a device is not running all the data & software are stored in a non-volatile storage like eMMC, HDD, SSD, USB Flash Drive, CD-ROM and so on. In contrast, when a device is running [some of] that data & software are now stored in a volatile storage like SRAM and DRAM.

Volatile vs. Non-Volatile Memory — Table Source: diyanaha.wordpress.com

See the Von Neumann architecture for more information on the question “why?”.

As with anything in life, your device/CPU has no idea what to do when it’s first started. It can’t just load your operating system and applications because it doesn’t even know what/where your storage device is, let alone your specific “operating system”.

Your device is [almost] completely blind at this point. It somehow must recover itself from this situation and become fully-functional. This process is called bootstrapping: for a computer to pulls itself up by it’s own bootstraps.

I assume you got the problem defined in your mind. Just in case you didn’t here’s the problem “we” are trying to solve & explain:

How does a computer starts itself when it doesn’t know what is what and where is where?

Btw, the question above is a little bit naive. There are way more questions that needs to be asked. Of course, I will answer most of them here, but I didn’t explicitly specified them for simplicity sakes >.<

Warming Up

Every software program out there has some form of requirements from its working environment. Let’s give some examples.

Python Script: Requires python
Java Program: Requires jre
Most C / C++ Programs: Requires libc or libc++
Most PC Games: Requires DirectX

Of course there are WAY more things that a program can require, but it’s just a simplification.

An AR/VR Game Made /w Unity Has Many Layers of “Requirements” — Source: unity.com

Now, the above examples are just some high-level requirements. What about things like stdin, stdout, TCP/IP Stack, Software Exceptions, Stack Area and Heap Area? Most program expect you to have these types of thing because they are so “base/trivial”. Luckily we have this amazing thing called operating systems that does these types of stuff for us;) But how do they come to existence anyway?

When you launch a software program, your operating system handles most of the “boring” stuff for you like (not in-order):

Loading dynamic libraries (e.g. libc)
Loading it into memory
Parsing the executable (e.g. ELF, PE)
Setting up the Stack
So on…

Low-level programs like kernels or generic firmwares are not so different in this area. They, too, need to be loaded into memory and have some sort of stack set-up for them. Unlike “normal” programs however, they have additional requirements like “I want to run on CPU Core #1” or “Interrupts must be disabled on all cores”.

A [presumably incorrect] Comparison Between High Level Software & Low Level Software — Source: developer.qualcomm.com

As you can see these are not your “normal” requirements. These are more “hardware-related & environmental” requirements. Now comes the big question:

Who can satisfy these environmental requirements for low-level programs like kernel and firmware?

Introducing.. Bootloaders!

Software program(s) that the CPU executes to boot itself is called are bootloaders. Now the term “bootloader” is very broad and generic. It could be a small assembly stub, a firmware or a multi-purpose project like GNU GRUB.

You might know bootloaders as the program that you see on the screen before launching your favorite operating system.

According to Wikipedia, a bootloader is: “… a computer program that is responsible for booting a computer.” And throughout this writing I’ll use this as the base definition.

Two Different Bootloaders & Their Interface (Left: GRUB - Right: BOOTMGR)

As seen from the above examples, there are many forms that a bootloader can take and be written. For a better understanding, let’s list some of the core responsibilities of a bootloader:

Detect & Initialize Hardware
Load a Program Into Memory
Pass Control to The Loaded Program

When I say “program” I mean stuff like, kernels, operating systems or other-bootloaders. However, since they are technically also programs, I’m going to call ANYTHING that a bootloaders loads a program.

In other words: program === kernel/OS/bootloader throughout this writing.

Most modern bootloaders, of course, do more than what listed about (e.g. Diagnostics, Basic CLI, Cool GUI). However, for a program to be properly called “a bootloader” it MUST handle the above 3 tasks.

Detect & Initialize Hardware

First thing first; CPU, memory and storage. These are the hardware devices that a bootloader program MUST detect and properly initialize before trying to “load” a program into memory.

Raspberry Pi 4 Board Hardware Overview — Source: seeedstudio.com

Naturally, this part is hardware-dependent. Meaning the bootloader must be designed, implemented and built for the target CPU ISA (e.g. x86, ARM64, RISC-V). This requires different booting methods & standards to be used, but more on this later.

After that, the memory must be set-up. Here we are faced with a new problem. Since a bootloader is still a “program”, it itself needs to be loaded into memory. But how? Isn’t the bootloader responsible for that? Hmm… this seems similar to the well-known chicken or the egg problem.

Obviously we need some another program for this. And guess what! It is also a type of “bootloader”. More specifically a “First-Stage Bootloader”. Again, more on this later;)

Different Kind of Non-Volatile Storage Devices

But now, what about storage devices? The target program is, as we know, is stored inside a non-volatile memory. In this case, a bootloader MUST implement the proper device drivers & use them to detect & read from storage devices like eMMC, HDD, SSD and USB Flash. For example, an AHCI driver is implemented in a bootloader so that it can operate on SATA drives.

Additionally, a bootloader can also initialize stuff like a serial port, keyboard, RAM buffer, display etc. Although not necessary, but most modern bootloaders do include them for a better UX & accessibility.

Load a Program Into Memory

A bootloader should be able to detect different programs from different storage devices & then launch them. And, optionally, provide some arguments to the target program. Now, the question is “How”?

Different Programs Detected by GRUB — Source: ubuntubuzz.com

First, a bootloader must identify the program it wants to load. This is generally done by scanning a storage device (or a drive) for files that have a specific Magic™ header. A magic header is a sequence-of-bytes that is unique to a particular type of program.

A bootloader can’t (and shouldn’t) just launch any type of program. It needs to know that “Okay, this program ‘can’ and ‘wants to’ be loaded by me.” This idea is implemented using Magic™ headers as previously said.

You can think of Magic™ headers as signatures for files. I don’t know why they are called ‘Magic’ :/ Personally tho, I find them really interesting and you should look them too when you have a chance.

Magic Header Are Used For Almost Any File Out There, Above is The ELF [Magic] Header — Source: nathanotterness.com

These Magic™ headers have other important meanings apart from being useful for bootloaders when detecting programs. They tell others:

I am a/n of file.
I acknowledge what my represents
I comply with the specifications of my

It’s like an agreement between programs! (we’ve just discovered software standards & specifications)

Here we are faced with a new topic: boot specifications. They are not real programs but rather “specifications” that defines what a/n of software program can expect and how it should behave. And we use Magic™ headers to tell that “I am of file and acknowledge everything in it.”

Boot specifications can also be known as boot protocols.

We have different boot specifications out there. Some are open-source and popular and some or custom-made & proprietary. Here’s some of the most common ones:

Multiboot 1/2: GNU’s boot protocol. Mainly used by GRUB for Linux.
Limine: Modern boot protocol. Mainly used by hobbyist OS devs.
Linux: Linux’s own boot protocol. Used by SysLinux on x86 platform.
BIOS: Basic Input and Output System for PCs. More on this later.
[U]EFI: Unified Extensible Firmware Interface. More on this later.
MachO: Apple’s own [boot] protocol. Used by iBoot on Apple devices.

Windows BOOTMGR Asking Which OS to Load — Source: technewsinc.com

Second, a bootloader selects which one of the discovered programs to be loaded & launched. This step is either pre-configured via a config file so that the bootloader “knows” which one to boot from OR it “asks” the user via a CLI or even a GUI. Most of the time the former method is used.

Different bootloaders have different config files, syntaxes and interfaces. This is purely implementation defined and the design is up to the bootloader developer. However, you can be [as an user] sure to know that some form of boot configuration is possible (e.g. config file).

GRUB Config File Found Under /boot/grub/grub. cfg — Source: linuxbabe.com

Third, depending on the boot specification/protocol, a bootloader sets up the environment the program expects itself to be in when launched. This step is further explained in the <TODO: FIRMWARE_HISTORY> section.

Fourth, the program finally gets “parsed” & loaded into the memory. This process depends entirely on the program format (type). It could just be as “simple” as parsing an [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) file and then loading its sections into the memory. Or it could be a Linux [bzImage](https://en.wikipedia.org/wiki/Vmlinux#bzImage) with [initrd](https://en.wikipedia.org/wiki/Initial_ramdisk) that needs additional things like decompressing.

Loading ELF File Into Memory (Both Physical & Virtual Memory is Given) — Source: ourembeddeds.github.io

After the parsing, a bootloader basically copies the program from the storage drive (non-volatile) to the memory (volatile). Generally this is as simple as just mirroring the executable format into memory byte-by-byte. However, there are some edge-cases that needs handling. For example;

Memory conflicts: If the program’s desired load address conflicts with the bootloader’s memory or other occupied areas, the bootloader may need to perform relocation to get out of the way.
Platform constraints: Some platforms (especially x86) have additional constrains like the infamous A20 Line or a Memorry-Mapped IO.

Generally, a [well-written] program is aware of these edge-cases and is designed to handle them by itself.

Pass Control to The Loaded Program

A bootloader job ends just before the loaded program gets executed. At this point of time, the program is loaded into the memory and is now ready to be executed.

Bootloader And The Program In the Memory (Visualized) — Source: embetronicx.com

You might be asking, “What happens to the bootloader in the memory after it’s done? Does it get deleted?” The answer depends on the bootloader and its configuration. It might:

Stay in the memory: Sometimes speed is key. So, no purging occurs.
Relocate to another place: Might want to keep it for later(?)
“Deletes” itself from the memory: Purge everything!

Again, this behavior is implementation defined and there is no “right” answer.

If you are working on low-level stuff, I advise you to assume the “worst” which the bootloader “Stays in the memory” and handle that accordingly.

Finally, the bootloader calls the ENTRY function defined by the program’s linker in build-time. From now on the flow of execution continues from the program and it is assumed to be in “total control” of everything! Sounds cool & dangerous…

An Example Assembly Entry Stub — (The Entry Label is Defines as _start)

Take a pause here if you’d like to. The upcoming sections are going to be relatively more technical & practical stuff >.<

Firmware (Between Hard and Soft)

So far we have defined what a bootloader is and seen some of its main objectives (and how it achieves them). It is now time to get a bit more technical and talk about stuff like x86/ARM boot process, BIOS, [U]EFI and firmwares in general.

In the world of “computing” we have:

the hardware: physical IRL component that handles the actual computing (e.g. CPU, RAM) and
the software: tells what the hardware should do & how to behave (e.g. OS, Drivers, Applications)

The thing with software is that.. it’s just a too broad of a term. I mean look at the image below and compare each software to other. They’re all technically software, but is it “smart” to identify them as “a software” and call it a day?

Some [Unofficial] Types of Software, There Are ofc More ‘Than What’s Above — Source: finoit.com

Can we really compare a low-level device driver to a Web application? Sure, they are both software, but the “level-of-abstraction” they work on is VASTLY different. Also, the stakes are VERY different. If a low-level software fails, it can literally “cook” the hardware (imagine if your CPU cooler failed to work *R.I.P.*). On the other hand we have high-level software like Stardew Valley that we can “always” restart to without much risk. The worst we can lose here is the game progress we made. (it still sucks tho)

Clearly we need some sort of way to distinguish between different types of softwares… Oh, wait. I found it! What’s the state between hard and soft? I know: it’s firm. Then let’s call this new type of software a firmware. I’m such a genius and creative person(!)

Simplified View Showing The Relationship a Firmware Has With HW & SW — Source: inspirezone.tech

We are going to classify all really-low-level programs as “firmware” instead of “software”. Since they are not really a software like video games or web browsers. They deal directly with the hardware and provide some sort of functionalities for the higher level softwares like kernels or a drivers. And, unlike software, they are designed/targeted for very-specific purposes like dealing with the XHCI controller or an optical mouse.

According to Wikipedia, a firmware is: “… class of computer software that provides the low-level control for a device’s specific hardware”.

In Laymen’s terms: a firmware is simply a software program that directly talks with the hardware for very specify purposes.

There are of course more differences between a software and a firmware. Instead of just typing them out, let’s see them in a picture (most important ones).

Some Comparisons Between SW & FW — Source: hardwarebee.com

Connection With Bootloaders

Now, the above section begs the questions “Should the bootloaders be considered as a firmware or a software?” The answer is both. Let me explain.

We know that a bootloader can be in any form (e.g. assembly stub, full-on project). This means that it can be considered as a firmware if it deals with low-level control of any kind-of device. It can also be considered as a typical software by providing services like a GUI and a CLI.

This distinction between a “firmware-level bootloader” and “software-level bootloader” is kinda confusing. To clarify it, the people-of-software world created two categories for bootloaders.

First-Stage Bootloaders: Like BIOS on x86 platforms
Second-Stage Bootloader: Like GRUB2 or iBoot.

The Overly-Simplified Transitions Between Different Boot Stages

You can already have some idea as to what they are. But, you will have to hold on to that thought just a little bit more. I’ll explain them in the coming sections;)

Bootloaders [obviously] needs to handle each platform differently. Espacially the First-Stage bootloaders. Below sections give information on x86 & ARM on their boot process & characteristics.

x86 — Boot Characteristics

X86 [Simplified] Boot Overview

NOTE: This section is mainly referenced from intermezzOS’s wiki page.

One of the main “philosophies” behind x86 is for it to be as general purpose as possible. It maintained backwards compatibility throughout years. But this made the overall boot process on x86 platforms a mess.

The x86 boot process is a whole pile of hacks held together with duck tape & glue. Each x86 generation added a feature that meant new steps for the process; further complicating it. The A20 Line is a good example to this:(

Here’s a fun fact: “when your fancy new computer starts up, it thinks it’s an 8086 from 1978. And then, through a succession of steps, we transition through more and more modern architectures until we end at the latest and greatest”

CPU

At first, the CPU is on a state called “real mode”. This is the 16-bit mode that the original x86 chips used.

The second one is the 32-bit “protected mode”. This mode add new things on top of the 16-bit “real mode”. It is called protected because the real mode sort of lets you do whatever you want (e.g. including bad ideas). Protected mode enables certain kinds of protections such as when accessing the RAM.

The third & final mode is the 64-bit “long mode”. Naturally we want programs (like the Kernel) to be running in this state.

More information on this topic can be found tortall.net — Execution Modes and Extensions.

Most Important X86 CPU Modes — Author: Faheem Syed

The above states are important to us as the bootloader needs to handle transitions between them before it can pass control to the program.

Memory & Device Discovery

The overall system on an x86 platform is described using a memory map. It is basically an array of information that specifies which areas of memory is used, reserved or free. The below picture is a good example showing a memory map.

Physical Memory Map As Seen & Reported by BIOS — Source: rekall-forensic.com

Unlike ARM, x86 platforms are more “dynamic”. Meaning the devices (e.g. PCIe, USB, Ethernet) are not necessarily known to “exist” before the system boots up. This is where First-Stage bootloaders, like BIOS, comes in to play. They “detect/discover” the devices available on the system and then create a memory map.

Keep this “memory map” in mind. They will be important when we talk about Device Trees in ARM.

ARM — Boot Characteristics

ARM [Simplified] Boot Overview (*most*)

Unlike x86, ARM platforms are generally targeted for more “mobile” devices and therefore follow a specialist philosophy. Although, Apple’s M SoCs proving this to be false.

For this reason ARM platforms are more static in nature. Meaning all of the hardware is designed to be closely working together and once it’s manufactured, it can’t be easily upgraded. For example, on x86 systems you can switch out your RAM modules and/or add more SATA ports.

Again, Apple’s new Mac Pro is proving this to be false /w expendable PCIe ports.

Due to being “static” in terms of hardware peripherals, ARM platforms doesn’t “care” about backwards compatibility as much as x86. And by this, the boot process becomes more clear & straight-forward.

Hardware Parts Are (closely) Tied on This Raspberry Pi 5 (Popular ARM Board)

Unfortunately for ARM, having a “specialist philosopshy” resulted in it being full of proprietary & vendor-dependent software/firmware solutions. Yes, the overall boot process is simpler and more clear. But, the actual implementation of it is [most of the time] NOT STANDARD! This is a rather big problem as we will see in First-Stage Loader section.

Luckily, with recent [U]EFI and Arm® Base Boot Requirements we might see more standardization in the ARM world. Here’s hoping!

CPU

ARM does not suffer from the same mess of 16-bit real mode and 32-bit protected mode state transitions. Instead we have Exception Levels. Depending on your ARM core, every instruction is run on the same X-bit mode. For AArch32, it’s 32-bit and for AArch64 it’s 64-bit.

ARM Exception Levels — Source: developer.arm.com

The bootloader’s main job here is less cluttered here (but vendor-specific solutions can interfene). Its main focus here is mostly working closely with exceptions levels. More info on them can be found on developer.arm.com — Privilege and Exception levels.

Memory

The overall system on ARM platforms are generally represented using Device Trees. They can be compared to memory maps on x86, as they are both used to describe the system hardware. Below is an example Device Tree Structure.

An Example [Part of An] ARM Device Tree Structure (Represents a QEMU ARM Virt Machine)

On the implementation level, device trees look “similar” to JSON files. They represent the system in an hierarchical view. For example, there can be a parent CPU cluster (NUMA Node) that has many core’s defined as its children.

Device Trees can be in two different file formats.

Device Tree Structure (.dts): Human-readable (similar to JSON)
Device Tree Blob(.dtb): Machine-readable (simple Binary Blob)

A .dts file can be compiled into .dtb & vice-versa. Check out my simple GitHub Gist to see how it is done. Shameless plug… ;)

The bootloader’s main job here is to “read & parse” the .dtb file and initalize the hardware accordingly. And pass it as-is to the loaded program.

Physically, the .dtb files are generally stored inside “ROM/Flash” on the board and/or, optionally, embedded inside the First-Stage loader.

Closing Words

Before we jump to First-Stage loaders, let’s take a break. We just saw & learned A LOT of new things. Just let your mind rest a bit and let it digest the information.

Bootloaders are not so simple programs as you can already see & feel. They cover wide range of fields: hardware, firmware and software). You need to at least understand some parts of them to get a good idea on what bootloaders are AND what they do. Try to take a few days of break and come back to this writing later again. I’m sure you’ll understand everything a bit better!

Very soon I’ll publish the next part of this bootloader series, which is going to be called First-Stage Loaders: BIOS, [U]EFI & ARM Counterparts. I’m doing some final editing & preparations for the images & diagrams.

If you have spotted some errors and/or think that what I said was wrong, please DO tell me! It would mean a lot to me. And lastly, thanks for the reading.

Enjoy Life ❤