My day job is writing embedded software, so I do a decent amount of Linux work. However, since the team that I work on was created long before I joined, there’s already a set of tools that builds our linux image. When I want to test a Linux change I run one build command and out pops a full linux image. But, if by some tragic accident, all our build code disappeared tomorrow, how would I go about building a linux image myself? How, exactly, do you go from a Linux source tree and some userspace code to a bootable binary? I wasn’t sure, so I decided to find out!
Goals
Before I start, I need to define the final product I’m looking for. I want a Linux kernel that I compiled, running with a device tree I compiled, booting off a file system I made, into a shell. No wifi, no GUI, just a terminal screen that I can type in. Basically, the minimal viable product of a Linux distro.
Now, if you go look up how to build a linux image, you’re going to come across two major tools: Buildroot and Yocto. And while I assume
that these tools are very powerful, I already don’t know how to build a linux image. And learning how to build a linux image at the same
time as I learn a tool that builds the linux image seems a bit much. So, for this exercise, I’m going to be doing everything by hand
(and by hand I mean using make plus whatever other miscellaneous utilities I need). But no pre-packaged Linux building system for me!
note: I don’t plan on having a system-wide C standard library either, which will come up later
There are only really three components that I need to build to boot into a shell.
- A Linux kernel
- A device tree for my target device
- A file system for my image to boot
So with all this in mind, onto step 1!
Building a kernel
Before I start building anything, I need to decide which hardware I’m going to be building for. I already have a Raspberry Pi 4 at home, so I decided to go with that (this also means that I could boot the image on real hardware in the future if I desired).
The Raspberry Pi foundation has a page on how to build a Linux kernel, so I started there.
note: I initially didn’t read the page closely enough and just assumed that the Pi was a build target for mainline Linux, but this is not the case. There’s actually a separate Linux repo for the Pi that’s distinct from the actual Linux repo. This matters since some of the make targets in the guide only exist in the Pi’s version of Linux.
The documentation said to build this for the 64-bit kernel
make ARCH=arm64 CROSS_COMPILE=aarch64-Linux-gnu- Image modules dtbs
and it said to build this for the 32-bit kernel
make ARCH=arm CROSS_COMPILE=arm-Linux-gnueabihf- zImage modules dtbs
I was a little confused to see two different targets for the kernel; do the 64 and 32-bit kernels not use the same target?
After a little googling it turns out that this can be answered via the help target. Running
make ARCH=arm64 help
Gives these build targets
Architecture-specific targets (arm64):
* Image.gz - Compressed kernel image (arch/arm64/boot/Image.gz)
Image - Uncompressed kernel image (arch/arm64/boot/Image)
and setting ARCH=arm gives these build targets
Architecture-specific targets (arm):
* zImage - Compressed kernel image (arch/arm/boot/zImage)
Image - Uncompressed kernel image (arch/arm/boot/Image)
* xipImage - XIP kernel image, if configured (arch/arm/boot/xipImage)
uImage - U-Boot wrapped zImage
bootpImage - Combined zImage and initial RAM disk
(supply initrd image via make variable INITRD=<path>)
So it seems that zImage and Image.gz are both compressed kernel images, just with different target names, depending on the architecture. Since I’m interested in 64-bit Linux, I’ll follow the guide and run the following commands.
KERNEL=kernel8
make -j $(nproc) ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- bcm2711_defconfig
make -j $(nproc) ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- Image dtbs
note: the dtbs target will build the device tree for the raspberry pi, so I’m actually going to build the kernel image and the device tree with one line
I get a file named ‘Image’ once Linux has finished building, and
running file on it reassures me that I’ve compiled the correct image.
file result/boot/Image
result/boot/Image: Linux kernel ARM64 boot executable Image, little-endian, 4K pages
I also get a device tree which seems valid
file result/dtbs/bcm2711-rpi-4-b.dtb
result/dtbs/bcm2711-rpi-4-b.dtb: Device Tree Blob version 17, size=56108, boot CPU=0, string block size=4872, DT structure block size=51164
Nice, my image built successfully!
So I now have a linux image and a device tree. But what about a file system? I need somewhere to store my shell (and the other utilities that I might want).
Building a file system
Initramfs
When you boot Linux, you need to inform it of where the file system for the image lives. You can use a file system on a physical device (like a hard drive or SSD), or you can use a ram only file system called initramfs. I decided to try the initramfs option.
note: This file system is RAM only, so once you shutdown your machine all the files are gone. This is fine for my use case here, but in reality you probably want a file system backed by a physical device.
After reading through the documentation on initramfs, it seems like what I need to do
is pretty minimal. I need to create a gzipped cpio archive that will be extracted into a root file system. That archive needs to contain a script called /init, which will
be what Linux runs after it’s unpacked my archive. Since the archive will be unpacked into a file system, it needs to contain all the programs I want access to, such as cp,
cd, and, given that I want a shell, sh. But there are a lot of tools that come bundled with a standard Linux system. Am I going to need to build all these from source?
As you may have guessed by my leading question, the answer is no, I don’t need to build these all from source. All I need is to use BusyBox.
BusyBox
To quote the busybox website
BusyBox combines tiny versions of many common UNIX utilities into a single small executable. It provides replacements for most of the utilities you usually find in GNU fileutils, shellutils, etc. The utilities in BusyBox generally have fewer options than their full-featured GNU cousins; however, the options that are included provide the expected functionality and behave very much like their GNU counterparts. BusyBox provides a fairly complete environment for any small or embedded system.
And luckily for me, those “many common” utilities happen to include all the programs I need for my shell to be useful! This means that I don’t need to ship a few hundred different binaries in my initramfs image, I can just ship BusyBox. But how does BusyBox provide all the functionality of the various tools that I want access to?
A brief tangent on argv[0]
Normally argv[0] isn’t used for anything; it’s just the name of your program, after all. And why would you ever care about
the name of the program you’re running?
This is some C code that just prints out argv[0]
#include <stdio.h>
int main(int argc, char** argv)
{
printf("%s\n", argv[0]);
}
when I run it I get
~/scratch$ ./a.out
./a.out
which isn’t very interesting. a.out is the name of the program. What else could it print?
However, an interesting thing happens when you create symlinks to a program.
~/scratch$ tree
.
├── a.out
└── foo -> a.out
Now, if I run foo (which is just a.out), I get this
~/scratch$ ./foo
./foo
Well isn’t that interesting. I only have one binary, but I can change argv[0] by invoking the same binary through a symlink. Imagine if I wanted to allow
one binary to do multiple things, depending on which symlink it’s invoked through. I could mimic having multiple binaries by checking the value of
argv[0] and taking the appropriate action depending on the value.
if argv[0] == "ls":
call_ls()
else if argv[0] == "cp"
call_cp()
// repeat for all other utilities
note: programs that change behavior depending on argv[0] are referred to as multi-call binaries, should you want to search for more examples
This is how BusyBox works; BusyBox has a few hundred programs inside of it, and you symlink each program name to the BusyBox binary. When you invoke BusyBox
via the relevant symlink, BusyBox checks argv[0] and calls the appropriate sub program for you. This means that I can ship one binary – BusyBox – but
have access to all the tools that BusyBox contains internally.
Listing out the programs BusyBox includes shows almost every program I’ve ever used
Usage: busybox [function [arguments]...]
or: busybox --list[-full]
or: busybox --show SCRIPT
or: busybox --install [-s] [DIR]
or: function [arguments]...
BusyBox is a multi-call binary that combines many common Unix
utilities into a single executable. Most people will create a
link to busybox for each function they wish to use and BusyBox
will act like whatever it was invoked as.
Currently defined functions:
[, [[, acpid, add-shell, addgroup, adduser, adjtimex, arch, arp,
arping, ascii, ash, awk, base32, base64, basename, bc, beep,
blkdiscard, blkid, blockdev, bootchartd, brctl, bunzip2, bzcat, bzip2,
cal, cat, chat, chattr, chgrp, chmod, chown, chpasswd, chpst, chroot,
chrt, chvt, cksum, clear, cmp, comm, conspy, cp, cpio, crc32, crond,
crontab, cryptpw, cttyhack, cut, date, dc, dd, deallocvt, delgroup,
deluser, depmod, devmem, df, dhcprelay, diff, dirname, dmesg, dnsd,
dnsdomainname, dos2unix, dpkg, dpkg-deb, du, dumpkmap, dumpleases,
echo, ed, egrep, eject, env, envdir, envuidgid, ether-wake, expand,
expr, factor, fakeidentd, fallocate, false, fatattr, fbset, fbsplash,
fdflush, fdformat, fdisk, fgconsole, fgrep, find, findfs, flock, fold,
free, freeramdisk, fsck, fsck.minix, fsfreeze, fstrim, fsync, ftpd,
ftpget, ftpput, fuser, getopt, getty, grep, groups, gunzip, gzip, halt,
hd, hdparm, head, hexdump, hexedit, hostid, hostname, httpd, hush,
hwclock, i2cdetect, i2cdump, i2cget, i2cset, i2ctransfer, id, ifconfig,
ifdown, ifenslave, ifplugd, ifup, inetd, init, insmod, install, ionice,
iostat, ip, ipaddr, ipcalc, ipcrm, ipcs, iplink, ipneigh, iproute,
iprule, iptunnel, kbd_mode, kill, killall, killall5, klogd, last, less,
link, linux32, linux64, linuxrc, ln, loadfont, loadkmap, logger, login,
logname, logread, losetup, lpd, lpq, lpr, ls, lsattr, lsmod, lsof,
lspci, lsscsi, lsusb, lzcat, lzma, lzop, makedevs, makemime, man,
md5sum, mdev, mesg, microcom, mim, mkdir, mkdosfs, mke2fs, mkfifo,
mkfs.ext2, mkfs.minix, mkfs.vfat, mknod, mkpasswd, mkswap, mktemp,
modinfo, modprobe, more, mount, mountpoint, mpstat, mt, mv, nameif,
nanddump, nandwrite, nbd-client, nc, netstat, nice, nl, nmeter, nohup,
nologin, nproc, nsenter, nslookup, ntpd, od, openvt, partprobe, passwd,
paste, patch, pgrep, pidof, ping, ping6, pipe_progress, pivot_root,
pkill, pmap, popmaildir, poweroff, powertop, printenv, printf, ps,
pscan, pstree, pwd, pwdx, raidautorun, rdate, rdev, readahead,
readlink, readprofile, realpath, reboot, reformime, remove-shell,
renice, reset, resize, resume, rev, rm, rmdir, rmmod, route, rpm,
rpm2cpio, rtcwake, run-init, run-parts, runlevel, runsv, runsvdir, rx,
script, scriptreplay, sed, seedrng, sendmail, seq, setarch, setconsole,
setfattr, setfont, setkeycodes, setlogcons, setpriv, setserial, setsid,
setuidgid, sh, sha1sum, sha256sum, sha3sum, sha512sum, showkey, shred,
shuf, slattach, sleep, smemcap, softlimit, sort, split, ssl_client,
start-stop-daemon, stat, strings, stty, su, sulogin, sum, sv, svc,
svlogd, svok, swapoff, swapon, switch_root, sync, sysctl, syslogd, tac,
tail, tar, taskset, tc, tcpsvd, tee, telnet, telnetd, test, tftp,
tftpd, time, timeout, top, touch, tr, traceroute, traceroute6, tree,
true, truncate, ts, tsort, tty, ttysize, tunctl, ubiattach, ubidetach,
ubimkvol, ubirename, ubirmvol, ubirsvol, ubiupdatevol, udhcpc, udhcpc6,
udhcpd, udpsvd, uevent, umount, uname, unexpand, uniq, unix2dos,
unlink, unlzma, unshare, unxz, unzip, uptime, users, usleep, uudecode,
uuencode, vconfig, vi, vlock, volname, w, wall, watch, watchdog, wc,
wget, which, who, whoami, whois, xargs, xxd, xz, xzcat, yes, zcat,
zcip
All I need to do to get access to all of these tools is make sure that my /init script sets up the relevant symlinks before starting the
shell
Compiling BusyBox
I don’t plan on packaging a C standard library in my system, so I need to make sure the BusyBox is compiled statically (otherwise BusyBox will search for a non-existent system-wide C standard library)
Luckily this isn’t that hard. To compile BusyBox I first clone the repo and then run
make defconfig
which produces a .config file. When I then open up this .config I see this
#
# Build Options
#
# CONFIG_STATIC is not set
Changing this to
# CONFIG_STATIC=y
means that my BusyBox image will now build statically.
I then build BusyBox using this command
make CROSS_COMPILE=aarch64-unknown-linux-gnu- -j $(nproc)
After building, I can verify that it is indeed a static binary.
file result/busybox
result/busybox: ELF 64-bit LSB executable, ARM aarch64, version 1 (GNU/Linux), statically linked, for GNU/Linux 3.10.0, stripped
Creating an initramfs image
Now that I’ve gotten BusyBox, I can actually create my initramfs image, which will contain the following
- a
/initfile (called at startup) BusyBox- BusyBox symlinks to whatever programs the
/initscript needs
Putting all that together results in a file system that looks like this
.
├── bin
│ ├── busybox
│ ├── ln -> busybox
│ ├── ls -> busybox
│ └── sh -> busybox
├── init
With this as the init script
#!/bin/sh
for command in $(busybox --list); do
if [ ! -e "/bin/$command" ]; then
ln -s busybox "/bin/$command"
fi
done
mount -t proc none /proc
exec /bin/sh
This script creates symlinks from the programs BusyBox packages to the BusyBox binary itself.
It then invokes /sh, which creates the shell that I’ll interact with.
Now that I’ve got the file system set up I need to package it in a way that Linux understands. Luckily the initramfs documentation provides a script for that!
#!/bin/sh
# Copyright 2006 Rob Landley <[email protected]> and TimeSys Corporation.
# Licensed under GPL version 2
if [ $# -ne 2 ]
then
echo "usage: mkinitramfs directory imagename.cpio.gz"
exit 1
fi
if [ -d "$1" ]
then
echo "creating $2 from $1"
(cd "$1"; find . | cpio -o -H newc | gzip) > "$2"
else
echo "First argument must be a directory"
exit 1
fi
This script takes two arguments: a directory to convert to an initramfs image, and what you want the resulting compressed cpio file to be called.
note: a cpio file serves the same purpose as a tar file, just in a different format
Running this script on my file system gives me a compressed cpio archive, which seems to be valid (I called my initramfs file init.cpio)
file init.cpio
init.cpio: gzip compressed data, from Unix, original size modulo 2^32 3069952
Booting the system
I now have all the pieces I need to actually boot the system. I have a kernel, a device tree, and an initramfs file system. Now it’s time to put it all together.
To test out the image, I’m need real hardware or an emulator. In this case I’m going to use QEMU, which is an emulator that can run a full kernel on a virtual Raspberry Pi, without needing to set up any real hardware.
note: I need the system emulation version of QEMU, not the user space version. The system level version can emulate the hardware of a device (which is needed to emulate a kernel), whereas the user space one can only emulate user space programs.
I can start the kernel by running
qemu-system-aarch64 -nographic \
-machine raspi4b \
-cpu cortex-a72 \
-m 2G -smp 4 \
-kernel result/boot/Image \
-dtb result/dtbs/bcm2711-rpi-4-b.dtb \
--initrd scratch/init.cpio \
-serial null -chardev stdio,id=uart1 -serial chardev:uart1 -monitor none
The important points here are
- pointing QEMU at the Image file I generated earlier using the
--kernelflag - pointing QEMU at the device tree file using the
--dtbflag - pointing QEMU at the initramfs image using the
--initrdflag
And after some waiting I see
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
[ 0.000000] Linux version 6.6.74-v8 (nixbld@localhost) (aarch64-unknown-linux-gnu-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41) #1 SMP PREEMPT Tue Apr 22 06:43:02 UTC 2025
[ 0.000000] KASLR disabled due to lack of seed [ 0.000000] Machine model: Raspberry Pi 4 Model B
[ 0.000000] efi: UEFI not found.
[ 0.000000] Reserved memory: created CMA memory pool at 0x000000002c000000, size 64 MiB
[ 0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[ 0.000000] OF: reserved mem: 0x000000002c000000..0x000000002fffffff (65536 KiB) map reusable linux,cma
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem 0x0000000000000000-0x000000003bffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x3bdd33c0-0x3bdd5fff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000000000-0x000000003bffffff]
[ 0.000000] DMA32 empty
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
// ignore most of the output
[ 1.410530] of_cfs_init
[ 1.413332] of_cfs_init: OK
[ 1.416512] clk: Disabling unused clocks
|[ 1.474364] Freeing unused kernel memory: 4864K
[ 1.477876] Run /init as init process
/bin/sh: can't access tty; job control turned off
~ #
I have a shell!! And I can verify that everything is working by doing the time honored tradition of hello world.
~ # echo hello world
hello world
And I’m done! I have a full – although limited – linux image that boots!
Future work
At this point I’ve achieved what I set out to do, but there’s lots of future work that could be done. Here’s a brief list of things that I know are out that that my Linux “distro” is lacking.
-
A system-wide C standard library
Most programs assume that a system-wide C standard library is available to them, and they will dynamically link against that. This means most “real” programs will fail on my system, since there is no C standard library available to them.
-
Networking
A linux image without an internet connection of some kind isn’t terribly useful. I don’t particularly want to spend time trying to figure out how to set up a Linux network and figure out how QEMU handles networking, so onto the “TODO” list it goes.
-
System configuration
Linux has various configuration files that control various parts of the system. The number and relevance of these files are beyond me, but the fact that they’re out there and I don’t understand what they all do means that it’s definietly an area of future work.
-
Unknown unknowns
These are the main things I can think of right now, but I assume there are many more areas of Linux that I’m completely unaware of. Maybe someday I’ll go look at what various Linux distros set up and try and replicate some of the configuration they do.
But for now this is it. I set out to create a mostly minimal linux image that boots into a shell, and that’s what I got!