Almost every programmer has written some version of this:
std::cout << "Hello, World!\n";
It is the smallest possible signal that something works.
The compiler works. The runtime works. The terminal works. You can turn text into something visible.
That is why Hello, World! is usually treated as boring. It is the first step before the real work starts. You write it, run it, confirm the environment is alive, and move on.
But the program only feels simple because almost everything interesting is hidden underneath it.
When you write to std::cout on a normal machine, you are not really talking to the screen or the serial port. You are talking to a C++ stream abstraction, which eventually talks to a runtime, which eventually makes a syscall, which crosses into the kernel, which routes bytes through a file descriptor, a TTY layer, and a device driver.
By the time the bytes reach hardware, your tiny line of C++ has leaned on an entire operating system.
So I wanted to ask a different version of the beginner question:
What does
Hello, World!look like when there is no operating system underneath it?
That is what led me to build hello-kernel, a tiny freestanding C++ kernel that boots, writes Hello, World! directly to a serial port, and then halts.
No userspace. No libc. No terminal. No operating system.
Just a bootloader, a kernel entry point, a linker script, a few CPU instructions, and one line of text coming out through COM1.
What std::cout normally hides
In a regular C++ program, std::cout << "Hello, World!\n"; feels direct.
It is not.
At the application level, the string starts in user-space memory. The C++ stream machinery buffers it. At some point, that buffer is flushed and the runtime asks the operating system to write those bytes somewhere.
On Linux, that usually means a write syscall. The program passes a file descriptor, a pointer to the buffer, and the number of bytes to write.
If file descriptor 1 is connected to a terminal, Linux routes that write through its terminal subsystem. If that terminal is backed by a serial device, the kernel eventually reaches a serial driver. For a classic x86 COM1 port, that driver writes bytes to I/O port 0x3F8, usually through an instruction like outb.
So the path is not:
program -> screen
It is more like:
C++ stream
-> runtime
-> syscall
-> kernel
-> file descriptor
-> TTY layer
-> serial driver
-> UART
That stack exists for good reasons.
Applications should not be allowed to issue privileged hardware instructions whenever they want. The operating system owns the hardware, schedules processes, enforces permissions, manages memory, and provides safer abstractions for programs to use.
Most of the time, that is exactly what you want.
But if the goal is to understand what sits underneath the abstraction, the OS gets in the way. It does so much for you that the simple act of printing text no longer teaches you much about the machine.
So in hello-kernel, I removed the layers.
Instead of asking an operating system to print text, the kernel talks to the serial port directly.
The smallest useful kernel
The kernel itself is almost disappointingly small.
The entry point looks like this:
extern "C" [[noreturn]] void kmain() {
serial_init();
serial_write("Hello, World!\n");
hang();
}
That is the whole story at the top level.
Initialize the serial port. Write the string. Stop forever.
But each part of that tiny function is carrying more weight than it first appears.
extern "C" matters because this is C++. Without it, the compiler would mangle the function name. The bootloader is not looking for a C++-decorated symbol. It needs a plain symbol named kmain.
[[noreturn]] matters because there is no caller in the normal sense. This is not a program launched by a shell. There is no main, no C runtime startup, no process waiting for an exit code. Once the bootloader jumps into the kernel, the kernel owns the machine. Returning would be meaningless at best and broken at worst.
Even the absence of standard library code matters.
This is freestanding C++. There is no libc. No exceptions. No RTTI. No heap. No std::cout. No runtime standing behind the language to catch anything.
The code looks like C++, but it is C++ with most of the comfortable furniture removed.
That is part of what makes kernel work interesting. You can still use the language, but you cannot assume the environment.
Getting to kmain
Before the kernel can print anything, something has to load it.
When a machine starts, the CPU begins executing firmware code. On modern PCs that usually means UEFI. The firmware initializes enough of the system to find something bootable. It then loads a bootloader.
In this project, the bootloader is Limine.
Limine reads its configuration, loads the kernel ELF file, finds the entry point, prepares the environment, and jumps to kmain.
The boot path looks roughly like this:
UEFI firmware
-> BOOTX64.EFI
-> Limine
-> kernel.elf
-> kmain
-> serial_init
-> serial_write
-> hang
That is a lot of ceremony for one line of text.
But it is also the point.
A normal application starts life inside an environment that already exists. A kernel has to be placed into one. It needs to be loaded into memory. Its executable format has to be understood. Its entry point has to be found. Its sections have to be mapped with the right permissions. Its boot protocol requests have to be discovered and filled in.
That work is easy to miss because it happens before your code runs.
But in low-level programming, “before your code runs” is often where most of the real system lives.
The handshake with the bootloader
The kernel uses the Limine boot protocol. That means it embeds specific request structures inside the binary. Limine scans for those structures and fills in response pointers before transferring control to the kernel.
In the code, that looks like this:
__attribute__((used, section(".limine_requests")))
static volatile LIMINE_BASE_REVISION(3);
__attribute__((used, section(".limine_requests")))
static volatile struct limine_framebuffer_request framebuffer_request = {
.id = LIMINE_FRAMEBUFFER_REQUEST,
.revision = 0,
.response = nullptr,
};
The kernel is not drawing to the framebuffer yet. The framebuffer request is there mostly as a simple end-to-end proof that Limine can see the request and write a response.
What I found interesting was how much intent is packed into those attributes.
used tells the compiler not to throw the symbol away just because no C++ code references it.
section(".limine_requests") places the data into a specific linker section so the bootloader can find it.
The request structs exist for someone outside the C++ program. From the compiler’s point of view, they look unused. From the bootloader’s point of view, they are the contract.
That is a nice example of a systems programming tension: not all important relationships are visible in normal language-level references.
Sometimes the contract is in a section name. Sometimes it is in a magic number. Sometimes it is in the linker script.
The linker script is part of the program
In application development, the linker is usually background noise.
You might think about it when dependencies fail, symbols are missing, or a build breaks in some unpleasant way. But most of the time, you do not write the map of your program’s memory layout by hand.
In a kernel, the linker script becomes part of the design.
The hello-kernel linker script starts the kernel at a higher-half address:
. = 0xffffffff80000000;
That address is in the upper part of the 64-bit virtual address space. Kernels commonly live there while userspace lives lower down. Limine maps the kernel there for us, and the compiler is told about this with -mcmodel=kernel.
The linker script also splits the kernel into separate loadable segments:
PHDRS
{
text PT_LOAD FLAGS((1 << 0) | (1 << 2)) ;
rodata PT_LOAD FLAGS((1 << 2)) ;
data PT_LOAD FLAGS((1 << 1) | (1 << 2)) ;
}
Text is readable and executable. Read-only data is readable. Data and BSS are readable and writable.
That gives the loader enough information to map the kernel with sensible permissions. Code does not need to be writable. Data does not need to be executable.
The linker script also has to preserve the Limine request sections:
.limine_requests : {
KEEP(*(.limine_requests_start))
KEEP(*(.limine_requests))
KEEP(*(.limine_requests_end))
} :rodata
That KEEP is small but important.
The C++ code never calls those request objects directly. A linker doing garbage collection could decide they are unused and remove them. If that happens, the bootloader cannot find the protocol markers, and the kernel no longer boots correctly.
This is the kind of detail that makes low-level work feel different from normal application code.
The source file is not the whole program. The compiler flags are part of the program. The linker script is part of the program. The bootloader configuration is part of the program.
The boundary of “the code” gets wider.
Talking to COM1 directly
Once the kernel is running, the actual output path is beautifully primitive.
The project writes to COM1, the first legacy serial port. On x86, legacy devices like the UART are reached through a separate I/O address space. You do not write to them like normal memory. You use special CPU instructions.
In C++, there is no built-in function for this, so the kernel uses inline assembly:
inline void outb(uint16_t port, uint8_t val) {
asm volatile ("outb %0, %1" : : "a"(val), "Nd"(port));
}
inline uint8_t inb(uint16_t port) {
uint8_t ret;
asm volatile ("inb %1, %0" : "=a"(ret) : "Nd"(port));
return ret;
}
outb writes one byte to an I/O port.
inb reads one byte from an I/O port.
That is the entire hardware interface for this project.
COM1 usually lives at port 0x3F8:
constexpr uint16_t COM1 = 0x3F8;
The UART exposes registers at offsets from that base address. To configure it, the kernel writes specific bit patterns to those registers:
void serial_init() {
outb(COM1 + 1, 0x00);
outb(COM1 + 3, 0x80);
outb(COM1 + 0, 0x03);
outb(COM1 + 1, 0x00);
outb(COM1 + 3, 0x03);
outb(COM1 + 2, 0xC7);
outb(COM1 + 4, 0x0B);
}
Those numbers are not random, although they look like it at first.
They disable UART interrupts, enable divisor-latch access, set the baud divisor, configure 8 data bits with no parity and one stop bit, enable FIFO buffers, and assert the modem control lines needed for normal operation.
In normal application code, this kind of thing is buried inside a driver.
Here, it is the driver.
To send a byte, the kernel waits until the UART says the transmit holding register is empty. Then it writes the byte:
void serial_putc(char c) {
while ((inb(COM1 + 5) & 0x20) == 0) {}
outb(COM1, static_cast<uint8_t>(c));
}
That while loop is a busy wait.
In a normal operating system, busy-waiting like this would be wasteful. You would want interrupts, buffers, scheduling, and a driver model. But this kernel has no scheduler. There is no other work to do. Waiting is fine because the machine exists for exactly one purpose at this moment: print the line.
Then serial_write just walks the string:
void serial_write(const char* s) {
for (; *s; ++s) serial_putc(*s);
}
That is the kernel’s entire output system.
No formatting. No stream abstraction. No terminal. No Unicode. No buffering.
Just bytes.
The build is part of the lesson
The code is small enough that it is tempting to think the project is small.
But a kernel is not only its source code. It is also how that source code becomes something firmware can eventually run.
The compile step uses flags that are easy to ignore until you need them:
clang++ --target=x86_64-elf \
-ffreestanding -fno-exceptions -fno-rtti \
-fno-stack-protector -mno-red-zone -mcmodel=kernel \
-Wall -Wextra \
-c main.cpp -o build/main.o
Each flag removes an assumption.
--target=x86_64-elf says we are not building a normal macOS or Linux executable. We are building a bare-metal ELF object.
-ffreestanding says the program does not run in a hosted environment.
-fno-exceptions and -fno-rtti remove C++ features that would need runtime support we have not provided.
-fno-stack-protector avoids references to stack canary support from a runtime that does not exist.
-mno-red-zone disables the 128-byte red zone below the stack pointer. In kernel mode, interrupt handlers can clobber that area, so kernels should not rely on it.
-mcmodel=kernel tells the compiler to generate code suitable for a higher-half kernel.
Then the linker produces the kernel ELF:
ld.lld -nostdlib -static -T linker.ld -o build/kernel.elf build/main.o
Again, the flags are mostly about removing assumptions.
No standard library. No dynamic linker. Use this linker script. Produce a static kernel image.
After that, the project builds a bootable ISO with Limine’s bootloader files, the kernel ELF, and a limine.conf that points Limine at /boot/kernel.elf.
When it runs under QEMU with -serial stdio, every byte written to COM1 appears in the host terminal.
That closes the loop.
The kernel writes a byte to port 0x3F8.
QEMU emulates the serial port.
The host terminal receives the byte.
Hello, World! appears.
It feels simple when it works.
It only feels simple because every layer is finally lined up.
Halting is also a decision
After printing, the kernel calls hang():
[[noreturn]] void hang() {
for (;;) {
asm volatile ("cli; hlt");
}
}
This is another detail that seems almost silly until you think about it.
A normal program exits.
It returns from main, passes control back to the runtime, and eventually the operating system reclaims its resources.
A kernel cannot do that.
There is nowhere to return to. The kernel is the thing that would normally receive control from everyone else.
So this kernel disables maskable interrupts with cli, halts the CPU with hlt, and loops forever in case something wakes it up.
Even stopping has to be explicit.
Why this was worth building
This project is not trying to be a real operating system.
It has no memory allocator. No scheduler. No interrupt handling. No filesystem. No userspace. No real console.
It boots, prints one line, and stops.
But that is enough to make the familiar unfamiliar again.
Hello, World! is usually used to prove that a language or environment works. Here, it proves something different:
- the compiler produced freestanding code
- the linker placed sections where the bootloader could use them
- Limine recognized the kernel and jumped to the right entry point
- the CPU reached our code in long mode
- the serial port was configured correctly
- QEMU exposed COM1 to the host terminal
- the kernel did not need an operating system because, for this tiny moment, it was the operating system
That is a lot of machinery hiding behind one line of output.
And that is what I enjoyed about it.
It reminded me that abstractions are not lies. They are agreements.
std::cout is a useful agreement. So are syscalls, terminals, drivers, bootloaders, executable formats, and virtual memory. Most of the time, we should be grateful that we do not have to think about all of them just to print text.
But every now and then, it is worth taking the agreement apart.
Not because we should all write kernels. Not because low-level code is somehow more real than application code. Not because abstractions are bad.
But because understanding what sits underneath an abstraction changes how you use it.
It makes ordinary software feel less ordinary.
The next time I write:
std::cout << "Hello, World!\n";
I will still appreciate the simplicity.
But now I will also see the hidden path underneath it: the runtime, the syscall, the kernel, the driver, the UART, and the small mountain of assumptions that make a beginner program feel effortless.
That is the strange joy of building tiny systems projects.
You do not always build them because they are useful.
Sometimes you build them because they make the layers visible again.
The code for this experiment is here: hello-kernel.