AES on the STM32H5 with Zig

I recently came across Reticulum, and so far I’ve been blown away by how well thought through it seems, while still remaining, conceptually, incredibly simple. I won’t go too much into it in this article, but in essence, encryption forms the basis of the addressing, and addressing is entirely agnostic to the underlying transport. This means that it will work over just about any medium you can get bits across: LoRa, serial links, I2C, IP tunnels—it genuinely doesn’t care.

To understand Reticulum properly, I want to build the pieces from the ground up. The most titilating aspect of Reticulum, to me, is that it offers an alternative to the entire TCP/IP stack, meaning that it needs nothing, in theory, to send those bits out over the wire. To prove this point, and to build on my deep-dive into how UART works on the STM32H5, I want to get a connection working between the STM32H5 and my Linux host over the ST-LINK UART connection. In later posts, we’ll move to more interesting link tech, but since UART is simple and it’s very clear how each byte is sent, it’s a great place to start building on top of.

Since encryption forms the basis of addressing, we need to start off with understanding the cryptographic primitives we need Reticulum uses AES-128 in CBC mode as part of its packet encryption. Before we can do anything interesting, we need a working AES primitive.

The STM32H5 happens to have a hardware AES accelerator, so we don’t need to implement the algorithm ourselves. But we do need to learn how to drive the peripheral. This article covers exactly that: getting AES-128 ECB encryption running on the STM32H573II, sending the result over UART, and verifying it on the host.

Why Zig?

The earlier articles in this series used C and assembly. For this project I wanted to try using Zig, which seems particularly well suited to embedded work. It gives you complete control over memory layout and allocation, compiles to efficient machine code, and—perhaps most relevant here—has no hidden runtime. There’s no libc, no runtime initialisation surprises, and no garbage collector. What you write is (mostly) what runs.

MicroZig provides the bridge between Zig and bare-metal hardware. It handles the vector table, startup code, and peripheral register definitions. Critically though, it doesn’t provide a HAL for the H5. Instead, we get structured access to the raw peripheral registers and have to work through the reference manual ourselves— which is exactly what we want.

Project setup

The project structure is straightforward:

aes-test/
├── build.zig          # build script
├── build.zig.zon      # package manifest
└── src/
    ├── h5.zig				# firmware for the STM32H573II
    └── uart_receiver.zig	# host-side Linux receiver

Here, I’m pulling in MicroZig as a local dependency, as I’ll be customizing the register-data files slightly. The build.zig.zon pulls in MicroZig as a local dependency:

// build.zig.zon
.dependencies = .{
    .microzig = .{
        .path = "../microzig/",
    },
},

And build.zig creates two targets: one for the firmware, and one for the Linux host utility:

// build.zig (excerpt)
const mz_builder = try microzig.MicroBuild.init(b, .{});
const firmware = mz_builder.add_firmware(b, .{
    .name = "aes-test",
    .target = mz_builder.add_chip(stm32h573ii),
    .optimize = .ReleaseSmall,
    .root_source_file = b.path("src/h5.zig"),
});
mz_builder.install_firmware(b, firmware, .{});

To build and flash the firmware:

zig build h5
STM32_Programmer_CLI -c port=SWD -w zig-out/firmware/aes-test.bin 0x8000000 -v -rst

A detour: the missing flag

Before diving into the AES code, it’s worth mentioning a small but instructive bump we hit.

MicroZig’s register definitions for the STM32 family are generated from the same data used by the embassy-rs project. This data lives in the stm32-data repository and describes every peripheral, register, and field for each chip in the STM32 family.

When working with AES, we need to clear the CCF (Computation Complete Flag) in the AES_ICR register once we’ve read the output. But the generated Zig code had no CCF field in AES_ICR at all.

The fix was to add it to the upstream stm32-data definitions. I submitted a small patch which the repo maintainers merged within 24 hours. Great stuff!

Another detour: fixing the key registers

While the missing CCF field was a straightforward upstream fix, the key registers required a different approach.

In the STM32H5 reference manual, the AES key registers are not laid out contiguously. The 128-bit key spans eight 32-bit registers (AES_KEYR0AES_KEYR7), but they’re split across two 16-byte blocks with the IV registers (AES_IVR0AES_IVR3) sitting in between. The memory map looks roughly like this:

AES_KEYR0   0x010
AES_KEYR1   0x014
AES_KEYR2   0x018
AES_KEYR3   0x01C
AES_IVR0    0x020   ← IV registers interrupt the key block
AES_IVR1    0x024
AES_IVR2    0x028
AES_IVR3    0x02C
AES_KEYR4   0x030   ← second half of the key resumes here
AES_KEYR5   0x034
AES_KEYR6   0x038
AES_KEYR7   0x03C

The stm32-data source defines the full set of eight key registers as a single array— which is the natural way to represent them—but with non-contiguous offsets. MicroZig currently only supports register arrays with a uniform length and stride; arrays defined with per-element offsets are just ignored. This meant the key registers weren’t accessible at all.

The fix I took locally was to split the definition into two separate four-element arrays, one for each contiguous block, reflecting the actual memory layout. I rebuilt the stm32-data JSON files locally, then updated ../microzig/port/stm32/build.zig.zon to point at the local build:

// ../microzig/port/stm32/build.zig.zon
.stm32_data = .{
    .path = "../stm32-data/build/",
},

With that in place, the first block of key registers is accessible as AES.KEYR1[0..3], which is exactly what the loop in the key loading section uses:

while (i < 4) {
    AES.KEYR1[i] = std.mem.readInt(u32, key[(3 - i) * 4 ..][0..4], .big);
    i += 1;
}

I haven’t submitted this change upstream to embassy-rs. The Rust side may well handle arrays with non-uniform offsets correctly, and splitting the definition into two arrays could silently break something in the Rust HAL. Until I understand that better, the local patch stays local.

The AES peripheral

The STM32H5’s AES peripheral supports AES-128, AES-192, and AES-256, in a variety of chaining modes: ECB, CBC, CTR, GCM, CCM. For this first experiment, we’ll use ECB (Electronic Code Book) with a 128-bit key, since ECB processes exactly one block at a time— which is all we need to verify the hardware is working correctly.

ECB is not safe for encrypting multiple blocks of related data, because identical plaintext blocks produce identical ciphertext blocks. We’re only using it here for testing purposes. Reticulum uses CBC, which we’ll move to once we’ve confirmed the fundamentals.

The peripheral operation in ECB encryption mode is described in RM0481 section 33.4.9, “ECB and CBC encryption process”.

The procedure, stripped to its essentials, is:

  1. Enable the AES peripheral clock via RCC_AHB2ENR.AESEN
  2. Disable the peripheral (AES_CR.EN = 0) before configuration
  3. Set the operating mode, chaining mode, and key size in AES_CR
  4. Load the key into the AES_KEYR registers
  5. Wait for AES_SR.KEYVALID to confirm the key has been prepared
  6. Enable the peripheral (AES_CR.EN = 1)
  7. Write the four 32-bit words of plaintext into AES_DINR
  8. Poll AES_ISR.CCF until the computation is complete
  9. Read the four 32-bit words of ciphertext from AES_DOUTR
  10. Clear the CCF flag via AES_ICR.CCF

Let’s walk through each of these in our Zig code.

Clock and configuration

// src/h5.zig
fn aesEncrypt(out: []u8) !void {
    RCC.AHB2ENR.modify(.{ .AESEN = 1 });
    AES.CR.modify(.{ .EN = 0 });
    AES.CR.modify(.{
        .CHMOD = 0x0,
        .MODE = AES_MODE.Mode1,
        .KEYSIZE = 0,
    });

CHMOD = 0x0 selects ECB mode. MODE = Mode1 means encryption (as opposed to key derivation or decryption). KEYSIZE = 0 selects 128-bit keys.

Key loading

The key is written in 32-bit words into the AES_KEYR registers. The AES peripheral expects the key words written in order (from register 0 to 3), in little-endian format at the “word level” (highest address KEYR1[3] should have the most significant word), and big-endian format at the byte level (highest address contains the least significant byte):

// src/h5.zig
const key = [16]u8{
    0x00, 0x01, 0x02, 0x03,
    0x04, 0x05, 0x06, 0x07,
    0x08, 0x09, 0x0A, 0x0B,
    0x0C, 0x0D, 0x0E, 0x0F,
};
i = 0;
while (i < 4) {
    AES.KEYR1[i] = std.mem.readInt(u32, key[(3 - i) * 4 ..][0..4], .big);
    i += 1;
}
while (AES.SR.read().KEYVALID == 0) continue;
AES.CR.modify(.{ .EN = 1 });

The KEYVALID wait ensures the hardware has finished processing the key schedule before we start feeding it plaintext.

Writing plaintext and reading ciphertext

Plaintext is pushed word by word into AES_DINR. Unlike the key, the plaintext words are written in big-endian order (most significant word plaintext[0..4] should be written first), but like the key, each word in big-endian format:

// src/h5.zig
const plaintext = [16]u8{
    0x00, 0x11, 0x22, 0x33,
    0x44, 0x55, 0x66, 0x77,
    0x88, 0x99, 0xAA, 0xBB,
    0xCC, 0xDD, 0xEE, 0xFF,
};

i = 0;
while (i < 4) {
    AES.DINR = std.mem.readInt(u32, plaintext[i * 4 ..][0..4], .big);
    i += 1;
}

while (AES.ISR.read().CCF == 0) continue;

i = 0;
while (i < 4) {
    std.mem.writeInt(u32, out[i * 4 ..][0..4], AES.DOUTR, .big);
    i += 1;
}

AES.ICR.modify(.{ .CCF = 1 });

Writing all four words to DINR triggers the hardware to begin encryption. We then spin on CCF in the interrupt status register until the hardware signals completion. The four output words can then be read from DOUTR, and finally we clear the flag in ICR to reset the peripheral for the next operation.

Choosing a known test vector

To verify that we’ve wired everything up correctly, we use a key-plaintext-ciphertext triplet from the NIST AES test vectors published in FIPS 197 (Appendix B):

Value
Key000102030405060708090a0b0c0d0e0f
Plaintext00112233445566778899aabbccddeeff
Ciphertext69c4e0d86a7b0430d8cdb78070b4c55a

If the hardware gives us back 69c4e0d8..., we know the peripheral is working.

Sending the result over UART

Once encryption is done, we send the 16-byte ciphertext over UART. We’ve covered UART initialisation in some depth in Poor Man’s printf; the same principles apply here, just written in Zig instead of assembly.

The initialisation sequence configures PA9 as USART1_TX in alternate function mode, sets the baud rate and enables the transmitter:

// src/h5.zig
fn uartInit() void {
    RCC.AHB2ENR.modify(.{ .GPIOAEN = 1 });
    GPIOA.MODER.modify(.{ .@"MODER[9]" = GPIO_TYPE.MODER.Alternate });
    GPIOA.AFR[1].modify(.{ .@"AFR[1]" = 7 }); // AF7 = USART1
    RCC.APB2ENR.modify(.{ .USART1EN = 1 });
    USART1.CR1.modify(.{ .UE = 1 });
    USART1.BRR.modify(.{ .BRR = 0x116 }); // 115200 baud
}

And the raw transmit function polls TXE (transmit data register empty) before writing each byte:

// src/h5.zig
fn uartSendRaw(msg: []const u8) !void {
    USART1.CR1.modify(.{ .TE = 1 });
    for (msg) |char| {
        while (USART1.ISR.read().TXE != 1) continue;
        USART1.TDR.modify(.{ .DR = char });
    }
}

The TE bit (Transmitter Enable) is set when we actually begin transmitting, following the sequence described in RM0481 section 50.5.6. One thing to note with MicroZig’s abstraction: where in our previous assembly code we manually calculated and wrote register addresses, here the register accesses are fully type-checked at compile time. Typos in register or field names become build errors, not silent misbehaviours.

The main loop ties it all together: initialise the peripherals, wait one second using SysTick, encrypt a block, send it, repeat.

// src/h5.zig
pub fn main() !void {
    // SysTick: 64000 ticks @ 64MHz HSI = 1ms per tick
    systick.LOAD.modify(.{ .RELOAD = 64_000 - 1 });
    systick.VAL.modify(.{ .CURRENT = 0 });
    systick.CTRL.modify(.{
        .CLKSOURCE = 1,
        .TICKINT = 1,
        .ENABLE = 1,
    });

    uartInit();

    var start: u32 = get_ticks();
    while (true) {
        if (get_ticks() - start < 1000) {
            microzig.cpu.wfi();
            continue;
        }
        var cipher: [16]u8 = undefined;
        try aesEncrypt(&cipher);
        uartSendRaw(&cipher) catch {
            GPIOI.ODR.modify(.{ .@"ODR[8]" = .Low });
        };
        start = get_ticks();
    }
}

The wfi (wait for interrupt) between SysTick ticks keeps the core idle rather than burning cycles in a busy loop. The SysTick interrupt handler increments a counter atomically:

// src/h5.zig
var accumulated_ticks: u32 = 0;

fn SysTick_handler() callconv(.c) void {
    _ = microzig.cpu.atomic.add(u32, &accumulated_ticks, 1);
}

pub const microzig_options = microzig.Options{
    .interrupts = .{
        .SysTick = .{ .c = SysTick_handler },
    },
};

The microzig_options declaration is how MicroZig links our handler into the interrupt vector table at compile time. No linker magic or attribute annotations required.

The host side: a side-quest in tty

On the host, we need something to receive our 16-byte ciphertext and verify it by decrypting it back to the original plaintext. A quick Python script would do the job, but for the purposes of this series we want to be able to eventually run our Reticulum logic over any interface, including anything built on Linux, so let’s keep it in Zig.

The device appears on Linux as /dev/ttyACM0, the same virtual COM port we’ve been using throughout this series. To receive raw binary data (rather than printable text), we need to configure the terminal in what’s called “raw mode”, which disables all the line-discipline processing the OS normally applies:

// src/uart_receiver.zig
const fd = try posix.open(port, .{ .ACCMODE = .RDWR, .NOCTTY = true }, 0);
var tty = try posix.tcgetattr(fd);

// Raw mode: zero out all input/output/local processing flags
tty.iflag = .{};
tty.oflag = .{};
tty.lflag = .{};

// 8N1: CS8, enable receiver, ignore modem control lines
tty.cflag = @bitCast(@intFromEnum(std.os.linux.speed_t.B115200));
tty.cflag.CSIZE = .CS8;
tty.cflag.CREAD = true;
tty.cflag.CLOCAL = true;

// Block until 16 bytes arrive; no timeout
tty.cc[@intFromEnum(posix.V.MIN)] = 16;
tty.cc[@intFromEnum(posix.V.TIME)] = 0;

try posix.tcsetattr(fd, .NOW, tty);

There’s considerably more ceremony here than on the bare-metal side. On the firmware, we set a few registers and we’re done. Here, even just configuring a serial port involves input flags, output flags, local flags, control flags, baud rate fields and special character arrays—all inherited from the POSIX terminal model, which in turn traces its history back to physical Teletype machines. We zero out most of this to get to the raw mode we actually want.

With the port configured, reading and decrypting is straightforward:

// src/uart_receiver.zig
var cipher: [16]u8 = undefined;

while (true) {
    var total: usize = 0;
    while (total < 16) {
        const n = try posix.read(fd, &cipher);
        if (n == 0) break;
        total += n;
    }
    if (total < 16) break;
    try bw.print("cipher:\t\t {x}\n", .{cipher});

    const key = [16]u8{
        0x00, 0x01, 0x02, 0x03, 0x04, 0x05,
        0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B,
        0x0C, 0x0D, 0x0E, 0x0F,
    };
    var cleartext: [16]u8 = undefined;
    std.crypto.core.aes.Aes128.initDec(key).decrypt(&cleartext, &cipher);
    try bw.print("cleartext:\t {x}\n", .{cleartext});
}

Note that read() is not guaranteed to return all 16 bytes in one call (the OS may split reads at buffer boundaries), so we accumulate into a total until we have a complete block.

Running the receiver while the firmware is running:

$ zig build linux && ./zig-out/bin/uart_receiver
Listening on /dev/ttyACM0 at 115200 8N1...
cipher:         69 c4 e0 d8 6a 7b 04 30 d8 cd b7 80 70 b4 c5 5a
cleartext:      00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff
cipher:         69 c4 e0 d8 6a 7b 04 30 d8 cd b7 80 70 b4 c5 5a
cleartext:      00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff

The ciphertext matches the FIPS 197 test vector exactly. The hardware is working.

Wrapping up

We’ve managed to:

Next up, we’ll look at activating CBC mode and getting the SHA-256 hardware accelaration working so that we can eventually build Fernet tokens (with Reticulum’s customizations) on the path to building fully-functional Reticulum destinations that we can encrypt messages for.

Hope to see you there,

Cheers.