Dean's Notes

So, in the last post, we got the AES hardware working on the STM32H5 and used it to encrypt individual blocks exactly 16 bytes long. Hardcoding the same key onto both the host and H5, we sent (repeatedly) the same encrypted block to the host over UART and decrypted it there, verifying we got our expected message. The nice thing about this setup is that we also have to test that zig’s software AES implementation plays nicely with what the H5’s hardware hands us.

In understanding Reticulum, this is perhaps the core primitive needed before we can even think about sending our message anywhere. However, there’s still some work to do for the remaining cryptographic infrastructure. We now turn to the remaining list of [cryptographic primitives](TODO: link) Reticulum uses:

AES-256 in CBC mode w/ PKCS#7 with cryptographically-secure, randomly generated IVs
Modified ~~Fernet~~ Tokens (sans timestamp and version fields) using HMAC-SHA-256 for authentication.
X25519 key exchange and HKDF for key derivation
Ed25519 for identity and verification.

AES-CBC-256 w/ PKCS#7

AES is a block cipher: it operates on exactly one 128-bit (16-byte) block at a time. To encrypt variable-length data, you need to run it in a mode of operation.

Block cipher modes are agnostic to the underlying cipher. The same ideas apply whether you’re using AES, Blowfish, or anything else. We’ll stick to AES throughout.

The naive approach gives you the first mode, ECB (Electronic Code Book): slice the plaintext into 16-byte blocks and encrypt each one independently. The output blocks are then concatenated into the ciphertext. ECB is easy to understand, but it still leaves a lot to be desired in terms of security. Because each block is encrypted independently with the same key, identical plaintext blocks produce identical ciphertext blocks. This means larger patterns in the data survive encryption. The classic demonstration is encrypting a bitmap image — the outlines of the original picture remain clearly visible in the ciphertext.

CBC (Cipher Block Chaining) is a direct answer to this problem. Instead of encrypting each block in isolation, CBC XORs each plaintext block with the previous ciphertext block before encrypting it. This chains the blocks together: each ciphertext block depends on all previous plaintext, and any change anywhere in the message cascades through everything after it. For the very first block, there’s no previous ciphertext to XOR with, so we supply a random 16-byte value called the Initialization Vector (IV). The IV itself is not secret; it’s sent along with the ciphertext. Its only job is to ensure that two encryptions of the same plaintext under the same key look completely different.

The “256” in AES-256-CBC just means we’re using a 256-bit key rather than a 128-bit one. The block size — and therefore the IV — is still 128 bits.

PKCS#7 is a padding scheme that handles messages whose length isn’t a multiple of 16. It appends between 1 and 16 bytes to the plaintext before encryption, each byte set to the count of bytes added. If the plaintext happens to already be a multiple of 16, a full extra block of padding is added. This ensures the receiver can always unambiguously strip the padding after decryption.

Cryptographically-secure, pseudorandom, random number generators

Generating a random IV for each encryption requires a cryptographically-secure pseudorandom number generator (CSPRNG). A regular PRNG is not suitable here: if an attacker can predict your IV, the security guarantees of CBC break down. CSPRNGs typically rely on specialized hardware (or OS system calls backed by hardware) to gather true entropy that can’t be predicted from outside.

Fortunately, the STM32H5 has an on-chip RNG peripheral. Enabling it requires turning on the HSI48 clock (the RNG’s clock source) and setting the RNGEN bit. From there, you poll the status register until a fresh 32-bit random word is ready:

// src/h5.zig
fn rngInit() void {
    RCC.AHB2ENR.modify(.{ .RNGEN = 1 });
    RCC.CR.modify(.{ .HSI48ON = 1 });
    RNG.CR.modify(.{ .RNGEN = 1 });
}

fn rngReady() bool {
    const sr = RNG.SR.read();
    return sr.CEIS == 0 and sr.SEIS == 0 and sr.DRDY == 1;
}

fn rngRead() !u32 {
    var timeout: u32 = 1000;
    while (!rngReady()) {
        timeout -= 1;
        if (timeout == 0) return error.Timeout;
    }
    return RNG.DR;
}

pub fn randomBytes(out: []u8) !void {
    rngInit();
    std.debug.assert(out.len % 4 == 0);
    var i: usize = 0;
    while (i < out.len / 4) : (i += 1) {
        std.mem.writeInt(u32, out[i * 4 ..][0..4], try rngRead(), .big);
    }
}

CEIS and SEIS are the clock and seed error interrupt flags. If either is set, the RNG has detected a fault condition and the output is unreliable. We check for both before reading DR.

Moving from AES-ECB-128 to AES-CBC-256 on the hardware

With the RNG in hand, let’s update the AES peripheral for CBC mode with a 256-bit key. The key size moves from 128 to 256 bits (8 key registers instead of 4), but the output block size and IV are still 128 bits — that’s fixed by the AES specification.

The configuration changes are minimal: CHMOD moves from 0x0 (ECB) to 0x1 (CBC), and KEYSIZE changes from 0 (128-bit) to 1 (256-bit):

// src/h5.zig
fn aesInitCBC(iv: *const [16]u8, key: *const [32]u8) void {
    RCC.AHB2ENR.modify(.{ .AESEN = 1 });
    AES.CR.modify(.{ .EN = 0 });
    AES.CR.modify(.{
        .CHMOD = 0x1, // CBC mode
        .MODE = AES_MODE.Mode1, // encryption
        .KEYSIZE = 1, // 256-bit key
    });

    // Load key. STM32 convention: KEYR3 holds the most significant word
    // (key bytes 0-3), KEYR0 holds the least significant (bytes 12-15).
    var i: u8 = 0;
    while (i < 8) : (i += 1) {
        const word = std.mem.readInt(u32, key[(7 - i) * 4 ..][0..4], .big);
        if (i < 4) AES.KEYR1[i] = word else AES.KEYR2[i - 4] = word;
    }
    while (AES.SR.read().KEYVALID == 0) continue;

    // Load IV with the same word-reversed convention.
    i = 0;
    while (i < 4) : (i += 1) {
        AES.IVR[i] = std.mem.readInt(u32, iv[(3 - i) * 4 ..][0..4], .big);
    }

    AES.CR.modify(.{ .EN = 1 });
}

The IV comes from our randomBytes call above. One nice property of the H5’s CBC implementation: after encrypting each block, the peripheral automatically updates its internal IV register with the ciphertext it just produced. This means we don’t need to manually carry the previous ciphertext block between iterations — the hardware does it for us.

// src/h5.zig
pub fn aesCbcEncrypt(encryption_key: *const [32]u8, iv: *const [16]u8, plaintext: []const u8, cipher_out: []u8) void {
    std.debug.assert(plaintext.len % 16 == 0);
    aesInitCBC(iv, encryption_key);
    var i: usize = 0;
    while (i < plaintext.len) : (i += 16) {
        aesEncryptBlock(cipher_out[i..][0..16], plaintext[i..][0..16]);
    }
}

Tokens and message authentication

CBC is considered by many to be obsolete, even dangerously broken. The key issue is that CBC provides confidentiality but not authentication: it encrypts your message, but it doesn’t prevent an attacker from tampering with it. GCM (Galois/Counter Mode), the more popular choice today, provides authenticated encryption natively.

However, Reticulum’s Token format offers an alternative: rather than relying on an authenticated mode, it constructs authentication on top of CBC using HMAC. Before we can understand how the token works, we need to understand HMAC.

Hash-based Message Authentication Codes

HMAC, defined in RFC 2104, is a construction that turns any cryptographic hash function into a message authentication code. The formula is:

MAC = H(K XOR opad, H(K XOR ipad, text))

where H is the hash function, K is the key, ipad is the byte 0x36 repeated B times, opad is the byte 0x5C repeated B times, and B is the block size of the hash function (64 bytes for SHA-256).

To recall what a hash function does: it takes variable-length input and produces a fixed-length “digest” as output. Good hash functions are effectively impossible to reverse, produce wildly different digests for slightly different inputs, and make it infeasible to find two inputs with the same digest.

The double-hash construction in HMAC is not accidental. Doing H(K, text) alone is vulnerable to length extension attacks. The nested structure with different paddings (ipad/opad) simultaneously prevents length extension and collision attacks, making HMAC a robust way to authenticate a message.

HMAC is agnostic to the underlying hash function. Here we use SHA-256, as per the Reticulum specification.

Fortunately, the H5’s HASH peripheral supports HMAC-SHA-256 directly (RM0481 §35.4.7). The peripheral operates in three phases — inner key, message, outer key — each triggered by writing data and setting the DCAL (digest calculation) bit. The hardware handles the ipad/opad padding automatically. DINIS (data input status) going high signals readiness for the next phase; DCIS (digest computation complete) signals that the final output is ready to read.

// src/h5.zig

// Configure the HASH peripheral and assert INIT to reset the core.
// INIT also latches ALGO, MODE, DATATYPE, and LKEY, so all must be set first.
// DATATYPE=0b10 selects byte-swapping: data is written little-endian and the
// hardware reverses byte order within each word before the SHA core sees it,
// matching the big-endian bit-string convention of FIPS 180-4.
fn hashInit(mode: u1, lkey: u1) void {
    RCC.AHB2ENR.modify(.{ .HASHEN = 1 });
    HASH.CR.modify(.{
        .ALGO = HASH_ALGO_SHA256,
        .DATATYPE = 0b10,
        .MODE = mode,
        .LKEY = lkey,
        .INIT = 1,
    });
}

// Write a byte slice into HASH_DIN word by word.
// Complete 32-bit words are written little-endian; the hardware byte-swap corrects
// them into big-endian order for the SHA core.
// A partial final word is packed into the LSBs of the last DIN write; the
// valid bit count is recorded in NBLW before DCAL triggers digest computation.
fn hashFeedData(data: []const u8) void {
    var i: usize = 0;
    while (i + 4 <= data.len) : (i += 4) {
        HASH.DIN = std.mem.readInt(u32, data[i..][0..4], .little);
    }
    if (i < data.len) {
        var last: u32 = 0;
        var b: usize = 0;
        while (b < data.len - i) : (b += 1) {
            last |= @as(u32, data[i + b]) << @as(u5, @intCast(b * 8));
        }
        HASH.DIN = last;
    }
    const nblw: u5 = @intCast((data.len * 8) % 32);
    // NOTE: the two writes must be separate — setting DCAL while writing NBLW
    // causes the hardware to ignore NBLW (RM0481 §35.4.7).
    HASH.STR.modify(.{ .NBLW = nblw });
    HASH.STR.modify(.{ .DCAL = 1 });
}

// Poll until the peripheral is ready to accept the next phase's data (DINIS = 1).
fn hashWaitDinis() void {
    while (HASH.SR.read().DINIS == 0) continue;
}

// Poll until the final digest is ready (DCIS = 1).
fn hashWaitDcis() void {
    while (HASH.SR.read().DCIS == 0) continue;
}

// Read 8 output words from HR (SHA-256 = 256 bits) into a 32-byte buffer.
fn hashReadDigest256(out: *[32]u8) void {
    var i: u8 = 0;
    while (i < 8) : (i += 1) {
        std.mem.writeInt(u32, out[i * 4 ..][0..4], HASH.HR[i], .big);
    }
}

// Compute HMAC-SHA-256 over `data` authenticated with `key`.
// The hardware executes all three HMAC phases (inner key, message, outer key)
// automatically once each phase is triggered by DCAL.
// Keys longer than 64 bytes are pre-hashed by the hardware when LKEY=1.
pub fn hmacSha256(key: []const u8, data: []const u8, out: *[32]u8) void {
    const lkey: u1 = if (key.len > 64) 1 else 0;
    hashInit(1, lkey); // HMAC mode

    // Phase 1: inner key
    hashFeedData(key);
    hashWaitDinis(); // hardware ready for message

    // Phase 2: message
    hashFeedData(data);
    hashWaitDinis(); // hardware ready for outer key

    // Phase 3: outer key
    hashFeedData(key);
    hashWaitDcis(); // final HMAC digest is ready

    hashReadDigest256(out);
}

Putting the token together

With HMAC working, the token format follows naturally. The idea is Encrypt-then-MAC: encrypt the message first, then compute an HMAC over the ciphertext. The receiver verifies the HMAC before attempting decryption, which means tampered messages are rejected before we ever touch the ciphertext.

The wire format is:

IV (16) | ciphertext (N, PKCS#7 padded) | HMAC-SHA256 (32)

The IV can be treated as something special, or simply as the first block of the ciphertext for purposes of the HMAC. Reticulum’s Token covers the IV in the MAC (i.e. the HMAC is computed over IV || ciphertext), which prevents IV tampering.

The Token implementation (Token.zig) is generic over a Primitives type, so the same code runs on the H5 (using the hardware AES and HASH peripherals) and on the Linux host (using Zig’s software implementations):

// src/Token.zig (encrypt)
pub fn encrypt(key: *const [64]u8, plaintext: []const u8, out: []u8) !void {
    const signing_key: *const [32]u8 = key[0..32];
    const encryption_key: *const [32]u8 = key[32..64];

    const padding: u8 = @intCast(16 - (plaintext.len % 16));
    const padded_len = plaintext.len + padding;

    // Generate a random IV and write it as the first 16 bytes of the token.
    var iv: [iv_len]u8 = undefined;
    try Primitives.randomBytes(&iv);
    @memcpy(out[0..iv_len], &iv);

    // PKCS#7-pad the plaintext into a stack buffer.
    var padded: [max_plaintext + 16]u8 = undefined;
    @memcpy(padded[0..plaintext.len], plaintext);
    @memset(padded[plaintext.len..padded_len], padding);

    // Encrypt; ciphertext is written directly after the IV in out.
    Primitives.aesCbcEncrypt(encryption_key, &iv, padded[0..padded_len], out[iv_len..][0..padded_len]);

    // HMAC-SHA256 over IV || ciphertext (Encrypt-then-MAC).
    var mac: [mac_len]u8 = undefined;
    Primitives.hmacSha256(signing_key, out[0 .. iv_len + padded_len], &mac);
    @memcpy(out[iv_len + padded_len ..][0..mac_len], &mac);
}

The 64-byte key is split in half: the first 32 bytes are the signing key for HMAC, and the second 32 bytes are the encryption key for AES-256-CBC. Where does this key come from? That’s next.

So why not GCM?

I hinted above that many consider CBC obsolete and that GCM is more popular today. So why might Reticulum use CBC?

I can only speculate. One likely reason: GCM is expensive in software, and many of Reticulum’s target devices don’t have it hardware-accelerated. CBC is simpler and more widely supported on constrained hardware.

One other interesting side-effect — I’m not sure if intended — is that CBC’s 16-byte block size makes it harder for a passive observer to extract the exact length of the plaintext from the ciphertext. The padding always rounds up to the next 16-byte boundary. GCM and other stream-like modes can expose length more directly, though padding can of course be built on top.

X25519 key exchange

Moving up the stack, we need a way for two parties to agree on a shared key without ever sending that key over the wire. This is the Diffie-Hellman key exchange, here using elliptic curves as defined by X25519.

The idea, stripped to its essentials: each party generates a keypair. The public keys are exchanged openly. Each party then performs a scalar multiplication of their own private key with the other party’s public key. The mathematical structure of the elliptic curve guarantees that both sides arrive at the same shared secret, while an eavesdropper who only sees the public keys cannot recover it.

Diffie-Hellman predates elliptic curves, but it works on group-theoretic principles — the underlying math doesn’t care what the group is, as long as the group operation is easy to compute but hard to reverse. X25519 just happens to use a particularly efficient elliptic curve group.

For X25519, there is no hardware support on the H5 — and that’s fine. In Reticulum, key exchange only needs to happen once per Link session, so the software overhead isn’t a concern. Zig’s standard library includes a complete X25519 implementation that works identically on bare metal and on Linux:

// src/h5.zig
// Generate an ephemeral X25519 keypair from the hardware RNG.
var enc_seed: [32]u8 = undefined;
try Primitives.randomBytes(&enc_seed);
const enc_kp = try X25519.KeyPair.generateDeterministic(enc_seed);

The “seed” here is the 32 bytes of entropy from the RNG. It’s fed into generateDeterministic to produce the actual keypair, where the private key is computed from (and in this case, is essentially) the seed.

X25519 private keys are supposed to have certain bits clamped to prevent so-called “small-subgroup” attacks. I don’t see this clamping done explicitly in Zig’s standard library implementation, so it’s either handled internally or considered unnecessary for Curve25519.

HMAC-based key derivation function

We now have a shared secret from the X25519 exchange. Can we use it directly as our AES key? Not safely. From RFC 5869:

In many applications, the input keying material is not necessarily distributed uniformly, and the attacker may have some partial knowledge about it (for example, a Diffie-Hellman value computed by a key exchange protocol) or even partial control of it (as in some entropy-gathering applications). Thus, the goal of the “extract” stage is to “concentrate” the possibly dispersed entropy of the input keying material into a short, but cryptographically strong, pseudorandom key.

This is where the HKDF (HMAC-based Key Derivation Function) comes in. HKDF has two stages. First, “extract”: it runs HMAC over the shared secret (the input key material) using a salt, producing a fixed-size pseudorandom key (PRK). Then, “expand”: it runs HMAC repeatedly over the PRK, concatenated with an optional context string and a counter, to produce as many bytes of output key material as needed.

The salt and context string are important. They bind the derived key to your application and callsite, so that the same Diffie-Hellman secret can be used in different parts of the system without ever deriving the same key twice:

// src/HKDF.zig
pub fn derive(
    comptime out_len: usize,
    ikm: []const u8,
    salt: []const u8,
    context: []const u8,
    out: *[out_len]u8,
) void {
    // --- Extract ---
    // PRK = HMAC-SHA256(salt, IKM)
    const zero_salt = [_]u8{0} ** 32;
    const effective_salt: []const u8 = if (salt.len > 0) salt else &zero_salt;
    var prk: [32]u8 = undefined;
    Primitives.hmacSha256(effective_salt, ikm, &prk);

    // --- Expand ---
    // T(i) = HMAC-SHA256(PRK, T(i-1) || context || i)
    var input_buf: [32 + 128 + 1]u8 = undefined;
    var t: [32]u8 = undefined;
    var t_len: usize = 0;
    const n_blocks = (out_len + 31) / 32;
    var written: usize = 0;

    for (0..n_blocks) |i| {
        var pos: usize = 0;
        @memcpy(input_buf[pos..][0..t_len], t[0..t_len]);
        pos += t_len;
        @memcpy(input_buf[pos..][0..context.len], context);
        pos += context.len;
        input_buf[pos] = @intCast(i + 1);
        pos += 1;
        Primitives.hmacSha256(&prk, input_buf[0..pos], &t);
        t_len = 32;
        const to_copy = @min(32, out_len - written);
        @memcpy(out[written..][0..to_copy], t[0..to_copy]);
        written += to_copy;
    }
}

This is ported directly from the Reticulum reference implementation. We call it with out_len = 64 to get our 64-byte Token key — 32 bytes for the HMAC signing key, 32 bytes for the AES encryption key.

Ed25519: identity and signing

We have encryption and authentication, but there’s still nothing stopping a man-in-the-middle attack. An attacker sitting between the H5 and the host could substitute their own public key during the X25519 exchange, and both sides would happily derive a shared secret with the attacker rather than with each other.

To prevent this, we need a way for each party to prove their identity: something fixed, something that can be verified cryptographically. This is done with digital signatures using Ed25519.

Ed25519 provides two things: a persistent signing keypair (distinct from the ephemeral encryption keypair from X25519), and a signing operation that produces a 64-byte signature cryptographically bound to both the message and the signer’s private key. Anyone with the signer’s public key can verify the signature; no one without the private key can forge one.

In Reticulum, each node generates its signing keys at startup and saves the private key to disk (typically in ~/.reticulum/identities/). For our H5, we hardcode a seed (base64-encoded) in the firmware and generate the keypair deterministically from it. This means the H5 presents the same identity across reboots, and the Linux host can verify against the known public key:

// src/h5.zig
const sig_seed_base64 = "u7VjTycOnosga1r8r-ENxjoy2rq5VsdBW1gb56Lq7D4=";

// Ed25519 uses a software SHA-512 internally. Infrequent enough not to matter.
var buf: [64]u8 = undefined;
try base64_decoder.decode(&buf, sig_seed_base64);
const sig_kp = try Ed25519.KeyPair.generateDeterministic(buf[0..32].*);

Signing is then a single call. We pass in the content to sign and 32 bytes of randomness (the “noise” parameter makes Ed25519 signatures non-deterministic, which is a hardening measure against certain fault attacks):

// src/h5.zig
var noise: [32]u8 = undefined;
try Primitives.randomBytes(&noise);
const sig = try sig_kp.sign(&pub_key, noise);

Putting it all together

Reticulum’s packet types are out of scope for this article, but we still need something like an “announce” mechanism to initiate key exchange when both devices come online. The announce carries the public parts of both the encryption and signing keys, plus a signature over both. This lets the receiver verify that the announce actually came from the claimed identity before trusting any derived keys.

The announce format is:

X25519 pub (32) || Ed25519 pub (32) || Signature (64)

The X25519 key is ephemeral — fresh for each session. The Ed25519 key is persistent — the same across reboots on the H5.

The full exchange between H5 and host goes like this:

The host starts up and waits for an announce message.
On boot, the H5 sends its announce (pressing the button repeats it if the host misses it).
Both sides verify the other’s signature.
Both sides perform X25519.scalarmult to derive the shared secret, then run HKDF to get the 64-byte Token key.
The H5 starts sending encrypted, authenticated Token messages.

The relevant code on the H5 side:

// src/h5.zig
var pub_key: [64]u8 = undefined;
@memcpy(pub_key[0..32], &enc_kp.public_key);
@memcpy(pub_key[32..], &sig_kp.public_key.bytes);
var noise: [32]u8 = undefined;
try Primitives.randomBytes(&noise);
const sig = try sig_kp.sign(&pub_key, noise);

try sendAnnounce(pub_key, sig.toBytes());
const peer_announce = receiveAnnounce();
const peer_sig = Ed25519.Signature.fromBytes(peer_announce[64..128].*);
const peer_key = try Ed25519.PublicKey.fromBytes(peer_announce[32..64].*);
peer_sig.verify(peer_announce[0..64], peer_key) catch {
    try uartSendRaw("Computer says no.");
    while (true) {}
};

const shared = try X25519.scalarmult(enc_kp.secret_key, peer_announce[0..32].*);
const fernet_key = deriveKey(shared);

And then the main loop simply encrypts and sends a Token every second:

// src/h5.zig
while (true) {
    if (get_ticks() - start < 1000) {
        microzig.cpu.wfi();
        continue;
    }
    start = get_ticks();
    try sendTokenEncrypted(&fernet_key);
}

And that’s it for today. The full code, including the Linux-host code that listens for and decrypts the messages over UART, can be found here. What we’ve built here is akin to Reticulum’s Link establishment process, where ephemeral keys are produced for each Link session. (Single Destination types still use the same process, but repeat the entire key exchange per message). Hopefully it should be clear why these primitives are necessary. It’s interesting to think about what properties these primitives convey to our system, and then to understand that the exact implementation of these primitives is only relevant insofar as it upholds that property. Message authentication, key exchange and identity all provide properties that can be satisfied with a range of algorithms. Reticulum makes opinionated choices about these primitives, but hopefully understanding them will help us to build up our own cryptographic infrastructure should we need networks with similar properties. Imagine a multi-node embedded or robotics scenario, where we might want to make use of the identity and encryption mechanisms, but we can choose to introduce AES in GCM mode to improve performance when we know all of our hardware supports it.

Stay tuned for the next article in the series, where we’ll try and introduce similar principles on a new chip type and work towards a small cluster of nodes talking to one another.

Cheers.