Dean's Notes

In our last post, we completed our quest to provide cryptographic primitives using hardware implementations on the STM32. There, we used UART and communicated with a simple Linux-host serial program. On our path to understanding Reticulum from scratch, however, we want to be able to communicate on more interesting media, in particular using radio waves. Given the strict regulations on the use of radio waves, that leaves us with essentially with WiFi and Bluetooth in the higher-frequency bands and LoRa (and perhaps one day WiFi HaLow for the sub-GHz range).

In this article, we’ll explore the wireless landscape for embedded devices and work towards an implementation that can send raw packets over the air. As we’ll see, asynchronous programming will play an important role here, as we’ll want to start taking advantage of DMA and upgrade our simple blocking sending application to one that can simultanouesly send and receive. This does unfortunately mean we’ll have to leave Zig behind (for now) while the team still works towards landing the new async features. Sending packets over radio waves is also much more complex than single bytes over a UART connection, and so switching to Rust unlocks a mature ecosystem of external libraries, debugging tools and open-source projects that we’ll be able to take advantage of.

The hardware I’ll use for this article are a pair of LILYGO TTGO LoRa32 V2.1_1.6 (868MHz) that I purchased for less than 20€ each on AliExpress. This also means we’ll be taking a little break from the usual STM32H5 deep diving and take a look at how the underlying ESP32 MCU works. We won’t be using the LoRa capabilities yet, so any old ESP32 device would also suffice.

Why do all this ourselves?

Before we move on much further, I want to address my motivation for this series, which I’ve only touched on briefly with the phrase: “to understand Reticulum better”. This is definitely the main motivation, but I think it also goes a bit deeper. Indeed, we could probably just install the reference Reticulum installation as an RNode (which is where I came across the LilyGo devices to begin with) and poke and prod at it until we get something working. I do hope to do this in later articles as a comparison.

However, Reticulum is — as we’ve seen — agnostic to the underlying medium. So simply following the path well taken would give us something that works, but not give us much insight into its greatest strength. I can imagine Reticulum (or some variant of it) being excellent for multi-processor, embedded scenarios, where its cryptographic primitives provide a secure and efficient means for different nodes to coordinate amongst themselves.

Similarly, imagine a mesh of resiliant RNodes that provides crisis-resiliant communication. One might want to set and forget a few of them running on solar panels and be sure that when they’re needed, they work efficiently. So once again, power consumption becomes a main concern.

As we’ll see, an understanding of the software concepts we want to build, as well as knowledge of what a piece of hardware can do, together provide us the most fertile ground genuinely novel applications. And with AI marching on the way it is, it’s never been easier to program the hardware itself and remove our reliance on other’s code.

Let’s just send a packet, right?

As I alluded to at the beginning, sending packets over radio waves is hard. For this article, we’ll stick with WiFi for now — it’s the one wireless technology we can be almost certain any device supports, and the ESP32 has it built in.

WiFi as a standard sits at the bottom of the network stack, spanning the physical (PHY) and data link (MAC) layers of the OSI model. The PHY layer handles the actual modulation and demodulation of radio signals; the MAC layer sits above it and manages who gets to speak on the shared medium. Some embedded WiFi stacks even bundle higher-layer IP and TCP implementations on top of this.

WiFi has accumulated eight generations since its introduction — from the original 802.11 ratified in 1997 through to the emerging 802.11bn (WiFi 8). Each generation layers on more complexity: new modulation and demodulation schemes (DSSS, OFDM, OFDMA), new ways to share the medium (CSMA/CA for collision avoidance, or OFDMA for true multi-user access), new network topologies (the familiar AP/client infrastructure mode, but also ad-hoc and mesh modes), Quality of Service mechanisms for prioritizing certain traffic, and successive generations of security protocols (WEP, WPA, WPA2, WPA3).

For our purposes, the ideal would be to bypass most of this machinery, and send raw, custom-formatted packets directly over the air. We really “just” need modulation and multiplexing.

The catch is that most of this is implemented in dedicated hardware or in a separate chip entirely. The STM32H5 development kit, for example, connects to an external WiFi module via SPI — the WiFi chip has its own firmware stack, and we interact with it through AT commands or a vendor API. Even on the ESP32, where the WiFi hardware is on the same die as the main processor, the radio firmware lives in ROM and is not open source. (the hardware itself is also barely mentioned in the ESP32 reference manual).

The other issue is that vendor WiFi stacks almost universally assume a few narrow use cases: we provide an SSID and credentials and the stack tries to connect to an Access Point. It should be possible to send raw packets, but we’ll be fighting the stacks provided to us.

We might try this out later with the STM32H5, but fortunately for us, a clever duo from Germany and Belgium have put in the yards to reverse engineer the ESP32’s WiFi hardware. The ESP32 Open MAC project has produced a Rust crate (esp-wifi-hal) that gives us direct access to the 802.11 MAC layer, (almost) completely bypassing the Espressif firmware stack. We can transmit and receive raw 802.11 frames, set our own MAC addresses, configure the RX filters, and choose the channel — all from safe Rust.

In their talk they also mention a “long-range” WiFI mode that can be enabled somewhere. I haven’t figured this out yet, but something to come back to in a future article.

Project setup

The full code for this article lives at here.

Getting started with Rust on the ESP32 requires a custom toolchain. The Xtensa architecture used by the original ESP32 is not supported by upstream LLVM, so Espressif maintains their own compiler fork. Installing it is straightforward with espup:

espup install

This creates an ~/.espressif/ directory with the custom compiler and libraries, and produces an export-esp.sh to add everything to your PATH. You’ll also want espflash for flashing and serial monitoring:

cargo install espflash

On the LilyGo TTGO LoRa32, there’s no SWD or JTAG header — only USB serial. This means probe-rs is not an option. espflash’s built-in monitor is all we get for logging and debugging.

I tried esp-generate, the scaffolding tool for ESP32 Rust projects, and found it largely unnecessary. All it does is add the relevant crates and config to a fresh project, which is just as easy to do by hand — and you understand what you’re adding.

The dependencies for this stage are:

// Cargo.toml
[package]
name = "reticulum-rust-esp32"
version = "0.1.0"
edition = "2024"

[dependencies]
esp-hal = { version = "~1.0", features = ["esp32", "unstable", "defmt"] }
esp-bootloader-esp-idf = { version = "0.4.0", features = ["esp32"] }
esp-backtrace = { version = "0.18.1", features = [
    "esp32",
    "panic-handler",
    "defmt",
] }
esp-println = { version = "0.16.1", features = ["esp32", "log-04", "defmt-espflash"] }
ssd1306 = "0.10.0"
embedded-graphics = "0.8.2"
defmt = "1.0.1"
critical-section = "1.2"

esp-wifi-hal = { git = "https://github.com/esp32-open-mac/esp-wifi-hal", features = ["esp32", "defmt"] }
static_cell = "2.1.1"

defmt (deferred formatting) is the standard logging solution for embedded Rust. Rather than formatting strings on the device — expensive in both CPU and memory — it transmits compact binary frames over a transport (here, the serial connection via espflash) and formats them on the host. The LilyGo board also comes with a small 128×64 OLED display connected over I2C, so we pull in ssd1306 and embedded-graphics for a bit of showmanship. The esp-wifi-hal crate — the Open MAC project — is what we’re ultimately here for; we’ll add it once the basics are working.

Two other files matter a lot but are easy to overlook. .cargo/config.toml tells Cargo which target to compile for and how to run (flash) the resulting binary:

// .cargo/config.toml
[target.xtensa-esp32-none-elf]
runner = "espflash flash --monitor --chip esp32"

[build]
rustflags = [
  "-C", "link-arg=-nostartfiles",
  "-C", "link-arg=-Tdefmt.x",
]

target = "xtensa-esp32-none-elf"

[unstable]
build-std = ["core"]

And build.rs wires in the linker scripts that esp-hal depends on:

// build.rs
fn main() {
    linker_be_nice();
    println!("cargo:rustc-link-arg=-Tlinkall.x");
}

fn linker_be_nice() {
...
}

linkall.x is the ESP-IDF memory layout script; defmt.x (added via rustflags) provides the section definitions for defmt‘s binary transport. The linker_be_nice helper is boilerplate that catches common missing symbols and replaces the linker’s cryptic errors with human-readable hints — a small quality-of-life detail that esp-generate would have added silently. It’s worth knowing it’s there.

With everything in place, cargo run builds, flashes, and opens a serial monitor in one step. Here’s a first proof of life — text rendered to the OLED and defmt logging from the main loop:

#[esp_hal::main]
fn main() -> ! {
    let peripherals = esp_hal::init(esp_hal::Config::default());
    let delay = Delay::new();

    esp_println::logger::init_logger_from_env();

    let i2c = I2c::new(peripherals.I2C0, Config::default())
        .unwrap()
        .with_scl(peripherals.GPIO22)
        .with_sda(peripherals.GPIO21);

    let interface = I2CDisplayInterface::new(i2c);
    let mut display = Ssd1306::new(interface, DisplaySize128x64, DisplayRotation::Rotate0)
        .into_buffered_graphics_mode();
    display.init().unwrap();

    let text_style = MonoTextStyleBuilder::new()
        .font(&FONT_6X10)
        .text_color(BinaryColor::On)
        .build();

    Text::with_baseline("Hello world!", Point::zero(), text_style, Baseline::Top)
        .draw(&mut display)
        .unwrap();

    Text::with_baseline("Hello Rust!", Point::new(0, 16), text_style, Baseline::Top)
        .draw(&mut display)
        .unwrap();

    display.flush().unwrap();
    defmt::info!("Hello world!");
    loop {
        delay.delay(Duration::from_millis(500));
    }
}

espflash‘s monitor also decodes defmt stack traces on panics. It’s not perfect, but at least we can see the offending line of code. Notice below that I put a panic! statement on line main.rs:409:

[ERROR]  (esp_backtrace esp-backtrace-0.18.1/src/lib.rs:78)
[ERROR] ====================== PANIC ====================== (esp_backtrace esp-backtrace-0.18.1/src/lib.rs:78)
[ERROR] panicked at src/main.rs:409:5 (esp_backtrace esp-backtrace-0.18.1/src/lib.rs:78)
[ERROR]  (esp_backtrace esp-backtrace-0.18.1/src/lib.rs:78)
[ERROR] Backtrace: (esp_backtrace esp-backtrace-0.18.1/src/lib.rs:78)
[ERROR]  (esp_backtrace esp-backtrace-0.18.1/src/lib.rs:78)
[ERROR] 0x400d7d91 (esp_backtrace esp-backtrace-0.18.1/src/lib.rs:78)
0x400d7d91 - core::panicking::panic_fmt
    at /home/dean/.rustup/toolchains/esp/lib/rustlib/src/rust/library/core/src/panicking.rs:80
[ERROR] 0x400d1180 (esp_backtrace esp-backtrace-0.18.1/src/lib.rs:78)
0x400d1180 - reticulum_rust_esp32::__xtensa_lx_rt_main
    at /home/dean/repos/reticulum-rust-esp32/src/main.rs:409
[ERROR] 0x40080b92 (esp_backtrace esp-backtrace-0.18.1/src/lib.rs:78)
[ERROR]  (esp_backtrace esp-backtrace-0.18.1/src/lib.rs:78)
[ERROR]  (esp_backtrace esp-backtrace-0.18.1/src/lib.rs:78)
[ERROR]  (esp_backtrace esp-backtrace-0.18.1/src/lib.rs:78)

Let’s build an RTOS

Our goal for this article is an echo program: one device sends an arbitrary message, the other receives it and sends it straight back. Simple enough in concept, but it immediately surfaces a scheduling problem. Both devices need to take turns sending and receiving — which means we need a way to switch between those two activities and to wait efficiently for the hardware to signal that a packet has arrived.

In the embedded world, an RTOS would normally fill this role. It lets us define tasks and interrupt handlers, and manages switching between them. That said, it’s a bit reductive to think of an RTOS as just a scheduler — they provide a lot more (memory protection being perhaps the most important one). But for understanding how bare-metal scheduling actually works, starting without one is the most instructive path. We’ll know exactly what we’re doing, and we’ll have a clear picture of what an RTOS would buy us later.

Fortunately, Rust has an elegant starting point here. async fns are compiled into stackless coroutines (a.k.a. generators): state machines that implement the Future trait. Each .await point becomes a state transition. The machine can be polled to advance, and returns Poll::Pending if it has nothing to do yet, or Poll::Ready(val) when it’s done. However, something still has to drive that polling, which is where an executor comes into play.

embassy-rs is the standard choice for embedded async Rust, But as usual, let’s dive into writing the simplest possible executor ourselves to understand what’s actually happening.

We’ll start with a spinning executor — essentially just a block_on function that polls a single future in a busy loop, along with a few static variables we use to track state:

static TICKS: AtomicU32 = AtomicU32::new(0);
static TIMER0: critical_section::Mutex<RefCell<Option<Timer0>>> =
    critical_section::Mutex::new(RefCell::new(None));
static WAKER: critical_section::Mutex<RefCell<Option<Waker>>> =
    critical_section::Mutex::new(RefCell::new(None));

static VTABLE: RawWakerVTable = RawWakerVTable::new(
    |p| RawWaker::new(p, &VTABLE), // clone  — keep the same data pointer
    |_| {},                        // wake (consuming)  — nothing to do
    |_| {},                        // wake_by_ref       — nothing to do
    |_| {},                        // drop              — nothing to free
);

fn block_on<F: Future>(future: F) -> F::Output {
    // Safety: the vtable functions are all no-ops so the null data pointer
    // is never dereferenced.
    let waker = unsafe { Waker::new(core::ptr::null(), &VTABLE) };
    let mut cx = Context::from_waker(&waker);
    // pin! fixes the future in place on the stack so its address is stable
    // across polls (futures may contain self-referential pointers).
    let mut future = pin!(future);
    loop {
        match future.as_mut().poll(&mut cx) {
            Poll::Ready(val) => return val,
            Poll::Pending => {} // spin: just try again immediately
        }
    }
}

We also need some code to set up the Timer peripheral and have it call our ISR:

#[esp_hal::main]
fn main() -> ! {
	...
    let tg0 = TimerGroup::new(peripherals.TIMG0);
    let mut timer0 = PeriodicTimer::new(tg0.timer0);
    timer0.set_interrupt_handler(tg0_t0_handler);
    timer0.start(Duration::from_millis(100)).unwrap();
    timer0.listen();
    critical_section::with(|cs| {
        TIMER0.borrow_ref_mut(cs).replace(timer0);
    });
	...
}

The Waker is built from a vtable of function pointers that tell the executor how to schedule a re-poll when a future signals it’s ready to make progress. Here, those functions are all no-ops: we’re going to poll unconditionally anyway, so there’s nothing to do. The pin! macro is required because futures may contain self-referential pointers between their internal state fields, and the compiler needs a guarantee that the future won’t be moved in memory between polls.

Even with a no-op waker, a future that returns Pending is still required to call cx.waker().wake_by_ref() before returning. The contract is: a future that returns Pending must arrange for the waker to be called, or the executor may never poll it again. Our spin executor ignores this — but a real executor depends on it.

In the docs for Future, the word “task” is used repeatedly even though there is no explicit implementation of whatever a task might be. Tasks themselves are an implementation detail of whatever executor we use: they store information about a running Future and allow the executor to find and resume any that are waiting to be progressed. In our simple, single-future executor, the task is implicit. Since our executor is a single function that only accepts one future, it’s trivial to keep track of that future. We might think of the pointer to that future as the task here. In more advanced executors, we’d expect to see structs with bookkeeping data, themselves organized in a way that makes the executor’s job easier.

Moving on: to make the pattern concrete, here are two simple futures. First, YieldNow — a future that returns Pending exactly once, then Ready:

struct YieldNow(bool /* already_yielded */);

impl Future for YieldNow {
    type Output = ();
    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<()> {
        if self.0 {
            Poll::Ready(())
        } else {
            self.0 = true;
            cx.waker().wake_by_ref();
            Poll::Pending
        }
    }
}

And WaitForTick — a future that suspends until a hardware timer ISR has fired a given number of times. Rather than spinning itself, it stores its Waker into a global so the ISR can call wake():

struct WaitForTick {
    target: u32,
}

impl WaitForTick {
    fn after(ticks: u32) -> Self {
        Self {
            target: TICKS.load(Ordering::Relaxed).wrapping_add(ticks),
        }
    }
}

impl Future for WaitForTick {
    type Output = ();
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<()> {
        if TICKS.load(Ordering::Acquire) >= self.target {
            return Poll::Ready(());
        }
        // Park the waker so the ISR can wake us on the next tick.
        critical_section::with(|cs| {
            *WAKER.borrow_ref_mut(cs) = Some(cx.waker().clone());
        });
        // Check again after storing the waker to close the race window:
        // the ISR might have fired between the first check and the store.
        if TICKS.load(Ordering::Acquire) >= self.target {
            Poll::Ready(())
        } else {
            Poll::Pending
        }
    }
}

The double-check around the waker store is important. If the ISR fires between the first TICKS read and storing the waker, the wake call would be lost — so we check again after storing to close the race.

The ISR itself is straightforward. The #[handler] attribute sets the correct calling convention for Xtensa, and #[ram] ensures it runs from fast on-chip SRAM rather than slower flash:

#[handler]
fn tg0_t0_handler() {
    critical_section::with(|cs| {
        if let Some(t) = TIMER0.borrow_ref_mut(cs).as_mut() {
            t.clear_interrupt();
        }
    });

    TICKS.fetch_add(1, Ordering::Release);

    // Take the waker out of the Option — avoids double-waking.
    // The future will re-store it on the next Pending return.
    let waker = critical_section::with(|cs| WAKER.borrow_ref_mut(cs).take());
    if let Some(w) = waker {
        w.wake();
    }
}

The TIMER0 and WAKER globals use the pattern Mutex<RefCell<Option<T>>>. Option<T> because the value may not be set yet at startup. RefCell for interior mutability — we need &mut access but the static reference only hands out shared & references. Mutex (from critical_section) to ensure the ISR and main code don’t race when accessing the shared state.

With all of this in place, the async entry point looks like this:

async fn run() -> ! {
    defmt::info!("poll 1: about to yield");
    YieldNow(false).await;
    defmt::info!("poll 2: resumed after yield");

    let mut count: u32 = 0;
    loop {
        defmt::info!("tick {}", count);
        count += 1;
        WaitForTick::after(1).await;
    }
}

#[esp_hal::main]
fn main() -> ! {
    // ... peripheral setup ...
    block_on(run())
}

With our spin executor, this works — but it burns CPU cycles between ticks. The architecture is already correct, though. Swap in a real executor (embassy) and the CPU could sleep between waker calls, waking only when the ISR fires. Which is exactly where we’re headed next.

Waiting and sleeping

Our Futures already have the right structure: they report back to the executor when there’s work to do, and park themselves otherwise. What remains is making the executor act on that information — instead of burning CPU in a busy loop, we want to pause the processor and save precious energy, resuming only when the hardware tells us something has happened.

On Xtensa, the instruction for this is waiti. waiti 0 atomically lowers the interrupt level to 0, enabling all interrupts, and halts the CPU until any interrupt fires. When the timer ISR fires, it awakens the processor, runs its handler, and returns to the executor. The executor can then check for work and, if the future is still pending, go back to sleep.

The tricky part is the gap between a future returning Pending and the executor executing waiti. If the ISR fires in that window, the wake signal would be lost and we’d sleep until the next interrupt unnecessarily. A single atomic flag — WAKE_SIGNAL — closes the gap: the vtable’s wake function sets it, and the executor checks and clears it before deciding whether to sleep.

static WAKE_SIGNAL: AtomicBool = AtomicBool::new(false);

static VTABLE: RawWakerVTable = RawWakerVTable::new(
    |p| RawWaker::new(p, &VTABLE), // clone
    |_| {
        WAKE_SIGNAL.store(true, Ordering::Release);
    }, // wake (consuming)
    |_| {
        WAKE_SIGNAL.store(true, Ordering::Release);
    }, // wake_by_ref
    |_| {},                        // drop — nothing to free
);

fn block_on<F: Future>(future: F) -> F::Output {
    let waker = unsafe { Waker::new(core::ptr::null(), &VTABLE) };
    let mut cx = Context::from_waker(&waker);
    let mut future = pin!(future);
    loop {
        match future.as_mut().poll(&mut cx) {
            Poll::Ready(val) => return val,
            Poll::Pending => {
                // If wake() was already called between the future returning
                // Pending and now, skip sleep and re-poll immediately.
                // Otherwise, halt the CPU until the next interrupt.
                if !WAKE_SIGNAL.swap(false, Ordering::AcqRel) {
                    unsafe { core::arch::asm!("waiti 0") };
                }
            }
        }
    }
}

There is still a narrow race here: the ISR could fire between WAKE_SIGNAL.swap(false, ...) and the waiti instruction. In that case the ISR sets WAKE_SIGNAL to true, but we then execute waiti and sleep until the next interrupt. For a periodic timer this means at worst one extra timer period of latency — entirely acceptable at the intervals we’re using here. Closing it fully would require disabling interrupts and issuing waiti as a single atomic sequence in inline assembly.

Notice however, that we’ve upgraded our executor, making it more energy efficient, without having to touch the futures themselves.

ESP32 also has two proper sleep modes: light and deep, where peripherals—and in the case of deep sleep, even memory—is powered down and unavailable. Creating an executor that uses light sleep is more involved, as we lose debugging capability after UART is powered down. In deep sleep, we also have to carefully shuttle any application state into the special RTC memory that remains active. Our executor wouldn’t even survive the transition! We’ll explore these possibilities in another article.

Initializing the WiFi peripheral

With our executor capable of waiting between interrupts, we’re ready to bring up the WiFi hardware. The setup is compact but there’s a fair amount happening under the hood.

WiFiResources is the struct that holds everything the DMA controller needs to move received frames from the radio hardware into memory. Concretely, it contains a fixed array of receive buffers and a matching array of DmaDescriptors. The descriptors form a linked list in memory: each node records the address and size of its buffer, and a pointer to the next descriptor in the chain. The hardware walks this list autonomously as frames arrive, writing each frame into the next available buffer and advancing its internal pointer — no CPU involvement required until we come to read the result.

We hold the resources in a StaticCell:

static WIFI_RESOURCES: StaticCell<WiFiResources<10>> = StaticCell::new();

StaticCell gives us a safe way to initialize a static variable at runtime exactly once. The DMA descriptors and buffers must be 'static — the hardware can write into them at any moment after initialization, so they cannot live on any stack that might be unwound. StaticCell enforces the single-initialization contract at runtime (panicking on a second call) and hands back a &'static mut reference, which is exactly what WiFi::new() needs.

WiFi::new() takes ownership of those resources and brings up the peripheral:

let wifi = WiFi::new(
    peripherals.WIFI,
    peripherals.ADC2,
    WIFI_RESOURCES.init(WiFiResources::new()),
);

Internally it does several things worth knowing about. First, it enables the WiFi power domain in the RTC controller — clearing the wifi_force_pd and wifi_force_iso bits to restore power and remove electrical isolation from the WiFi subsystem. Then it enables the modem clock, resets and initializes the MAC, and sets up the interrupt handler for WIFI_MAC. The MAC interrupt fires on both RX (a frame has arrived) and TX (a slot has finished transmitting); there is also a WIFI_PWR interrupt that the project does not yet handle.

The most interesting step is the PHY calibration, performed via register_chipv7_phy — one of the last remaining precompiled C blobs from Espressif’s firmware that the Open MAC project has not yet replaced. This is a candidate for future power savings: in a deep-sleep / wake cycle, the calibration data could be saved to RTC memory and reused. It’s not clear yet though just how expensive calibration is, but could potentially draw a fair bit of current on each reboot.

Once the peripheral is running, we set the channel and configure the RX filters:

wifi.set_channel(WIFI_CHANNEL).ok();
wifi.set_filter(RxFilterBank::ReceiverAddress, 0, MY_MAC, [0xff; 6])
    .ok();
wifi.set_filter_status(RxFilterBank::ReceiverAddress, 0, true)
    .ok();
wifi.set_filter_bssid_check(0, false).ok();
wifi.clear_rx_queue();

Changing the channel briefly deinits the MAC, calls chip_v7_set_chan() (another FFI shim), toggles the AGC (Automatic Gain Control), then reinits.

The filter configuration is what determines which frames actually make it through the hardware to our DMA buffers — without it, the hardware drops everything and receive() waits forever. The ESP32’s MAC supports two filter banks per virtual interface: ReceiverAddress and BSSID. Here we register our own MAC address as the expected receiver on interface 0 with a full mask (all six bytes must match), enable that filter, and disable the BSSID check. Disabling the BSSID check is necessary because we’re not in an infrastructure BSS — there’s no access point, so there’s no BSSID to check against.

The BSSID filter is interesting for a future Reticulum-specific build. We could assign a fixed BSSID value to identify our network and have the hardware filter out all non-Reticulum traffic before it even reaches the driver.

Send and receive

With the WiFi peripheral running, we can finally implement the send and receive tasks. Before we do, though, we need to extend our executor to handle more than one future at a time — the sender and receiver each need to run independently, and we’d like to keep a heartbeat task running alongside them to confirm everything is alive. We’ll also punt displaying to the OLED into its own task and have it display the current message at a fixed interval. This could really be done as a function call after sending or receiving, but we’ll add it in a task just to give us another future for our executor.

The single-task block_on is replaced by a round-robin run_tasks. Instead of one global WAKE_SIGNAL, we now keep a per-slot TASK_READY flag array. The trick is to encode the slot index into the waker’s data pointer, so the vtable’s wake functions can set exactly the right flag without a mutex or any indirection:

const MAX_TASKS: usize = 4;

static TASK_READY: [AtomicBool; MAX_TASKS] = [const { AtomicBool::new(true) }; MAX_TASKS];

static TASK_VTABLE: RawWakerVTable = RawWakerVTable::new(
    |p| RawWaker::new(p, &TASK_VTABLE),
    |p| TASK_READY[p as usize].store(true, Ordering::Release), // wake (consuming)
    |p| TASK_READY[p as usize].store(true, Ordering::Release), // wake_by_ref
    |_| {},
);

fn run_tasks(tasks: &mut [Pin<&mut dyn Future<Output = ()>>]) -> ! {
    loop {
        let mut any_polled = false;
        for (i, task) in tasks.iter_mut().enumerate() {
            if TASK_READY[i].swap(false, Ordering::AcqRel) {
                any_polled = true;
                let waker = unsafe { Waker::new(i as *const (), &TASK_VTABLE) };
                let mut cx = Context::from_waker(&waker);
                let _ = task.as_mut().poll(&mut cx);
            }
        }
        if !any_polled {
            unsafe { core::arch::asm!("waiti 0") };
        }
    }
}

All tasks start with their TASK_READY flag set to true so each gets an initial poll on boot. The timer ISR is similarly simplified — it no longer needs to store or call a waker at all; it just sets every slot’s flag and lets each future re-check its condition on the next poll:

#[handler]
fn tg0_t0_handler() {
    critical_section::with(|cs| {
        if let Some(t) = TIMER0.borrow_ref_mut(cs).as_mut() {
            t.clear_interrupt();
        }
    });
    TICKS.fetch_add(1, Ordering::Release);
    for slot in &TASK_READY {
        slot.store(true, Ordering::Release);
    }
}

We use Cargo features to select which role each device plays at compile time, so the same codebase can be flashed to either board with a single flag:

// Cargo.toml
[features]
sender = []
receiver = []

Each feature sets a different pair of MAC addresses:

#[cfg(feature = "sender")]
const MY_MAC:   [u8; 6] = [0x02, 0x00, 0x00, 0x00, 0x00, 0x01];
#[cfg(feature = "sender")]
const PEER_MAC: [u8; 6] = [0x02, 0x00, 0x00, 0x00, 0x00, 0x02];

#[cfg(feature = "receiver")]
const MY_MAC:   [u8; 6] = [0x02, 0x00, 0x00, 0x00, 0x00, 0x02];
#[cfg(feature = "receiver")]
const PEER_MAC: [u8; 6] = [0x02, 0x00, 0x00, 0x00, 0x00, 0x01];

The 0x02 prefix marks these as locally-administered addresses, so they won’t collide with any real hardware.

A heartbeat task rounds out the three slots — it simply logs the running tick count on every tenth timer interrupt, confirming the executor and timer are working even when no packets are flying:

async fn heartbeat_task() {
    let mut count: u32 = 0;
    loop {
        defmt::info!("tick {}", count);
        count += 1;
        WaitForTick::after(10).await;
    }
}

For the actual packet format, we build a minimal 802.11 data frame by hand. The MAC header is 24 bytes; we populate the Destination Address, Source Address, and BSSID fields ourselves. For now, we simply reuse the destination address as the BSSID — we’re not in an infrastructure network, so there’s no real BSSID to use. We tell the hardware to override the sequence control field before transmission:

fn build_frame(dst: &[u8; 6], src: &[u8; 6], payload: &[u8], buf: &mut [u8]) -> usize {
    const HDR: usize = 24;
    let total = HDR + payload.len();
    buf[0] = 0x08; buf[1] = 0x00;           // Frame Control: Data
    buf[2] = 0x00; buf[3] = 0x00;           // Duration/ID
    buf[4..10].copy_from_slice(dst);         // Address 1 — DA
    buf[10..16].copy_from_slice(src);        // Address 2 — SA
    buf[16..22].copy_from_slice(dst);        // Address 3 — BSSID (reuse DA)
    buf[22] = 0x00; buf[23] = 0x00;         // Sequence Control (overridden by HW)
    buf[HDR..total].copy_from_slice(payload);
    total
}

Rather than sending arbitrary-length messages, we’ll keep things simple for now: both devices compile in the same list of short phrases, and the payload is just a single index byte identifying which message to show. That’s enough to prove the link end-to-end and keep both OLED displays in sync:

const MESSAGES: &[&str] = &[
    "Hello!", "How are you?", "Rust on ESP32",
    "No OS needed", "WiFi works!", "Open MAC ftw",
    "Ping!", "Reticulum?",
];

static MSG_INDEX: AtomicU32 = AtomicU32::new(u32::MAX);

The sender cycles through the list, transmits the index, then waits for the echo before advancing:

#[cfg(feature = "sender")]
async fn sender_task(wifi: &WiFi<'_>) {
    let mut counter: u32 = 0;
    loop {
        let idx = (counter as usize) % MESSAGES.len();
        MSG_INDEX.store(idx as u32, Ordering::Release);

        let payload = [idx as u8];
        let mut frame = [0u8; 256];
        let len = build_frame(&PEER_MAC, &MY_MAC, &payload, &mut frame);

        wifi.transmit(&mut frame[..len], &TxParameters {
            rate: WiFiRate::PhyRate1ML,
            override_seq_num: true,
            ..Default::default()
        }, None).await.ok();

        // Wait for the echo, discarding any frames not from our peer.
        loop {
            let reply = wifi.receive().await;
            let from_peer = reply.mpdu_buffer().len() >= 16
                && &reply.mpdu_buffer()[10..16] == &PEER_MAC;
            drop(reply);
            if from_peer { break; }
        }
        counter = counter.wrapping_add(1);
        WaitForTick::after(10).await;
    }
}

The receiver reads the index out of the payload, updates its display, then echoes the frame back with the source and destination addresses swapped:

#[cfg(feature = "receiver")]
async fn echo_task(wifi: &WiFi<'_>) {
    loop {
        let frame = wifi.receive().await;
        let mpdu = frame.mpdu_buffer();
        if mpdu.len() < 16 || &mpdu[10..16] != &PEER_MAC { drop(frame); continue; }

        if mpdu.len() > 24 {
            let idx = (mpdu[24] as usize) % MESSAGES.len();
            MSG_INDEX.store(idx as u32, Ordering::Release);
        }

        let mut buf = [0u8; 256];
        let len = mpdu.len().min(buf.len());
        buf[..len].copy_from_slice(&mpdu[..len]);
        drop(frame);

        buf[4..10].copy_from_slice(&PEER_MAC);
        buf[10..16].copy_from_slice(&MY_MAC);
        wifi.transmit(&mut buf[..len], &TxParameters {
            rate: WiFiRate::PhyRate1ML,
            override_seq_num: true,
            ..Default::default()
        }, None).await.ok();
    }
}

All three tasks are then pinned on the stack and handed to run_tasks:

fn main() {
	...
	let mut heartbeat = pin!(heartbeat_task());
	let mut disp      = pin!(display_task(display));

	#[cfg(feature = "sender")]
	let mut role = pin!(sender_task(&wifi));
	#[cfg(feature = "receiver")]
	let mut role = pin!(echo_task(&wifi));

	let mut tasks: [Pin<&mut dyn Future<Output = ()>>; 3] =
		[heartbeat.as_mut(), role.as_mut(), disp.as_mut()];

	run_tasks(&mut tasks)
	...
}

For brevity, the display_task has been left out here, see the repo for full code

Flash one board with --features sender and the other with --features receiver, and both screens should start cycling through the message list as the echo loop runs:

Echo

Wrap up

And that’s it for today. We’ve now got a feel for a different microcontroller, in a different language, but Rust’s maturity and vast ecosystem has proven to be invaluable on our journey here. For the next steps, there are two main things we might consider doing: we might want to look at getting Reticulum-rs working with embassy instead of tokio, and/or we could dive deeper into sleep modes, calibration reuse and measuring power usage to see if we can’t make the system vastly more efficient.

I hope you follow me along on the next post to see where it goes!

Cheers.