Skip to content

Add ChaChaPoly AEAD-4 encryption with nonce persistence#1677

Open
weebl2000 wants to merge 17 commits intomeshcore-dev:devfrom
weebl2000:feature/aead-4-encryption
Open

Add ChaChaPoly AEAD-4 encryption with nonce persistence#1677
weebl2000 wants to merge 17 commits intomeshcore-dev:devfrom
weebl2000:feature/aead-4-encryption

Conversation

@weebl2000
Copy link
Copy Markdown
Contributor

@weebl2000 weebl2000 commented Feb 12, 2026

Build firmware: Build from this branch

Testing

  • Current testling on Heltec v4 companion and Heltec v4 repeater. It's working so far. Would be great if others can try with different devices, especially more constrained devices with less ram/flash storage.

Summary

Adds ChaCha20-Poly1305 (AEAD-4) encryption alongside the existing AES-128-ECB + HMAC-2 scheme, plus session key negotiation for Perfect Forward Secrecy. Updated nodes send AEAD-4 to peers that advertise support and fall back to ECB for legacy peers. All nodes can decode both formats. Old nodes continue to work unchanged.

Nonces are persisted to flash so they survive reboots without risk of reuse. Session keys are negotiated via ephemeral X25519 Diffie-Hellman and persisted immediately on establishment.

Relates to #259.

What This Means in Practical Terms

The current encryption has a few weaknesses that this PR addresses:

  • Message tampering is too easy to attempt. The existing 2-byte authentication code means an attacker only needs about 65,000 guesses to forge a valid-looking message. At LoRa speeds that's roughly 9 hours of continuous attempts. The new 4-byte tag raises this to over 4 billion guesses — at LoRa rates, that would take over a century.

  • Identical messages look identical on the air. The current block cipher (ECB mode) produces the same ciphertext for the same plaintext, which can reveal patterns — for example, you could tell when someone sends the same message twice. The new scheme produces completely different ciphertext every time, even for identical messages.

  • Addressing fields are now protected. Currently, only the message body is authenticated. With AEAD, the payload type and addressing hashes (which identify sender and recipient) are included in the authentication check, so an attacker cannot swap or modify them without detection. Outer routing fields like TTL and hop path are intentionally left unauthenticated so repeaters can still forward packets through the mesh.

  • Messages get slightly smaller. ECB pads every message up to a 16-byte boundary, wasting airtime. The new scheme has no padding, so most messages shrink by a few bytes on the wire.

  • Compromise of a node doesn't reveal past messages. Session key negotiation establishes fresh shared secrets via ephemeral key exchange. Even if a node's long-term private key is later compromised, previously recorded traffic cannot be decrypted (Perfect Forward Secrecy).

  • Nothing breaks. Updated nodes send AEAD-4 to peers that advertise support, and fall back to ECB for legacy peers. Old nodes are completely unaffected — they never receive AEAD-4 messages because the sender checks their capability first.

  • Nodes advertise their capabilities. Updated nodes include a flag in their advertisements saying "I understand the new encryption." When two updated nodes discover each other, they automatically start using AEAD-4 for their communication.

  • Nonces survive reboots. Per-peer nonce counters are saved to flash periodically and before clean reboots. After a dirty reset (power loss, watchdog, brownout), nonces are bumped forward by a safety margin to guarantee no reuse.

Wire Format

Current ECB:

[HMAC:2] [ECB_ciphertext:N×16]     (padded to block boundary)

New AEAD-4 (same position in payload):

[nonce:2] [ciphertext:M] [tag:4]    (exact plaintext length, no padding)

Average overhead: ~6 bytes (AEAD) vs ~9.5 bytes (ECB). Most messages get smaller.

Cryptographic Design

Per-message key derivation (eliminates nonce-reuse catastrophe):

msg_key[32] = HMAC-SHA256(shared_secret, nonce || dest_hash || src_hash)

The shared_secret is either the static ECDH secret or a session key (see Session Key Negotiation below).

Including dest_hash || src_hash makes keys direction-dependent — Alice→Bob and Bob→Alice derive different keys even with the same nonce value (for 255/256 peer pairs; the 1/256 where dest_hash == src_hash is a residual limitation of 1-byte hashes).

IV construction (12 bytes, from on-wire fields):

iv[12] = { nonce_hi, nonce_lo, dest_hash, src_hash, 0, 0, 0, 0, 0, 0, 0, 0 }

Associated data (authenticated but not encrypted):

  • Peer messages: header || dest_hash || src_hash
  • Anonymous requests: header || dest_hash
  • Group messages: header || channel_hash

Route type bits are masked out of the header in associated data (header & ~PH_ROUTE_MASK), since routing mode changes per hop as repeaters forward packets.

Nonce management: 16-bit counter per peer, persisted to flash. See "Nonce Persistence" section below.

Session Key Negotiation (Perfect Forward Secrecy)

Session keys provide Perfect Forward Secrecy by establishing fresh shared secrets via ephemeral X25519 Diffie-Hellman. Compromise of either node's long-term private key cannot recover traffic encrypted with a session key.

Protocol (2 messages + implicit confirmation)

Initiator                                   Responder
    |  1. REQ [REQ_TYPE_SESSION_KEY_INIT]        |
    |     [ephemeral_pub_A:32] (AEAD-4)          |
    | -----------------------------------------> |  derive session_key, persist, dual-decode
    |  2. RESPONSE [RESP_TYPE_SESSION_KEY_ACCEPT] |
    |     [ephemeral_pub_B:32] (static ECDH)     |
    | <----------------------------------------- |
    |  derive session_key, persist, nonce=1       |
    |  3. Any normal message (session key)       |
    | -----------------------------------------> |  confirm: drop old key

The INIT is encrypted with AEAD-4 (static ECDH or existing session key). The ACCEPT is always encrypted with the static ECDH secret, because the initiator hasn't derived the session key yet.

Key Derivation

ephemeral_secret = X25519(their_ephemeral_pub, my_ephemeral_prv)
session_key[32]  = HMAC-SHA256(static_shared_secret, ephemeral_secret)

Uses existing ed25519_key_exchange() (X25519 Montgomery ladder) from lib/ed25519. No new dependencies.

Who Initiates

  • Companion ↔ Repeater/Room/Sensor: Companion initiates, server responds
  • Companion ↔ Companion: Either side can initiate, both can respond

Repeaters, room servers, and sensors only implement the responder role — they never initiate session key negotiation.

Automatic Triggers

Session key negotiation is triggered automatically based on message count. The trigger check runs inside getEncryptionNonceFor() — the single funnel all encrypted sends pass through — so no send path can silently skip it. Negotiation is deferred to the next loop() tick to avoid re-entrancy.

Hop count Current key Trigger Retry after failure
0 (direct) Static ECDH Every 100 msgs 100 msgs
0 (direct) Session key nonce > 60000, then every 100 msgs 100 msgs
1–9 Static ECDH Every 500 msgs 500 msgs
1–9 Session key nonce > 60000, then every 300 msgs 300 msgs
10+ Static ECDH Every 1000 msgs 1000 msgs
10+ Session key nonce > 60000, then every 300 msgs 300 msgs

3 INIT attempts per negotiation (3-minute timeout each).

Nonce Lifecycle

  • New contacts: Static ECDH nonce seeded from RNG in range 1000–50000
  • Session key nonce: Starts at 1 on establishment, full 65535 budget per session
  • Nonce exhaustion: Fall back to static ECDH, keep retrying negotiation at tier intervals

Encryption Key Selection

All node types use paired getEncryptionKey() / getEncryptionNonce() functions that return the correct key and nonce based on current session state:

has_session_key && sends_since_last_recv < 50  → AEAD with session key
has_session_key && sends_since_last_recv >= 50 → AEAD with static ECDH (stale probe)
CONTACT_FLAG_AEAD && nonce OK                  → AEAD with static ECDH
CONTACT_FLAG_AEAD && nonce exhausted           → ECB (pending renegotiation)
else                                           → ECB (legacy peer)

Decode Order

has_session_key: session_key → prev_session_key (dual-decode) → static ECDH → ECB
CONTACT_FLAG_AEAD: static ECDH → ECB
else: ECB → static ECDH

Dual-Decode Window

When the responder accepts a session key INIT, it enters DUAL_DECODE state: the new session key is active for sending, but both old and new keys are accepted for decoding. Once the initiator sends a message encrypted with the new session key (message 3), the responder confirms the transition and drops the old key.

This makes ACCEPT packet loss safe — the responder stays in dual-decode, the initiator times out and retries, and no messages are lost.

Stale Session Detection

If a node sends 50 consecutive messages without receiving any session-key-encrypted reply, it falls back to static ECDH for sending (the peer may have lost the session key). At 100 unanswered sends, falls back to ECB. At 255, clears the AEAD capability flag and removes the session key entirely. The counter resets to 0 on any successful session-key-encrypted message from the peer.

Session Key Persistence

Session keys use a two-tier storage model: a small RAM pool for active sessions and a larger flash-backed store for less recently used entries.

RAM pool: 8 slots (MAX_SESSION_KEYS_RAM), managed as an LRU cache. Each access touches a counter so the least-recently-used entry can be evicted when the pool is full. Entries in INIT_SENT state (ephemeral keys only) are never evicted — they must complete or time out.

Flash store: Up to 48 entries (MAX_SESSION_KEYS_FLASH), persisted to /sess_keys (companion) or /s_sess_keys (server firmware).

Variable-length records: Entries without a previous session key (no dual-decode) use 39 bytes (SESSION_KEY_RECORD_MIN_SIZE); entries with a previous key use 71 bytes (SESSION_KEY_RECORD_SIZE). The SESSION_FLAG_PREV_VALID flag bit distinguishes the two.

Without prev_key: [pub_key_prefix:4] [flags:1] [nonce:2] [session_key:32]         = 39 bytes
With prev_key:    [pub_key_prefix:4] [flags:1] [nonce:2] [session_key:32] [prev_session_key:32] = 71 bytes

On-demand flash lookup: When findSessionKey() misses the RAM pool, it reads the flash file to look for a matching entry. If found, the entry is loaded into RAM (evicting LRU if needed) and returned.

Merge-save strategy: When persisting, the code reads existing flash entries, filters out any that are already in the RAM pool or have been explicitly removed, then writes the merged result (RAM entries + surviving flash-only entries). This prevents flash from resurrecting deleted entries while preserving entries that were evicted from RAM.

Removed-entry tracking: When a session key is explicitly removed (e.g., invalidation after static ECDH fallback), its prefix is recorded in a small tracking array. The merge-save step skips these prefixes so the deleted entry doesn't reappear from stale flash data. The tracking array is cleared after each successful save.

Nonce Persistence

Nonces are persisted to a dedicated file on flash (/nonces for companion radios, /s_nonces for server firmware).

Periodic saves: After every NONCE_PERSIST_INTERVAL (50) messages to a given peer, the nonce file is written. A dirty flag tracks whether any nonce has advanced since the last save.

Clean reboot: Software restarts and deep sleep wakes load the persisted nonces as-is. A onBeforeReboot() callback in CommonCLI flushes any dirty nonces before the restart.

Dirty reboot: Power-on, watchdog, and brownout resets are detected via wasDirtyReset() (platform-specific: esp_reset_reason() on ESP32, RESETREAS register on NRF52). After a dirty reset, all loaded nonces are bumped forward by NONCE_BOOT_BUMP (50), which is at least the persist interval, guaranteeing that even the worst-case unpersisted nonce is safely skipped. Session key nonces also receive the boot bump; if the bump causes a wrap, the nonce is forced to 65535 to trigger renegotiation.

Format: Simple array of {pub_key_prefix[6], nonce[2]} entries, matched to in-memory contacts/clients on load.

Security Comparison

Property ECB + HMAC-2 (current) AEAD-4 (new) AEAD-4 + Session Key
Confidentiality Identical blocks → identical ciphertext Unique keystream per message Same
Forgery resistance 1/65K (~9 hours at LoRa rates) 1/4.3B (~136 years) Same
Key usage 16 of 32 bytes (AES-128) Full 32 bytes (ChaCha20-256) Same
Addressing authentication None Payload type & address hashes via AAD Same
MAC timing memcmp (timing side-channel) secure_compare (constant-time) Same
Padding waste 0-15 bytes per message None None
Perfect Forward Secrecy No No Yes
Nonce reuse on reboot N/A (no nonces) Mitigated by persistence + boot bump Same

Scope

Payload type AEAD-4 decode AEAD-4 send Session keys Notes
TXT_MSG, REQ, RESPONSE, PATH Yes Yes (if peer advertises AEAD) Yes Per-peer secret, no collision risk
ANON_REQ Yes No (no prior capability exchange) No Ephemeral ECDH secret
GRP_TXT, GRP_DATA Yes No (see group considerations) No Shared channel key

All node types (companion radio, repeater, room server, sensor) support AEAD-4 decode, AEAD-4 send, and session key negotiation (companion initiates or responds; server firmware responds only).

Group Message Considerations

Group channels share a single key among all members. With a 2-byte nonce and multiple senders, cross-sender nonce collisions follow the birthday bound (~300 messages for 50% probability on an active channel). A collision leaks P1 ⊕ P2 for that specific message pair via crib-dragging, but:

  • No key recovery — per-message key derivation via HMAC-SHA256 is one-way
  • No cascade — each collision is isolated, doesn't affect other messages
  • Bounded threat model — the attacker must not have the channel PSK (if they do, they can already read everything)

This is mainly beneficial for public/hashtag channels where the PSK is already widely known and the ECB pattern leakage and weak MAC are a greater concern than the bounded nonce collision risk.

Potential future mitigations explored and deferred:

  • Per-sender derived keys (HMAC(channel_secret, sender_pub_key)) — eliminates cross-sender collisions but requires receivers to know all senders' public keys, changing the group security model from "know the PSK = full access" to "know the PSK + sender discovery = access." Ruled out as a usability regression.
  • Expanded nonce (4 bytes instead of 2) — pushes birthday bound to ~65,000 messages (~2 years at 100 msgs/day). Costs 2 extra bytes of airtime and creates a different wire format for groups vs peers.
  • Sender hash byte on wire — differentiates senders for key derivation at 1 byte cost, but leaks sender identity metadata (traffic correlation, identification via adverts) that is currently hidden inside the encrypted payload.

Decode Order

Adaptive per-peer: for peers with CONTACT_FLAG_AEAD set, try AEAD-4 first then ECB fallback. For unknown/legacy peers, try ECB first then AEAD-4 fallback. When a session key exists, decode order is: session key → prev session key (dual-decode window) → static ECDH → ECB. This avoids the 1/65536 ECB false-positive rate on AEAD packets (nonce bytes matching truncated HMAC) for known AEAD peers, while minimizing wasted CPU for legacy peers.

Capability Advertisement

  • feat1 bit 0 (FEAT1_AEAD_SUPPORT) is set in adverts for all node types (chat, repeater, room, sensor)
  • Receivers record peer capability in ContactInfo.flags bit 1 (CONTACT_FLAG_AEAD)
  • Old nodes parse feat1 but ignore the value (forward-compatible via existing AdvertDataParser)

Files Changed

Core Library

  • src/MeshCore.h — AEAD constants, session key constants (SESSION_KEY_SIZE, REQ_TYPE_SESSION_KEY_INIT, RESP_TYPE_SESSION_KEY_ACCEPT, NONCE_REKEY_THRESHOLD, SESSION_KEY_* thresholds and limits), two-tier pool sizing (MAX_SESSION_KEYS_RAM=8, MAX_SESSION_KEYS_FLASH=48), variable-length record sizes (SESSION_KEY_RECORD_SIZE, SESSION_KEY_RECORD_MIN_SIZE), SESSION_FLAG_PREV_VALID
  • src/Utils.h / src/Utils.cppaeadEncrypt() and aeadDecrypt() using ChaChaPoly
  • src/Mesh.hgetPeerFlags(), getPeerNextAeadNonce(), getPeerSessionKey(), getPeerPrevSessionKey(), onSessionKeyDecryptSuccess(), getPeerEncryptionKey(), getPeerEncryptionNonce() virtuals; aead_nonce param on createDatagram/createPathReturn
  • src/Mesh.cpp — AEAD send path in createDatagram/createPathReturn; session key → prev session key → static ECDH → ECB adaptive decode order
  • src/helpers/ContactInfo.huint16_t aead_nonce field, nextAeadNonce() helper
  • src/helpers/SessionKeyPool.hSessionKeyEntry struct and SessionKeyPool class (LRU-managed RAM pool with last_used tracking, eviction that skips INIT_SENT entries, removed-entry tracking for merge-save safety)

Companion Radio (BaseChatMesh)

  • src/helpers/BaseChatMesh.h / BaseChatMesh.cpp — Advertise AEAD, track peer capability, AEAD send for all peer message types, nonce persistence, session key negotiation (both initiator and responder roles), encryption key/nonce funnel (getEncryptionKeyFor/getEncryptionNonceFor), deferred rekey trigger via _pending_rekey_idx

Server-Side (ClientACL + examples)

  • src/helpers/ClientACL.h / ClientACL.cpp — Server-side AEAD nonce tracking and persistence, session key responder (handleSessionKeyInit), paired encryption key/nonce selection (getEncryptionKey/getEncryptionNonce), flash-backed session key wrappers with merge-save, peer-index forwarding helpers
  • src/helpers/CommonCLI.h / CommonCLI.cpp — Advertise AEAD for repeaters/rooms/sensors; onBeforeReboot() callback for nonce/session key flush
  • examples/simple_repeater/MyMesh.h / MyMesh.cpp — AEAD + session key support, nonce persistence, session key INIT handling in onPeerDataRecv
  • examples/simple_room_server/MyMesh.h / MyMesh.cpp — Same
  • examples/simple_sensor/SensorMesh.h / SensorMesh.cpp — Same

Platform Support

  • src/helpers/ArduinoHelpers.hwasDirtyReset() helper (ESP32/NRF52 reset reason detection)
  • examples/companion_radio/DataStore.h / DataStore.cpp — Nonce and session key file I/O, variable-length session key records, merge-save with flash-backed lookup (loadSessionKeyByPrefix)
  • examples/companion_radio/MyMesh.h / MyMesh.cpp — Wire up nonce/session key persistence and reboot callback, flash-backed session key overrides (loadSessionKeyRecordFromFlash, mergeAndSaveSessionKeys)

Build Verification

  • ESP32 (Heltec_v3_companion_radio_ble): builds successfully
  • ESP32 (Heltec_v3_repeater): builds successfully
  • ESP32 (Heltec_v3_room_server): builds successfully
  • NRF52 (Xiao_nrf52_companion_radio_ble): builds successfully

Future Work

  • Group messages: send AEAD-4 (all updated nodes can already decode it)
  • ANON_REQ: remain ECB (no prior capability exchange possible)
  • rekey <peer> CLI command for manual session key renegotiation

Build firmware: Build from this branch

@weebl2000 weebl2000 changed the base branch from main to dev February 12, 2026 00:08
@weebl2000 weebl2000 force-pushed the feature/aead-4-encryption branch 4 times, most recently from 881d18d to 7637e64 Compare February 12, 2026 01:19
@jimdigriz
Copy link
Copy Markdown

jimdigriz commented Feb 12, 2026

Per-message key derivation (eliminates nonce-reuse catastrophe):

msg_key[32] = HMAC-SHA256(shared_secret, nonce || dest_hash || src_hash)

I do not understand how this prevents nonce re-use. After 65k messages from A->B the nonce looks like it will be reused.

I do not understand why concatenation with src/dst would change this.

The concatenation means you are partitioning the nonce value per (uni-directional) flow, in effect running different counters for A->B, B->A and C->A. Right?

Nonce management: 16-bit counter per peer, seeded from hardware RNG on boot and on contact load. Not persisted to flash — always fresh on each boot cycle.

What happens for devices without access to a good early boot entropy source?

What if two different reboots generate the same nonce?

What happens for A->B if:

  • reboot initialises nonce=20
  • 3 messages are sent from A->B
  • reboot initialises nonce=15
  • 10 messages are sent from A->B

What does this method improve over a plain incremental counter?

Why not persist the nonce once every 100 messages, and on reboot increment by 200 (rounded down to nearest 100)? When the nonce wraps, regenerate the key.

@weebl2000
Copy link
Copy Markdown
Contributor Author

Yeah, it doesn't stop nonce re-use. I think in the end we might need more bytes for nonces.

@jimdigriz
Copy link
Copy Markdown

jimdigriz commented Feb 12, 2026

But in the end maybe we need more bytes for nonces.

You do not, you can also change the key.

Just negotiate a dedicated key for this. It is a lot easier to understand and make safe.

It would require a round trip but then only need to be done every 65k messages; you could then also share that key for both directions (ie. A->B and B->A).

Then when nonce=0 negotiate a new key, which allows you to pick if you want to persist the nonce or reset to zero on boot.

@weebl2000
Copy link
Copy Markdown
Contributor Author

weebl2000 commented Feb 12, 2026

It would require a round trip but then only need to be done every 65k messages; you could then also share that key for both directions (ie. A->B and B->A).

Might be a good option. But the protocol will become a bit more complex and brittle. Then again, we can always fallback to ECB if nothing was negotiated.

Copy link
Copy Markdown

@jcjones jcjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a casual review, but I like the design, and the directionality of the KDF. Good doc comments, too.

Comment thread src/Mesh.cpp Outdated
Comment thread src/Utils.cpp Outdated
Comment thread src/Utils.cpp
Comment thread src/helpers/ContactInfo.h Outdated
@weebl2000
Copy link
Copy Markdown
Contributor Author

weebl2000 commented Feb 12, 2026

Thanks for all the comments so far. I will look into them. Just tested this branch with a Heltec v4 repeater and Heltec v4 companion client, and I can confirm communicating between them works using AEAD-4.

It's a request for status from the repeater and the repeater response is understood correctly by the client.

AEAD-4 Packet Decode Verification

Wire Format

[header:1] [path_len:1] [path:N] [dest_hash:1] [src_hash:1] [nonce:2] [ciphertext:M] [tag:4]

Sent Packet — REQ (23 bytes)

Raw: 0200DD130E659F0B0C02D86AC2508DF6B7B3B671F6638A

Field Hex Value
Header 02 Route=DIRECT(2), Type=REQ(0), Ver=0
Path length 00 0 (no path)
dest_hash DD Destination peer
src_hash 13 Source
AEAD nonce 0E 65 3685
Ciphertext 9F 0B 0C 02 D8 6A C2 50 8D F6 B7 B3 B6 13 bytes plaintext
Tag 71 F6 63 8A Poly1305 (truncated to 4 bytes)

Format confirmed AEAD-4: 17 bytes after hashes is not a multiple of 16, ruling out legacy ECB.

Received Packet — RESPONSE (70 bytes)

Raw: 060013DD830B84757DB841545969BA39A62BDD0D6AD9E2CD70B25208219F964F51E8AFB0E800130BBAFC23C9C0712B7E28CE72DE17508E30A3359222A2A7DD4B2375E5AE33AC

Field Hex Value
Header 06 Route=DIRECT(2), Type=RESPONSE(1), Ver=0
Path length 00 0 (no path)
dest_hash 13 Receiver
src_hash DD Responding peer
AEAD nonce 83 0B 33547
Ciphertext 84 75 ... 4B 23 75 60 bytes plaintext
Tag E5 AE 33 AC Poly1305 (truncated to 4 bytes)

Note: legacy ECB is structurally possible here (64 bytes is a multiple of 16), but context confirms AEAD-4.

Associated Data

Per the route-mask fix, assoc data masks out route type bits:

Packet assoc bytes
REQ {0x00, 0xDD, 0x13}(0x02 & ~0x03)=0x00, dest, src
RESPONSE {0x04, 0x13, 0xDD}(0x06 & ~0x03)=0x04, dest, src

Observations

  • Both packets use AEAD-4 wire format: [nonce:2] [ciphertext:N] [tag:4]
  • dest/src hashes (0xDD, 0x13) correctly swapped between REQ and RESPONSE
  • Both routed DIRECT with empty path (single hop, no relaying)
  • Nonce values (3685, 33547) are non-zero, consistent with independent per-peer counters seeded from HW RNG

@weebl2000 weebl2000 changed the title Add ChaChaPoly AEAD-4 decryption support (Phase 1) Add ChaChaPoly AEAD-4 encryption with nonce persistence Feb 13, 2026
@ignisf
Copy link
Copy Markdown

ignisf commented Feb 14, 2026

@weebl2000 thank you for your contributions. Have you considered jumping straight to something proven like the double ratchet instead? Used in Signal.

@weebl2000
Copy link
Copy Markdown
Contributor Author

@weebl2000 thank you for your contributions. Have you considered jumping straight to something proven like the double ratchet instead? Used in Signal.

I don't think double ratchet is practical, we would need to send a new key every message and rely on strict ordering of messages. With LoRa packet limits and out-of-order delivery it would be a disaster.

I'm working on session key negotiation though. That will fix the nonce problem, but requires an exchange first.

Copy link
Copy Markdown
Contributor

@3dpgg 3dpgg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting together a PR to address AES-ECB! I had a cursory look, so not all files yet.

Comment thread src/Mesh.cpp
Comment thread src/Mesh.cpp
Comment thread src/Mesh.cpp Outdated
Comment thread src/helpers/BaseChatMesh.cpp Outdated
Comment thread src/Mesh.cpp
Comment thread src/Mesh.cpp
Copy link
Copy Markdown

@jcjones jcjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth noting that it looks like if a peer goes from supporting AEADs to not, then the sender needs to forget the peer and re-discover it.

Looking real good here!

Comment thread src/helpers/ContactInfo.h
Comment on lines +30 to +33
if (aead_nonce < NONCE_INITIAL_MIN) {
aead_nonce = 1; // stay stuck in exhaustion zone, always return ECB
return 0;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the nonce wraps from 65535 => 0 => 1, this check happens on every subsequent call. Since 1 < NONCE_INITIAL_MIN, it repeatedly resets aead_nonce = 1 and returns 0, permanently locking that peer to ECB encryption.

Copy link
Copy Markdown
Contributor Author

@weebl2000 weebl2000 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did this intentionally. I reserved nonces 1-999, because ideally it shouldn't happen that a peer rolls over 65535 without renegotiation of a session key. The only thing missing is trying to renegotiate a session key, but I think it's a theoretical case currently.

Most realistic scenario for this happening is that our contact no longer supports AEAD-4, so they only support ECB anyway. If they support AEAD-4 in future, I would expect them to trigger a session key initiation.

If someone comes up with a good solution in the future nonces 1-999 are still safe.

Comment thread src/MeshCore.h
Comment on lines +1069 to +1077
const uint8_t* BaseChatMesh::getEncryptionKeyFor(const ContactInfo& contact) {
auto entry = findSessionKey(contact.id.pub_key);
if (canUseSessionKey(entry)) {
return entry->session_key;
}
return contact.getSharedSecret(self_id);
}

uint16_t BaseChatMesh::getEncryptionNonceFor(const ContactInfo& contact) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think having two separate methods here take ContactInfo as an arg is a good idea, because this is a race waiting to happen. Both of these call findSessionKey but there's nothing ensuring that they get the same result, it certainly looks like the key could change in between the calls.

In reality how often do keys change? Well, it's going to be a real fun bug during a rollover event. And since this hardware isn't fast, I think this is too risky to do this way.

IMO these two methods should become one that returns both the key and the nonce.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your concern is legitimate but the actual risk is close to zero on this single-threaded system. It touches a lot of call sites and virtual wrappers. Addressing this isn't trivial. I was thinking about just using session keys only and doing away with static ECDH - but the code overhead to keep ECDH isn't that large either.

Deciscions, decisions. Thoughts?

Comment thread src/helpers/BaseChatMesh.cpp Outdated
Comment thread src/helpers/ClientACL.cpp Outdated
Comment thread src/helpers/BaseChatMesh.cpp Outdated
Comment thread src/Mesh.cpp Outdated
Comment thread src/MeshCore.h
Comment on lines +31 to +32
#define REQ_TYPE_SESSION_KEY_INIT 0x08
#define RESP_TYPE_SESSION_KEY_ACCEPT 0x08 // response type byte in PAYLOAD_TYPE_RESPONSE
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: While they're in different payload types, sharing the literal 0x08 is IMO confusing and can cause false a non-session-key response to be misinterpreted as a session key acceptance... if we have the bits to spare, maybe make them be different?

Copy link
Copy Markdown
Contributor Author

@weebl2000 weebl2000 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the accept become 0x09 if maintainers decide this PR is worth merging. I don't want to break dev builds in the meantime. We can break them if this PR is selected for merging.

I haven't heard much from maintainers about this PR so far.

Comment thread src/helpers/BaseChatMesh.cpp
@weebl2000 weebl2000 force-pushed the feature/aead-4-encryption branch from 8ae955c to 7397837 Compare February 28, 2026 18:07
@weebl2000
Copy link
Copy Markdown
Contributor Author

weebl2000 commented Mar 2, 2026

Probably worth noting that it looks like if a peer goes from supporting AEADs to not, then the sender needs to forget the peer and re-discover it.

Looking real good here!

Thanks for your review once more. Addressed your remarks. And yes, if a peer goes froom AEAD to not supporting it, an advert will fix other contacts trying AEAD-4 instantly. If an advert never happens it takes a max of 50 messages before we fallback to ECB.

@weebl2000 weebl2000 force-pushed the feature/aead-4-encryption branch from 81bad32 to b8147e8 Compare March 3, 2026 14:40
@darconeous
Copy link
Copy Markdown

Pardon me if I missed it, but why use ChaCha20-256 when AES encryption is supported on pretty much all hardware?

@weebl2000
Copy link
Copy Markdown
Contributor Author

Pardon me if I missed it, but why use ChaCha20-256 when AES encryption is supported on pretty much all hardware?

Main reasons for this are:

  • easy to use full 256-bit key since it natively uses all 32 bytes from the ECDH shared secret, current AES-128 only uses half
  • fast in software on all platforms (ESP32, nRF52, raspberry), not all have AES hw acceleration and when they do they do not always support AEAD algorithms at hw level, ESP32-S3 for example doesnt support AES GCM or AES CCM at hw level
  • better nonce-reuse resilience — per-message key derivation via HMAC means a nonce collision only leaks P1 XOR P2 for that one message pair, no key recovery. AES-GCM catastrophically leaks the authentication key on nonce reuse
  • already supported by the existing crypto library rweather/Crypto

@jcjones
Copy link
Copy Markdown

jcjones commented Mar 11, 2026

Regarding @nextgens' comment on #259 (comment) about authenticated DH - This is a legitimate gap that concerns me, too, and I think we should have a concrete plan around it before merging.

This PR uses bare X25519 ECDH for session key negotiation -- the AEAD protects integrity and confidentiality, but an active attacker could MITM the ephemeral exchange and have valid session keys for both sides. Poly1305 can't protect us if the key is negotiated with the wrong peer in the first place.

I'd like to suggest that we don't need to solve this in this PR but we should have a top-level plan, in the description, containing a forward-compatible path before merging. I think such a plan should hit these three points:

1: Opportunistic upgrade to authenticated DH.

If both sides advertise support (e.g., via a capability flag in the session key request/response), they perform a (Noise_IK-like?) handshake using their existing long-term identity keys to authenticate the ephemeral exchange. If one side doesn't support it, fall back to the current bare ECDH -- so we avoid another wire break, and can show degraded auth in the UI or whatever.

2: Out-of-Band safety number verification.

Once a session key is established, supporting clients can derive a (short) verification code (like Signal's safety numbers) from the session transcript and, if users care about MITM resistance, they can compare these OOB. This way we get trust-on-first-use with verification. Again, like Signal.

3: Wire compatibility.

The session key request/response already has a type byte and there's space available in the AD. Whatever's chosen for the authenticated handshake should be able to fit into the existing message flow by extending those payloads without changing the AEAD wire format itself.

Of course rweather/Crypto doesn't have any Noise implementations (but hey, it has Blake2b!) so there's a whole bunch of work to do to approach authenticated DH. I don't want the perfect to be the enemy of the good, so I am quite supportive of proceeding with this PR without waiting on having everything we could want ready to go. However, we should spend the little bit of research time to confirm, update this PR with a plan, and maybe toss some code comments in to make clear that the grounds here are prepared for the follow-on work.

@nextgens
Copy link
Copy Markdown
Contributor

nextgens commented Mar 11, 2026

I don't particularly care about the choice of primitives but I want to point out that there's also rweather/CryptoLW if we wanted to: https://rweather.github.io/lightweight-crypto/algorithms.html

The key schedule of ascon is significantly faster... and it has a nice XOF.

@jcjones if you want to sleep at night I recommend you refrain from looking at https://github.com/meshcore-dev/MeshCore/blob/main/lib/ed25519/key_exchange.c ;)

@weebl2000 weebl2000 force-pushed the feature/aead-4-encryption branch from b8147e8 to 6c0a87e Compare March 23, 2026 13:46
@weebl2000 weebl2000 force-pushed the feature/aead-4-encryption branch from 6c0a87e to b5c6c20 Compare April 4, 2026 11:18
weebl2000 and others added 17 commits May 5, 2026 10:23
Add ChaCha20-Poly1305 AEAD decryption with 4-byte auth tag for peer
messages and group channels, falling back to ECB for backward
compatibility. Sending remains ECB-only in this phase.

- Per-message key derivation: HMAC-SHA256(secret, nonce||dest||src)
- Direction-dependent keys prevent bidirectional keystream reuse
- 12-byte IV from nonce + dest_hash + src_hash
- Advertise AEAD capability via feat1 bit 0 in adverts
- Track peer AEAD support in ContactInfo.flags
- Seed aead_nonce from HW RNG on contact creation and load
Send ChaChaPoly-encrypted messages to peers with CONTACT_FLAG_AEAD set,
and try AEAD decode first for those peers (avoiding 1/65536 ECB
false-positive). Legacy peers continue to use ECB in both directions.

- Add aead_nonce parameter to createDatagram/createPathReturn (default 0 = ECB)
- Add getPeerFlags/getPeerNextAeadNonce virtual methods for decode-order selection
- Add ContactInfo::nextAeadNonce() helper (returns nonce++ if AEAD, 0 otherwise)
- Update all BaseChatMesh send paths to pass nonce for AEAD-capable peers
- Adaptive decode order: AEAD-first for known AEAD peers, ECB-first for others
The header's route type bits (PH_ROUTE_MASK) are zero when
createDatagram/createPathReturn encrypt with AEAD, but get changed to
ROUTE_TYPE_FLOOD (1) or ROUTE_TYPE_DIRECT (2) by sendFlood/sendDirect
afterwards. The receiver builds assoc from the received header (with
route bits set), so the tag check always fails and every AEAD packet
is silently dropped.

Mask out route type bits in assoc data on all 5 encrypt/decrypt sites.
Also track AEAD decode success to enable peer capability auto-detection.
- Fix potential unsigned overflow in createDatagram size check by
subtracting constants from MAX_PACKET_PAYLOAD instead of adding to
data_len
- Add upper-bound validation on src_len and assoc_len in aeadEncrypt and
aeadDecrypt
- Log peer name on AEAD nonce wraparound for debug builds
Prevent nonce reuse after reboots by persisting per-peer nonce counters
to a dedicated /nonces (companion) or /s_nonces (server) file. On dirty
reset (power-on, watchdog, brownout), nonces are bumped by NONCE_BOOT_BUMP
(100) to cover any unpersisted messages. Clean wakes (deep sleep, software
restart) load nonces as-is.

- Add nonce persistence to BaseChatMesh (companion) and ClientACL (server)
- Add wasDirtyReset() helper to ArduinoHelpers.h for platform-specific
  reset reason detection (ESP32/NRF52)
- Add onBeforeReboot() callback to CommonCLI for pre-reboot nonce flush
- Wire nonce persistence into all firmware variants: companion radio,
  repeater, room server, and sensor
- Only clear dirty flag on successful file write
@weebl2000 weebl2000 force-pushed the feature/aead-4-encryption branch from b5c6c20 to 0bcf4d6 Compare May 5, 2026 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants