feat - refactor nand flash as ring buffer#7216
Conversation
Greptile SummaryThis PR replaces the LittleFS file-based offline audio storage with a raw-sector ring buffer written directly to NAND flash sectors. Audio packets are batched into 16 KB aligned sectors, metadata is wear-leveled across 64 dedicated sectors using a generation counter, and the BLE storage service is updated to speak the new sequence-number-based protocol (
Confidence Score: 3/5The core ring-buffer mechanics are well-structured, but silent audio loss and a service starvation path on timeout remain unresolved. The previously-flagged silent discard of all pre-RTC-sync audio is unchanged. A timeout in any sd_ring_* blocking call leaves the in-flight flag set until the SD worker drains the request, during which all callers receive -EBUSY with no log visibility. The get_oldest_packet_name TOCTOU between sd_ring_get_info and sd_ring_read can produce spurious errors on the BLE file-list path. These issues sit on the core data-path of a newly rewritten storage layer. omi/firmware/omi/src/sd_card.c needs the most attention: the in-flight timeout handling, the get_oldest_packet_name TOCTOU, and the no-RTC discard path. Important Files Changed
Reviews (2): Last reviewed commit: "improv - the audio sync speed increases ..." | Re-trigger Greptile |
| done_pending = true; | ||
| remaining_packets = 0; |
There was a problem hiding this comment.
|
@greptile-apps review |
| static int get_oldest_packet_name(char *buf, size_t buf_size) | ||
| { | ||
| if (!filenames || !count || max_files <= 0) | ||
| if (!buf || buf_size == 0U) { | ||
| return -EINVAL; | ||
| if (!is_mounted) | ||
| return -ENODEV; | ||
| } | ||
|
|
||
| int cache_res = ensure_file_cache(); | ||
| if (cache_res < 0) { | ||
| return cache_res; | ||
| sd_ring_info_t info; | ||
| int ret = sd_ring_get_info(&info); | ||
| if (ret < 0) { | ||
| return ret; | ||
| } | ||
|
|
||
| int file_count = cached_file_list_count; | ||
| if (file_count > max_files) { | ||
| file_count = max_files; | ||
| if (info.read_seq == info.write_seq) { | ||
| buf[0] = '\0'; | ||
| return 0; | ||
| } | ||
|
|
||
| for (int i = 0; i < file_count; i++) { | ||
| strncpy(filenames[i], cached_file_names[i], MAX_FILENAME_LEN - 1); | ||
| filenames[i][MAX_FILENAME_LEN - 1] = '\0'; | ||
| if (sizes) { | ||
| sizes[i] = cached_file_sizes[i]; | ||
| } | ||
| uint8_t packet[RAW_AUDIO_PACKET_BYTES]; | ||
| uint32_t bytes_read = 0; | ||
| uint32_t packets_read = 0; | ||
| ret = sd_ring_read(info.read_seq, packet, sizeof(packet), &bytes_read, &packets_read); | ||
| if (ret < 0 || packets_read == 0U) { | ||
| return (ret < 0) ? ret : -ENOENT; | ||
| } | ||
|
|
||
| *count = file_count; | ||
| format_timestamp_name(sys_get_be32(packet), buf, buf_size); | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Stale
info.read_seq used in subsequent sd_ring_read call
get_oldest_packet_name snapshots info.read_seq via sd_ring_get_info, then issues a separate sd_ring_read(info.read_seq, ...). Between these two blocking worker round-trips, an external caller can issue CMD_RING_ADVANCE which updates ring_state.read_seq. If that happens, info.read_seq is now behind the current ring_state.read_seq, and read_packets_internal returns -ERANGE. The caller (get_audio_file_list_with_sizes) propagates this error, causing the file-list operation to fail silently from the BLE client's perspective. The fix is to either use a single atomic worker operation that returns both the oldest seq and its packet, or to handle -ERANGE by retrying with a fresh sd_ring_get_info.
| if (k_sem_take(&resp.sem, K_MSEC(5000)) != 0) { | ||
| return -ETIMEDOUT; | ||
| } | ||
|
|
||
| *info = resp.info; | ||
| return resp.res; |
There was a problem hiding this comment.
sd_ring_get_info leaves info_in_flight set after timeout, silently starving all callers
When k_sem_take times out, the function returns -ETIMEDOUT without clearing info_in_flight. info_in_flight is only cleared by the SD worker via release_resp_busy when it eventually processes the enqueued request. If the SD worker is stalled, every subsequent call to sd_ring_get_info will immediately return -EBUSY for the full duration — potentially freezing the entire storage service. The same pattern applies to sd_ring_read, sd_ring_advance, and sd_ring_clear. At minimum, the timeout branch should log a warning so operators can detect this condition.
Core Idea
444 bytes:0..3: UTC timestamp,uint32 BE4..443: audio payload,440 bytesCMD_RING_ADVANCEpersists consumer progress.GATT
30295780-4301-EABD-2904-2849ADFEAE4330295781-4301-EABD-2904-2849ADFEAE43Write + Notify30295782-4301-EABD-2904-2849ADFEAE43Read + NotifySubscribe to notifications on the Control characteristic.
Status Char Read
Returns
16 bytes=4 x uint32 LE:0..3:used_bytes4..7:unread_packets8..11:free_bytes12..15:rtc_validCommands
Write commands to the Control characteristic.
0x10CMD_RING_INFO[0x10]0x11CMD_RING_READ[0x11][start_seq: u64 BE][0x11][start_seq: u64 BE][packet_count: u32 BE]packet_countis omitted or0, firmware streams everything available fromstart_seq0x12CMD_RING_ADVANCE[0x12][new_read_seq: u64 BE]0x13CMD_RING_CLEAR[0x13]0x03CMD_STOP_SYNC[0x03]Notifications
All notifications are sent on the Control characteristic.
0x01NOTIFY_ACK[0x01][status]0x02NOTIFY_INFO[0x02][read_seq: u64 BE][write_seq: u64 BE][capacity_packets: u32 BE][dropped_packets: u64 BE][packet_size: u16 BE]packet_size = 4440x05NOTIFY_READ_BEGIN[0x05][transfer_start_seq: u64 BE][packet_count: u32 BE]0x03NOTIFY_DATA[0x03][raw_bytes...]444-byterecords0x04NOTIFY_DONE[0x04][status][next_seq: u64 BE]next_seqinCMD_RING_ADVANCEStatus Codes
0: success6: invalid command9: storage not ready10: sequence out of rangeImportant Semantics
read_seq= oldest unread packetwrite_seq= next sequence after newest committed packetread_seqdropped_packetsstart_seqolder than currentread_seq, firmware returns10Recommended App Flow
CMD_RING_INFO.NOTIFY_INFO.CMD_RING_READ(read_seq).NOTIFY_READ_BEGIN.NOTIFY_DATAchunks and rebuild444-byterecords.NOTIFY_DONE.CMD_RING_ADVANCE(next_seq).read_seq == write_seq.Migration Note
Old flow was file-based:
New flow is ring-based: