Address streaming writes and p-value correction conflicts

# Challege
When p-value correction is requested (the default: `correct.p.value.terms = c("fdr")`), all results must be held in memory. For millions of elements × dozens of statistics columns, this data frame can be very large. The correction itself via stats::p.adjust() requires the complete vector of p-values across all elements:
```
df_out[[tempstr.corrected]] <- stats::p.adjust(df_out[[tempstr.raw]], method = methodstr)
```

This means you cannot do a fully streaming, low-memory run if you want FDR correction — the entire results matrix must exist in memory simultaneously. As a result, when streaming writes and p-value corrections are both active, results are written twice.

```
# First: uncorrected chunks streamed during the loop
writer <- .results_stream_write_block(writer, chunk_df)

# Then: full corrected results overwritten after the loop
if (!is.null(writer) && (need_term_correction || need_model_correction)) {
  writeResults(
    fn.output = write_results_file,
    df.output = df_out,
    analysis_name = write_results_name,
    overwrite = TRUE
  )
}
```
This doubles the I/O cost.

Consider a two-pass approach to FDR correction:
1. During the main element-wise loop, stream full results to HDF5 as currently done, but additionally collect only the p-value columns into memory using a pre-allocated matrix
2. After the model fitting loop, correct the p-values using `stats::p.adjust()` on the in-memory p-value matrix, then patch the HDF5 file with the corrected columns.

# Theoretical benefit:
Consider an example analysis with 1 million elements, the formula `FD ~ age + sex + site` produces an intercept and 3 predictors (4 terms), with var.terms = c("estimate", "statistic", "p.value") and var.model = c("adj.r.squared", "p.value")
| Column type | Count | Description |
|---|---|---|
| `element_id` | 1 | Element index |
| `<term>.estimate` | 4 | Per-term estimates |
| `<term>.statistic` | 4 | Per-term t-statistics |
| `<term>.p.value` | 4 | Per-term p-values |
| `model.adj.r.squared` | 1 | Model R² |
| `model.p.value` | 1 | Model F-test p-value |
| **Total columns** | **15** | |
| **P-value columns only** | **5** | 4 term + 1 model |

| Approach | In-memory footprint at 1M elements |
|---|---|
| Current (full `df_out`) | 1M × 15 × 8 bytes = **120 MB** |
| Two-pass (p-values only) | 1M × 5 × 8 bytes = **40 MB** |
| Savings | **67%** |

With `full.outputs = TRUE` the column count jumps to ~30+, making the savings even larger — the p-value column count stays the same while total columns grow.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address streaming writes and p-value correction conflicts #128

Challege

Theoretical benefit:

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Column type	Count	Description
`element_id`	1	Element index
`<term>.estimate`	4	Per-term estimates
`<term>.statistic`	4	Per-term t-statistics
`<term>.p.value`	4	Per-term p-values
`model.adj.r.squared`	1	Model R²
`model.p.value`	1	Model F-test p-value
Total columns	15
P-value columns only	5	4 term + 1 model

Approach	In-memory footprint at 1M elements
Current (full `df_out`)	1M × 15 × 8 bytes = 120 MB
Two-pass (p-values only)	1M × 5 × 8 bytes = 40 MB
Savings	67%

Address streaming writes and p-value correction conflicts #128

Description

Challege

Theoretical benefit:

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions