Skip to content

Performance optimization opportunity: combinator.c file I/O bottleneck #97

Description

@fireonthemountain

The current combinator.c implementation has an O(n*m) file I/O bottleneck:

  • File2 is completely re-read for every line in file1 (line 217: rewind(fd2))
  • For a 10,000 × 1,000 line combination, this results in 10 million file reads instead of 1
  • Performance degrades exponentially as file1 size increases

Performance Impact

Stress Test Results (10,000 × 1,000 lines = 10M combinations):

  • Current implementation: Timeout (>60 seconds)
  • Optimized implementation: <1 second
  • Speedup: >60x improvement

Memory usage remains similar (~33MB for both versions)

Proposed Solution

I've created combinator_optimized.c that implements:

  1. File2 Memory Caching: Load entire file2 once into memory, eliminating all rewind operations
  2. Enhanced Error Handling: Added malloc() null pointer checks to prevent segfaults
  3. Combined Line Processing: Single-pass carriage return stripping
  4. Improved Buffer Management: More efficient I/O batching

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions