Currently, memory usage of the prime sieve increases linearly with the size of the interval [lo, hi], while the number of primes only grows asymptotically as (hi) / log (hi). Furthermore, L1 memory caching is not exploited.
By implementing a segmented sieve both problems can be addressed at once: lowering memory consumption and increasing speed (in C++ I noticed up to 10x 5x better performance).
The current algorithm is basically:
- Discover prime numbers
p less than √hi
- Cross off multiples of
p starting at p * p.
- Collect all prime numbers in
[√hi, hi]
The suggested procedure is:
- Discover all prime numbers
p less than √hi
- Divide
[√hi, hi] into fixed size segments / smaller intervals (fitting L1 cache)
- Outer loop: iterate over each segment
- Inner loop: cross off all multiples of all primes in the current segment
- Collect all primes in the current segment
You only need to store one segment (constant memory allocation) in step 2/3 and if the segment fits in L1 cache it is extremely fast, since you're iterating over it multiple times when sieving.
For reference see http://primesieve.org/
Currently, memory usage of the prime sieve increases linearly with the size of the interval
[lo, hi], while the number of primes only grows asymptotically as(hi) / log (hi). Furthermore, L1 memory caching is not exploited.By implementing a segmented sieve both problems can be addressed at once: lowering memory consumption and increasing speed (in C++ I noticed up to
10x5x better performance).The current algorithm is basically:
pless than√hipstarting atp * p.[√hi, hi]The suggested procedure is:
pless than√hi[√hi, hi]into fixed size segments / smaller intervals (fitting L1 cache)You only need to store one segment (constant memory allocation) in step 2/3 and if the segment fits in L1 cache it is extremely fast, since you're iterating over it multiple times when sieving.
For reference see http://primesieve.org/