Segmented Sieve of Eratosthenes

Problem

Given two integers $A$ and $B$, find number of primes inside the range of $A$ and $B$ inclusive. Here, $1 \leq A \leq B \leq 10^{12}$ and $B – A \leq 10^5$.

For example, $A = 11$ and $B = 19$, then answer is $4$ since there are $4$ primes within that range ($11$,$13$,$17$,$19$).

If limits of $A$ and $B$ were small enough ( $\leq 10^8$ ), then we could solve this problem using the ordinary sieve. But here limits are huge, so we don’t have enough memory or time to run normal sieve. But note that, $B – A \leq 10^5$. So even though we don’t have memory/time to run sieve from $1$ to $N$, we have enough memory/time to cover $A$ to $B$.

$A$ to $B$ is a segment, and we are going to modify our algorithm for Sieve of Eratosthenes to cover this segment. Hence, the modified algorithm is called Segmented Sieve of Eratosthenes.

Make sure you fully understand how “Normal” Sieve of Eratosthenes works.

How Normal Sieve Works

First, let us see the steps for “Normal” sieve. In order to make things simpler, we will be looking into a somewhat unoptimized sieve. You can implement your own optimizations later.

Suppose we want to find all primes between $1$ to $N$.

  1. First we define a new variable $sqrtn = \sqrt{N}$.
  2. We take all primes less than $sqrtn$.
  3. For each prime $p$, we repeat the following steps.
    1. We start from $j = p \times p$.
    2. If $j \leq N$, we mark sieve at position $j$ to be not prime.
      Else, we break out of the loop.
    3. We increase the value of $j$ by $p$. And go back to step $2$ of this loop.
  4. All positions in the sieve that are not marked are prime.

This is how the basic sieve works. We will now modify it to work on segments.

How Segmented Sieve Works

We will perform the same steps as normal sieve but just slightly modified.

Generate Primes Less Than $\sqrt{N}$

In the segmented sieve, what is the largest limit possible? $10^{12}$. So let $N = 10^{12}$

First of all, in the normal sieve, we worked with primes less than $\sqrt{N}$ only. So, if we had to run sieve from $1$ to $N$, we would have required only primes less than $\sqrt{N} = 10^6$. So in order to run a sieve on a segment between $1$ to $N$, we won’t require primes greater than $\sqrt{N}$.

So, using normal sieve we will first generate all primes less than $\sqrt{N} = 10^6$.

Run on Segment

Okay, now we can start our “Segmented” Sieve. We want to find primes between $A$ and $B$.

  1. If $A$ is equal to $1$, then increase $A$ by $1$. That is, make $A = 2$. Since $1$ is not a prime, this does not change our answer.
  2. Define a new variable $sqrtn = \sqrt{B}$.
  3. Declare a new array of size $dif = \text{maximum difference of }(B – A) + 1$. Since it is given in our problem that $B-A \leq 10^5$, $dif = 10^5 + 1$ for this problem.

    Let the array be called $arr$. This array has index from $0$ to $dif-1$. Here $arr[0]$ represents the number $A$, $arr[1]$ represents $A+1$ and so on.

  4. Now, we will be working with all primes less than $sqrtn$. These primes are already generated using the normal sieve.
  5. For each prime $p$, we will repeat the following steps:

    1. We start from $j = p \times p$.
    2. But initial value of $j = p^2$ might be less than $A$. We want to mark positions between $A$ and $B$ only. So we will need to shift $j$ inside the segment.

      So, if $j < A$, then $j = ceil ( \frac{A}{p} ) \times p = \frac{A+p-1}{p} \times p$. This line makes $j$ the smallest multiple of $p$ which is bigger than $A$.

    3. If $j \leq B$, we mark sieve at position $j$ to be not prime.
      Else, we break out of the loop.

      But when marking, remember that our array $arr$ is shifted by $A$ positions. $arr[0]$ indicates position $A$ of normal sieve. So, we will mark position $j – A$ of $arr$ as not prime.

    4. Increase the value of $j$ by $p$. Repeat loop from step $3$.
  6. All positions in $arr$ which has not been marked are prime.

Step $1$ is important. Since we only mark multiples of prime as not prime in the pseudocode above, $1$ which has no prime factor never gets marked. So we handle it by increasing the value of $A$ by $1$ when $A = 1$.

Code

If we convert the above pseducode into C++, then it becomes something like this:

int arr[SIZE];

// Returns number of primes between segment [a,b]
int segmentedSieve ( int a, int b ) {
    if ( a == 1 ) a++;

    int sqrtn = sqrt ( b );

    memset ( arr, 0, sizeof arr ); // Make all index of arr 0.

    for ( int i = 0; i < prime.size() && prime[i] <= sqrtn; i++ ) {
        int p = prime[i];
        int j = p * p;

        // If j is smaller than a, then shift it inside of segment [a,b]
        if ( j < a ) j = ( ( a + p - 1 ) / p ) * p;

        for ( ; j <= b; j += p ) {
            arr[j-a] = 1; // mark them as not prime
        }
    }

    int res = 0;
    for ( int i = a; i <= b; i++ ) {
        // If it is not marked, then it is a prime
        if ( arr[i-a] == 0 ) res++;
    }
    return res;
}

In line $1$ we declare an array of $SIZE$. This array needs to be as large as the maximum difference between $B-A+1$. Next in line $4$ we declare a function that finds the number of primes between $a$ and $b$. Rest of the code is the same as the pseudocode above.

It is possible to optimize it further ( both for speed and memory ) but in the expense of clarity. I am sure if readers understand the core concept behind this algorithm, then they will have no problem tweaking the code to suit their needs.

Conclusion

I first learned about Segmented Sieve from blog of +Zobayer Hasan. You can have a look at that post here. I wasn't really good at bit manipulation back then, so it looked really scary. Later I realized it's not as hard as it looks. Hopefully, you guys feel the same.

Leave a comment if you face any difficulty in understanding the post.

Reference

  1. forthright48 - Sieve of Eratosthenes
  2. zobayer - Segmented Sieve

Related Problems

  1. SPOJ PRIME1 - Prime Generator
  2. LOJ 1197 - Help Hanzo

Euler Totient or Phi Function

I have been meaning to write a post on Euler Phi for a while now, but I have been struggling with its proof. I heard it required the Chinese Remainder Theorem, so I have been pushing this until I covered CRT. But recently, I found that CRT is not required and it can be proved much more easily. In fact, the proof is so simple and elegant that after reading it I went ahead and played Minecraft for 5 hours to celebrate.


Problem

Given an integer $N$, how many numbers less than or equal $N$ are there such that they are coprime to $N$? A number $X$ is coprime to $N$ if $gcd(X,N)=1$.

For example, if $N = 10$, then there are $4$ numbers, namely ${1,3,7,9}$, which are coprime to $10$.

This problem can be solved using Euler Phi Function, $phi()$. Here is the definition from Wiki:

In number theory, Euler’s totient function (or Euler’s phi function), denoted as $\phi(n)$, is an arithmetic function that counts the positive integers less than or equal to n that are relatively prime to n. – Wiki

That’s exactly what we need to find in order to solve the problem above. So, how does Euler Phi work?

Euler Phi Function

Before we go into its proof, let us first see the end result. Here is the formula using which we can find the value of the $phi()$ function. If we are finding Euler Phi of $N = p_1^{a_1}p_2^{a_2}…p_k^{a_k}$, then:

$$\phi(n) = n \times \frac{p_1-1}{p_1} \times \frac{p_2-1}{p_2}… \times \frac{p_k-1}{p_k}$$

If you want you can skip the proof and just use the formula above to solve problems. That’s what I have been doing all these years. But I highly recommend that you read and try to understand the proof. It’s simple and I am sure someday the proof will help you out in an unexpected way.

Read More

Bit Manipulation

On this post, we will look into how to use “Bitwise Operators” to manipulate bits of a number. Bit manipulation is a useful tool that can sometimes perform complicated tasks in a \simple manner.

We will work on integer values and assume each number has $32$ bits in them. If a number has only $0$ or $1$ as its digit, then assume that it is in binary form. All position of bits is $0$ indexed.

Since binary values can only be $0$ or $1$, we sometimes refer them like light bulbs. A bit being “Off” means it has value $0$ and being “On” means it has value $1$. Bit $1$ is also referred as “Set” and bit $0$ as “Reset”.

We need to have a strong grasp of how AND, OR, Negation, XOR and Shift Operators work to understand how bit manipulation works. If you have forgotten about them, then make sure you revise.

Checking Bit at Position X

Given a number $N$, find the value of its bit at position $X$. For example, if $N=12=(1100)_2$ and $X=2$, then value of bit is $1$.

So how do we find it? We can find this value in three steps:

  1. Let, $mask = 1 << X$
  2. Let, $res = N \ \& \ mask$
  3. If $res$ is $0$, then bit is $0$ at that position, else bit is $1$.

Let us see it in action with the example above, $N = 12$ and $X = 2$.

$1100$
$\underline{0100} \&$ ( $1 << 2$ is just $1$ shifted to left twice )
$0100$

Now, what happened here? In step $1$, when we left shifted $1$ $X$ times, we created a number $mask$ which has bit $1$ only at position $X$. This is a useful application of left shift operator. Using this, we can move $1$ bit anywhere we want.

Next, at step $2$, we find the result of performing AND operator between $N$ and $mask$. Since $mask$ has $0$ in all positions except $X$, the result will have $0$ in all places other than $X$. That’s because performing AND with $0$ will always give $0$. Now, what about position $X$. Will the result at position $X$ have $0$ too? That depends on the bit of $N$ at position $X$. Since, the $X$ bit in the mask is $1$, the only way result will have $0$ at that position if $N$ has $0$ in that position.

So, when all position of $res$ is $0$, that is $res$ has value $0$, we can say that $N$ has $0$ bit in position $X$, otherwise it doesn’t.

Set Bit at Position $X$

Given a value $N$, turn the bit at position $X$ of $N$ to $1$. For example, $N=12=1100$ and $X=0$, then $N$ will become $13=1101$.

This can be done in two steps:

  1. Let, $mask = 1 << X$
  2. $N = N \ | \ mask$

$1100$
$\underline{0001} \ |$
$1101$

In step $1$, we shift a $1$ bit to position $X$. Then we perform OR between $N$ and $mask$. Since $mask$ has 0 in all places except $X$, in the result, all those places remain unchanged. But $mask$ has $1$ in position $X$, so in result position $X$ will be $1$.

Reset Bit at Position $X$

Given a value $N$, turn the bit at position $X$ of $N$ to $0$. For example, $N=12=1100$ and $X=3$, then $N$ will become $4=0100$.

This cane be done in three steps:

  1. Let, $mask = 1 << X$
  2. $mask = \sim mask$
  3. $N = N \ \& \ mask$

$1100$
$\underline{0111} \ \&$ ( We first got $1000$ from step $1$. Then used negation from step 2.)
$0100$

In step $1$ we move $1$ bit to position $X$. This gives us a number of $1$ in position $X$ and $0$ in all other places. Next, we negate the $mask$. This flips all bits of $mask$. So now we have $0$ in position $X$ and $1$ in all other places. Now when we perform AND between $mask$ and $N$, it forces the bit at the $X$ position to $0$ and all other bits stay intact.

Toggle Bit at Position X

Given a value $N$, toggle the bit at position $X$, i.e if the bit at $X$ is $0$ then make it $1$ and if it is already $1$, make it $0$.

This can be done in two steps:

  1. Let, $mask = 1 << X$
  2. $N = N \ \wedge \ mask$

First, we shift bit $1$ to our desired position $X$ in step $1$. When XOR is performed between a bit and $0$, the bit remains unchanged. But when XOR performed between a bit and $1$, it toggles. So all position except $X$ toggles during step 2, since all position in mask except $X$ is $0$.

Coding Tips

When coding these in C++, make sure you use lots of brackets. Bitwise operators have lower precedence than $==$ and $!=$ operator which often causes trouble. See here.

What would be the output of this code:

if ( 0 & 6 != 7 ) printf ( "True\n" );
else printf ( "False\n" );

You might expect things to follow this order: $0 \ \& \ 6 = 0$ and $0 \ \neq 7$, so “True” will be output. But due to precedence of operators, the output comes out “False”. First $6 \ != 7$ is checked. This is true, so $1$ is returned. Next $0 \ \& \ 1$ is performed which is $0$.

The best way to avoid these kind of mishaps is to use brackets whenever you are using bitwise operators. The correct way to write the code above is:

if ( (0 & 6) != 7 ) printf ( "True\n" );
else printf ( "False\n" );

This prints “True” which is our desired result.

Conclusion

These are the basics of bit manipulation and by no means the end of it. There are lots of other tricks that are yet to be seen, such as “Check Odd or Even”, “Check Power of Two”, “Right Most Bit”. We will look into them some other day.

Reference

  1. forthright48 – Bitwise Operators
  2. CPPReference – C++ Operator Precedence