## Repeated Squaring Method for Modular Exponentiation

Previously on Modular Exponentiation we learned about Divide and Conquer approach to finding the value of $B^P \ \% \ M$. In that article, which is recursive. I also mentioned about an iterative algorithm that finds the same value in same complexity, only faster due to the absence of recursion overhead. We will be looking into that faster algorithm on this post today.

Make sure you know about Bit Manipulation before proceeding.

# Problem

Given three positive integers $B, P$ and $M$, find the value of $B^P \ \% \ M$.

For example, $B=2$, $P=5$ and $M=7$, then $B^P \ \% \ M = 2^5 \ \% \ 7 = 32 \ \% \ 7 = 4$.

# Repeated Squaring Method

Repeated Squaring Method (RSM) takes the advantage of the fact that $A^x \times A^y = A^{x+y}$.

Now, we know that any number can be written as the sum of powers of $2$. Just convert the number to Binary Number System. Now for each position $i$ for which binary number has $1$ in it, add $2^i$ to the sum.

For example, $(19)_{10} = (10011)_2 = ( 2^4 + 2^1 + 2^0)_{10} = (16 + 2 + 1 )_{10}$

Therefore, in equation $B^P$, we can decompose $P$ to the sum of powers of $2$.

So let’s say, $P = 19$, then $B^{19} = B^{2^4 + 2^1 + 2^0} = B^{16+2+1} = B^{16} \times B^2 \times B^1$.

This is the main concept for repeated squaring method. We decompose the value $P$ to binary, and then for each position $i$ (we start from 0 and loop till the highest position at binary form of $P$) for which binary of $P$ has $1$ in $i_{th}$ position, we multiply $B^{2^i}$ to result.

# Code

Here is the code for RSM. The Explanation is below the code.

int bigmod ( int b, int p, int m ) {
int res = 1 % m, x = b % m;
while ( p ) {
if ( p & 1 ) res = ( res * x ) % m;
x = ( x * x ) % m;
p >>= 1;
}
return res;
}


## Explanation

At line $1$, we have the parameters. We simply send the value of $B$, $P$ and $M$ to this function, and it will return the required result.

At line $2$, we initiate some variables. $res$ is the variable that will hold our result. It contains the value $1$ initially. We will multiply $b^{2^i}$ with result to find $b^p$. $x$ is temporary variable that initially contains the value $b^{2^0} = b^1 = b$.

Now, from line $3$ the loop starts. This loop runs until $p$ becomes $0$. Huh? Why is that? Keep reading.

At line $4$ we first check whether the first bit of $p$ is on or not. If it is on, then it means that we have to multiply $b^{2^i}$ to our result. $x$ contains that value, so we multiply $x$ to $res$.

Now line $5$ and $6$ are crucial to the algorithm. Right now, $x$ contains the value of $b^{2^0}$ and we are just checking the $0_{th}$ position of $p$ at each step. We need to update our variables such that they keep working for positions other than $0$.

First, we update the value of $x$. $x$ contains the value of $b^{2^i}$. On next iteration, we will be working with position $i+1$. So we need to update $x$ to hold $b^{2^{i+1}}$.

$b^{2^{i+1}} = b^{2^i \times 2^1} = b ^ {2^i \times 2} = b^{2^i + 2^i} = b^{2^i} \times b^{2^i} = x \times x$.

Hence, new value of $x$ is $x \times x$. We make this update at line $5$.

Now, at each step we are checking the $0_{th}$ position of $p$. But next we need to check the $1_{st}$ position of $p$ in binary. Instead of checking $1_{st}$ position of $p$, what we can do is shift $p$ $1$ time towards right. Now, checking $0_{th}$ position of $p$ is same as checking $1_{st}$ position. We do this update at line $6$.

These two lines ensures that our algorithm works on each iteration. When $p$ becomes $0$, it means there is no more bit to check, so the loop ends.

## Complexity

Since there cannot be more than $log_{2}(P)$ bits in $P$, the loop at line $3$ runs at most $log_{2}(P)$ times. So the complexity is $log_{2}(P)$.

# Conclusion

RSM is significantly faster than D&C approach due to lack of recursion overhead. Hence, I always use this method when I have to find Modular Exponentiation.

The code may seem a little confusing, so feel free to ask questions.

When I first got my hands on this code, I had no idea how it worked. I found it in a forum with a title, “Faster Approach to Modular Exponentiation”. Since then I have been using this code.

# Resources

1. forthrigth48 – Modular Exponentiation
2. forthright48 – Bit Manipulation
3. algorithmist – Repeated Squaring
4. Wiki – Exponentiation by Squaring

## Euler’s Theorem and Fermat’s Little Theorem

We will be looking into two theorems at the same time today, Fermat’s Little Theorem and Euler’s Theorem. Euler’s Theorem is just a generalized version of Fermat’s Little Theorem, so they are quite similar to each other. We will focus on Euler’s Theorem and its proof. Later we will use Euler’s Theorem to prove Fermat’s Little Theorem.

# Euler’s Theorem

Theorem – Euler’s Theorem states that, if $a$ and $n$ are coprime, then $a^{\phi(n)} \equiv 1 (\text{mod n})$ – Wikipedia

Here $\phi(n)$ is Euler Phi Function. Read more about Phi Function on this post – Euler Totient or Phi Function.

## Proof

Let us consider a set $A = {b_1, b_2, b_3…,b_{\phi(n)} }\ (\text{mod n})$, where $b_i$ is coprime to $n$ and distinct. Since there are $\phi(n)$ elements which are coprime to $n$, $A$ contains $\phi(n)$ integers.

Now, consider the set $B = { ab_1, ab_2, ab_3….ab_{\phi(n)} } (\text{mod n})$. That is, $B$ is simply set $A$ where we multiplied $a$ with each element. Let $a$ be coprime to $n$.

Lemma – Set $A$ and set $B$ contains the same integers.

We can prove the above lemma in three steps.

1. $A$ and $B$ has the same number of elements

Since $B$ is simply every element of $A$ multiplied with $a$, it contains the same number of elements as $A$. This is obvious.

2. Every integer in $B$ is coprime to $n$

An integer in $B$ is of form $a \times b_i$. We know that both $b_i$ and $a$ are coprime to $n$, so $ab_i$ is also coprime to $n$.

3. $B$ contains distinct integers only

Suppose $B$ does not contain distinct integers, then it would mean that there is such a $b_i$ and $b_j$ such that:

$ab_i \equiv ab_j \ (\text{mod n})$
$b_i \equiv b_j \ (\text{mod n})$

But this is not possible since all elements of $A$ are distinct, that is, $b_i$ is never equal to $b_j$. Hence, $B$ contains distinct elements.

With these three steps, we claim that, since $B$ has the same number of elements as $A$ which are distinct and coprime to $n$, it has the same elements as $A$.

Now, we can easily prove Euler’s Theorem.

$ab_1 \times ab_2 \times ab_3… \times ab_{\phi(n)} \equiv b_1 \times b_2 \times b_3… \times b_{\phi(n)} \ (\text{mod n})$
$a^{\phi(n)} \times b_1 \times b_2 \times b_3… \times b_{\phi(n)} \equiv b_1 \times b_2 \times b_3… \times b_{\phi(n)} \ (\text{mod n})$
$$\therefore a^{\phi(n)} \equiv 1 \ (\text{mod n})$$

# Fermat’s Little Theorem

Fermat’s Little Theorem is just a special case of Euler’s Theorem.

Theorem – Fermat’s Little Theorem states that, if $a$ and $p$ are coprime and $p$ is a prime, then $a^{p-1} \equiv 1 \ (\text{mod p})$ – Wikipedia

As you can see, Fermat’s Little Theorem is just a special case of Euler’s Theorem. In Euler’s Theorem, we worked with any pair of value for $a$ and $n$ where they are coprime, here $n$ just needs to be prime.

We can use Euler’s Theorem to prove Fermat’s Little Theorem.

Let $a$ and $p$ be coprime and $p$ be prime, then using Euler’s Theorem we can say that:

$a^{\phi(p)} \equiv 1 \ (\text{mod p})$ (But we know that for any prime $p$, $\phi(p) = p-1$)
$a^{p-1} \equiv 1 \ (\text{mod p})$

# Conclusion

Both theorems have various applications. Finding Modular Inverse is a popular application of Euler’s Theorem. It can also be used to reduce the cost of modular exponentiation. Fermat’s Little Theorem is used in Fermat’s Primality Test.

There are more applications but I think it’s better to learn them as we go. Hopefully, I will be able to cover separate posts for each of the applications.

# Reference

1. Wiki – Euler’s Theorem
2. forthright48 – Euler Totient or Phi Function
3. Wiki – Fermat’s Little Theorem

# Problem

Given two integers $A$ and $B$, find number of primes inside the range of $A$ and $B$ inclusive. Here, $1 \leq A \leq B \leq 10^{12}$ and $B – A \leq 10^5$.

For example, $A = 11$ and $B = 19$, then answer is $4$ since there are $4$ primes within that range ($11$,$13$,$17$,$19$).

If limits of $A$ and $B$ were small enough ( $\leq 10^8$ ), then we could solve this problem using the ordinary sieve. But here limits are huge, so we don’t have enough memory or time to run normal sieve. But note that, $B – A \leq 10^5$. So even though we don’t have memory/time to run sieve from $1$ to $N$, we have enough memory/time to cover $A$ to $B$.

$A$ to $B$ is a segment, and we are going to modify our algorithm for Sieve of Eratosthenes to cover this segment. Hence, the modified algorithm is called Segmented Sieve of Eratosthenes.

Make sure you fully understand how “Normal” Sieve of Eratosthenes works.

# How Normal Sieve Works

First, let us see the steps for “Normal” sieve. In order to make things simpler, we will be looking into a somewhat unoptimized sieve. You can implement your own optimizations later.

Suppose we want to find all primes between $1$ to $N$.

1. First we define a new variable $sqrtn = \sqrt{N}$.
2. We take all primes less than $sqrtn$.
3. For each prime $p$, we repeat the following steps.
1. We start from $j = p \times p$.
2. If $j \leq N$, we mark sieve at position $j$ to be not prime.
Else, we break out of the loop.
3. We increase the value of $j$ by $p$. And go back to step $2$ of this loop.
4. All positions in the sieve that are not marked are prime.

This is how the basic sieve works. We will now modify it to work on segments.

# How Segmented Sieve Works

We will perform the same steps as normal sieve but just slightly modified.

## Generate Primes Less Than $\sqrt{N}$

In the segmented sieve, what is the largest limit possible? $10^{12}$. So let $N = 10^{12}$

First of all, in the normal sieve, we worked with primes less than $\sqrt{N}$ only. So, if we had to run sieve from $1$ to $N$, we would have required only primes less than $\sqrt{N} = 10^6$. So in order to run a sieve on a segment between $1$ to $N$, we won’t require primes greater than $\sqrt{N}$.

So, using normal sieve we will first generate all primes less than $\sqrt{N} = 10^6$.

## Run on Segment

Okay, now we can start our “Segmented” Sieve. We want to find primes between $A$ and $B$.

1. If $A$ is equal to $1$, then increase $A$ by $1$. That is, make $A = 2$. Since $1$ is not a prime, this does not change our answer.
2. Define a new variable $sqrtn = \sqrt{B}$.
3. Declare a new array of size $dif = \text{maximum difference of }(B – A) + 1$. Since it is given in our problem that $B-A \leq 10^5$, $dif = 10^5 + 1$ for this problem.

Let the array be called $arr$. This array has index from $0$ to $dif-1$. Here $arr[0]$ represents the number $A$, $arr[1]$ represents $A+1$ and so on.

4. Now, we will be working with all primes less than $sqrtn$. These primes are already generated using the normal sieve.
5. For each prime $p$, we will repeat the following steps:

1. We start from $j = p \times p$.
2. But initial value of $j = p^2$ might be less than $A$. We want to mark positions between $A$ and $B$ only. So we will need to shift $j$ inside the segment.

So, if $j < A$, then $j = ceil ( \frac{A}{p} ) \times p = \frac{A+p-1}{p} \times p$. This line makes $j$ the smallest multiple of $p$ which is bigger than $A$.

3. If $j \leq B$, we mark sieve at position $j$ to be not prime.
Else, we break out of the loop.

But when marking, remember that our array $arr$ is shifted by $A$ positions. $arr[0]$ indicates position $A$ of normal sieve. So, we will mark position $j – A$ of $arr$ as not prime.

4. Increase the value of $j$ by $p$. Repeat loop from step $3$.
6. All positions in $arr$ which has not been marked are prime.

Step $1$ is important. Since we only mark multiples of prime as not prime in the pseudocode above, $1$ which has no prime factor never gets marked. So we handle it by increasing the value of $A$ by $1$ when $A = 1$.

# Code

If we convert the above pseducode into C++, then it becomes something like this:

int arr[SIZE];

// Returns number of primes between segment [a,b]
int segmentedSieve ( int a, int b ) {
if ( a == 1 ) a++;

int sqrtn = sqrt ( b );

memset ( arr, 0, sizeof arr ); // Make all index of arr 0.

for ( int i = 0; i < prime.size() && prime[i] <= sqrtn; i++ ) {
int p = prime[i];
int j = p * p;

// If j is smaller than a, then shift it inside of segment [a,b]
if ( j < a ) j = ( ( a + p - 1 ) / p ) * p;

for ( ; j <= b; j += p ) {
arr[j-a] = 1; // mark them as not prime
}
}

int res = 0;
for ( int i = a; i <= b; i++ ) {
// If it is not marked, then it is a prime
if ( arr[i-a] == 0 ) res++;
}
return res;
}


In line $1$ we declare an array of $SIZE$. This array needs to be as large as the maximum difference between $B-A+1$. Next in line $4$ we declare a function that finds the number of primes between $a$ and $b$. Rest of the code is the same as the pseudocode above.

It is possible to optimize it further ( both for speed and memory ) but in the expense of clarity. I am sure if readers understand the core concept behind this algorithm, then they will have no problem tweaking the code to suit their needs.

# Conclusion

I first learned about Segmented Sieve from blog of +Zobayer Hasan. You can have a look at that post here. I wasn't really good at bit manipulation back then, so it looked really scary. Later I realized it's not as hard as it looks. Hopefully, you guys feel the same.