Prime Factorization of Factorial

Problem

Given a positive integer $N$, find the prime factorization of $N!$.

For example, $5! = 5 \times 4 \times 3 \times 2 \times 1 = 120 = 2^3 \times 3 \times 5$.

Brute Force Solution

A possible solution is to calculate the value of $x = N!$ and then prime factorize $x$. But calculating $N!$ is tedious. We cannot fit $N!$ where $N > 20$ in a long long variable. We will need to use the Big Integer class and that would make things slow. I will soon write a blog post on Big Integer until then know that using Big Integer it would take more than $N^2$ steps to calculate $N!$.

Is there a better way?

Limits on Prime

Before we move on to the solution, let us first decide the limit on prime. In order to factorize $x = N!$, we have to generate prime numbers. But up to which value? Should we generate all primes less than $\sqrt{x}$?

Even for a small value of $N$ like $100$, $x$ can be huge with over $100$ digits in it, thus, $\sqrt{x}$ will also be huge. Generating so many primes is not feasible. Using Sieve of Eratosthenes we could generate primes around $10^8$, which is nowhere near $\sqrt{100!}$.

Note that $N! = N \times (N-1) \times (N-2) \times … \times 2 \times 1$. That is, $N!$ is a product of numbers less than $N$ only. Now, can there be any prime greater than $N$ that can divide $N!$?

Suppose there is a number $A$ and we factorized it. It is trivial to realize that all its prime factors will be less than or equal to $A$. So in $N!$, which is the product of numbers less than $N$, if we decompose all those numbers to their prime factors, then they will reduce to primes less than or equal to $N$.

For example, $6! = 6 \times 5 \times 4 \times 3 \times 2 \times 1 = (2 \times 3) \times 5 \times 2^2 \times 3 \times 2 = 2^4 \times 3^2 \times 5$.

So the prime factors of $N!$ will be less than or equal to $N$. Generating primes till $\sqrt{N!}$ is not necessary. We just need to generate all primes less than or equal to $N$.

Prime Factorization

Now that we know the limit for primes, we are ready to begin factorizing the factorial. There is more than one way to achieve this. We will see three of them and discuss which one is best.

First – Linear Loop From $1$ to $N$

We know that $N! = N \times (N-1) \times (N-2) \times … \times 2 \times 1$. So we could simply factorize every number from $1$ to $N$ and add to a global array that tracks the frequency of primes. Using the code for $factorize()$ from here, we could write a solution like below.

vector<int> prime;
int primeFactor[SIZE]; // Size should be as big as N

void factorize( int n ) {
    int sqrtn = sqrt ( n );
    for ( int i = 0; i < prime.size() && prime[i] <= sqrtn; i++ ) {
        if ( n % prime[i] == 0 ) {
            while ( n % prime[i] == 0 ) {
                n /= prime[i];
                primeFactor[ prime[i] ]++; // Increment global primeFactor array
            }
            sqrtn = sqrt ( n );
        }
    }
    if ( n != 1 ) {
        primeFactor[n]++;
    }
}

void factFactorize ( int n ) {
    for ( int i = 2; i <= n; i++ ) {
        factorize( i );
    }

    // Now We can print the factorization
    for ( int i = 0; i < prime.size(); i++ ) {
        printf ( "%d^%d\n", prime[i], primeFactor[ prime[i] ] );
    }
}

We pass the value of $N$ to $factFactorize()$ in line $20$, and it calculates the frequency of each prime in $N!$. It starts a loop from $2$ to $N$ and factorizes each of them. In line $4$ we have the $factorize()$ function modified a bit in line $10$ and $16$ to suit our need. When those lines find a prime factor, they increase the frequency of that prime in the $primeFactor$ array.

It is simple and straightforward but takes $O(N \times factorize() )$ amount of time. We can do better.

Second - Summation of Each Prime Frequency

Instead of factorizing from $1$ to $N$, how about we just find out how many times each prime occurs in $N!$ and list them. If $p_1$ occurs $a_1$ times, $p_2$ occurs $a_2$ times$...$ $p_x$ occurs $a_x$ times, then $N! = p_1^{a_1} \times p_2^{a_2} \times ... \times p_x^{a_x}$.

That sounds nice, but how do we find the frequency of prime factors in $N!$. Let us just focus on one prime factor, for example, $2$, and find out how many times it occurs in $N!$. We will extend this idea to other primes.

Let $N = 12$. How many times does $2$ occur in $12!$? We know that $12! = 12 \times 10 \times 9 \times ... \times 1$. How many numbers from $1$ to $12$ has $2$ as their prime factors? $\frac{12}{2} = 6$ numbers do and they are ${ 2, 4, 6, 8, 10, 12 }$. So we can say that at least $2^6$ is a factor of $12!$. But is there more?

Yes. Notice that $4 = 2^2$, so it has an extra $2$ in it that we did not count. That means for each number that has $2^2$ in them as a factor, we need to add $1$ to our result. How many numbers are there which has $2^2$ as their factor? $\frac{12}{4} = 3$ numbers, which are ${ 4, 8, 12 }$. So we increase our frequency to $6 + 3 = 9$ and say we have at least $2^9$ in $12!$. But is that it?

No. $8 = 2^3$ and for each number with $2^3$ as factor we add $1$ to result. So our result is $9 + \frac{12}{8} = 9 + 1 = 10$.

Do we try with $16 = 2^4$ now? No. $12!$ cannot have any number with factor $2^4$ since $\frac{12}{16} = 0$. So we conclude that $12!$ has $2^{10}$ as its factor and no more.

Now, we extend this idea to other primes. What is the frequency of prime factor $3$ in $12!$? $\frac{12}{3} + \frac{12}{9} + \frac{12}{27} = 4 + 1 + 0 = 5$. We repeat this for all primes less than equal to $12$.

Therefore, we can say that for a given prime $p$, $N!$ will have $p^x$ as its prime factor where $x = \frac{N}{p} + \frac{N}{p^2} + \frac{N}{p^3} + ... \text{ Until it becomes 0 }$.

So, using this idea our code will look as the following.

void factFactorize ( int n ) {
    for ( int i = 0; i < prime.size() && prime[i] <= n; i++ ) {
        int p = prime[i];
        int freq = 0;

        while ( n / p ) {
            freq += n / p;
            p *= prime[i];
        }

        printf ( "%d^%d\n", prime[i], freq ); // Printing prime^freq which is factor of N!
    }
}

This code factorizes $N!$ as long as we can generate all primes less than or equal to $N!$. The loop in line $6$ runs until $\frac{n}{p}$ becomes 0.

This code has 3 advantages over the "First" code.

  • We don't have to write $factorize()$ code.
  • Using this code, we can find how many \times a specific prime $p$ occurs in $N!$ in $O(log_p (N))$ time. In the "First" code, we will need to run $O(N)$ loop and add occurrences of $p$ in each number.
  • It has a better complexity for Factorization. Assuming the loop in line $6$ runs $log_2 (N)$ \times, this code has a complexity of $O(N log_2 (N))$. The code runs faster than this since we only loop over primes less than $N$ and at each prime the loop runs only $O(log_p (N))$ \times. The "First" code ran with $O(N \times factorize() )$ complexity, where $factorize()$ has complexity of $O(\frac{ \sqrt{N} }{ ln ( \sqrt{N} ) } + log_2(N) : )$.

This idea still has a small flaw. So the next one is better than this one.

Third - Better Code than Two

Suppose, we want to find out how many times $1009$ occurs in $9 \times 10^{18}!$. Let us modify the "Second" code to write another function that will count the result for us.

long long factorialPrimePower ( long long n, long long p ) {
    long long freq = 0;
    long long cur = p;

    while ( n / cur ) {
        freq += n / cur;
        cur *= p;
    }
    return freq;
}

If we pass $n = 9 \times 10^{18}$ and $p = 1009$, it will return us $8928571428571439$. But this is wrong. The line $7$ in the code overflows resulting in a wrong answer. Try it yourself. Print out the value of $cur$ in each step and see when it overflows.

We could change the condition in line $5$ into something like this to solve the situation:

while ( n / cur > 0 )

But this remedy won't work if $cur \times p$ overflows into a positive number. If we want we could use techniques that avoid multiplying two numbers if it crosses a limit, but there is a simpler way.

Note that $\frac{N}{p^3}$ is same as $\frac{ \frac{N}{p^2} }{p}$. So instead of saying that $res = \frac{N}{p} + \frac{N}{p^2}...$, we could rewrite it as

$x = N; \ res = res + \frac{x}{p}$.
$x = \frac{N}{p} = \frac{x}{p}; \ res = res + \frac{x}{p}$.
$x = \frac{N}{p^2} = \frac{ \frac{N}{p} } {p} = \frac{x}{p}; \ res = res + \frac{x}{p}$.
$...$
$x = 0$

Instead of raising the power of $p$, we divide the value of $N$ by $p$ at each step. This has the same effect.

So our code for finding the frequency of specific prime should look like the following:

long long factorialPrimePower ( long long n, long long p ) {
    long long freq = 0;
    long long x = n;

    while ( x ) {
        freq += x / p;
        x = x / p;
    }

    return freq;
}

There might still be inputs for which this code will overflow, but chances for that is now lower. Now if we send in $n = 9 \times 10^{18}$ and $p = 1009$, then this time we get $8928571428571425$ as our result.

If we apply this improvement in our $factFactorize()$ function, then it will become:

void factFactorize ( int n ) {
    for ( int i = 0; i < prime.size() && prime[i] <= n; i++ ) {
        int x = n;
        int freq = 0;

        while ( x / prime[i] ) {
            freq += x / prime[i];
            x = x / prime[i];
        }

        printf ( "%d^%d\n", prime[i], freq );
    }
}

This code has less chance to overflow so it is better.

Resource

Number of Digits of Factorial

Problem

Given an integer $N$, find number of digits in $N!$.

For example, for $N=3$, number of digits in $N! = 3! = 3 \times 2 \times 1 = 6$ is $1$. For $N=5$, number of digits in $5! = 120$ is $3$.

Brute Force Solution

The first solution that pops into mind is to calculate $N!$ and count how many digits it has. A possible solution will look like the following:

int factorialDigit ( int n ) {
    long long fact = 1;
    for ( int i = 2; i <= n; i++ ) {
        fact *= i;
    }

    int res = 0; // Number of digit of n!
    while ( fact ) { //  Loop until fact becomes 0
        res++;
        fact /= 10; // Remove last digit
    }

    return res;
}

This code works, but only for $N \leq 20$. Once $N$ crosses $20$, it no longer fits in a "long long" variable.

Since the factorial of $N > 20$ overflows long long, how about we use "Big Integer" to store the value?

Brute Force Solution with Big Integer

If you don't know what Big Integer is, then know that it is a class for arbitrary large integer. C++ does not support this class so we will have to manually implement it if we want to use C++ to solve problems involving large integers. Or you could use a programming language that supports this class, such as Java.

I will write a post on Big Integer someday, but for now you could just use the Big Int class written by +Anudeep Nekkanti from here.

Multiplication of a Big Integer and an integer takes $O(\text{number of digits of Big Integer})$. Assuming that when calculating $N!$, the number of digits of $N!$ increases by $1$ at each step, it will take $O(N^2)$ time to compute $N!$. But obviously $N!$ does not increase by $1$ digit at each step ( for e.g, multiply by $100$ increases it by $2$ digits ), so worst time complexity is worse than $O(N^2)$.

Solution Using Logarithm

Logarithm of a number is connected to its number of digits, which might not be apparent. What is logarithm? Logarithm of a number $x$, in base $b$, is a real number $y$ such that $x = b^y$. For example:

$$log_{10}1234 = 3.0913151597 \text{ and } 10^{3.0913151597} = 1234$$

In logarithms, the base of the number is important. Since we want the number of digits of $N!$ in decimal, we will work with base $10$.

Number of Digit of an Integer

First, we will apply the logarithm idea on an integer. So where is the connection of logarithm with digit number? Look at the following logarithm values:

$log_{10}(x) = y$
$log_{10}(1) = 0$
$log_{10}(10) = 1$
$log_{10}(100 )= 2$
$log_{10}(1000) = 3$
$log_{10}(10000) = 4$

As the value of $x$ increases, the value of $y$ also increases. Every time we multiple $x$ by $10$, value of $y$ increases by $1$. That is every time number of digit increases, the value of $y$ increases. From this table, we can infer a few things about the log of other values.

If $log_{10}(100)$ is 2, and $log_{10}(1000)$ is 3, then for all value of $x$ where $100 < x < 1000$ , value of $y$ will lie between $2 < y < 3$. Let us try this out.

$log_{10}(100) = 2$
$log_{10}(150) = 2.17609125906$
$log_{10}(500) = 2.69897000434$
$log_{10}(999) = 2.99956548823$

Now note that, for every $100 \leq x < 1000$, value of $y$ is $2 \leq y < 3$. Can you see some relation between value of $y$ and number of digits of $x$?

Yes. If the value of $y$ is of the form $2.XXX$, then $x$ has $2+1=3$ digits.

$$\therefore \text{number of digits of}: x = \lfloor log_{10} (x) \rfloor + 1$$

Also note that, even though theoretically we just need to add 1 to $log_{10}(x)$ before flooring, it is good practice to add eps (where eps is a small value: $eps = 10^{-9})$ while we are at it. Why? Due to precision error, sometimes our program may say that $log_{10}(100) = 1.9999999$ instead of $2$. In order to avoid such scenes, we add eps to it before flooring.

int numberDigit ( int n ) {
    int wrongAnswer = log10(n) + 1; // This may give wrong answer sometimes.
    int rightAnswer = log10(n) + 1 + eps; // This is right.
    return rightAnswer;
}

If you are wondering, where is the floor function, know that in line $3$, we are assigning $log10(n) + 1 + eps$ to an integer. This has same action as $floor()$ function. Also note that we used $log10()$ function instead of $log()$ function. Unlike our calculators, in C++ $log()$ has base $2$.

Extending To Factorial

So how do we extend this idea to $N!$?

Let $x = log_{10}(N!)$. Then our answer will be $res = \lfloor x \rfloor + 1$. So all we need to do is find value of $x$.

$x = log_{10}(N!)$
$x = log_{10}(1 \times 2 \times 3 \times ... \times N)$
$\therefore x = log_{10}(1) + log_{10}(2) + log_{10}(3) + ... + log_{10}(N)$ This is using the law $log_{10}(ab) = log_{10}(a)+log_{10}(b)$

So in order to calculate $x = log_{10}(N!)$, we don't have to calculate value of $N!$. We can simply add log value of all numbers from $1$ to $N$. This can be achieved in $O(N)$.

int factorialDigit ( int n ) {
    double x = 0;
    for ( int i = 1; i <= n; i++ ) {
        x += log10 ( i );
    }
    int res = x + 1 + eps;
    return res;
}

Digits of $N!$ in Different Base

Now what if we want to find how many digits $N!$ has if we convert $N!$ to some other base.

For example, how many digits $3!$ has in the binary number system with base $2$? We know that $(6)_{10} = (110)_2$. So $3!$ has $3$ digits in base $2$ number system.

Can we use logarithms to solve this problem too? Yes.

$$\text{number of digits of x in base B} = \lfloor log_B(x) \rfloor + 1$$

All we need to do is change the base of our $log$ and it will find the number of digits in that base.

But, how do we change base in our code? We can only use log with base $2$ and $10$ in C++. Fear not, we can use the following law to change base of logartihm from $B$ to $C$.
$$log_B(x) = \frac{log_C(x)}{log_C(B)}$$
So in C++, we will use $C = 2$ or $C=10$ to find value of $log_B(x)$.

int factorialDigitExtended ( int n, int base ) {
    double x = 0;
    for ( int i = 1; i <= n; i++ ) {
        x += log10 ( i ) / log10(base); // Base Conversion
    }
    int res = x + 1 + eps;
    return res;
}

UVa 10407 – Simple division

Problem

Problem Link – UVa 10407 – Simple division

Given an array of numbers, find the largest number $d$ such that, when elements of the array are divided by $d$, they leave the same remainder

Solution

We will be using our knowledge about Congruence Relation and GCD to solve this problem. But first, we need to make sure that we understand the problem properly.

Wrong Approach

At first, one might think that $d$ is the $gcd$ of array elements. Surely, dividing elements of the array with $d$ will leave same remainder $0$, but that doesn’t necessarily mean it is the largest possible answer.

For example, suppose the array is $A = {2,8,14,26}$. Then what will be the $gcd$ of array A? $gcd(2,8,14,26) = 2$. Dividing elements of $A$ with $2$ leaves us $0$. So is this our answer? No.

For array $A$, there exists a number greater than $2$ that leaves the same remainder. And that number is $3$. When we divide each element with $3$, it leaves us $2$ as the remainder.

The problem did not ask us to find the number that will leave $0$ as remainder. It asked us to find the largest number $d$ that leaves the same remainder. So for array $A$ the $gcd(A)$ is not the answer. Is $3$ our answer? No. The answer is given below in Example.

Using Congruence Relation

Let us rephrase the problem using congruence relation. Suppose there is an array with elements, $A={a,b,c}$. We need to find the largest number $d$ that leaves the same remainder for each of its element.

Let us consider only ${a,b}$ for now. We need to find $d$ such that it leaves the same remainder when it divides $a$ and $b$. Meaning

$$a \ \equiv \ b \ \text{(mod d)} \ a – b \ \equiv \ 0 \ \text{(mod d)}$$

What does this mean? It means, if $d$ leaves the same remainder when it divides $a$ and $b$, then it will leave no remainder when dividing $a-b$.

Using same logic, we can say that $b – c \ \equiv \ 0 \ \text{(mod d)}$. Therefore, if we find a number that divides $a-b$ and $b-c$, then that number will leave the same remainder when dividing ${a,b,c}$.

This idea can be extended for any number of elements in the array. So $d$ is such a number that leaves no remainder when it divides the difference of adjacent elements in the array.

But wait, there are multiple numbers that can divide the difference of adjacent elements. Which one should we take? Since we want the largest value of $d$, we will take the largest divisor that can divide all the difference of adjacent numbers. There is only one divisor that can divide all the adjacent differences, and that is $gcd( (a-b), (b-c) )$.

Therefore, if we have an array $A={a_1,a_2,a_3…a_n}$, then $d = gcd( a_2 – a_1, a_3- a_2,…a_n – a_{n_1})$.

Careful about negative value of $gcd()$. Make sure you take the absolute value of $gcd()$.

Example

Let us get back to the example we were looking into before. Finding $d$ for the array $A = {2,8,14,26}$. We know that $d$ will divide the difference of adjacent elements completely. So let us make another array that will contain the difference of adjacent elements., $B = {8-2, 14-8, 26-14} = {6,6,12}$. Therefore, $d = gcd ( 6,6,12 ) = 6$.

Summary

  • Let the array of elements be $A = {a,b,c,d…}$.
  • $res \neq gcd(a,b,c,d…)$.
  • $res = gcd ( a-b,b-c,c-d,…)$.

Code

#include <bits/stdc++.h>
using namespace std;

typedef long long vlong;

vlong gcd ( vlong a, vlong b ) {
    while ( b ) {
        a = a % b;
        swap ( a, b );
    }
    return a;
}

vlong arr[1010];

int main () {

    while ( scanf ( "%d", &arr[0] ) != EOF ) {
        if ( arr[0] == 0 ) break; // End of test case

        // A new test case has started
        int cur = 1;

        // Take input
        while ( 1 ) {
            scanf ( "%lld", &arr[cur] );
            if ( arr[cur] == 0 ) break;
            else cur++;
        }

        vlong g = 0; // Start with 0 since gcd(0,x) = x.
        for ( int i = 1; i < cur; i++ ) {
            int dif = arr[i] - arr[i-1]; // Calculate difference
            g = gcd ( g, dif ); // Find gcd() of differences
        }

        if ( g < 0 ) g *= -1; // In case gcd() comes out negative
        printf ( "%lld\n", g );
    }

    return 0;
}