## Prufer Code: Linear Representation of a Labeled Tree

I guess this is going to be my first post (apart from the contest analysis’) which is not about Number Theory! It’s not about graph either, even though the title has “Tree” in it. This post is actually about Combinatorics.

Prufer code, in my opinion, is one of the most underrated algorithms I have learned. It has a wide range of applications, yet, very few people seem to know about it. It’s not even a hard algorithm! It’s very easy and you should understand it intuitively.

In this post, we will be discussing what is Prufer Code and its conversion to and from a tree. On next post, we will look into some of its application.

# Prufer Code: What is it?

Every labeled tree of $N$ nodes can be uniquely represented as an array of $N-2$ integers and vice versa

The linear array representation of a labeled tree is called Prufer Code. In case you are wondering what’s a labeled tree, it’s just a tree with distinguishable nodes, i.e, each node is marked such that they can be distinguished from one another.

Okay, before we move on to how to covert a labeled tree into Prufer Code and vice versa, let me tell you how I came to know about Prufer Code. If you are a problem setter, the story will be helpful to you.

# How I came to know about Prufer Code

So a few years ago, I used to create challenges for HackerRank as a “Challenge Creator”. So one day, I created a challenge involving tree. Since the problem had a tree in it, I had to create some random trees for the data set. I used the following algorithm to create the random trees:

1. Start with node 1. Initially, there is only node 1 by itself. Totally independent.
2. Now, for each node x from 2 to N:
a. Node 1 to x-1 has already been processed. They are now connected as a tree.
b. Choose a node randomly from node 1 to x-1 and connect node x with it.


I thought this was a pretty reasonable process for getting myself a tree. So I submitted the challenge for testing. Back then, Wanbo used to test all my challenges. He no longer works at HR now. Anyways, as a tester, it was Wanbo’s duty to write bad solutions to test my challenge. He looked into the dataset and realized that my data set wasn’t truly random. He quickly figured out a pattern and wrote a heuristical solution that got AC. So he sent my challenge back to me and asked to improve the dataset.

I started to analyze my tree generator again. Why is it not random? What’s wrong with it? After some thinking, I realized that in the pseudocode above, node number 1 has a higher chance of getting a child. As a result, trees created by that generator had higher degrees for nodes with a lower index. This can’t be called a random tree generator.

Once I realized my mistake, I notified Wanbo about it. He then suggested me to google Prufer Code.

And that’s how I came to know about Prufer code. After learning the Prufer Code, creating random trees became trivial. All I had to do is create a random array of length $N-2$ and then convert it to a labeled tree. It was truly random without any pattern.

Creating random trees is just one of the application of Prufer Code. There is more.

Let us see how to find the Prufer Code of a given tree.

# Convert a Labeled Tree into Prufer Code

The process of converting a labeled tree into its Prufer Code is very simple. The pseudocode is given below:

1. Find the smallest leaf node of the given tree. Let it be x. Let the neighbor of node x be y.
2. Write down the value of y on a piece of paper.
3. Remove node x from the tree.
4. Repeat step 1 to 3 until there are only 2 nodes remaining.
5. Once you are done, there will be a sequence of numbers on the piece of paper. That's your Prufer Code.


Here, the white nodes are non-leaf nodes, the green nodes are leaf nodes and the blue nodes are the smallest leaf node for that tree.

So for the tree given on picture, the Prufer Code is $1, 7, 6, 6, 1$. It’s simple right?

Before we move on, we will also note down few properties of Prufer Code:

1. If a node has degree $d$, then that node will appear in prufer code exactly $d-1$ times.
2. Leaves never appear in Prufer Code.

If you are still having difficulty in understanding, have a look at this video.

## Code for Converting Tree into Prufer Code

We are just going to convert the idea we discussed above into code. Nothing fancy.

Complexity: $O(VlogV)$

I have tested the above code with some hand generated test cases. I would have felt much more confident if I could have gotten AC in some problem from an online judge. But unfortunately, I never found any such problem. The other way around is more common.

# Convert a Prufer Code into Labeled Tree

So given a Prufer Code, how do we convert it into a tree? The algorithm is just reverse of what we have done before.

1. Find the node numbers who are missing in the Prufer Code. They are the leaves of the tree since leaves never make it to Prufer Code. Let L be the set of leaf nodes.
2. Select the smallest leaf from L and connect it to the first element of Prufer Code.
3. Remove the first element from Prufer Code. Let the removed element be X.
4. Check if X is still present in the Prufer code or not. If it has disappeared, then X has become a leaf itself. Add it in the set L.
5. Repeat step 2 to 4 until the full sequence disappears.
6. Once the process ends, you will have a connected labeled tree.


I am not going to draw a picture again. Too much effort. Just watch the following video if you are still confused.

## Code for Converting Prufer Code to Tree

We will convert the process we discussed above into code.

Complexity: $O(VlogV)$

Converting Prufer Code to Tree is much more common than the other way around. There is even a problem on Timus on this conversion: Timus 1069. Prufer Code. I believe you now have enough knowledge to solve Timus 1069.

# Conclusion

We now know how to convert a labeled tree into Prufer Code and vice versa. Not only we know the process, we even know how to code them.

On next post (Application of Prufer Code: Random Tree Generation, Cayley’s Formula and Building Tree from Degree Count), we will look into some of its common applications and solve some interesting problems related to Prufer Code. Make sure you understand the process clearly before proceeding.

If you enjoyed reading the post, then don’t forget to share it with your friends.

# Next Posts

## Chinese Remainder Theorem Part 2 – Non Coprime Moduli

As promised on the last post, today we are going to discuss the “Strong Form” of Chinese Remainder Theorem, i.e, what do we do when the moduli in the congruence equations are not pairwise coprime. The solution is quite similar to the one we have already discussed in the previous post, so hopefully, it will be a lot easier to understand.

### Prerequisite

If you haven’t read my previous post, you can find it here: Chinese Remainder Theorem Part 1. I strongly suggest you read that one before proceeding onwards.

# Problem

Given two sequences of numbers $A = [a_1, a_2, \dots, a_n]$ and $M = [m_1, m_2, \dots, m_n]$, find the smallest solution to the following linear congruence equations if it exists:

$$x \equiv a_1 \text{(mod } m_1) \\ x \equiv a_2 \text{(mod } m_2) \\ \dots \\ x \equiv a_n \text{(mod } m_n)$$

Note: The elements of $M$ are not pairwise coprime

# Working With Two Equations

Before we go on to deal with $n$ equations, let us first see how to deal with two equations. Just like before, we will try to merge two equations into one.

So given the two equation $x \equiv a_1 \text{(mod }m_1)$ and $x \equiv a_2 \text{(mod }m_2)$, find the smallest solution to $x$ if it exists.

## Existance of Solution

Suppose, $gcd(m_1,m_2) = g$. In that case, the following constraint must be satisfied: $a_1 \equiv a_2 \text{(mod }g)$, otherwise there does not exist any solution to $x$. Why is that?

We know that, $x \equiv a_1 \text{(mod }m_1)$
or, $x – a_1 \equiv 0 \text{(mod }m_1)$
or, $m_1 | x – a_1$

Since $m_1$ divides $x – a_1$, any divisor of $m_1$ also divides $x – a_1$, including $g$.

$\therefore x – a_1 \equiv 0 \text{(mod }g)$.

Similarly, we can show that, $x – a_2 \equiv 0 \text{(mod }g)$ is also true.

$\therefore x – a_1 \equiv x – a_2 \text{(mod }g)$
or, $a_1 \equiv a_2 \text{(mod }g)$
or, $a_1 – a_2 \equiv \text{(mod }g)$
or $g | (a_1 – a_2)$

Therefore, $g$ must divide $a_1 – a_2$, otherwise, there is conflict and no solution exists. From this point, we assume that $g | a_1 – a_2$.

## Finding Solution

Just like last time, we use Bezout’s identity to find a solution to $x$. From Bezout’s identity, we know that there exists a solution to the following equations:

$m_1p + m_2q = g$

Since both side is divisible by $g$, we can rewrite it as:

$$\frac{m_1}{g}p + \frac{m_2}{g}q = 1$$

Using Extended Euclidean Algorithm, we can easily find values of $p$ and $q$. Simply call the function ext_gcd(m1/g, m2/g, p, q) and it should be calculated automatically.

Once we know the value of $p$ and $q$, we can claim that the solution to $x$ is the following:

$$x = a_1\frac{m_2}{g}q + a_2\frac{m_1}{g}p$$

Why is that? Look below.

$$x = a_1\frac{m_2}{g}q + a_2\frac{m_1}{g}p \\ \text{From Bezout’s Identity, we know that } \frac{m_1}{g}p = 1 – \frac{m_2}{g}q \\ \text{so, } x = a_1\frac{m_2}{g}q + a_2(1 – \frac{m_2}{g}q) \\ \text{or, } x = a_2 + (a_1 – a_2)\frac{m_2}{g}q \\ \text{Since, } g | (a_1 – a_2) \\ x = a_2 + \frac{a_1 – a_2}{g}m_2q \\ \therefore x \equiv a_2 \text{(mod }m_2) \\ \text{Similarly, } x \equiv a_1 \text{(mod }m_1)$$

Hence, $x = a_1\frac{m_1}{g}q + a_2\frac{m_2}{g}p$ is a valid solution. It may not be the smallest solution though.

## Finding Smallest Solution

Let $x_1$ and $x_2$ be adjacent two solutions. Then the following are true:

$x_1 \equiv a_1 \text{(mod }m_1)$
$x_2 \equiv a_1 \text{(mod }m_1)$
$x_1 \equiv x_2 \text{(mod }m_1)$
$x_1 – x_2 \equiv 0 \text{(mod }m_1)$
$m_1 | x_1 – x_2$

Similarly, $m_2 | x_1 – x_2$

Since both $m_1$ and $m_2$ divides $x_1 – x_2$, we can say that $LCM(m_1, m_2)$ also divides $x_1-x_2$.

Since any two adjacent solution of $x$ is divisible by $L = LCM(m_1,m_2)$, then there cannot be more than one solution that lies on the range $0$ to $L-1$. Hence, the following is the smallest solution to $x$:

$$x \equiv a_1\frac{m_2}{g}q + a_2\frac{m_1}{g}p \text{(mod }LCM(m_1,m_2))$$

# Working with $n$ Equations

When there are $n$ equations, we simply merge two equations at a time until there is only one left. The last remaining equation is our desired answer. If at any point, if we are unable to merge two equations due to conflict, i.e, $a_i \not\equiv a_j \text{ (mod }GCD(m_i, m_j))$, then there is no solution.

# Chinese Remainder Theorem: Strong Form

Given two sequences of numbers $A = [a_1, a_2, \dots, a_n]$ and $M = [m_1, m_2, \dots, m_n]$, a solution to $x$ exists for the following $n$ congrunce equations:

$$x \equiv a_1 \text{(mod } m_1) \\ x \equiv a_2 \text{(mod } m_2) \\ \dots \\ x \equiv a_n \text{(mod } m_n)$$

if, $a_i \equiv a_j \text{ (mod }GCD(m_i,m_j))$ and the solution will be unique modulo $L = LCM(m_1, m_2,\dots, m_n)$.

# Code

The code is almost same as weak form, with slight changes in few lines.

/** Works for non-coprime moduli.
Returns {-1,-1} if solution does not exist or input is invalid.
Otherwise, returns {x,L}, where x is the solution unique to mod L
*/
pair<int, int> chinese_remainder_theorem( vector<int> A, vector<int> M ) {
if(A.size() != M.size()) return {-1,-1}; /** Invalid input*/

int n = A.size();

int a1 = A[0];
int m1 = M[0];
/** Initially x = a_0 (mod m_0)*/

/** Merge the solution with remaining equations */
for ( int i = 1; i < n; i++ ) {
int a2 = A[i];
int m2 = M[i];

int g = __gcd(m1, m2);
if ( a1 % g != a2 % g ) return {-1,-1}; /** No solution exists*/

/** Merge the two equations*/
int p, q;
ext_gcd(m1/g, m2/g, &p, &q);

int mod = m1 / g * m2; /** LCM of m1 and m2*/

/** We need to be careful about overflow, but I did not bother about overflow here to keep the code simple.*/
int x = (a1*(m2/g)*q + a2*(m1/g)*p) % mod;

/** Merged equation*/
a1 = x;
if (a1 < 0) a1 += mod; /** Result is not suppose to be negative*/
m1 = mod;
}
return {a1, m1};
}


Once again, please note that if you use int data type, then there is a risk of overflow. In order to keep the code simple I used int, but you obviously need to modify it to suit your needs. You can have a look at my CRT template on github from here: github/forthright48 - chinese_remainder_theorem.cpp

Complexity: $O(n \times log(L))$.

# Conclusion

And we are done. We can now solve congruence equations even when the moduli are not pairwise coprime.

The first time I learned Chinese Remainder Theorem, I had to read a paper. The paper was well written, but I never actually managed to understand how it worked. Therefore, I always feared Chinese Remainder Theorem and used it as a black box. Until last week, I didn't even know that CRT worked with moduli that are not coprime. I really enjoyed learning how it works and all the related proof. Hopefully, you enjoyed it too.

# Resources

1. forthright48 - Chinese Remainder Theorem Part 1 https://forthright48.com/2017/11/15/chinese-remainder-theorem-part-1-coprime-moduli/
2. Wiki - Chinese Remainder Theorem - https://en.wikipedia.org/wiki/Chinese_remainder_theorem
3. Brilliant.org - Chinese Remainder Theorem - https://brilliant.org/wiki/chinese-remainder-theorem/
4. Cut The Knot - Chinese Remainder Theorem - https://www.cut-the-knot.org/blue/chinese.shtml

# Related Problems

## Chinese Remainder Theorem Part 1 – Coprime Moduli

Second part of the series can be found on: Chinese Remainder Theorem Part 2 – Non Coprime Moduli

Wow. It has been two years since I published my last post. Time sure flies by quickly. I have been thinking about resuming writing again for a while now. Took me long enough to get back into the mood.

I am really excited about today’s post. I have been meaning to write on Chinese Remainder Theorem (CRT for short) for over two years now. Hopefully, everyone will find it interesting and easy to understand.

Here are the pre-requisite that you need to complete for understanding the post properly:

# An Old Woman Goes to Market

I remember reading the following story the first time I tried learning about CRT. It’s quite interesting in my opinion. Kind of brings out the essence of CRT in a simple to understand fashion.

An old woman goes to market and horse steps on her basket and crashes the eggs. The rider offers to pay the damages and asks her how many eggs she had brought. She does not remember the exact number, but when she had taken them out two at a time, there was one egg left. The same happened when she picked 2,3,4,5,6 at a time, but when she took seven at a time they came out even. What is the smallest number of eggs she could have had?

Hopefully, everyone understood the story above. Let me now rephrase it in mathematical notations.

Suppose, the old woman had $x$ eggs in her basket. Now she claims that when she took out eggs from the basket $2$ at a time, there was only $1$ egg remaining, meaning, $x \equiv 1 \text{(mod 2)}$. So basically, she is giving us various congruence equations. From her statement, we get the following linear congruence equations:

$$x \equiv 1 \text{(mod 2)} \\ x \equiv 1 \text{(mod 3)} \\ x \equiv 1 \text{(mod 4)} \\ x \equiv 1 \text{(mod 5)} \\ x \equiv 1 \text{(mod 6)} \\ x \equiv 2 \text{(mod 7)}$$

Now, we need to find the smallest value of $x$ which satisfies all of the above congruence equations. Chinese Remainder Theorem helps us in finding the value of $x$.

Before we move on, let us redefine the problem formally.

# Problem: Formal Definition

Given two sequences of numbers $A = [a_1, a_2, \dots, a_n]$ and $M = [m_1, m_2, \dots, m_n]$, find the smallest solution to the following linear congruence equations if it exists:

$$x \equiv a_1 \text{(mod } m_1) \\ x \equiv a_2 \text{(mod } m_2) \\ \dots \\ x \equiv a_n \text{(mod } m_n)$$

# Chinese Remainder Theorem: Weak Form

As mentioned before, we can use Chinese Remainder Theorem to solve the above problem described. On this part of the CRT series, we will look into a weaker form of Chinese Remainder Theorem, which is easier to understand and occurs more frequently.

The weak Chinese Remainder Theorem states the following:

Given two sequences of numbers $A = [a_1, a_2, \dots, a_n]$ and $M = [m_1, m_2, \dots, m_n]$, where all elements of $M$ are pairwise coprime, there always exists an unique solution to $x$ mod $L$, where $L = m_1 \times m_2 \times \dots \times m_n$, such that $x$ satisfies the following linear congruence equations:

$$x \equiv a_1 \text{(mod } m_1) \\ x \equiv a_2 \text{(mod } m_2) \\ \dots \\ x \equiv a_n \text{(mod } m_n)$$

So the weak form of Chinese Remainder Theorem has a constraint: members of the array $M$ must be pairwise coprime. What do we mean by that? This means that $\text{GCD}(m_i,m_j) = 1$ when $i \neq j$.

As long as this condition is satisfied, the weak form of CRT state that a solution to $x$ always exists which is “unique mod $L$”. Let us see an example to understand what it means to be “unique mod $L$”.

## Weak Form of CRT: Example

Suppose we are given the following three congruence equations:

$$x \equiv 3 (\text{mod } 5) \\ x \equiv 2 (\text{mod } 7) \\ x \equiv 2 (\text{mod } 8)$$

Here we have $A=[3,2,2]$, $M=[5,7,8]$ and $L = 5 \times 7 \times 8 = 280$. Using Weak form of Chinese Remainder Theorem, we can find that $x \equiv 58 \text{(mod } 280)$. How did we find this? We will see that later.

Before you continue, please verify yourself that this indeed satisfies the given congruence. Also, $x = 58$ is the smallest solution and there exists more than one solution. $x = 58 + 280 \times k$, where $k$ is any non-negative integer, also satisfies the equations. But as we said before, $x$ is unique when you mod all the solutions with $L$. This is what it means to have a solution unique to mod $L$.

## Weak Form of CRT: Finding a Solution

We are first going to see how to solve when there are just two equations. Once we know how to solve for two equations, we will then generalize the method for $n$ equations.

### When there are just two equations

Let us first just consider two equation: $x \equiv a_1 \text{(mod }m_1)$ and $x \equiv a_2 \text{(mod }m_2)$. What we are going to do is merge these two equations into one equation. Here is how it works.

Since $gcd(m_1,m_2) = 1$, from Bezout’s Identity, we know that there exists a solution to the following equation:

$$m_1p + m_2q = 1 \tag{1} \label{1}$$

Using Extended Euclidean Algorithm, we can find the value of $p$ and $q$. If we know the value of $p$ and $q$, then we can say that:
$x = a_1 m_2 q + a_2 m_1 p \text{ (mod }m_1m_2) tag{2}$

See below to understand why so.

$x = a_1 m_2 q + a_2 m_1 p$
$x = a_1 (1 – m_1 p) + a_2 m_1 p$ (Using $\eqref{1}$)
$x = a_1 – a_1 m_1 p + a_2 m_1 p$
$x = a_1 + (a_2 – a_1) m_1 p$
$\therefore x \equiv a_1 \text{(mod }m_1)$

Similarly, $x = a_1 m_2 q + a_2 m_1 p \equiv a_2 \text{(mod } m_2)$

Great! We now have a possible solution for $x$, but why did we mod the solution with $m_1m_2$? The solution we found may or may not be the smallest. Like we have seen before, there could be infinite solutions, but there exists a solution which is unique to mod $m_1 \times m_2$. How so? I guess this is the perfect time to look into its proof.

#### Proof of Uniqueness

Since there could be infinite solutions, let $x_1$ and $x_2$ be two such solution. Hence we can say the following:

$x_1 \equiv a_1 \text{(mod } m_1)$
$x_2 \equiv a_1 \text{(mod } m_1)$
$\therefore x_1 \equiv x_2 \text{(mod } m_1)$
$x_1 – x_2 \equiv 0 \text{(mod } m_1)$
$\therefore m_1 | x_1 – x_2$

Similarly, we can show that $m_2 | x_1 – x_2$.

What can we do with all these information? The difference of any two solutions $x_1$ and $x_2$ is divisible by both $m_1$ and $m_2$. We also know that, $m_1$ and $m_2$ are coprime. Doesn’t that mean, the difference is also divisible by $m_1 \times m_2$? Yes. That’s exactly what we are going towards

$m_1m_2 | x_1 – x_2$
$x_1 – x_2 \equiv 0 \text{(mod } m_1m_2)$
$x_1 \equiv x_2 \text{(mod } m_1m_2)$

The next question I will ask is: if the difference between any two solutions, $x_1$ and $x_2$, is divisible by $m_1m_2$, then how many solution can there exist in the range $0$ to $m_1m_2 – 1$? Only one. And hence, the solution $x$ is unique to modulo $m_1m_2$.

Therefore we can say that the solution $x \equiv a_1 m_2 q + a_2 m_1 p \text{ (mod }m_1m_2)$ is unique and smallest. Hence, the two equation, $x \equiv a_1 \text{(mod }m_1)$ and $x \equiv a_2 \text{(mod }m_2)$, after merging becomes $x \equiv a_1 m_2 q + a_2 m_1 p \text{ (mod }m_1m_2)$

### When there are $n$ equations

Since we know how to merge two equations into one, all we need to do is take the first two equations from $n$ equations and merge them. Then we are left with $n-1$ equations. Since all the elements of $M$ were pairwise coprime, the new modulus is also coprime. Hence, we can continue to merge the equations until there is only one equation left. The equation left is our answer.

## Weak form of CRT: Code

The following code implements the idea we discussed above. Note that I used int data type, but in most of the problems you will most likely require long long. I am sure you can adjust the code accordingly yourself.


/** Return {-1,-1} if invalid input.
Otherwise, returns {x,L}, where x is the solution unique to mod L
*/
pair<int, int> chinese_remainder_theorem( vector<int> A, vector<int> M ) {
if(A.size() != M.size()) return {-1,-1}; /** Invalid input*/

int n = A.size();

int a1 = A[0];
int m1 = M[0];
/** Initially x = a_0 (mod m_0)*/

/** Merge the solution with remaining equations */
for ( int i = 1; i < n; i++ ) {
int a2 = A[i];
int m2 = M[i];

/** Merge the two equations*/
int p, q;
ext_gcd(m1, m2, &p, &q);

/** We need to be careful about overflow, but I did not bother about overflow here to keep the code simple.*/
int x = (a1*m2*q + a2*m1*p) % (m1*m2);

/** Merged equation*/
a1 = x;
m1 = m1 * m2;
}
if (a1 < 0) a1 += m1; /** Result is not suppose to be negative*/
return {a1, m1};
}


Once again, please be careful about overflow when solving a problem. From my experience, I have seen that the value of $p$ and $q$ becomes large quickly and intermediate calculations no longer fit into long long variables. I use __int128 data type to avoid overflow issues. So if you get “Wrong Answer”, it is most likely due to overflow.

Complexity: $O(n \times log(L))$

# Conclusion

With this, we are done with Weak Form of Chinese Remainder Theorem. We can now find a solution to congruence equations when the moduli are co-prime. On next post, we will see a stronger version of Chinese Remainder Theorem, where the moduli are not co-prime.

Edit: The next post can be found here: Chinese Remainder Theorem Part 2 – Non Coprime Moduli

# Resources

1. Wiki – Chinese Remainder Theorem – https://en.wikipedia.org/wiki/Chinese_remainder_theorem
2. Brilliant.org – Chinese Remainder Theorem – https://brilliant.org/wiki/chinese-remainder-theorem/
3. Cut The Knot – Chinese Remainder Theorem – https://www.cut-the-knot.org/blue/chinese.shtml

# Related Problems

That’s all I could find on CRT. If you know about any other related problem, please feel free to add it as a comment.