API and Client
A crucial first step in developing a data structure is to define its public API. For the Union-Find data type, we need to specify the operations it will support.
Public API
public class UF {
// Initializes a union-find data structure with N objects
// i.e (0 to N-1)
UF(int N)
// Adds a connection between p and q
void union(int p, int q)
// Returns the component identifier for p (0 to N-1)
int find(int p)
// Returns true if p and q are in the same component
boolean connected(int p, int q)
}
A 1-line implementation of connected() can be achieved by checking if find(p) is equal to find(q).
Example Client
Here is a sample client that uses the UF data type to solve a dynamic connectivity problem. It reads an integer N and a sequence of pairs of integers, and prints the pairs that are not yet connected.
public static void main(String[] args) {
int N = StdIn.readInt();
UF uf = new UF(N);
while (!StdIn.isEmpty()) {
int p = StdIn.readInt();
int q = StdIn.readInt();
if (!uf.connected(p, q)) {
uf.union(p, q);
StdOut.println(p + " " + q);
}
}
}
This client can be tested with a file like tinyUF.txt:
10
4 3
3 8
6 5
9 4
2 1
8 9
5 0
7 2
6 1
1 0
6 7
The Idea of Connected Components
The core idea behind the Union-Find data structure is to keep track of a set of objects partitioned into a number of disjoint (non-overlapping) sets, which we call connected components.
The “is connected to” relation has three key properties:
- Reflexive:
pis connected top. - Symmetric: If
pis connected toq, thenqis connected top. - Transitive: If
pis connected toqandqis connected tor, thenpis connected tor.
A connected component is a set of objects where every object in the set is connected to every other object in the set.
Overview
Union−Find is a data structure that supports dynamic connectivity problems. This implementation explores:
- Quick Find
- Quick Union
- Weighted Quick Union
- Weighted Quick Union with Path Compression
We’ll apply these concepts to solve the percolation problem from physical chemistry.
Quick Find (Eager Approach)
In this approach, p and q are considered connected if and only if id[p] is equal to id[q].
- Data Structure: An integer array
id[]of sizeN. - Interpretation:
pandqare in the same component if they have the same ID.
Quick Union (Lazy Approach)
This method is an improvement over Quick Find, designed to make the union operation faster.
- Data Structure: An integer array
id[]of sizeN. - Interpretation:
id[i]is the parent ofi. The root of a component is an elementiwhereid[i] == i. - Find: To find the root of
p, we chase the parent pointers until we reach a root. - Union: To merge components containing
pandq, we set theidofp’s root to theidofq’s root.
Improvement 1: Weighted Quick-Union
To avoid the problem of tall trees in Quick Union, we use a weighted approach.
- Strategy: Always link the root of the smaller tree to the root of the larger tree.
- Data Structure: We maintain an extra array
sz[i]to count the number of objects in the tree rooted ati. - Implementation:
int i = root(p); int j = root(q); if (i == j) return; if (sz[i] < sz[j]) { id[i] = j; sz[j] += sz[i]; } else { id[j] = i; sz[i] += sz[j]; } - Proposition: The depth of any node
xis at mostlg N.
Improvement 2: Path Compression
After finding the root of p, we can flatten the tree by making every other node in the path from p to the root point to its grandparent. This nearly flattens the tree.
Analysis of Tree Depth
The maximum depth of any node x is at most lg N (where lg is log base 2)
Proof
- When tree T1 containing x merges with T2:
- The size of tree containing x at least doubles
- This occurs because x’s tree is always the smaller one
- Depth only increases by 1 when merging with same-size tree
Depth Growth Pattern
| Tree Size | Maximum Depth |
|---|---|
| 1 | 0 |
| 2 | 1 |
| 4 | 2 |
| 8 | 3 |
| 16 | 4 |
| 32 | 5 |
| N | lg(N) |
Performance
Weighted Quick Union with Path Compression is linearithmic: O(N + M lg N)
- N = number of objects
- M = number of union-find operations
For detailed proof, see: Princeton’s Algorithms Course
