Menu

Union-Find Data Structure

API and Client

A crucial first step in developing a data structure is to define its public API. For the Union-Find data type, we need to specify the operations it will support.

Public API

public class UF {
    // Initializes a union-find data structure with N objects
    // i.e (0 to N-1)
    UF(int N)
    
    // Adds a connection between p and q
    void union(int p, int q)
    
    // Returns the component identifier for p (0 to N-1)
    int find(int p)
    
    // Returns true if p and q are in the same component
    boolean connected(int p, int q)
}

A 1-line implementation of connected() can be achieved by checking if find(p) is equal to find(q).

Example Client

Here is a sample client that uses the UF data type to solve a dynamic connectivity problem. It reads an integer N and a sequence of pairs of integers, and prints the pairs that are not yet connected.

public static void main(String[] args) {
    int N = StdIn.readInt();
    UF uf = new UF(N);
    
    while (!StdIn.isEmpty()) {
        int p = StdIn.readInt();
        int q = StdIn.readInt();
        
        if (!uf.connected(p, q)) {
            uf.union(p, q);
            StdOut.println(p + " " + q);
        }
    }
}

This client can be tested with a file like tinyUF.txt:

10
4 3
3 8
6 5
9 4
2 1
8 9
5 0
7 2
6 1
1 0
6 7

The Idea of Connected Components

The core idea behind the Union-Find data structure is to keep track of a set of objects partitioned into a number of disjoint (non-overlapping) sets, which we call connected components.

The “is connected to” relation has three key properties:

  • Reflexive: p is connected to p.
  • Symmetric: If p is connected to q, then q is connected to p.
  • Transitive: If p is connected to q and q is connected to r, then p is connected to r.

A connected component is a set of objects where every object in the set is connected to every other object in the set.


Overview

Union−Find is a data structure that supports dynamic connectivity problems. This implementation explores:

  • Quick Find
  • Quick Union
  • Weighted Quick Union
  • Weighted Quick Union with Path Compression

We’ll apply these concepts to solve the percolation problem from physical chemistry.

Quick Find (Eager Approach)

In this approach, p and q are considered connected if and only if id[p] is equal to id[q].

  • Data Structure: An integer array id[] of size N.
  • Interpretation: p and q are in the same component if they have the same ID.

Quick Union (Lazy Approach)

This method is an improvement over Quick Find, designed to make the union operation faster.

  • Data Structure: An integer array id[] of size N.
  • Interpretation: id[i] is the parent of i. The root of a component is an element i where id[i] == i.
  • Find: To find the root of p, we chase the parent pointers until we reach a root.
  • Union: To merge components containing p and q, we set the id of p’s root to the id of q’s root.

Improvement 1: Weighted Quick-Union

To avoid the problem of tall trees in Quick Union, we use a weighted approach.

  • Strategy: Always link the root of the smaller tree to the root of the larger tree.
  • Data Structure: We maintain an extra array sz[i] to count the number of objects in the tree rooted at i.
  • Implementation:
    int i = root(p);
    int j = root(q);
    if (i == j) return;
    if (sz[i] < sz[j]) { id[i] = j; sz[j] += sz[i]; } 
    else               { id[j] = i; sz[i] += sz[j]; }
    
  • Proposition: The depth of any node x is at most lg N.

Improvement 2: Path Compression

After finding the root of p, we can flatten the tree by making every other node in the path from p to the root point to its grandparent. This nearly flattens the tree.

Analysis of Tree Depth

The maximum depth of any node x is at most lg N (where lg is log base 2)

Proof

  1. When tree T1 containing x merges with T2:
    • The size of tree containing x at least doubles
    • This occurs because x’s tree is always the smaller one
  2. Depth only increases by 1 when merging with same-size tree

Depth Growth Pattern

Tree Size Maximum Depth
1 0
2 1
4 2
8 3
16 4
32 5
N lg(N)

Performance

Weighted Quick Union with Path Compression is linearithmic: O(N + M lg N)

  • N = number of objects
  • M = number of union-find operations

For detailed proof, see: Princeton’s Algorithms Course