Number of Islands

2024-06-15

Problem: Given a 2-dimensional array with a dimension m * n, where each cell is either 0 (water) or 1 (land), find the number of components of vertically and horizontally connected 1s (island).

For example, given the following array, the solution must return 3 because there are 3 ‘islands.’

[
    [1, 1, 0, 0, 0],
    [1, 1, 0, 0, 0],
    [0, 0, 1, 0, 0],
    [0, 0, 0, 1, 1]
]

Constraints:

1 <= m, n <= 500
A cell in the given array can either be ‘0’ or ‘1’.
You can assume that the edge of the given matrix is surrounded by 0.

Table of contents

Initial Thought Process

To count the number of islands, we need to be able to find islands. Therefore, let us start by thinking about how we can find an island. An island is defined by a connected component of 1s in vertical and horizontal directions. So, starting from a cell of 1, we can keep visiting its neighbors and the their neighbors recursively as long as they are 1. At the end, we will have traversed all connected 1s from the starting point, which forms an island by our definition. This process hints at a depth-first traversal of a graph.

While performing a depth-first search (DFS), we will need to remember the cells that we have already visited for two reasons. Firstly, the DFS will loop if it does not know it has already visited a cell. It will move to a neighbor, move back to the starting cell, move to the neighbor, and so on. Secondly, when counting the number of islands, we will want to avoid double-counting. For the DFS, we will fix a starting cell and walk the island that it belongs. If in the future, we fix another starting cell that is a part of the island we already discovered, we will double count the island.

This observation leads to the following starting point.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class Solution {
    public int numIslands(char[][] grid) {
        int m = grid.length;
        int n = grid[0].length;

        boolean[][] visited = new boolean[m][n];
        int ans = 0;

        for (int i = 0; i < m; i++) {
            for (int j = 0; j < n; j++) {
                // Fix a starting cell.
                char cell = grid[i][j];

                if (cell == '0' || visited[i][j]) {
                    continue;
                }

                // If the cell has not been visited, and is a land
                // then, walk the island and mark the lands as visited.
                dfs(grid, visited, i, j);
                ans++;
            }
        }

        return ans;
    }
}

Let’s not forget the validation. Below we implement validateGrid method and call it at the beginning of the entry point.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class Solution {
    public int numIslands(char[][] grid) {
        validateGrid(grid);

        // ...
    }

    private void validateGrid(char[][] grid) {
        int m = grid.length;
        if (m < 0 || m > 300) {
            throw new IllegalArgumentException("The length of the grid is not valid.");
        }

        int n = grid[0].length;

        for (int i = 0; i < m; i++) {
            int rowLength = grid[i].length;

            if (rowLength != n) {
                throw new IllegalArgumentException("The input has an inconsistent row length.");
            }

            for (int j = 0; j < rowLength; j++) {
                char cell = grid[i][j];
                if (cell != '0' && cell != '1') {
                    throw new IllegalArgumentException(String.format("A cell has an invalid character '%c'", cell));
                }
            }
        }
    }
}

In each recursive step of the depth-first search, we will mark the current land cell as visited, and walk its horizontal and vertical neighbors. The base cases to terminate the search would be when the cell is out of bound, the cell is water, or the cell has already been visited. In code, this logic looks like the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class Solution {
    private void dfs(char[][] grid, boolean[][] visited, int i, int j) {
        int m = grid.length;
        int n = grid[0].length;

        // Out of bound
        if (i < 0 || i >= m || j < 0 || j >= n) {
            return;
        }

        char cell = grid[i][j];

        // If the cell is a water, or has already been visited
        // there is no need to continue.
        if (cell == '0' || visited[i][j]) {
            return;
        }

        // Mark as visited
        visited[i][j] = true;

        // Traverse the grid 4-ways. 
        dfs(grid, visited, i - 1, j);
        dfs(grid, visited, i + 1, j);
        dfs(grid, visited, i, j - 1);
        dfs(grid, visited, i, j + 1);
    }
}

Complexity

Let M be the number of rows in the matrix, and N be the number of columns in the matrix. Then, the time complexity of the algorithm is O(M * N) because we walk each cell in the matrix a constant number of times. The space complexity is O(M * N) because because we allocate an array whose size is the same as the input array, in order to keep track of the visited cells.

Alternative Solution (Dynamic Connectivity)

In a more general sense, determining the number of islands can be seen as a “dynamic connectivity problem” which is about designing a data structure that dynamically maintains information about connected components. And there exists a data structure called Union-Find to maintain the connected components in a graph as the connectivity information changes.

While implementing the entire Union-Find data structure might be unwieldy during an interview setting with a limited time, it could be helpful to describe your thought process in how you would apply such a data structure to solve the problem. Besides, once you understand the inner workings of Union-Find, it is entirely possible to derive it from the scratch.

Union-Find

The idea behind the Union-Find data structure is to maintain an array, id, to keep track of the component that each cell belongs to. id[i] points to its parent in a tree a root points to itself (id[i] == i). Each component is identified by the root. Suppose we have two cells, c1 and c2. If they have the same root, then they belong to the same component. When we connect two cells, we can update the id of one cell to the root of another cell.

This leads to the following skeleton for UnionFind class. It will support three main APIs to solve the problem. Firstly, union will connect two different cells. We will use this API to connect two lands together as we walk through the matrix. Secondly, there will be a find API to find the root of a cell, so that we can identify the component that a cell belongs to. This will come in handy when union is called. Lastly, we will keep track of the number of connected components and return it via count API.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
class Solution {
    private class UnionFind {
        private int n; // number of rows
        private int m; // number of columns
        private int[] id;
        private int count;

        public UnionFind(char[][] grid) {
            this.n = grid.length;
            this.m = grid[0].length;
            this.count = 0;

            // id[i] is a parent-link that points to the parent in a tree.
            // id[i] == i if i is the root.
            this.id = new int[n * m];

            for (int i = 0; i < n; i++) {
                for (int j = 0; j < m; j++) {
                    char cell = grid[i][j];

                    if (cell == '1') {
                        count++;

                        int idx = toIndex(i, j);
                        id[idx] = idx;
                    }
                }
            }
        }

        private int toIndex(int i, int j) {
            return i * m + j;
        }

        public void union(int p1, int q1, int p2, int q2) {
            // TODO
        }

        public int find(int cell) {
            // TODO
        }

        public int count() {
            return count;
        }
    }
}

The find method can be implemented as follows. It walks the tree upwards by following the parent-link, and returns the root.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class Solution {
    private class UnionFind {
        public int find(int cell) {
            int cur = cell;

            // Walk the tree upwards until reaching the root.
            while (cur != id[cur]) {
                cur = id[cur];
            }

            return cur;
        }
    }
}

The union method can be implemented in the following way. First of all, if two cells share the same root, there is nothing to do because they already belong to the same component. If not, we can connect two cells by updating the id of one cell’s root to the other cell’s root. This is equivalent of moving the tree rooted in one cell’s root under the tree rooted in the other cell’s root. Then, we decrease the number of count of components because two separate components have now become one.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
class Solution {
    private class UnionFind {
        public void union(int p1, int q1, int p2, int q2) {
            int cell1 = toIndex(p1, q1);
            int cell2 = toIndex(p2, q2);

            int pRoot = find(cell1);
            int qRoot = find(cell2);

            // If p and q belong to the same component, there is nothing to do.
            if (pRoot == qRoot) {
                return;
            }

            id[pRoot] = qRoot;

            count--;
        }
    }
}

Now that we implemented UnionFind, we are ready to implement the entry point for our solution. Here, we will first initialize the UnionFind data structure. Then, we will visit each cell in the matrix, and connect any neighboring lands if the current cell is a land. At the end, UnionFind instance will have the number of connected components, and we retrieve it by using the count API.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
class Solution {
    public int numIslands(char[][] grid) {
        // Validate the grid. Same as the validation from the DFS solution.
        validateGrid(grid);

        int n = grid.length;
        int m = grid[0].length;
    
        UnionFind uf = new UnionFind(grid);

        for (int i = 0; i < n; i++) {
            for (int j = 0; j < m; j++) {
                char c = grid[i][j];

                if (c == '0') {
                    continue;
                }

                unionIfConnected(uf, grid, i, j, i - 1, j);
                unionIfConnected(uf, grid, i, j, i + 1, j);
                unionIfConnected(uf, grid, i, j, i, j - 1);
                unionIfConnected(uf, grid, i, j, i, j + 1);
            }
        }

        return uf.count();
    }

    private void unionIfConnected(
        UnionFind uf, char[][] grid, int p1, int q1, int p2, int q2
    ) {
        if (isLand(grid, p1, q1) && isLand(grid, p2, q2)) {
            uf.union(p1, q1, p2, q2);
        }
    }

    private boolean isLand(char[][] grid, int i, int j) {
        int n = grid.length;
        int m = grid[0].length;

        if (i < 0 || i > n - 1 || j < 0 || j > m - 1) {
            return false;
        }

        return grid[i][j] == '1';
    }
}

Optimization

Two improvements are possible for our UnionFind data structure. First improvement is called “path compression.” Ideally, we would like all the nodes to link directly to their root. The reason is that the runtime of union operation could degenerate into a quadratic one, if the height of tree represented by the parent-link structure in id keeps growing by 1. In that case, find operation will touch the nodes in the tree 1 + 2 + ... + n times if we call union n number of times, leading to a quadratic number of operations.

The path compression can be implemented in one line of code in the find API. As we walk the tree, we can update the parent-link pointer to the parent of a parent.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class Solution {
    private class UnionFind {
        public int find(int cell) {
            int cur = cell;

            // Walk the tree upwards until reaching the root.
            while (cur != id[cur]) {
                // Path compression: if we are not at the root, point to the parent.
                id[cur] = id[id[cur]];
                cur = id[cur];
            }

            return cur;
        }
    }
}

Another idea is called “union-by-rank.” The idea is to avoid growing the height of the tree during union, by bringing a tree with a smaller height under a tree with larger height. We will keep track of the height of each tree in rank array. In union, we can check the rank array for the height of a tree before moving a tree.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class Solution {
    private class UnionFind {
        private int[] rank;

        public void union(int p1, int q1, int p2, int q2) {
            int cell1 = toIndex(p1, q1);
            int cell2 = toIndex(p2, q2);

            int pRoot = find(cell1);
            int qRoot = find(cell2);

            // If p and q belong to the same component, there is nothing to do.
            if (pRoot == qRoot) {
                return;
            }

            // Move smaller tree under the larger tree.
            if (rank[pRoot] < rank[qRoot]) {
                id[pRoot] = qRoot;
            } else if (rank[pRoot] > rank[qRoot]) {
                id[qRoot] = pRoot;
            } else {   
                id[pRoot] = qRoot;
                rank[pRoot]++;
            }

            count--;
        }
    }
}

Complexity

Analyzing the time complexity of Union-Find data structure is beyond the scope of this writing. But it is nearly close to amortized constant time, with “path compression” and “union-by-rank” optimizations. The space complexity of the solution is O(M * N) where M is the number of rows and N is the number of columns, because UnionFind allocates space proportionally to M * N.

Lessons Learned

Try to visualize a problem in a graph even though the question is not explicitly stated in terms of a graph.
For questions related to connectivity, it might be helpful to walk the cells using graph traversal.