Number of Islands
Problem: Given a 2-dimensional array with a dimension m * n
, where each cell is either 0
(water) or 1
(land), find the number of components of vertically and horizontally connected 1
s (island).
For example, given the following array, the solution must return 3 because there are 3 ‘islands.’
[
[1, 1, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 1]
]
Constraints:
- 1 <= m, n <= 500
- A cell in the given array can either be ‘0’ or ‘1’.
- You can assume that the edge of the given matrix is surrounded by
0
.
Table of contents
Initial Thought Process
To count the number of islands, we need to be able to find islands. Therefore, let us start by thinking about how we can find an island. An island is defined by a connected component of 1
s in vertical and horizontal directions. So, starting from a cell of 1
, we can keep visiting its neighbors and the their neighbors recursively as long as they are 1
. At the end, we will have traversed all connected 1
s from the starting point, which forms an island by our definition. This process hints at a depth-first traversal of a graph.
While performing a depth-first search (DFS), we will need to remember the cells that we have already visited for two reasons. Firstly, the DFS will loop if it does not know it has already visited a cell. It will move to a neighbor, move back to the starting cell, move to the neighbor, and so on. Secondly, when counting the number of islands, we will want to avoid double-counting. For the DFS, we will fix a starting cell and walk the island that it belongs. If in the future, we fix another starting cell that is a part of the island we already discovered, we will double count the island.
This observation leads to the following starting point.
|
|
Let’s not forget the validation. Below we implement validateGrid
method and call it at the beginning of the entry point.
|
|
In each recursive step of the depth-first search, we will mark the current land cell as visited, and walk its horizontal and vertical neighbors. The base cases to terminate the search would be when the cell is out of bound, the cell is water, or the cell has already been visited. In code, this logic looks like the following:
|
|
Complexity
Let M
be the number of rows in the matrix, and N
be the number of columns in the matrix. Then, the time complexity of the algorithm is O(M * N)
because we walk each cell in the matrix a constant number of times. The space complexity is O(M * N)
because because we allocate an array whose size is the same as the input array, in order to keep track of the visited cells.
Alternative Solution (Dynamic Connectivity)
In a more general sense, determining the number of islands can be seen as a “dynamic connectivity problem” which is about designing a data structure that dynamically maintains information about connected components. And there exists a data structure called Union-Find to maintain the connected components in a graph as the connectivity information changes.
While implementing the entire Union-Find data structure might be unwieldy during an interview setting with a limited time, it could be helpful to describe your thought process in how you would apply such a data structure to solve the problem. Besides, once you understand the inner workings of Union-Find, it is entirely possible to derive it from the scratch.
Union-Find
The idea behind the Union-Find data structure is to maintain an array, id
, to keep track of the component that each cell belongs to. id[i]
points to its parent in a tree a root points to itself (id[i] == i
). Each component is identified by the root. Suppose we have two cells, c1
and c2
. If they have the same root, then they belong to the same component. When we connect two cells, we can update the id of one cell to the root of another cell.
This leads to the following skeleton for UnionFind
class. It will support three main APIs to solve the problem. Firstly, union
will connect two different cells. We will use this API to connect two lands together as we walk through the matrix. Secondly, there will be a find
API to find the root of a cell, so that we can identify the component that a cell belongs to. This will come in handy when union
is called. Lastly, we will keep track of the number of connected components and return it via count
API.
|
|
The find
method can be implemented as follows. It walks the tree upwards by following the parent-link, and returns the root.
|
|
The union
method can be implemented in the following way. First of all, if two cells share the same root, there is nothing to do because they already belong to the same component. If not, we can connect two cells by updating the id
of one cell’s root to the other cell’s root. This is equivalent of moving the tree rooted in one cell’s root under the tree rooted in the other cell’s root. Then, we decrease the number of count of components because two separate components have now become one.
|
|
Now that we implemented UnionFind
, we are ready to implement the entry point for our solution. Here, we will first initialize the UnionFind
data structure. Then, we will visit each cell in the matrix, and connect any neighboring lands if the current cell is a land. At the end, UnionFind
instance will have the number of connected components, and we retrieve it by using the count
API.
|
|
Optimization
Two improvements are possible for our UnionFind
data structure. First improvement is called “path compression.” Ideally, we would like all the nodes to link directly to their root. The reason is that the runtime of union
operation could degenerate into a quadratic one, if the height of tree represented by the parent-link structure in id
keeps growing by 1. In that case, find
operation will touch the nodes in the tree 1 + 2 + ... + n
times if we call union
n number of times, leading to a quadratic number of operations.
The path compression can be implemented in one line of code in the find
API. As we walk the tree, we can update the parent-link pointer to the parent of a parent.
|
|
Another idea is called “union-by-rank.” The idea is to avoid growing the height of the tree during union
, by bringing a tree with a smaller height under a tree with larger height. We will keep track of the height of each tree in rank
array. In union
, we can check the rank
array for the height of a tree before moving a tree.
|
|
Complexity
Analyzing the time complexity of Union-Find data structure is beyond the scope of this writing. But it is nearly close to amortized constant time, with “path compression” and “union-by-rank” optimizations. The space complexity of the solution is O(M * N)
where M is the number of rows and N is the number of columns, because UnionFind
allocates space proportionally to M * N
.
Lessons Learned
- Try to visualize a problem in a graph even though the question is not explicitly stated in terms of a graph.
- For questions related to connectivity, it might be helpful to walk the cells using graph traversal.