The Lowest Common Ancestor in a Binary Tree
Problem: Given a binary tree, and two different nodes p
and q
in the tree, find the lowest node which has both p
and q
as descendants.
Constraints:
- A value of a node in a tree is between -10^5 and 10^5.
- All values in a tree are unique.
- The number of nodes in a tree is between 2 and 10^4
p
andq
are different nodes.- The tree is guaranteed to contain
p
andq
. - The tree,
p
andq
are not empty.
For example, the lowest common ancestor of 3 and 6 in the following tree is 0.
2
/ \
1 0
/ \ / \
5 8 9 6
/ \
4 3
Table of contents
Initial Thought Process
Let’s start with an observation. Since the question states that the given tree always contains the nodes p
and q
, the root of the tree is always a common denominator. Generalizing this observation, if a subtree contains both p
and q
, the root of that subtree is a common denominator. The question is about finding the deepest of such a subtree.
This observation leads us to try to walk the tree using depth-first search (DFS), because we are interested in finding the deepest subtree that contains both p
and q
. If we find such a subtree, we can keep bubbling up the result and return it eventually.
What kind of information do we need to return after walking the subtrees during a depth-first search? Firstly, we need to return a lowest common denominator node, if one is already found, so that it can be bubbled up to the top. Secondly, we need to know whether p
or q
has been seen in the subtree. The reason is that, if a lowest common denominator hasn’t been found in either of the children, we would want to check whether the node itself is a lowest common denominator.
Let us define a class to hold this information. Our DFS algorithm could return an instance of this class in each recursive step.
|
|
And, of course, we need to model the tree data structure. From the question and its constraints, we know that a tree is a binary tree in which a node holds an integer value. In an interview setting, you should be prepared to clarify the details such as this.
|
|
Lets define two methods, findLCA
and dfs
. The reason is that the question requires us to return a TreeNode
and our DFS algorithm will return Solution
. We can answer the question using findLCA
which will call the dfs
helper method.
|
|
Let’s implement our depth-first search in the dfs
method:
|
|
First of all, the base case is when the current node is null
. In that case we return Result(false, false, null)
, indicating that neither p
nor q
is seen, and the lowest common ancestor is not found. Then, we validate that the value of the node is between the constraint specified the question. We can do other kinds of validations by walking the tree, but let’s not go into the implementation here. In an interview setting, you can explain that you will revisit the idea later as an improvement.
Then, we walk the subtrees first, and examine the root of the tree (“post-order traversal”) because we need to know the results of the subtrees first before we can conclude whether the root is a lowest common ancestor. After walking the subtrees, we can determine whether p
or q
is seen in the subtree rooted in the current node (pSeen
, qSeen
). As for the lowest common ancestor, we bubble it up if one was already found in the left or right subtrees. Or, we make the current root the lowest common ancestor if one has not already been found, and p
and q
are seen in the subtree rooted in the current root.
Let’s call the dfs
helper method from the main method to return the lowest common ancestor.
|
|
Complexity
Let N
be the number of nodes in the tree.
- The time complexity is
O(N)
because the algorithm needs to visit all nodes in the tree once. - The space complexity is
O(N)
because the call stack of the DFS algorithm will grow proportionally toN
in case a tree is completely skewed.
Simplification (Without an Extra Class)
Since the given tree is guaranteed to have p
and q
, it is possible to remove the helper method. The intuition is as follows:
- If the current node is
null
, returnnull
(Base case). - If both left and right subtrees return a node, return the current node. It must be the lowest common ancestor.
- Otherwise, if the current node is either
p
orq
, return the current node. Condition (2) will eventually use this information. - Otherwise, if one of left or right subtrees return a node, return it. Condition (2) will eventually use this information.
Note that this strategy only works because the given tree is guaranteed to contain both p
and q
. Suppose that the tree only contained p
, but not q
. Then, it will return p
as the final answer and will be incorrect.
This strategy leads to the following implementation.
|
|
While the solution is indeed simpler without an extra helper method, a trade-off is that it is not practical to implement some validation. For example, if we wanted to validate that p
and q
are not equal, it would be awkward to do that in the findLCA
method itself because we will be doing that for every recursion. For this reason, in real life, it is more practical to have a separate entry point and a helper.
|
|
Complexity
The time and space complexity remain unchanged from the initial implementation.
- Time complexity:
O(N)
- Space complexity:
O(N)
Additional validation
The question specifies a number of constraints, and it is a important to handle invalid inputs. Let’s walk the tree once in the beginning to validate that the input tree satisfies the constraints.
|
|
Although we walk the tree an additional time, and allocate a new Set
data structure, the time and space complexity remain unchanged.
Follow-ups
What if the tree might not contain p or q?
If the tree is not guaranteed to contain p
or q
, we need to be more explicit about what we return in a DFS step. Specifically, we need to follow the original approach of defining an extra class Result
to keep track of pSeen
, qSeen
, and lca
from the subtrees. If we follow the approach in the “Simplification” section, the algorithm will return p
as an answer if q
does not exist in a tree.
What If the Tree May Contain Duplicates?
When the tree contains duplicates, both the original and simplified approaches work fine. In both approaches, we keep track of the first lowest common ancestor we find and bubble it up. If another common ancestor is found higher up in the tree, we will not consider it as the answer.
Lessons Learned
- Create a data class to keep track of multiple states in recursion.
- You might be able to simplify the solution by eliminating the need to define a data class based on some assumptions. Clarify the assumptions and define a strategy with some bullet points.
- From the beginning of the interview, clarify the constraints of the tree and model the data structure and validation accordingly.