The Lowest Common Ancestor in a Binary Tree

2024-05-25

Problem: Given a binary tree, and two different nodes p and q in the tree, find the lowest node which has both p and q as descendants.

Constraints:

A value of a node in a tree is between -10^5 and 10^5.
All values in a tree are unique.
The number of nodes in a tree is between 2 and 10^4
p and q are different nodes.
The tree is guaranteed to contain p and q.
The tree, p and q are not empty.

For example, the lowest common ancestor of 3 and 6 in the following tree is 0.

Table of contents

Initial Thought Process

Let’s start with an observation. Since the question states that the given tree always contains the nodes p and q, the root of the tree is always a common denominator. Generalizing this observation, if a subtree contains both p and q, the root of that subtree is a common denominator. The question is about finding the deepest of such a subtree.

This observation leads us to try to walk the tree using depth-first search (DFS), because we are interested in finding the deepest subtree that contains both p and q. If we find such a subtree, we can keep bubbling up the result and return it eventually.

What kind of information do we need to return after walking the subtrees during a depth-first search? Firstly, we need to return a lowest common denominator node, if one is already found, so that it can be bubbled up to the top. Secondly, we need to know whether p or q has been seen in the subtree. The reason is that, if a lowest common denominator hasn’t been found in either of the children, we would want to check whether the node itself is a lowest common denominator.

Let us define a class to hold this information. Our DFS algorithm could return an instance of this class in each recursive step.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
class Solution {
    private class Result {
        public boolean pSeen;
        public boolean qSeen;
        public TreeNode lca; // Lowest common ancestor

        public Result(boolean pSeen, boolean qSeen, TreeNode lca) {
            this.pSeen = pSeen;
            this.qSeen = qSeen;
            this.lca = lca;
        }
    }
}

And, of course, we need to model the tree data structure. From the question and its constraints, we know that a tree is a binary tree in which a node holds an integer value. In an interview setting, you should be prepared to clarify the details such as this.

1
2
3
4
5
private class TreeNode {
    int val;
    TreeNode left;
    TreeNode right;
}

Lets define two methods, findLCA and dfs. The reason is that the question requires us to return a TreeNode and our DFS algorithm will return Solution. We can answer the question using findLCA which will call the dfs helper method.

1
2
3
4
5
6
7
8
9
class Solution {
    public TreeNode findLCA(TreeNode root, TreeNode p, TreeNode q) {
        // Call DFS
    }

    private Result dfs(TreeNode root, TreeNode p, TreeNode q) {
        // Recursively walk the tree
    }
}

Let’s implement our depth-first search in the dfs method:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class Solution {
    private Result dfs(TreeNode root, TreeNode p, TreeNode q) {
        // Base case
        if (root == null) {
            return new Result(false, false, null);
        }

        // Validation
        if (root.val >= 10e9 || root.val <= -10e9) {
            throw new IllegalArgumentException("Invalid tree value");
        }

        // Walk the subtrees
        Result leftResult = dfs(root.left, p, q);
        Result rightResult = dfs(root.right, p, q);

        // Bubble up the result
        boolean pSeen = root.val == p.val || leftResult.pSeen || rightResult.pSeen;
        boolean qSeen = root.val == q.val || leftResult.qSeen || rightResult.qSeen;

        TreeNode lca = null;
        if (leftResult.lca != null) {
            lca = leftResult.lca;
        } else if (rightResult.lca != null) {
            lca = rightResult.lca;
        } else if (pSeen && qSeen) {
            lca = root;
        }

        return new Result(
            pSeen, qSeen, lca
        );
    }
}

First of all, the base case is when the current node is null. In that case we return Result(false, false, null), indicating that neither p nor q is seen, and the lowest common ancestor is not found. Then, we validate that the value of the node is between the constraint specified the question. We can do other kinds of validations by walking the tree, but let’s not go into the implementation here. In an interview setting, you can explain that you will revisit the idea later as an improvement.

Then, we walk the subtrees first, and examine the root of the tree (“post-order traversal”) because we need to know the results of the subtrees first before we can conclude whether the root is a lowest common ancestor. After walking the subtrees, we can determine whether p or q is seen in the subtree rooted in the current node (pSeen, qSeen). As for the lowest common ancestor, we bubble it up if one was already found in the left or right subtrees. Or, we make the current root the lowest common ancestor if one has not already been found, and p and q are seen in the subtree rooted in the current root.

Let’s call the dfs helper method from the main method to return the lowest common ancestor.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
class Solution
    public TreeNode findLCA(TreeNode root, TreeNode p, TreeNode q) {
        // Validations
        if (root == null) {
            throw new IllegalArgumentException("The tree cannot be empty.");
        }
        if (p == q) {
            throw new IllegalArgumentException(
                "Nodes p and q cannot be the same."
            );
        }

        Result result = dfs(root, p, q);
        if (result.lca == null) {
            throw new IllegalArgumentException(
                "Unexpected: there must be a lowest common ancestor."
            );
        }

        return result.lca;
    }
}

Complexity

Let N be the number of nodes in the tree.

The time complexity is O(N) because the algorithm needs to visit all nodes in the tree once.
The space complexity is O(N) because the call stack of the DFS algorithm will grow proportionally to N in case a tree is completely skewed.

Simplification (Without an Extra Class)

Since the given tree is guaranteed to have p and q, it is possible to remove the helper method. The intuition is as follows:

If the current node is null, return null (Base case).
If both left and right subtrees return a node, return the current node. It must be the lowest common ancestor.
Otherwise, if the current node is either p or q, return the current node. Condition (2) will eventually use this information.
Otherwise, if one of left or right subtrees return a node, return it. Condition (2) will eventually use this information.

Note that this strategy only works because the given tree is guaranteed to contain both p and q. Suppose that the tree only contained p, but not q. Then, it will return p as the final answer and will be incorrect.

This strategy leads to the following implementation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class Solution {
    public TreeNode findLCA(TreeNode root, TreeNode p, TreeNode q) {
        // (Condition 1) Base case
        if (root == null) {
            return null;
        }

        TreeNode left = findLCA(root.left, p, q);
        TreeNode right = findLCA(root.right, p, q);

        // (Condition 2) Lowest common ancestor
        if (left != null && right != null) {
            return root;
        }

        // (Condition 3) The current root is p or q.
        if (root.val == p.val || root.val == q.val) {
            return root;
        }

        // (Condition 4) Bubble up any returned nodes.
        if (left != null) {
            return left;
        }

        return right;
    }
}

While the solution is indeed simpler without an extra helper method, a trade-off is that it is not practical to implement some validation. For example, if we wanted to validate that p and q are not equal, it would be awkward to do that in the findLCA method itself because we will be doing that for every recursion. For this reason, in real life, it is more practical to have a separate entry point and a helper.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
class Solution {
    public TreeNode findLCA(TreeNode root, TreeNode p, TreeNode q) {
        if (root == null) {
            throw new IllegalArgumentException("The tree cannot be empty.");
        }
        if (p == q) {
            throw new IllegalArgumentException(
                "Nodes p and q cannot be the same."
            );
        }

        TreeNode lca = dfs(root, p, q);
        if (lca == null) {
            throw new IllegalArgumentException(
                "Unexpected: there must be a lowest common ancestor."
            );
        }

        return lca;
    }

    private TreeNode dfs(TreeNode root, TreeNode p, TreeNode q) {
        // (Condition 1) Base case
        if (root == null) {
            return null;
        }

        TreeNode left = dfs(root.left, p, q);
        TreeNode right = dfs(root.right, p, q);

        // (Condition 2) Lowest common ancestor
        if (left != null && right != null) {
            return root;
        }

        // (Condition 3) The current root is p or q.
        if (root.val == p.val || root.val == q.val) {
            return root;
        }

        // (Condition 4) Bubble up any returned nodes.
        if (left != null) {
            return left;
        }

        return right;
    }
}

Complexity

The time and space complexity remain unchanged from the initial implementation.

Time complexity: O(N)
Space complexity: O(N)

Additional validation

The question specifies a number of constraints, and it is a important to handle invalid inputs. Let’s walk the tree once in the beginning to validate that the input tree satisfies the constraints.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class Solution {
    private void validateTree(TreeNode root) {
        if (root == null) {
            throw new IllegalArgumentException("The tree cannot be empty.");
        }

        Set<Integer> seen = new HashSet();
        validateTreeHelper(root, seen);

        if (seen.length() > 10e4 || seen.length() < 2) {
            throw new IllegalArgumentException(
                "The tree has an invalid number of nodes."
            );
        }
    }

    private void validateTreeHelper(TreeNode root, Set<Integer> seen) {
        if (root == null) {
            return;
        }

        if (root.val >= 10e9 || root.val <= -10e9) {
            throw new IllegalArgumentException("Invalid tree value");
        }

        if (seen.contains(root.val)) {
            throw new IllegalArgumentException(
                String.format(
                    "The value must be unique but a dulpicate exists: '%d'.",
                    root.val
                )
            );
        }

    }

    public TreeNode findLCA(TreeNode root, TreeNode p, TreeNode q) {
        validateTree(root);
        // ...
    }
}

Although we walk the tree an additional time, and allocate a new Set data structure, the time and space complexity remain unchanged.

Follow-ups

What if the tree might not contain p or q?

If the tree is not guaranteed to contain p or q, we need to be more explicit about what we return in a DFS step. Specifically, we need to follow the original approach of defining an extra class Result to keep track of pSeen, qSeen, and lca from the subtrees. If we follow the approach in the “Simplification” section, the algorithm will return p as an answer if q does not exist in a tree.

What If the Tree May Contain Duplicates?

When the tree contains duplicates, both the original and simplified approaches work fine. In both approaches, we keep track of the first lowest common ancestor we find and bubble it up. If another common ancestor is found higher up in the tree, we will not consider it as the answer.

Lessons Learned

Create a data class to keep track of multiple states in recursion.
You might be able to simplify the solution by eliminating the need to define a data class based on some assumptions. Clarify the assumptions and define a strategy with some bullet points.
From the beginning of the interview, clarify the constraints of the tree and model the data structure and validation accordingly.