Trees
Trees and tree traversals

- When you think about it, all data structures are variations of graphs or trees.
- Trees provide the basic structure for the implementation of essential software :
- A file system is a tree.
- The DOM is a tree.
- Language compilers use trees (AST) to parse the source code.
- A tree is a set of interconnected nodes with the following features :
- Each node holds a reference to a value.
- Each node holds a reference to its child nodes (for instance, an array of nodes to which it is connected).
- The connection between a node and its child nodes is called an edge.
- Each tree has a root node, ie. a node that is not referenced by any other node.
Note : The root node value is usually relevant to the organization of the tree (smallest or median value referenced by a node, etc).
-
- Internal node : a node that has child nodes.
- Leaf node : a node that has no child nodes.
- Branching factor : the number of child nodes for a node.
- Node depth : number of edges from the root node to a given node.
- Tree height : maximum possible number of edges from the root node to any leaf node.
-
Specific types of trees :
- Balanced tree : a tree in which all leaf nodes have the same depth.
- General tree : a tree in which each node has zero or more child nodes.
- Binary tree : a tree in which each node has at most 2 child nodes.
- Binary search tree : a binary tree in which nodes follow a specific order.
Notes :
- In the event the tree only has 1 node, the root node is a leaf node.
- If the tree branching factor is not equal for all nodes, an average branching factor can be calculated.
- When the maximum number of child nodes is known (ex. binary trees), a node can hold a dedicated references to each child node instead of using an array.
- Trees algorithms are usually more efficient when the input is a balanced tree, so keeping a tree balanced is a subject in itself.
- In some trees, nodes also maintain a parent reference (think of it as the tree equivalent of a doubly linked list)
- A tree traversal is the visit of every node in a tree and (optionally) the processing of node values.
- Traversals are the most basic operation that can be performed on trees and are usually done using recursion.
- Tree traversals have a complexity of
O(n)
.
- This is a type of traversal that uses recursion to explore the tree as deep as possible along each branch before backtracking.
- Different types of depth-first traversals exist, each changing the order in which the nodes are visited.
- PRE order traversal : visiting of the node is done before the recursive calls (the root node is visited first).
- IN order traversal : visiting of the node is done between the recursive calls (the root node is visited mid).
- POST order traversal : visiting of the node is done after the recursive calls (the root node is visited last).
Notes :
- Since a binary tree has a branching factor of at most 2, a traversal function will require 2 recursive calls at each step.
- As a result, the IN order traversal is only possible for binary trees and not for general trees.
- This is a type of traversal that uses successive iterations to explore the tree on a per-level basis, starting from the root node.
- This type of traversal relies on an additional data structure (FIFO queue) to achieve the desired results.
- Traversal of a tree level can be performed left to right or right to left.
- DFS preserves the SHAPE of the traversal.
- BFS DOES NOT preserve the SHAPE of the traversal.
- As a result, DFS is the preferred traversal algorithm to use for trees comparison.