Trees

Trees and tree traversals

view on github

Tree data structure

Table of contents

  1. Definition
  2. Tree traversal
  3. Special types of trees

Definition

  • When you think about it, all data structures are variations of graphs or trees.
  • Trees provide the basic structure for the implementation of essential software :
    • A file system is a tree.
    • The DOM is a tree.
    • Language compilers use trees (AST) to parse the source code.
  • A tree is a set of interconnected nodes with the following features :
    • Each node holds a reference to a value.
    • Each node holds a reference to its child nodes (for instance, an array of nodes to which it is connected).
    • The connection between a node and its child nodes is called an edge.
    • Each tree has a root node, ie. a node that is not referenced by any other node.

Note : The root node value is usually relevant to the organization of the tree (smallest or median value referenced by a node, etc).

  • Trees terminology :

    • Internal node : a node that has child nodes.
    • Leaf node : a node that has no child nodes.
    • Branching factor : the number of child nodes for a node.
    • Node depth : number of edges from the root node to a given node.
    • Tree height : maximum possible number of edges from the root node to any leaf node.
  • Specific types of trees :

    • Balanced tree : a tree in which all leaf nodes have the same depth.
    • General tree : a tree in which each node has zero or more child nodes.
    • Binary tree : a tree in which each node has at most 2 child nodes.
    • Binary search tree : a binary tree in which nodes follow a specific order.

Notes :

  • In the event the tree only has 1 node, the root node is a leaf node.
  • If the tree branching factor is not equal for all nodes, an average branching factor can be calculated.
  • When the maximum number of child nodes is known (ex. binary trees), a node can hold a dedicated references to each child node instead of using an array.
  • Trees algorithms are usually more efficient when the input is a balanced tree, so keeping a tree balanced is a subject in itself.
  • In some trees, nodes also maintain a parent reference (think of it as the tree equivalent of a doubly linked list)

Tree traversal

  • A tree traversal is the visit of every node in a tree and (optionally) the processing of node values.
  • Traversals are the most basic operation that can be performed on trees and are usually done using recursion.
  • Tree traversals have a complexity of O(n).

Depth-first traversal (DFS)

  • This is a type of traversal that uses recursion to explore the tree as deep as possible along each branch before backtracking.
  • Different types of depth-first traversals exist, each changing the order in which the nodes are visited.
  • PRE order traversal : visiting of the node is done before the recursive calls (the root node is visited first).
  • IN order traversal : visiting of the node is done between the recursive calls (the root node is visited mid).
  • POST order traversal : visiting of the node is done after the recursive calls (the root node is visited last).

Notes :

  • Since a binary tree has a branching factor of at most 2, a traversal function will require 2 recursive calls at each step.
  • As a result, the IN order traversal is only possible for binary trees and not for general trees.

Breadth-first traversal (BFS)

  • This is a type of traversal that uses successive iterations to explore the tree on a per-level basis, starting from the root node.
  • This type of traversal relies on an additional data structure (FIFO queue) to achieve the desired results.
  • Traversal of a tree level can be performed left to right or right to left.

Comparison of binary trees

  • DFS preserves the SHAPE of the traversal.
  • BFS DOES NOT preserve the SHAPE of the traversal.
  • As a result, DFS is the preferred traversal algorithm to use for trees comparison.

Special types of trees