About Nested Set Trees
When I learned programming, I was intensively taught how to deal with data structures in programming languages like C, C++, Ada, Modula, … When I tried to represent hierachical structures (tree structures) with databases, I realized that different representations and algorithms are more appropriate.
A web search reveiled an interesting approach to represent tree structures with databases which is based on nested sets. The basic idea is that each subtree is recursively represented as a subset of it’s ancestor node. Each set is represented by a pair of integer values: the left (lower) value and the right (upper) value. The correct numbering is achieved by incrementally tagging the nodes while traversion the tree in modified preorder. See the image at the right for an example, and Gijs Van Tulder’s article for a more detailed description.
|Advantages of Nested Set Model||Drawbacks of Nested Set Model|
- Author: Rolf Brugger (my web site)
- Contributors: Nick L�thi
- License: the nstrees library is available under the terms fo the GNU public license GPL.
- Download:Version 0.02 of April ’05 (10kB)
- new function nstBreadcrumbsString contributed by Nick L�thi
- new function nstIsRoot
Version 0.01 of March ’04 (9kB)
About this Implementation nstrees
How to use nstrees
There are two methods, how database records of a table T can be organized in trees using nstrees:
- Extending the table T with two colums of type integer for the left and right values. This approach is usually more effient, because it requires less database accesses.
- Creating a new table with 3 attributes: two integer columns for the left and right values, and one for the id that relates to the id which is a unique key in the table T. This approach is usually more flexible.
A tree is represented by a tree handle. This is an associative array where the necessary informations are stored to handle a tree: The attribute names for the left and right values, and the table name, where they are stored. The tree handle must be passed as first argument to all functions in the nstress library.
Here’s a typical initialization code fragment where a tree handle $thandle is created:
include "nstrees.php"; $dbh = mysql_connect("db-host", "user-name", "your-passwd"); $thandle['table'] = "tree1"; $thandle['lvalname'] = "lft"; $thandle['rvalname'] = "rgt";
A node is represented as an associative array with the left and right values of a tree node. In order to use the library, it is not necessary to know the internal representation of a node datastructure. ( If you insist to know it: it’s an associative array with the left (key:’l’) and right (key:’r’) value of the referenced node.)
The functions do not have hidden variables or side effects. You can simultaneously manage an arbitrary number of trees.
The names of the functions should fairly well describe their behaviour. To descibe the neigborhood relations of tree nodes, typical terms are used. The image at the right shows the relation terms from the perspective of the blue node.
Tree Constructors and Destructors
These functions create a new node. They all return a reference to the new node.
function nstNewRoot ($thandle, $othercols) function nstNewFirstChild ($thandle, $node, $othercols) function nstNewLastChild ($thandle, $node, $othercols) function nstNewPrevSibling ($thandle, $node, $othercols) function nstNewNextSibling ($thandle, $node, $othercols)
These functions delete a single node or an entire tree. They also remove the deleted node records from the database.
function nstDeleteTree ($thandle) function nstDelete ($thandle, $node)
$root = nstNewRoot($thandle, "id=1,name='root'"); $child = nstNewFirstChild($thandle, $root, "name='2'"); nstNewPrevSibling($thandle, $child2, "name='1'"); nstDelete($thandle, nstGetNodeWhere($thandle, "name='2'") );
Move the node ‘$src’ and all its children (the entire subtree) that it is the next sibling (or previous sibling, or first/last child) of ‘$dst’. All nstMove… functions return the new position of the moved subtree.
function nstMoveToNextSibling ($thandle, $src, $dst) function nstMoveToPrevSibling ($thandle, $src, $dst) function nstMoveToFirstChild ($thandle, $src, $dst) function nstMoveToLastChild ($thandle, $src, $dst)
The following functions return
- a valid node (L and R-value),
- or L=0,R=0 if the result doesn’t exist.
function nstGetNodeWhere ($thandle, $whereclause)
Returns the first node that matches the ‘$whereclause’. The WHERE-caluse can optionally contain ORDER BY or LIMIT clauses too.
function nstGetNodeWhereLeft ($thandle, $leftval) function nstGetNodeWhereRight ($thandle, $rightval) function nstRoot ($thandle) function nstFirstChild ($thandle, $node) function nstLastChild ($thandle, $node) function nstPrevSibling ($thandle, $node) function nstNextSibling ($thandle, $node) function nstAncestor ($thandle, $node)
The following functions return a boolean value:
function nstValidNode ($thandle, $node) function nstHasAncestor ($thandle, $node) function nstHasPrevSibling ($thandle, $node) function nstHasNextSibling ($thandle, $node) function nstHasChildren ($thandle, $node) function nstIsRoot ($thandle, $node) function nstIsLeaf ($thandle, $node) function nstIsChild ($node1, $node2) function nstIsChildOrEqual ($node1, $node2) function nstEqual ($node1, $node2)
The following functions return an integer value:
function nstNbChildren ($thandle, $node) function nstLevel ($thandle, $node)
Initialize preorder walk and returns a walk handle:
function nstWalkPreorder ($thandle, $node)
Move to next node and return attributes of that node
function nstWalkNext($thandle, &$walkhand) function nstWalkAttribute($thandle, $walkhand, $attribute) function nstWalkCurrent($thandle, $walkhand) function nstWalkLevel($thandle, $walkhand)
function nstNodeAttribute ($thandle, $node, $attribute) /* returns the attribute of the specified node */ function nstPrintSubtree ($thandle, $node, $attributes) function nstPrintTree ($thandle, $attributes) function nstBreadcrumbsString ($thandle, $node) /* returns a breadcrums string. Ex: "root > a-node > another-node > current-node" */
Important! Things to Remember
- In general, all node references become invalid after a tree reorganization (insertion/deletion of a node, moving subtrees etc.). The reason is, that tree nodes need to be globally renumbered, even if the tree is modified only locally. Using invalid node references in some functions can cause incoherent left and right values and hence damage an entire tree!As a general rule, all required node references should be refreshed after tree modifications, i.e. using the function nstGetNodeWhere, or using the node references that are returned by the tree modification functions.
SQL Lesson: nested set model by Joe Celko
Homepage of Joe Celko
Article: Storing Hierarchical Data in a Database by Gijs Van Tulder.