123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146 |
- LC-trie implementation notes.
- Node types
- ----------
- leaf
- An end node with data. This has a copy of the relevant key, along
- with 'hlist' with routing table entries sorted by prefix length.
- See struct leaf and struct leaf_info.
- trie node or tnode
- An internal node, holding an array of child (leaf or tnode) pointers,
- indexed through a subset of the key. See Level Compression.
- A few concepts explained
- ------------------------
- Bits (tnode)
- The number of bits in the key segment used for indexing into the
- child array - the "child index". See Level Compression.
- Pos (tnode)
- The position (in the key) of the key segment used for indexing into
- the child array. See Path Compression.
- Path Compression / skipped bits
- Any given tnode is linked to from the child array of its parent, using
- a segment of the key specified by the parent's "pos" and "bits"
- In certain cases, this tnode's own "pos" will not be immediately
- adjacent to the parent (pos+bits), but there will be some bits
- in the key skipped over because they represent a single path with no
- deviations. These "skipped bits" constitute Path Compression.
- Note that the search algorithm will simply skip over these bits when
- searching, making it necessary to save the keys in the leaves to
- verify that they actually do match the key we are searching for.
- Level Compression / child arrays
- the trie is kept level balanced moving, under certain conditions, the
- children of a full child (see "full_children") up one level, so that
- instead of a pure binary tree, each internal node ("tnode") may
- contain an arbitrarily large array of links to several children.
- Conversely, a tnode with a mostly empty child array (see empty_children)
- may be "halved", having some of its children moved downwards one level,
- in order to avoid ever-increasing child arrays.
- empty_children
- the number of positions in the child array of a given tnode that are
- NULL.
- full_children
- the number of children of a given tnode that aren't path compressed.
- (in other words, they aren't NULL or leaves and their "pos" is equal
- to this tnode's "pos"+"bits").
- (The word "full" here is used more in the sense of "complete" than
- as the opposite of "empty", which might be a tad confusing.)
- Comments
- ---------
- We have tried to keep the structure of the code as close to fib_hash as
- possible to allow verification and help up reviewing.
- fib_find_node()
- A good start for understanding this code. This function implements a
- straightforward trie lookup.
- fib_insert_node()
- Inserts a new leaf node in the trie. This is bit more complicated than
- fib_find_node(). Inserting a new node means we might have to run the
- level compression algorithm on part of the trie.
- trie_leaf_remove()
- Looks up a key, deletes it and runs the level compression algorithm.
- trie_rebalance()
- The key function for the dynamic trie after any change in the trie
- it is run to optimize and reorganize. It will walk the trie upwards
- towards the root from a given tnode, doing a resize() at each step
- to implement level compression.
- resize()
- Analyzes a tnode and optimizes the child array size by either inflating
- or shrinking it repeatedly until it fulfills the criteria for optimal
- level compression. This part follows the original paper pretty closely
- and there may be some room for experimentation here.
- inflate()
- Doubles the size of the child array within a tnode. Used by resize().
- halve()
- Halves the size of the child array within a tnode - the inverse of
- inflate(). Used by resize();
- fn_trie_insert(), fn_trie_delete(), fn_trie_select_default()
- The route manipulation functions. Should conform pretty closely to the
- corresponding functions in fib_hash.
- fn_trie_flush()
- This walks the full trie (using nextleaf()) and searches for empty
- leaves which have to be removed.
- fn_trie_dump()
- Dumps the routing table ordered by prefix length. This is somewhat
- slower than the corresponding fib_hash function, as we have to walk the
- entire trie for each prefix length. In comparison, fib_hash is organized
- as one "zone"/hash per prefix length.
- Locking
- -------
- fib_lock is used for an RW-lock in the same way that this is done in fib_hash.
- However, the functions are somewhat separated for other possible locking
- scenarios. It might conceivably be possible to run trie_rebalance via RCU
- to avoid read_lock in the fn_trie_lookup() function.
- Main lookup mechanism
- ---------------------
- fn_trie_lookup() is the main lookup function.
- The lookup is in its simplest form just like fib_find_node(). We descend the
- trie, key segment by key segment, until we find a leaf. check_leaf() does
- the fib_semantic_match in the leaf's sorted prefix hlist.
- If we find a match, we are done.
- If we don't find a match, we enter prefix matching mode. The prefix length,
- starting out at the same as the key length, is reduced one step at a time,
- and we backtrack upwards through the trie trying to find a longest matching
- prefix. The goal is always to reach a leaf and get a positive result from the
- fib_semantic_match mechanism.
- Inside each tnode, the search for longest matching prefix consists of searching
- through the child array, chopping off (zeroing) the least significant "1" of
- the child index until we find a match or the child index consists of nothing but
- zeros.
- At this point we backtrack (t->stats.backtrack++) up the trie, continuing to
- chop off part of the key in order to find the longest matching prefix.
- At this point we will repeatedly descend subtries to look for a match, and there
- are some optimizations available that can provide us with "shortcuts" to avoid
- descending into dead ends. Look for "HL_OPTIMIZE" sections in the code.
- To alleviate any doubts about the correctness of the route selection process,
- a new netlink operation has been added. Look for NETLINK_FIB_LOOKUP, which
- gives userland access to fib_lookup().
|