NAME

XRCU is a library that provides efficient lock-less synchronization for read-mostly tasks and structures

ABOUT THIS DOCUMENT

This file is meant to document the XRCU library, including internal details and the rationale behind the decisions taken.

This document assumes that the reader is at least somewhat familiar with such concepts as multi-threading, lock-free algorithms and data structures, and read-copy-update (RCU).

ABOUT LOCK FREE ALGORITHMS AND STRUCTURES

The term lock free is used to refer to several different things depending on the author, but the basic idea is that an algorithm or structure is lock free if multiple threads can operate on it without the end result being dependent on said threads' scheduling. That is, it doesn't matter if the involved threads are suspended or preempted; progress is guaranteed.

The usually mentioned advantage of lock freedom is performance. When you are using lock free algorithms, you naturally avoid using things like mutexes, which can have a non-negligible performance impact. On the other hand, other people point out that lock free algorithms are typically harder to understand, that debugging is notoriously difficult, and that the benefits generally do not compensate for these complexities.

The reason why this library was developed is to give other programmers another option when designing high performant, low latency systems. We do not claim that lock freedom is a silver bullet when it comes to making programs faster. However, it's also hard to argue that lock contention tends to be a real bottleneck, and several projects have moved towards lock free structures given the performance benefits they bring (see for example, many kernels).

ABOUT RCU

There is a big problem with using lock free algorithms in programming languages that feature manual memory management, and it has to do with memory reclamation. Since lock freedom implies that multiple threads may operate on a given structure concurrently, it is possible to find ourselves in a situation in which a thread is reading some data, while another one is deleting that same piece of data, potentially freeing the memory associated to it.

As such, lock freedom needs an additional twist in languages like C and C++. We need an additional subsystem that allows us to synchronize reclamation without using heavyweight methods like mutexes (we are developing a lock-free library, after all).

This is where RCU comes in. RCU, short for read-copy-update, is a mechanism that allows multiple threads to read from memory, while postponing reclamations until it is safe to do so (i.e: until no readers are operating on said memory). RCU typically assumes that updates are performed using atomic instructions, and isn't concerned with them (At least, that's the case in this library).

There are several different ways to implement RCU, and the one chosen in this library will be detailed later. For now, it's important to point out that RCU typically makes reading from shared memory a very cheap operation, while imposing more overhead on updates. Thus, it's more suited for tasks in which reads are more frequent than modifications.

It should be noted that RCU is not the only way to manage memory reclamations in a multi-threaded environment, but as its name implies, XRCU chose it because it was deemed the best option. Since our lock-free structures rely on RCU to work properly, we'll describe the RCU interface that this library exposes first.

API design

All public interfaces (functions, types, methods, etc) reside in the namespace xrcu. It's the explicit intent of the authors to mantain compatibility with every interface that was ever exposed. Any breaking changes should be done in different interfaces (either by adding new functions, or by overloading).

The implementation of some template types require additional, internal details, and those usually reside in a namespace appropriately named detail (Said namespace is inside xrcu). It goes without saying that users should not rely on those internal details, and that the authors may freely break compatibility with them.

XRCU uses some thread-specific data in order to function properly. There may be some systems that have issues when dynamically loading libraries that use thread-specific data (I believe Windows Vista was among them). Consult with your working environment's manual to verify that if you notice any problem.

Finally, it should be mentioned that XRCU requires no initialization or cleanup at a library level to function properly. Once the library is installed, you can freely use its API without any further action needed.

Note that XRCU may expose more functions or types that are specified in this document. This is usually done to avoid code bloat. Anything that isn't included in this file should be treated as private and subject to change.

RCU API

  #include <xrcu/xrcu.hpp>

The following section documents the RCU API that is exposed in XRCU. It's used internally quite a bit by other parts of the library, yet it's very useful by itself when implementing other concurrent algorithms or data structures.

RCU critical section management

RCU is based on the concept of critical sections, code fragments that are executed under the guarantee that no reclamation can take place. This allows the user to safely read from shared memory, knowing that it will be valid during the critical section. Writes have to be synchronized independently of RCU, however.

Naturally, RCU critical sections don't prevent all memory reclamation, they only affect the destruction of objects that are derived from a type defined in XRCU, called finalizable. If you wish to use the RCU API, all you need to do is make your types derive from finalizable and use its API, and you'll be set. Any other memory management functions, like malloc and friends are unaffected by RCU as implemented in this library.

Critical sections are managed through the following API:

void enter_cs (void);: Makes the calling thread enter a critical section. Critical sections can be safely nested without problem. As long as we are in one, finalizable objects cannot be reclaimed.
void exit_cs (void);: Exits a critical section. This is generally called after the calling thread is done reading from shared memory, and wants to signal that reclamations should be re-allowed. The effects of calling exit_cs when the calling thread is not in a critical section are undefined.
bool in_cs (void);: Returns true if the calling thread is in a critical section; false otherwise.
bool sync (void);: Waits until all threads are outside pre-existing critical sections, and returns true afterwards. If a deadlock is detected (because the calling thread is in a critical section, for example), this function returns false immediately without waiting.
struct cs_guard: This type is defined such that its constructor calls enter_cs, and its destructor calls exit_cs; it has no internal state. As its name implies, it's useful as a guard to manage entering and exiting a critical section in a way that is exception safe. A few types in this library are derived from cs_guard, such as iterators, since they need to examine potentially many elements from a container without their memory being reclaimed.

RCU finalizable objects

As it was mentioned before, XRCU defines a type called finalizable that is specifically designed to make its destruction safe (i.e: Only once all threads are outside a critical section). This type defines the following interface:

  struct finalizable
    {
      virtual void safe_destroy ();
      virtual ~finalizable ();
    };

Under most circumstances, it's enough for a user-defined type to derive from finalizable and leave it at that. However, the above 2 methods are provided as virtual for customization's sake. When a finalizable object is reclaimed by the RCU subsystem, it will call the safe_destroy method. The default implementation simply calls the object's destructor and frees the memory associated to it. If, for whatever reason, a user wants to override this behaviour, they may do so by extending either of those methods.

In addition, the following API is available when dealing with finalizables:

void finalize (finalizable *F);

Adds the object F to the calling thread's list of pending finalizable objects. Each thread has a limit on the number of pending finalizables. Once that limit is reached, they are scheduled for reclamation, and will be collected once it's safe to do so.

Only a single call to finalize is allowed on a particular object. If this function is called more than once on the same object (Either by the same thread, or another), the behaviour is undefined.

bool flush_finalizers (void);

Schedule all the calling thread's accumulated finalizable objects for reclamation immediately. Returns true if successful, false if a deadlock was detected. Note that this call may block until all threads are outside a critical section (Much as a call to sync would).

Once a thread exits, an implicit call to this function is made.

Miscellaneous functions

These functions don't really belong anywhere else, but they are included in this file for convenience's sake:

    struct atfork
      {
        void (*prepare) (void);
        void (*parent) (void);
        void (*child) (void);
      };

atfork atfork_data ();: Returns a structure that implements the needed callbacks for XRCU to mantain a consistent state across calls to fork. The returned callbacks should be passed to pthread_atfork (or similar). XRCU itself does not install these callbacks, because most calls to fork are followed immediately after exec, and because multi-threaded programs that call fork are rare.
void library_version (int& major, int& minor);: Returns the library version as a pair of major, minor. Useful to assert that a program is using the correct version (at runtime).

Implementation details

There are many ways to implement RCU. In this library, it was fundamental that RCU be implemented in a portable, simple way, even if it meant slightly more overhead. As such, things like signals and OS-specific syscalls were out.

In the end, for XRCU, we decided to use a global registry that uses a stamp to monitor which threads are in critical sections. By using C++ thread_local objects, we avoid the need for explicit registration, so that when a thread first uses the RCU API, it is added automatically to the global registry.

In order to enter a critical section, all a thread has to do is read a global counter, usually bump it by some small value, and then store that value with release semantics in its thread-specific data. Exiting a critical section is almost entirely symmetric (We decrement the thread-specific value), but with a small caveat that will be explained below.

When an object is finalized, it's prepended to a singly-linked list that is also kept in thread-specific storage. Once a certain number of them have been accumulated (specified by the constant XRCU_MAX_FINS), they are scheduled to be reclaimed. However, if the calling thread is inside a critical section at that point, a special flag is set instead, that tells the thread to immediately flush its finalizable objects once it's outside the critical section.

In this implementation, the most expensive operation is undoubtedly sync. It works by locking the global registry, then checking if any thread is in a critical section, and sleeping for short periods of time in case there are. The overhead associated to sync is the main reason why critical sections should be short, and also why finalizable objects are accumulated instead of being reclaimed right away.

Interlude: optionals

    #include <xrcu/optional.hpp>

Before moving on to the topic of containers, we need to take a look at an auxiliary structure implemented to make things easier and more convenient for users. This is the optional type.

An optional is a template type that can hold either an object of type T, or be in an uninitialized state. In other words, the value may or may not be present. Unlike the usage of a pointer which may be null, an optional never performs dynamic memory allocation, as the space required to hold the value is always there, even if it's not being used.

Optional types have been standardized in C++17, and are present in many other libraries as well. However, since XRCU aims to work with the base minimum of C++11 compliant compilers, and because it's designed not to depend on other libraries or frameworks, it contains its own (lightweight) implementation.

Optional API

An optional is defined as such:

    template <class T>
    struct optional
      {
      };

And its public interface is the following:

optional ();: Default constructor. Initializes the optional without a value.
optional (const T& value);: Initializes the optional to contain value.
optional (const optional<T>& other);: Copy constructor. If other has a value, then it initializes the optional to contain that same value. Otherwise, the optional will contain no value.
optional (T&& value);: Move constructor. Initializes the optional by taking ownership of value.
optional (optional<T>&& other);: Move constructor. If other has a value, then the optional is initialized by taking ownership of it. Otherwise, the optional will have no value.
T& operator* ();
const T& operator* () const;: Returns a (possibly const) reference to the optional's value. The results are undefined if the optional does not contain one.
T* operator-> ();
const T* operator-> () const;: Returns a (possibly const) pointer to the optional's value. The results are undefined if the optional does not contain one.
bool has_value () const;: Returns true if the optional has a value.
void reset () const;: If the optional contains a value, call its destructor and make the optional contain no value afterwards. Otherwise, there are no effects.
optional& operator= (const optional<T>& other);: If other has a value, assigns it to the optional. Otherwise, calls reset on the optional and leaves it without a value. Returns *this.
optional& operator= (const T& value);: Assigns value to the optional. Returns *this.
optional& operator= (optional<T>&& other);: If other has a value, assigns it to the optional by moving it. Otherwise, calls reset on the optional and leaves it without a value. Returns *this.
optional& operator= (T&& value);: Assigns value to the optional by moving it. Returns *this.
~optional ();: Destroys the value associated to the optional, if it had any.

Implementation details

Optionals are rather simple: They are implemented by using a flat buffer on which placement new is called. Once an object has been constructed, a pointer is cached for the buffer. This pointer will be used when fetching the value, or destroying the object (it's null for empty optionals).

Optionals are very useful when performing lookups in mapped containers, since they allow us to bypass the need of additional output parameters to determine if a search was successful in an atomic way.

Another reason as to why optional values are used so extensively in XRCU is because we consider them to be the best way to signal both a valid object, as well as an error. When using lock-less data structures, it's impossible to determine beforehand whether an error will occur when calling a method (because multithreading makes things non deterministic). Throwing an exception was considered a bit too "punishing" - it's not precisely a programming error, after all.

Stacks

  #include <xrcu/stack.hpp>

Stacks are the simplest of the lock-free data structures: They represent a basic LIFO container on which you can push and pop items at the back of the stack. Although their functionality is a bit more limited than other structures, they are still very useful to implement things like atomic free lists.

In XRCU, stacks meet the requirements for the C++ concept of Container.

Stack API

Stacks are templated types that can be instantiated with a type T:

  template <class T>
  struct stack
    {
      typedef T value_type;
      typedef T& reference;
      typedef const T& const_reference;
      typedef T* pointer;
      typedef const T* const_pointer;
      typedef ptrdiff_t difference_type;
      typedef size_t size_type;

      struct iterator;
      struct const_iterator;
    };

In C++'s parlance, a stack iterator is a forward iterator, and it may be used in any function that accepts them.

A stack iterator is a cs_guard, which means that an implicit critical section is entered when one is created, and is extended until its destruction.

The following describes the public interface for stack<T>

stack ();: Default constructor. Initializes a stack to be empty.
template <class Integer> stack (Integer n, T value);: Initializes a stack to contain n times value.
template <class Iter> stack (Iter first, Iter last);: Initializes a stack with the values in the range [first, last)
stack (std::initializer_list<T> lst);: Initializes a stack with the values in lst.
stack (const stack<T>& other);: Copy constructor. Initializes a stack with the values in other.
stack (stack<T>&& other);: Move constructor. Takes ownership of the values in other.
void push (const T& value);: Pushes value to the top of the stack.
template <class Iter> void push (Iter first, Iter last): Pushes the values in the range [first, last) to the stack.
template <class Integer> void push (Integer n, T value): Pushes n times value to the stack.
template<class ...Args> void emplace (Args&& ...args): Same as push, only it constructs a new item by calling the move constructor with args....
optional<T> pop ();: Removes the item at the top of the stack, and returns an optional with that value. If the stack is empty, the method returns an optional with no value.
optional<T> top ();: Fetches the current item at the top of the stack, and returns an optional with it as its value. If the stack is empty, returns an optional with no value.
size_type size () const;: Returns the current size of the stack.
size_type max_size () const;: Returns the maximum allowed size for the stack.
bool empty () const;: Returns true if the stack is empty (equivalent to size () == 0).
void swap (stack<T>& other);: Swaps the contents of the stack with other, but only if other is not the same object.
stack<T>& operator= (const stack<T>& other);: Assigns the contents of other to the stack and returns *this.
stack<T>& operator= (stack<T>&& other);: Move assignment. Takes ownership of the elements in other. Returns *this.
bool operator== (const stack<T>& other) const;: Compares the elements of the stack with the ones in other. Returns true if they are all equal.
bool operator!= (const stack<T>& other) const;: Compares the elements of the stack with the ones in other. Returns true if any two of them are not equal.
bool operator< (const stack<T>& other) const;
bool operator> (const stack<T>& other) const;
bool operator<= (const stack<T>& other) const;
bool operator>= (const stack<T>& other) const;: Lexicographically compares the elements of the stack with the ones in other, in a way that is equivalent to calling std::lexicographical_compare.
void clear ();: Removes every element from the stack.
template <class T1, class T2> void assign (T1 x, T2 y);: Assigns to the stack the elements described by (x, y). They could be an iterator range, or a pair of (integer, value), as is the case with the stack's constructor.
void assign (std::initializer_list<T> lst);: Assigns to the stack the items in lst.
iterator begin ();
const_iterator cbegin () const;: Returns an iterator to the first element of the stack.
iterator end ();
const_iterator cend () const;: Returns an iterator one past the last element of the stack.
iterator::iterator ();: Default constructor for stack iterators. Leaves the object in an invalid state. Dereferencing or incrementing such an iterator has undefined behavior.
iterator::iterator (const iterator& other);: Copy constructor. Initializes the iterator to be equal to other.
T& iterator::operator* ();
const T& const_iterator::operator* () const;: Returns a (possibly const) reference to the iterator's underlying object.
T* iterator::operator-> ();
const T* const_iterator::operator-> () const;: Returns a (possibly const) pointer to the iterator's underlying object.
iterator& iterator::operator++ ();: Pre-increment operator. Moves the iterator forward, and returns it. If the iterator was at the end of the stack, the results are undefined.
iterator iterator::operator++ (int);: Post-increment operator. Moves the iterator forward, and returns an iterator equal to what it was before incrementing. If the iterator was at the end of the stack, the results are undefined.
bool iterator::operator== (const iterator& other);
bool iterator::operator!= (const iterator& other);: Tests for iterator (in)equality.

Implementation details

There really isn't much to say about the internal details of the stack. It's probably the easiest lock free structure to implement. As with most other implementations, the one in XRCU simply consists of an atomic pointer to the top node. The use of RCU prevents the biggest issue with this design, that is, the ABA problem.

The only noteworthy thing to point out is that the swap method is safe to call from multiple threads as well. In order to achive this, the library uses a special sentinel bit that is temporarily set at the head node when a swap is undergoing. During a swap, push and pop cannot proceed, since they check that this special bit is not set before modifying the stack.

Note that because stack iterators are implicit critical sections, and because of stacks' implementation, iterating a stack will always be safe, even in the presence of operations like push and pop. The only way for an iterator to be invalidated is it's at the beggining of the stack, and a call to pop is made. Even then, dereferencing the iterator is valid, but advancing it will end prematurely, since the object was unlinked from the stack.

Queues

    #include <xrcu/queue.hpp>

This is a multi-producer, multi-consumer, FIFO queue. Much like the stack, elements can be pushed or popped from it, but the order is different: They will be retrieved in the same order they were inserted.

In XRCU, queues meet the requirements for the C++ concept of Container.

Queue API

Queues are templated types, and they may be instantiated with a type T:

  template <class T>
  struct queue
    {
      typedef T value_type;
      typedef T& reference;
      typedef const T& const_reference;
      typedef T* pointer;
      typedef const T* const_pointer;
      typedef ptrdiff_t difference_type;
      typedef size_t size_type;

      struct iterator;
      struct const_iterator;
    };

As with the stack, a queue's iterator can be considered a forward iterator, and it's also a cs_guard, so that a queue may be examined by an iterator, at any time.

The following describes the public interface for queue<T>

queue ();: Default constructor. Initializes a queue to be empty.
template <class Integer> queue (Integer n, T value);: Initializes a queue to contain n times value.
template <class Iter> queue (Iter first, Iter last);: Initializes a queue with the values in the range [first, last).
queue (std::initializer_list<T> lst);: Initializes a queue with the values in lst.
queue (const queue<T>& other);: Copy constructor. Initializes a queue with the values in other.
queue (queue<T>&& other);: Move constructor. Takes ownership of the values in other.
void push (const T& value);: Pushes value to the top of the queue.
template <class ...Args> void emplace (Args&& ...args): Same as push, only the new item is constructed by calling the move constructor with args....
optional<T> pop ();: Removes the first item from the queue, and returns an optional with that value. If the queue is empty, the method returns an optional with no value.
optional<T> front () const;: Returns an optional with the first value from the queue. If the queue is empty, an optional with no value is returned instead.
optional<T> back () const;: Returns an optional with the last value from the queue. If the queue is empty, an optional with no value is returned instead.
size_type size () const;: Returns the size of the queue.
size_type max_size () const;: Returns the maximum allowed size for the queue.
void swap (queue<T>& other);: Swaps the contents of the queue with other, but only if other is not the same object.
queue<T>& operator= (const queue<T>& other);: Assigns the contents of other to the queue and returns *this.
queue<T>& operator= (queue<T>&& other);: Move assignment. Takes ownership of the elements in other. Returns *this.
bool operator== (const queue<T>& other);: Compares the elements of the queue with the ones in other. Returns true if they are all equal.
bool operator!= (const queue<T>& other);: Compares the elements of the queue with the ones in other. Returns true if any two of them are not equal.
bool operator< (const queue<T>& other) const;
bool operator> (const queue<T>& other) const;
bool operator<= (const queue<T>& other) const;
bool operator>= (const queue<T>& other) const;: Lexicographically compares the elements of the queue with the ones in other, in a way that is equivalent to calling std::lexicographical_compare.
void clear ();: Removes every element from the queue.
template <class T1, class T2> void assign (T1 x, T2 y);: Assigns to the queue the elements described by (x, y). They can be an iterator range or a pair of (integer, value), as is the case with the queue's constructor.
void assign (std::initializer_list<T> lst);: Assigns to the queue the elements in lst.
iterator begin ();
const_iterator cbegin () const;: Returns an iterator to the first element of the queue.
iterator end ();
const_iterator cend () const;: Returns an iterator one past the last element of the queue.
iterator::iterator ();: Default constructor for queue iterators. Leaves the object in an invalid state. Dereferencing or incrementing such an iterator has undefined behavior.
iterator::iterator (const iterator& other);: Copy constructor. Initializes the iterator to be equal to other.
const T& iterator::operator* ();
const T& const_iterator::operator* () const;: Returns a const reference to the iterator's underlying object.
iterator& iterator::operator++ ();: Pre-increment operator. Moves the iterator forward, and returns it. If the iterator was at the end of the stack, the results are undefined.
iterator iterator::operator++ (int);: Post-increment operator. Moves the iterator forward, and returns an iterator equal to what it was before incrementing. If the iterator was at the end of the stack, the results are undefined.
bool iterator::operator== (const iterator& other);
bool iterator::operator!= (const iterator& other);: Tests for iterator (in)equality.

Implementation details

The design of the multi-producer, multi-consumer queue in XRCU is rather simple and different from other implementations. It consists of a single array that holds either the elements themselvers, or pointers to them, a decision taken at compile time via type traits.

In addition to the vector, the queue is composed of three additional counters: the capacity, a read cursor and a write cursor. The capacity is static, and simply stores the raw number of elements that the queue can hold before it's forced to reallocate. The read and write cursors are indices that indicate at which positions the pop and push (or emplace) operations may take place.

When an element is pushed into a queue, the write cursor is checked against the capacity. If the queue is full, then it must be reallocated. This is done by marking each element with a special bit, allocating a new queue and then copying over the elements. If there's room in the queue, the element is atomically set via compare-and-swap, and if successful, the write cursor is also updated atomically.

Popping elements proceeds in a similar way. The read cursor is compared against the write cursor, and returns prematurely with failure if there are no more elements to pop. Otherwise, it's atomically replaced with a special marker, after which the read cursor is updated.

Skip lists

    #include <xrcu/skip_list.hpp>

A skip list is an associative container that holds a sorted set of unique objects of a particular type. In addition to the Key type, skip lists are instantiated with a comparator type that allows them to determine ordering.

In XRCU, skip lists meet the requirements for the C++ concept of Container, and AssociativeContainer.

Skip list API

Skip lists are templated types, defined in the following way:

    template <class T, class Cmp = std::less<T> >
    struct skip_list
      {
        typedef T value_type;
        typedef T key_type;
        typedef Cmp key_compare;
        typedef Cmp value_compare;
        typedef T& reference;
        typedef const T& const_reference;
        typedef T* pointer;
        typedef const T* const_pointer;
        typedef ptrdiff_t difference_type;
        typedef size_t size_type;

        struct iterator;
        struct const_iterator;
      };

The template parameter T refers to the key type, whereas Cmp is the comparator type, which defaults to std::less. Under most circumstances, that is usually enough, but users may instantiate with any other type that defines the operator() which returns a boolean.

As shown above, skip_list has iterators, but they are always constant, for practicality reasons. Much like with other containers, skip list iterators are forward iterators, and an implicit cs_guard.

The following describes the public interface for skip_list<T, Cmp>

skip_list (Cmp c = Cmp (), unsigned int depth = ...);: Initializes the skip list with comparator c, which defaults to a default constructed value. Also takes a parameter indicating the maximum depth a skip list node may have, which defaults to an implementation-specified value. Users shouldn't need to change the latter, but it can be useful when tuning the application for performance. As a general rule, a higher value implies more memory usage, but better performance.
template <class Iter> skip_list (Iter first, Iter last, Cmp c = Cmp (), unsigned int depth = ...);: Initializes the skip list to the values in [first, last). The comparator and depth parameters are the same as explained above.
skip_list (std::initializer_list<T> lst, Cmp c = Cmp (), unsigned int depth = ...);: Initializes the skip list to the values in lst. The comparator and depth parameters are the same as explained above.
skip_list (const skip_list<T, Cmp>& other);: Copy constructor. Initializes the skip list to hold the values in other.
skip_list (skip_list<T, Cmp>&& other);: Move constructor. Takes ownership of the values in other.
optional<T> find (const T& key) const;: Searches for an element equivalent to key in the skip list, and returns an optional with it as its value. If the key couldn't be found, the returned optional has no value.
bool contains (const T& key) const;: Returns true if key is present in the skip list.
bool insert (const T& key) const;: Inserts key in the skip list. Returns true if the key wasn't present previous to this call.
bool erase (const T& key) const;: Erases key from the skip list. Returns true if the key was present previous to this call.
optional<T> remove (const T& key);: Erases key from the skip list and returns an optional with that element as its value, if the key was present. Otherwise, the returned optional has no value.
iterator begin ();
const_iterator cbegin () const;: Returns an iterator to the first element of the skip list.
iterator end ();
const_iterator cend () const;: Returns an iterator one past the last element of the skip list.
size_type size () const;: Returns the current size of the skip list.
size_type max_size () const;: Returns the maximum allowed size for a skip list.
bool empty () const;: Returns true if the skip list is empty (i.e: its size is 0).
template <class Iter> void assign (Iter first, Iter last);: Assigns to the skip list the elements in [first, last).
void assign (std::initializer_list<T> lst);: Assigns to the skip list the elements in lst.
skip_list& operator= (const skip_list<T, Cmp>& other);: Assigns the elements in other to the skip list. Returns *this.
skip_list& operator= (skip_list<T, Cmp>&& other);: Move assignment. Takes ownership of the elements in other. Returns *this.
void swap (skip_list<T, Cmp>& other);: Swaps the contents of the skip list with other, but only if other is a different object.
void clear ();: Removes every element from the skip list.

Implementation details

In XRCU, skip lists are implemented as described in any piece of literature that talks about them. Basically, every skip list of depth D has a head node that can be linked with up to other D nodes. When performing a lookup for an element, we start at the head node, and move horizontally until the current element is equal or greater. If it's equal, we the lookup succeeded. Otherwise, we move vertically to the next node, until we either find the element, or we exhausted every node.

Insertions first perform a lookup, and if the search failed, we create a new node with the specified value as its key. The number of linked nodes it will have is determined in a semi-random way, but it must never exceed the maximum depth of the skip list. After the node is created, we try to atomically link it with its computed predecesors and successors (Returned in the same lookup call we did before). If we succeed, the node is considered part of the skip list; otherwise, we free it and retry.

Erasing an element proceeds in a similar fashion, only we check that the lookup did not fail, and then atomically relink the node's predecessors with the node's own successors, thereby removing the key from the skip list.

Skip lists mantain an atomic word to store the length. However, this is used slightly differently as one would think, because the lowest bit is used as an ad-hoc lock bit in order to serialize concurrent calls to swap.

Hash tables

    #include <xrcu/hash_table.hpp>

Hash tables are associative containers that map unique keys to values. Ordering is unspecified for both keys and values. In addition to the key and value types, hash tables are instantiated with a hashing type and an equality type; callables that compute a hash value for a given key, and one that tests for equality, given two keys, respectively.

Additionally, hash tables mantain a load factor that determines when a full rehash is performed. A rehash implies moving all the key-value pairs into a new location to reduce the overhead of most operations.

In XRCU, hash tables meet the requirements for the C++ concept of Container and UnorderedAssociativeContainer.

Hash table API

As mentioned above, hash tables are template types, defined like this:

    template <class Key, class Val,
              class Equal = std::equal<Key>,
              class Hash = std::hash<Key> >
    struct hash_table
      {
        typedef Val mapped_type;
        typedef Key key_type;
        typedef std::pair<Key, Val> value_type;
        typedef Equal key_equal;
        typedef Hash hasher;
        typedef value_type& reference;
        typedef const value_type& const_reference;
        typedef value_type* pointer;
        typedef const value_type* const_pointer;
        typedef ptrdiff_t difference_type;
        typedef size_t size_type;

        struct iterator;
        struct const_iterator;
      };

The template parameters should be pretty self explanatory: They refer to the key, value, equality and hashing types, in that order. The Equal type has to operate on two keys and return a boolean value that determines their equality, whereas the Hash type has to operate on keys and return unsigned integers. Both are allowed to throw exceptions, although it is not really wise to do so.

Much like with skip lists, hash table iterators are always constant, and also an implicit cs_guard.

The following describes public interface for hash tables:

hash_table (size_t size = 0, float lf = 0.85, Equal e = Equal (), Hash h = Hash ())

Initializes the hash table to hold size elements, with a load factor of lf, using the equality predicate e and the hasher h. The load factor must be in the range [0.4, 0.9]; if it's not, then it will silently be set to the default of 0.85.

template <class It> hash_table (It first, It last, float lf = 0.85, Equal e = Equal (), Hash h = Hash ())

Initializes the hash table with the values between (first, last]. The elements in that range must have two public members defined: first and second, referring to the key and value of each element, respectively, and in a similar fashion to what std::pair does. The other parameters work as with the previous constructor.

hash_table (std::initializer_list<std::pair<Key, Val> >, float lf = 0.85, Equal e = Equal (), Hash h = Hash ())

Initializes the hash table with the values in lst. The rest of the parameters work as with the previous constructors.

hash_table (const hash_table<Key, Val, Hash, Equal>& other);

Copy constructor. Initializes the hash table with the values in other.

hash_table (hash_table<Key, Val, Hash, Equal>&& other);

Move constructor. Takes ownership of the values in other.

size_t size () const;

Returns the hash table size.

size_t max_size () const;

Returns the maximum allowed size for a hash table.

bool empty () const;

Returns true if the hash table is empty.

optional<Val> find (const Key& key) const;

Val find (const Key& key, const Val& defl) const;

Look for key in the hash table. If it's found, returns the value associated to it. If it's not found, returns an empty optional in the first version, or defl in the second one.

bool contains (const Key& key) const;

Returns true if the key is present in the hash table.

bool insert (const Key& key, const Val& val);

Associates key with val in the hash table. Returns true if the value was not present. Otherwise, the former value is replaced.

template <class Fn, class ...Args> bool update (const Key& key, Fn f, Args... args);

Updates the value associated to key by calling f with it and the rest of the arguments. In other words, if val is the value associated to key, calls f (val, args...) and updates the value with the result. If no value was present for key, the function is called with a default constructed value instead (as if by calling Val ()).

Note that the function takes a reference to the value, and so it's possible to modify it in place. Additionally, if the function returns the very same object that was passed (i.e: the same reference that it received), there won't be any modifications made to the value in the table. If, however, the function returns a different object, this call will need to atomically update the value. The passed function may be called more than once if the atomic updates fail.

Returns true if key was not present in the hash table before the call.

bool erase (const Key& key);

Erases key from the hash table. Returns true if key was present before the call.

optional<Val> remove (const Key& key);

Removes key from the hash table and returns the value that was associated to if, if it was present.

void clear ();

Removes every element from the hash table.

template <class Iter> void assign (Iter first, Iter last);

Assigns the elements in [first, last) to the hash table. The same restrictions as the range constructor apply here.

void assign (std::initializer_list<std::pair<Key, Val>> lst);

Assigns the elements in lst to the hash table.

hash_table<Key, Val>& operator= (const hash_table<Key, Val>& other);

Assigns the elements in other to the hash table. Returns *this.

hash_table<Key, Val>& operator= (hash_table<Key, Val>&& other);

Move assignment. Takes ownership of the elements in other. Returns *this.

void swap (hash_table<Key, Val>& other);

Swaps the contents of the hash table with other, but only if other is a different object.

iterator begin ();

const_iterator cbegin () const;

Returns an iterator to the first element of the hash table.

iterator end ();

const_iterator cend () const;

Returns an iterator one past the last element of the hash table.

iterator::iterator ();

Default constructor for hash table iterators. Leaves the object in an invalid state. Dereferencing or incrementing such an iterator has undefined behavior.

iterator::iterator (const iterator& other);

Copy constructor. Initializes the iterator to be equal to other.

Key iterator::key () const;

Returns the key for the iterator.

Val iterator::value () const;

Returns the value for the iterator.

std::pair<Key, Val> iterator::operator* () const;

Returns an std::pair constructed with the key and value of the iterator.

iterator& iterator::operator++ ();

Pre-increment operator. Moves the iterator forward and returns it. If the iterator was at the end, the results are undefined.

iterator iterator::operator++ (int);

Post-increment operator. Moves the iterator forward, and returns an iterator equal to what it was before incrementing. If the iterator was at the end, the results are undefined.

bool iterator::operator== (const iterator& other);

bool iterator::operator!= (const iterator& other);

Test for iterator (in)equality.

Implementation details

Hash tables are somewhat complex, because the atomicity requirements force us to do some rather convoluted things. To start off, a hash table is essentially a vector of consecutive key and value pairs, with some special values indicating free and deleted entries. However, since we can only operate atomically on integers, we wrap any other type that is not integral into a dynamically allocated pointer. This is done based on the template instantiation and is figured out at compile time.

When a lookup is performed, the passed key is hashed with the table's own internal hasher and a hash code is computed. Using this code, we produce an index into the vector and then we iterate over it, looking for a key that matches, or until we find a free entry, in which case the lookup fails.

Insertions are a bit more complicated. We start by performing a lookup, as described above, and then determine what to do based on the result: If the lookup came up empty, then we need to insert both the key and the value; otherwise, we only need to update the value.

If only the value needs to be updated (because the key is already present), then the insertion simply consists of atomically swapping the old value with the new one, and, if succesful, finalizing the old value as well.

If we need to insert the key as well, first we decrement an internal counter that keeps track of how many additional elements we can insert before a rehash is triggered. If the counter has already reached zero, then we have to rehash before inserting the key and value. If no rehash is needed, or after it's done, both the key and value are updated atomically (Some platforms allow us to do it in a single step, otherwise the operations are done sequentially).

At this point, we have to pause to make a sort of confession: Hash tables as implemented in XRCU are not entirely lock free, because rehashing actually takes an internal lock (I know, you're crushed). This not because of any intrinsical limitation, but rather, to make things easier. Since rehashes are probably the least common of the operations, we figured it wasn't that big of a deal anyways.

Going back to rehashes, these are the most expensive operations. As mentioned, they start off by acquiring an internal lock, and then they iterate over the full vector, marking the value indexes with a special bit that signals that the table is being rehashed. When this bit is set, insertions and erasures are forbidden (They both check for this bit before proceeding).

Erasures are pretty simple in comparison. They obviously perform a lookup on the key, and if it didn't come up empty, they atomically swap out the value for the special empty constant. Afterwards, they can mutate the key entry without atomicity (Because erased entries cannot be reused).

BUGS

All implemented containers use standard operators new and delete to perform memory (de)allocations. There's no way to specify custom allocators yet, although it's planned in the future.

AUTHORS

Agustina Arzille (avarzille@riseup.net) and Luciano Lo Giudice (lmlogiudice@gmail.com)