C++ API

Being written in C++, libbdsg and libhandlegraph offer a C++ API.

Handle Graph API

The handlegraph namespace defines the handle graph interface.

Handles

The namespace contains definitions for different types of handles. THese are references to graph elements. A basic handlegraph::handle_t is a reference to a strand or orientation of a node in the graph.

struct handle_t

Represents a traversal of a node in a graph in a particular direction.

struct path_handle_t

Represents the internal id of a path entity.

struct step_handle_t

A step handle is an opaque reference to a single step of an oriented node on a path in a graph.

struct net_handle_t

A net handle is an opaque reference to a category of traversals of a single node, a chain, or the interior of a snarl, in the snarl decomposition of a graph.

Snarls and chains are bounded by two particular points, but the traversal may not visit both or any of them (as is the case for traversals between internal tips).

The handle refers to the snarl or chain itself and also a particular category of traversals of it. Each of the start and end of the traversal can be the start of the snarl/chain, the end of the snarl/chain, or some internal tip, for 6 distinct combinations.

For single nodes, we only have forward and reverse.

Graph Interfaces

The handlegraph namespace also defines a hierarchy of interfaces for graph implementations that provide different levels of features.

HandleGraph

The most basic is the handlegraph::HandleGraph, a completely immutable, unannotated graph.

class HandleGraph

This is the interface that a graph that uses handles needs to support. It is also the interface that users should code against.

Subclassed by bdsg::SubgraphOverlay, handlegraph::ExpandingOverlayGraph, handlegraph::MutableHandleGraph, handlegraph::PathHandleGraph, handlegraph::RankedHandleGraph

Public Functions

virtual bool has_node(nid_t node_id) const = 0

Method to check if a node exists by ID.

virtual handle_t get_handle(const nid_t &node_id, bool is_reverse = false) const = 0

Look up the handle for the node with the given ID in the given orientation.

virtual nid_t get_id(const handle_t &handle) const = 0

Get the ID from a handle.

virtual bool get_is_reverse(const handle_t &handle) const = 0

Get the orientation of a handle.

virtual handle_t flip(const handle_t &handle) const = 0

Invert the orientation of a handle (potentially without getting its ID)

virtual size_t get_length(const handle_t &handle) const = 0

Get the length of a node.

virtual std::string get_sequence(const handle_t &handle) const = 0

Get the sequence of a node, presented in the handle’s local forward orientation.

virtual size_t get_node_count() const = 0

Return the number of nodes in the graph.

virtual nid_t min_node_id() const = 0

Return the smallest ID in the graph, or some smaller number if the smallest ID is unavailable. Return value is unspecified if the graph is empty.

virtual nid_t max_node_id() const = 0

Return the largest ID in the graph, or some larger number if the largest ID is unavailable. Return value is unspecified if the graph is empty.

template<typename Iteratee>
bool follow_edges(const handle_t &handle, bool go_left, const Iteratee &iteratee) const

Loop over all the handles to next/previous (right/left) nodes. Passes them to a callback. If called with a bool-returning invocable thing, can stop early when the function returns false. Returns true if we finished and false if we stopped early.

template<typename Iteratee>
bool for_each_handle(const Iteratee &iteratee, bool parallel = false) const

Loop over all the nodes in the graph in their local forward orientations, in their internal stored order. If called with a bool-returning invocable thing, can stop early when the function returns false. Returns true if we finished and false if we stopped early. Can be told to run in parallel, in which case stopping after a false return value is on a best-effort basis and iteration order is not defined.

virtual size_t get_degree(const handle_t &handle, bool go_left) const

Get the number of edges on the right (go_left = false) or left (go_left = true) side of the given handle. The default implementation is O(n) in the number of edges returned, but graph implementations that track this information more efficiently can override this method.

virtual bool has_edge(const handle_t &left, const handle_t &right) const

Returns true if there is an edge that allows traversal from the left handle to the right handle. By default O(n) in the number of edges on left, but can be overridden with more efficient implementations.

inline bool has_edge(const edge_t &edge) const

Convenient wrapper of has_edge for edge_t argument.

virtual size_t get_edge_count() const

Return the total number of edges in the graph. If not overridden, counts them all in linear time.

virtual size_t get_total_length() const

Return the total length of all nodes in the graph, in bp. If not overridden, loops over all nodes in linear time.

virtual char get_base(const handle_t &handle, size_t index) const

Returns one base of a handle’s sequence, in the orientation of the handle.

virtual std::string get_subsequence(const handle_t &handle, size_t index, size_t size) const

Returns a substring of a handle’s sequence, in the orientation of the handle. If the indicated substring would extend beyond the end of the handle’s sequence, the return value is truncated to the sequence’s end. By default O(n) in the size of the handle’s sequence, but can be overriden.

handle_t forward(const handle_t &handle) const

Get the locally forward version of a handle.

edge_t edge_handle(const handle_t &left, const handle_t &right) const

A pair of handles can be used as an edge. When so used, the handles have a canonical order and orientation.

handle_t traverse_edge_handle(const edge_t &edge, const handle_t &left) const

Such a pair can be viewed from either inward end handle and produce the outward handle you would arrive at.

template<typename Iteratee>
bool for_each_edge(const Iteratee &iteratee, bool parallel = false) const

Loop over all edges in their canonical orientation (as returned by edge_handle) as edge_t items and execute an iteratee on each one. If the iteratee returns bool, and it returns false, stop iteration. Return true if the iteration completed and false if it stopped early. If run in parallel (parallel = true), stopping early is best-effort.

PathHandleGraph

On top of this, there is the handlegraph::PathHandleGraph, which allows for embedded, named paths in the graph.

class PathHandleGraph : public virtual handlegraph::HandleGraph, public virtual handlegraph::PathMetadata

This is the interface for a handle graph that stores embedded paths.

Subclassed by bdsg::PathHandleGraphProxy< BackingGraph >, bdsg::PathSubgraphOverlay, bdsg::PathVectorizableOverlay, handlegraph::MutablePathHandleGraph, handlegraph::PathPositionHandleGraph

Public Functions

virtual size_t get_path_count() const = 0

Returns the number of paths stored in the graph.

virtual bool has_path(const std::string &path_name) const = 0

Determine if a path name exists and is legal to get a path handle for.

virtual path_handle_t get_path_handle(const std::string &path_name) const = 0

Look up the path handle for the given path name. The path with that name must exist.

virtual std::string get_path_name(const path_handle_t &path_handle) const = 0

Look up the name of a path from a handle to it.

virtual bool get_is_circular(const path_handle_t &path_handle) const = 0

Look up whether a path is circular.

virtual size_t get_step_count(const path_handle_t &path_handle) const = 0

Returns the number of node steps in the path.

virtual size_t get_step_count(const handle_t &handle) const

Returns the number of node steps on a handle.

virtual handle_t get_handle_of_step(const step_handle_t &step_handle) const = 0

Get a node handle (node ID and orientation) from a handle to an step on a path.

virtual path_handle_t get_path_handle_of_step(const step_handle_t &step_handle) const = 0

Returns a handle to the path that an step is on.

virtual step_handle_t path_begin(const path_handle_t &path_handle) const = 0

Get a handle to the first step, which will be an arbitrary step in a circular path that we consider “first” based on our construction of the path. If the path is empty, then the implementation must return the same value as path_end().

virtual step_handle_t path_end(const path_handle_t &path_handle) const = 0

Get a handle to a fictitious position past the end of a path. This position is returned by get_next_step for the final step in a path in a non-circular path. Note: get_next_step will NEVER return this value for a circular path.

virtual step_handle_t path_back(const path_handle_t &path_handle) const = 0

Get a handle to the last step, which will be an arbitrary step in a circular path that we consider “last” based on our construction of the path. If the path is empty then the implementation must return the same value as path_front_end().

virtual step_handle_t path_front_end(const path_handle_t &path_handle) const = 0

Get a handle to a fictitious position before the beginning of a path. This position is return by get_previous_step for the first step in a path in a non-circular path. Note: get_previous_step will NEVER return this value for a circular path.

virtual bool has_next_step(const step_handle_t &step_handle) const = 0

Returns true if the step is not the last step in a non-circular path.

virtual bool has_previous_step(const step_handle_t &step_handle) const = 0

Returns true if the step is not the first step in a non-circular path.

virtual step_handle_t get_next_step(const step_handle_t &step_handle) const = 0

Returns a handle to the next step on the path. If the given step is the final step of a non-circular path, this method has undefined behavior. In a circular path, the “last” step will loop around to the “first” step.

virtual step_handle_t get_previous_step(const step_handle_t &step_handle) const = 0

Returns a handle to the previous step on the path. If the given step is the first step of a non-circular path, this method has undefined behavior. In a circular path, it will loop around from the “first” step (i.e. the one returned by path_begin) to the “last” step.

template<typename Iteratee>
bool for_each_path_handle(const Iteratee &iteratee) const

Execute a function on each path_handle_t in the graph. If it returns bool, and it returns false, stop iteration. Returns true if we finished and false if we stopped early.

If the graph contains compressed haplotype paths and properly implements for_each_path_of_sense to retrieve them, they should not be visible here. Only reference or generic named paths should be visible.

template<typename Iteratee>
bool for_each_step_on_handle(const handle_t &handle, const Iteratee &iteratee) const

Execute a function on each step (step_handle_t) of a handle in any path. If it returns bool and returns false, stop iteration. Returns true if we finished and false if we stopped early.

If the graph contains compressed haplotype paths and properly implements for_each_step_of_sense to find them, they should not be visible here. Only reference or generic named paths should be visible.

virtual std::vector<step_handle_t> steps_of_handle(const handle_t &handle, bool match_orientation = false) const

Returns a vector of all steps of a node on paths. Optionally restricts to steps that match the handle in orientation.

virtual bool is_empty(const path_handle_t &path_handle) const

Returns true if the given path is empty, and false otherwise.

PathForEachSocket scan_path(const path_handle_t &path) const

Returns a class with an STL-style iterator interface that can be used directly in a for each loop like: for (handle_t handle : graph->scan_path(path)) { }

template<typename Iteratee>
bool for_each_step_in_path(const path_handle_t &path, const Iteratee &iteratee) const

Loop over all the steps (step_handle_t) along a path. In a non-circular path, iterates from first through last step. In a circular path, iterates from the step returned by path_begin forward around to the step immediately before it. If the iteratee returns bool, and it returns false, stop. Returns true if we finished and false if we stopped early.

Mutable and Deletable Interfaces

Then for each there are versions where the underlying graph is “mutable” (meaning that material can be added to it and nodes can be split) and “deletable” (meaning that nodes and edges can actually be removed from the graph), and for handlegraph::PathHandleGraph there are versions where the paths can be altered.

class MutableHandleGraph : public virtual handlegraph::HandleGraph

This is the interface for a handle graph that supports addition of new graph material.

Subclassed by handlegraph::DeletableHandleGraph, handlegraph::MutablePathMutableHandleGraph

Public Functions

virtual handle_t create_handle(const std::string &sequence) = 0

Create a new node with the given sequence and return the handle. The sequence may not be empty.

virtual handle_t create_handle(const std::string &sequence, const nid_t &id) = 0

Create a new node with the given id and sequence, then return the handle. The sequence may not be empty. The ID must be strictly greater than 0.

virtual void create_edge(const handle_t &left, const handle_t &right) = 0

Create an edge connecting the given handles in the given order and orientations. Ignores existing edges.

inline void create_edge(const edge_t &edge)

Convenient wrapper for create_edge.

virtual handle_t apply_orientation(const handle_t &handle) = 0

Alter the node that the given handle corresponds to so the orientation indicated by the handle becomes the node’s local forward orientation. Rewrites all edges pointing to the node and the node’s sequence to reflect this. Invalidates all handles to the node (including the one passed). Returns a new, valid handle to the node in its new forward orientation. Note that it is possible for the node’s ID to change. Does not update any stored paths. May change the ordering of the underlying graph.

virtual std::vector<handle_t> divide_handle(const handle_t &handle, const std::vector<size_t> &offsets) = 0

Split a handle’s underlying node at the given offsets in the handle’s orientation. Returns all of the handles to the parts. Other handles to the node being split may be invalidated. The split pieces stay in the same local forward orientation as the original node, but the returned handles come in the order and orientation appropriate for the handle passed in. Updates stored paths. This invalidates the passed handles, but not other handles.

inline std::pair<handle_t, handle_t> divide_handle(const handle_t &handle, size_t offset)

Specialization of divide_handle for a single division point.

virtual void optimize(bool allow_id_reassignment = true) = 0

Adjust the representation of the graph in memory to improve performance. Optionally, allow the node IDs to be reassigned to further improve performance. Note: Ideally, this method is called one time once there is expected to be few graph modifications in the future. This may invalidate outstanding handles.

virtual bool apply_ordering(const std::vector<handle_t> &order, bool compact_ids = false) = 0

Reorder the graph’s internal structure to match that given. This sets the order that is used for iteration in functions like for_each_handle. If compact_ids is true, may (but will not necessarily) compact the id space of the graph to match the ordering, from 1->|ordering|. In other cases, node IDs will be preserved. This may be a no-op in the case of graph implementations that do not have any mechanism to maintain an ordering. This may invalidate outstanding handles. Returns true if node IDs actually were adjusted to match the given order, and false if they remain unchanged.

virtual void set_id_increment(const nid_t &min_id) = 0

Set a minimum id to increment the id space by, used as a hint during construction. May have no effect on a backing implementation.

virtual void increment_node_ids(nid_t increment)

Add the given value to all node IDs. Has a default implementation in terms of reassign_node_ids, but can be implemented more efficiently in some graphs.

virtual void increment_node_ids(long increment)

This specialization for long appears to be needed to avoid confusion about nid_t.

virtual void reassign_node_ids(const std::function<nid_t(const nid_t&)> &get_new_id) = 0

Renumber all node IDs using the given function, which, given an old ID, returns the new ID. Modifies the graph in place. Invalidates all outstanding handles. If the graph supports paths, they also must be updated. The mapping function may return 0. In this case, the input ID will remain unchanged. The mapping function should not return any ID for which it would return 0.

class DeletableHandleGraph : public virtual handlegraph::MutableHandleGraph

This is the interface for a handle graph that supports both addition of new nodes and edges as well as deletion of nodes and edges.

Subclassed by handlegraph::MutablePathDeletableHandleGraph

Public Functions

virtual void destroy_handle(const handle_t &handle) = 0

Remove the node belonging to the given handle and all of its edges. Either destroys any paths in which the node participates, or leaves a “hidden”, un-iterateable handle in the path to represent the sequence of the removed node. Invalidates the destroyed handle. May be called during serial for_each_handle iteration ONLY on the node being iterated. May NOT be called during parallel for_each_handle iteration. May NOT be called on the node from which edges are being followed during follow_edges. May NOT be called during iteration over paths, if it could destroy a path. May NOT be called during iteration along a path, if it could destroy that path.

virtual void destroy_edge(const handle_t &left, const handle_t &right) = 0

Remove the edge connecting the given handles in the given order and orientations. Ignores nonexistent edges. Does not update any stored paths.

virtual handle_t truncate_handle(const handle_t &handle, bool trunc_left, size_t offset)

Shorten a node by truncating either the left or right side of the node, relative to the orientation of the handle, starting from a given offset along the nodes sequence. Any edges on the truncated end of the node are deleted. Returns a (possibly altered) handle to the truncated node. May invalid stored paths.

inline void destroy_edge(const edge_t &edge)

Convenient wrapper for destroy_edge.

virtual void clear() = 0

Remove all nodes and edges. May also remove all paths, if applicable.

class MutablePathHandleGraph : public virtual handlegraph::PathHandleGraph, public virtual handlegraph::MutablePathMetadata

This is the interface for a handle graph with embedded paths where the paths can be modified. Note that if the graph can also be modified, the implementation will also need to inherit from MutableHandleGraph, via the combination MutablePathMutableHandleGraph interface. TODO: This is a very limited interface at the moment. It will probably need to be extended.

Subclassed by handlegraph::MutablePathMutableHandleGraph

Public Functions

virtual void destroy_path(const path_handle_t &path_handle) = 0

Destroy the given path. Invalidates handles to the path and its steps.

virtual path_handle_t create_path_handle(const std::string &name, bool is_circular = false) = 0

Create a path with the given name. The caller must ensure that no path with the given name exists already, or the behavior is undefined. Returns a handle to the created empty path. Handles to other paths must remain valid.

virtual path_handle_t rename_path(const path_handle_t &path_handle, const std::string &new_name)

Renames a path. Existing path_handle_t’s may become invalidated..

virtual step_handle_t append_step(const path_handle_t &path, const handle_t &to_append) = 0

Append a visit to a node to the given path. Returns a handle to the new final step on the path which is appended. If the path is cirular, the new step is placed between the steps considered “last” and “first” by the method path_begin. Handles to prior steps on the path, and to other paths, must remain valid.

virtual step_handle_t prepend_step(const path_handle_t &path, const handle_t &to_prepend) = 0

Prepend a visit to a node to the given path. Returns a handle to the new first step on the path which is appended. If the path is cirular, the new step is placed between the steps considered “last” and “first” by the method path_begin. Handles to later steps on the path, and to other paths, must remain valid.

virtual void pop_front_step(const path_handle_t &path_handle)

Remove the first step in a path. Undefined behavior if path is empty.

virtual void pop_back_step(const path_handle_t &path_handle)

Remove the last step in a path. Undefined behavior if path is empty.

virtual std::pair<step_handle_t, step_handle_t> rewrite_segment(const step_handle_t &segment_begin, const step_handle_t &segment_end, const std::vector<handle_t> &new_segment) = 0

Delete a segment of a path and rewrite it as some other sequence of steps. Returns a pair of step_handle_t’s that indicate the range of the new segment in the path. The segment to delete should be designated by the first (begin) and past-last (end) step handles. If the step that is returned by path_begin is deleted, path_begin will now return the first step from the new segment or, in the case that the new segment is empty, the step used as segment_end. Empty ranges consist of two copies of the same step handle. Empty ranges in empty paths consist of two copies of the end sentinel handle for the path. Rewriting an empty range inserts before the provided end handle.

virtual void set_circularity(const path_handle_t &path, bool circular) = 0

Make a path circular or non-circular. If the path is becoming circular, the last step is joined to the first step. If the path is becoming linear, the step considered “last” is unjoined from the step considered “first” according to the method path_begin.

class MutablePathMutableHandleGraph : public virtual handlegraph::MutablePathHandleGraph, public virtual handlegraph::MutableHandleGraph

This is the interface for a graph which is mutable and which has paths which are also mutable.

Subclassed by handlegraph::MutablePathDeletableHandleGraph

class MutablePathDeletableHandleGraph : public virtual handlegraph::MutablePathMutableHandleGraph, public virtual handlegraph::DeletableHandleGraph

This is the interface for a graph which is deletable and which has paths which are also mutable.

Subclassed by bdsg::GraphProxy< BackingGraph >, bdsg::HashGraph, bdsg::MutablePositionOverlay, bdsg::GraphProxy< BasePackedGraph< MappedBackend > >, bdsg::GraphProxy< BasePackedGraph<> >

Note that there is no handlegraph::PathMutableHandleGraph or handlegraph::PathDeletableHandleGraph; it does not make sense for the paths to be static while the graph can be modified.

Position and Ordering Interfaces

For paths, there is also the handlegraph::PathPositionHandleGraph which provides efficient random access by or lookup of base offset along each embedded path. Additionally, there is handlegraph::VectorizableHandleGraph which provides the same operations for a linearization of all of the graph’s bases. There is also a handlegraph::RankedHandleGraph interface, which provides an ordering, though not necessarily a base-level linearization, of nodes and edges.

class PathPositionHandleGraph : public virtual handlegraph::PathHandleGraph

This is the interface for a path handle graph with path positions

Subclassed by bdsg::PackedPositionOverlay, bdsg::PathPositionVectorizableOverlay, bdsg::PositionOverlay

Public Functions

virtual size_t get_path_length(const path_handle_t &path_handle) const = 0

Returns the length of a path measured in bases of sequence.

virtual size_t get_position_of_step(const step_handle_t &step) const = 0

Returns the position along the path of the beginning of this step measured in bases of sequence. In a circular path, positions start at the step returned by path_begin().

virtual step_handle_t get_step_at_position(const path_handle_t &path, const size_t &position) const = 0

Returns the step at this position, measured in bases of sequence starting at the step returned by path_begin(). If the position is past the end of the path, returns path_end().

virtual bool for_each_step_position_on_handle(const handle_t &handle, const std::function<bool(const step_handle_t&, const bool&, const size_t&)> &iteratee) const

Execute an iteratee on each step on a path, along with its orientation relative to the path (true if it is reverse the orientation of the handle on the path), and its position measured in bases of sequence along the path. Positions are always measured on the forward strand.

Iteration will stop early if the iteratee returns false. This method returns false if iteration was stopped early, else true

class VectorizableHandleGraph : public virtual handlegraph::RankedHandleGraph

Defines an interface providing a vectorization of the graph nodes and edges, which can be co-inherited alongside HandleGraph.

Subclassed by bdsg::VectorizableOverlay

Public Functions

virtual size_t node_vector_offset(const nid_t &node_id) const = 0

Return the start position of the node in a (possibly implict) sorted array constructed from the concatenation of the node sequences

virtual nid_t node_at_vector_offset(const size_t &offset) const = 0

Return the node overlapping the given offset in the implicit node vector.

virtual size_t edge_index(const edge_t &edge) const = 0

Return a unique index among edges in the graph.

class RankedHandleGraph : public virtual handlegraph::HandleGraph

Defines an interface for a handle graph that can rank its nodes and handles.

Doesn’t actually require an efficient node lookup by sequence position as in VectorizableHandleGraph.

Subclassed by handlegraph::VectorizableHandleGraph

Public Functions

virtual size_t id_to_rank(const nid_t &node_id) const = 0

Return the rank of a node (ranks start at 1 and are dense).

virtual nid_t rank_to_id(const size_t &rank) const = 0

Return the node with a given rank.

virtual size_t handle_to_rank(const handle_t &handle) const

Return the rank of a handle (ranks start at 1 and are dense, and each orientation has its own rank). Handle ranks may not have anything to do with node ranks.

virtual handle_t rank_to_handle(const size_t &rank) const

Return the handle with a given rank.

Algorithm implementers are encouraged to take the least capable graph type necessary for their algorithm to function.

SerializableHandleGraph

Orthogonal to the mutability and paths hierarchy, there is a handlegraph::SerializableHandleGraph interface that is implemented by graphs that can be saved to and loaded from disk. The C++ API supports saving to and loading from C++ streams, but the Python API provides only the ability to save to or load from filenames.

class SerializableHandleGraph : public handlegraph::Serializable

Subclassed by bdsg::GraphProxy< BackingGraph >, bdsg::HashGraph, bdsg::GraphProxy< BasePackedGraph< MappedBackend > >, bdsg::GraphProxy< BasePackedGraph<> >

SnarlDecomposition

A “snarl decomposition” describes the decomposition of the graph into nested substructures known as snarls and chains. The handlegraph::SnarlDecomposition interface defines methods for traversing the snarl decomposition of a graph using handlegraph::net_handle_t.

class SnarlDecomposition

Subclassed by bdsg::SnarlDistanceIndex, handlegraph::BuildableSnarlDecomposition

Public Types

enum endpoint_t

Represents a place that a traversal can start or end. Traversals can start or end at the start, end, or an internal tip of the thing they traverse.

Values:

enumerator START
enumerator END
enumerator TIP

Public Functions

virtual net_handle_t get_root() const = 0

Get a net handle referring to a tip-to-tip traversal of the contents of the root snarl. TODO: Special handling for circular things in the root snarl? Circular traversal type?

virtual bool is_root(const net_handle_t &net) const = 0

Return true if the given handle refers to (a traversal of) the root snarl, and false otherwise.

virtual bool is_snarl(const net_handle_t &net) const = 0

Returns true if the given net handle refers to (a traversal of) a snarl.

virtual bool is_chain(const net_handle_t &net) const = 0

Returns true if the given net handle refers to (a traversal of) a chain.

virtual bool is_node(const net_handle_t &net) const = 0

Returns true if the given net handle refers to (a traversal of) a single node, and thus has a corresponding handle_t.

virtual bool is_sentinel(const net_handle_t &net) const = 0

Return true if the given net handle is a snarl bound sentinel (in either inward or outward orientation), and false otherwise.

virtual net_handle_t get_net(const handle_t &handle, const HandleGraph *graph) const = 0

Turn a handle to an oriented node into a net handle for a start-to-end or end-to-start traversal of the node, as appropriate.

virtual handle_t get_handle(const net_handle_t &net, const HandleGraph *graph) const = 0

For a net handle to a traversal of a single node, get the handle for that node in the orientation it is traversed. May not be called for other net handles.

virtual net_handle_t get_parent(const net_handle_t &child) const = 0

Get the parent snarl of a chain, or the parent chain of a snarl or node. If the child is start-to-end or end-to-start, and the parent is a chain, the chain comes out facing the same way, accounting for the relative orientation of the child snarl or node in the chain. Otherwise, everything is produced as start-to-end, even if that is not actually a realizable traversal of a snarl or chain. May not be called on the root snarl.

Also works on snarl boundary sentinels.

virtual net_handle_t get_bound(const net_handle_t &snarl, bool get_end, bool face_in) const = 0

Get the bounding handle for the snarl or chain referenced by the given net handle, getting the start or end facing in or out as appropriate.

For snarls, returns the bounding sentinel net handles. For chains, returns net handles for traversals of the bounding nodes of the chain.

Ignores traversal type.

May not be called on traversals of individual nodes.

virtual net_handle_t flip(const net_handle_t &net) const = 0

Return a net handle to the same snarl/chain/node in the opposite orientation. No effect on tip-to-tip, start-to-start, or end-to-end net handles. Flips all the others.

virtual net_handle_t canonical(const net_handle_t &net) const = 0

Get a canonical traversal handle from any net handle. All handles to the same net graph element have the same canonical traversal. That canonical traversal must be realizable, and might not always be start-to-end or even consistently be the same kind of traversal for different snarls, chains, or nodes. Mostly useful to normalize for equality comparisons.

virtual endpoint_t starts_at(const net_handle_t &traversal) const = 0

Return the kind of location at which the given traversal starts.

virtual endpoint_t ends_at(const net_handle_t &traversal) const = 0

Return the kind of location at which the given traversal ends.

template<typename Iteratee>
bool for_each_child(const net_handle_t &parent, const Iteratee &iteratee) const

Loop over the child snarls and nodes of a chain, or the child chains of a snarl. If the parent is a chain and is in start-to-end or end-to-start order, children are produced in the order and orientation they are encountered when viewing the chain that way. Otherwise, children are produced in arbitrary order and start-to-end orientation, even if that is not actually a realizable traversal of a snarl or chain.

Return false from the iteratee to stop, or void or true to keep going. Returns false if iteration was stopped early, and true otherwise.

template<typename Iteratee>
bool for_each_traversal(const net_handle_t &item, const Iteratee &iteratee) const

Given a net handle to any type of traversal of a snarl, chain, or node, loop over only the types of traversals that are possible. Produces a net handle to the same snarl, chain, or node for each possible combination of traversal starts and ends (start, end, or internal tip) that is permitted by the internal connectivity of the snarl, chain, or node.

Return false from the iteratee to stop, or void or true to keep going. Returns false if iteration was stopped early, and true otherwise.

template<typename Iteratee>
bool follow_net_edges(const net_handle_t &here, const HandleGraph *graph, bool go_left, const Iteratee &iteratee) const

Given a net handle to a traversal of a snarl, chain, or node, iterate over all kinds of traversals of all snarls, chains, or nodes that are reachable by going either left or right from the given traversal.

Does not check to see if the given traversal is possible given the internal connectivity of a snarl or chain being traversed.

Does not leave a chain; looking form a node traversal out the end of its chain will produce nothing.

Produces and accepts snarl start and end sentinels, and does not leave a snarl.

If you do want to leave a snarl or a chain, jump up to the parent with get_parent_traversal().

template<typename Iteratee>
bool for_each_tippy_child(const net_handle_t &parent, const Iteratee &iteratee) const

Execute a function on each realizable traversal of a child of the given snarl or chain that begins with an internal tip. Traversals that end with tips are obtained by flipping the results.

TODO: allow to be restricted to those that can e.g. reach the start or end of the parent, or another internal tip.

virtual net_handle_t get_parent_traversal(const net_handle_t &traversal_start, const net_handle_t &traversal_end) const = 0

Get a net handle for traversals of a the snarl or chain that contains the given oriented bounding node traversals or sentinels. Given two sentinels for a snarl, produces a net handle to a start-to-end, end-to-end, end-to-start, or start-to-start traversal of that snarl. Given handles to traversals of the bounding nodes of a chain, similarly produces a net handle to a traversal of the chain.

For a chain, either or both handles can also be a snarl containing tips, for a tip-to-start, tip-to-end, start-to-tip, end-to-tip, or tip-to-tip traversal. Similarly, for a snarl, either or both handles can be a chain in the snarl that contains internal tips, or that has no edges on the appropriate end.

May only be called if a path actually exists between the given start and end.

template<typename Iteratee>
bool for_each_traversal_start(const net_handle_t &traversal, const Iteratee&) const

Loop over all the child net graph item traversals that could potentially start the given traversal. For traversals starting at tips, no guarantee is made that there is a path from the yielded traversals to the overall traversal’s final destination.

template<typename Iteratee>
bool for_each_traversal_end(const net_handle_t &traversal, const Iteratee&) const

Loop over all the child net graph item traversals that could potentially end the given traversal. For traversals ending at tips, no guarantee is made that there is a path to the yielded traversals from the overall traversal’s initial position.

inline net_handle_t get_start_bound(const net_handle_t &parent) const

Get a handle to the inward-facing traversal of the first node in a chain or the start boundary in a snarl.

This isn’t necessarily where the traversal specified by the given handle actually starts (it may be end to end or tip to tip, for example).

inline net_handle_t get_end_bound(const net_handle_t &parent) const

Get a handle to the outward-facing traversal of the last node in a chain or the end boundary in a snarl.

This isn’t necessarily where the traversal specified by the given handle actually ends (it may be start to start or tip to tip, for example).

inline bool starts_at_start(const net_handle_t &net) const

Return true if the given net handle describes a category of traversal that starts at the local start of the snarl/chain/node.

inline bool starts_at_end(const net_handle_t &net) const

Return true if the given net handle describes a category of traversal that starts at the local end of the snarl/chain/node.

inline bool starts_at_tip(const net_handle_t &net) const

Return true if the given net handle describes a category of traversal that starts at an internal tip in the snarl/chain. Never true for nodes.

inline bool ends_at_start(const net_handle_t &net) const

Return true if the given net handle describes a category of traversal that ends at the local start of the snarl/chain/node.

inline bool ends_at_end(const net_handle_t &net) const

Return true if the given net handle describes a category of traversal that ends at the local end of the snarl/chain/node.

inline bool ends_at_tip(const net_handle_t &net) const

Return true if the given net handle describes a category of traversal that ends at the an internal tip in the snarl/chain. Never true for nodes.

libbdsg Handle Graph Implementations

The bdsg namespace provides useful implementations of the Handle Graph API.

Full Graph Implementations

There are two full graph implementations in the module: bdsg::PackedGraph, bdsg::HashGraph. Previously, a third implementation, ODGI, was provided, but that implementation is now part of its own odgi project.

PackedGraph

class PackedGraph : public bdsg::GraphProxy<BasePackedGraph<>>

HashGraph

class HashGraph : public handlegraph::MutablePathDeletableHandleGraph, public handlegraph::SerializableHandleGraph

HashGraph is a HandleGraph implementation designed for simplicity. Nodes are plain C++ objects stored in a hash map, which contain C++ vectors representing their adjacencies.

HashGraph is a good choice when fast access to or modification of a graph is required, but can use more memory than other graph implementations.

Public Functions

HashGraph(istream &in)

Deserialize from a stream of data.

virtual bool has_node(nid_t node_id) const

Method to check if a node exists by ID.

virtual handle_t get_handle(const nid_t &node_id, bool is_reverse = false) const

Look up the handle for the node with the given ID in the given orientation.

virtual nid_t get_id(const handle_t &handle) const

Get the ID from a handle.

virtual bool get_is_reverse(const handle_t &handle) const

Get the orientation of a handle.

virtual handle_t flip(const handle_t &handle) const

Invert the orientation of a handle (potentially without getting its ID)

virtual size_t get_length(const handle_t &handle) const

Get the length of a node.

virtual string get_sequence(const handle_t &handle) const

Get the sequence of a node, presented in the handle’s local forward orientation.

virtual bool follow_edges_impl(const handle_t &handle, bool go_left, const std::function<bool(const handle_t&)> &iteratee) const

Loop over all the handles to next/previous (right/left) nodes. Passes them to a callback which returns false to stop iterating and true to continue. Returns true if we finished and false if we stopped early.

virtual bool for_each_handle_impl(const std::function<bool(const handle_t&)> &iteratee, bool parallel = false) const

Loop over all the nodes in the graph in their local forward orientations, in their internal stored order. Stop if the iteratee returns false. Can be told to run in parallel, in which case stopping after a false return value is on a best-effort basis and iteration order is not defined.

virtual size_t get_node_count(void) const

Return the number of nodes in the graph TODO: can’t be node_count because XG has a field named node_count.

virtual nid_t min_node_id(void) const

Return the smallest ID in the graph, or some smaller number if the smallest ID is unavailable. Return value is unspecified if the graph is empty.

virtual nid_t max_node_id(void) const

Return the largest ID in the graph, or some larger number if the largest ID is unavailable. Return value is unspecified if the graph is empty.

virtual size_t get_degree(const handle_t &handle, bool go_left) const

Efficiently get the number of edges attached to one side of a handle.

virtual char get_base(const handle_t &handle, size_t index) const

Returns one base of a handle’s sequence, in the orientation of the handle.

virtual string get_subsequence(const handle_t &handle, size_t index, size_t size) const

Returns a substring of a handle’s sequence, in the orientation of the handle. If the indicated substring would extend beyond the end of the handle’s sequence, the return value is truncated to the sequence’s end.

virtual handle_t create_handle(const std::string &sequence)

Create a new node with the given sequence and return the handle. The sequence may not be empty.

virtual handle_t create_handle(const std::string &sequence, const nid_t &id)

Create a new node with the given id and sequence, then return the handle. The sequence may not be empty. The ID must be strictly greater than 0.

virtual void destroy_handle(const handle_t &handle)

Remove the node belonging to the given handle and all of its edges. Destroys any paths in which the node participates. Invalidates the destroyed handle. May be called during serial for_each_handle iteration ONLY on the node being iterated. May NOT be called during parallel for_each_handle iteration. May NOT be called on the node from which edges are being followed during follow_edges. May NOT be called during iteration over paths, if it would destroy a path. May NOT be called during iteration along a path, if it would destroy that path.

virtual void create_edge(const handle_t &left, const handle_t &right)

Create an edge connecting the given handles in the given order and orientations. Ignores existing edges.

virtual void destroy_edge(const handle_t &left, const handle_t &right)

Remove the edge connecting the given handles in the given order and orientations. Ignores nonexistent edges. Does not update any stored paths.

virtual handle_t truncate_handle(const handle_t &handle, bool trunc_left, size_t offset)

Shorten a node by truncating either the left or right side of the node, relative to the orientation of the handle, starting from a given offset along the nodes sequence. Any edges on the truncated end of the node are deleted. Returns a (possibly altered) handle to the truncated node. May invalid stored paths.

virtual void clear(void)

Remove all nodes and edges. Does not update any stored paths.

virtual handle_t apply_orientation(const handle_t &handle)

Alter the node that the given handle corresponds to so the orientation indicated by the handle becomes the node’s local forward orientation. Rewrites all edges pointing to the node and the node’s sequence to reflect this. Invalidates all handles to the node (including the one passed). Returns a new, valid handle to the node in its new forward orientation. Note that it is possible for the node’s ID to change. Does not update any stored paths. May change the ordering of the underlying graph.

virtual vector<handle_t> divide_handle(const handle_t &handle, const std::vector<size_t> &offsets)

Split a handle’s underlying node at the given offsets in the handle’s orientation. Returns all of the handles to the parts. Other handles to the node being split may be invalidated. The split pieces stay in the same local forward orientation as the original node, but the returned handles come in the order and orientation appropriate for the handle passed in. Updates stored paths.

virtual void optimize(bool allow_id_reassignment = true)

Adjust the representation of the graph in memory to improve performance. Optionally, allow the node IDs to be reassigned to further improve performance. Note: Ideally, this method is called one time once there is expected to be few graph modifications in the future.

bool apply_ordering(const vector<handle_t> &order, bool compact_ids = false)

Reorder the graph’s internal structure to match that given. This sets the order that is used for iteration in functions like for_each_handle. If compact_ids is true, may (but will not necessarily) compact the id space of the graph to match the ordering, from 1->|ordering|. In other cases, node IDs will be preserved. This may be a no-op in the case of graph implementations that do not have any mechanism to maintain an ordering. This may invalidate outstanding handles. Returns true if node IDs actually were adjusted to match the given order, and false if they remain unchanged.

virtual size_t get_path_count() const

Returns the number of paths stored in the graph.

virtual bool has_path(const std::string &path_name) const

Determine if a path name exists and is legal to get a path handle for.

virtual path_handle_t get_path_handle(const std::string &path_name) const

Look up the path handle for the given path name. The path with that name must exist.

virtual string get_path_name(const path_handle_t &path_handle) const

Look up the name of a path from a handle to it.

virtual bool get_is_circular(const path_handle_t &path_handle) const

Look up whether a path is circular.

virtual size_t get_step_count(const path_handle_t &path_handle) const

Returns the number of node steps in the path.

virtual handle_t get_handle_of_step(const step_handle_t &step_handle) const

Get a node handle (node ID and orientation) from a handle to a step on a path.

virtual path_handle_t get_path_handle_of_step(const step_handle_t &step_handle) const

Returns a handle to the path that a step is on.

virtual step_handle_t path_begin(const path_handle_t &path_handle) const

Get a handle to the first step, or in a circular path to an arbitrary step considered “first”. If the path is empty, returns the past-the-last step returned by path_end.

virtual step_handle_t path_end(const path_handle_t &path_handle) const

Get a handle to a fictitious position past the end of a path. This position is return by get_next_step for the final step in a path in a non-circular path. Note that get_next_step will NEVER return this value for a circular path.

virtual step_handle_t path_back(const path_handle_t &path_handle) const

Get a handle to the last step, which will be an arbitrary step in a circular path that we consider “last” based on our construction of the path. If the path is empty then the implementation must return the same value as path_front_end().

virtual step_handle_t path_front_end(const path_handle_t &path_handle) const

Get a handle to a fictitious position before the beginning of a path. This position is return by get_previous_step for the first step in a path in a non-circular path. Note: get_previous_step will NEVER return this value for a circular path.

virtual bool has_next_step(const step_handle_t &step_handle) const

Returns true if the step is not the last step in a non-circular path.

virtual bool has_previous_step(const step_handle_t &step_handle) const

Returns true if the step is not the first step in a non-circular path.

virtual step_handle_t get_next_step(const step_handle_t &step_handle) const

Returns a handle to the next step on the path. If the given step is the final step of a non-circular path, returns the past-the-last step that is also returned by path_end. In a circular path, the “last” step will loop around to the “first” (i.e. the one returned by path_begin). Note: to iterate over each step one time, even in a circular path, consider for_each_step_in_path.

virtual step_handle_t get_previous_step(const step_handle_t &step_handle) const

Returns a handle to the previous step on the path. If the given step is the first step of a non-circular path, this method has undefined behavior. In a circular path, it will loop around from the “first” step (i.e. the one returned by path_begin) to the “last” step. Note: to iterate over each step one time, even in a circular path, consider for_each_step_in_path.

virtual bool for_each_path_handle_impl(const std::function<bool(const path_handle_t&)> &iteratee) const

Execute a function on each path in the graph.

bool for_each_step_on_handle_impl(const handle_t &handle, const function<bool(const step_handle_t&)> &iteratee) const

Calls a function with all steps of a node on paths.

virtual void destroy_path(const path_handle_t &path)

Destroy the given path. Invalidates handles to the path and its node steps.

path_handle_t create_path_handle(const string &name, bool is_circular = false)

Create a path with the given name. The caller must ensure that no path with the given name exists already, or the behavior is undefined. Returns a handle to the created empty path. Handles to other paths must remain valid.

virtual step_handle_t append_step(const path_handle_t &path, const handle_t &to_append)

Append a visit to a node to the given path. Returns a handle to the new final step on the path which is appended. If the path is cirular, the new step is placed between the steps considered “last” and “first” by the method path_begin. Handles to prior steps on the path, and to other paths, must remain valid.

virtual step_handle_t prepend_step(const path_handle_t &path, const handle_t &to_prepend)

Prepend a visit to a node to the given path. Returns a handle to the new first step on the path which is appended. If the path is cirular, the new step is placed between the steps considered “last” and “first” by the method path_begin. Handles to later steps on the path, and to other paths, must remain valid.

virtual pair<step_handle_t, step_handle_t> rewrite_segment(const step_handle_t &segment_begin, const step_handle_t &segment_end, const std::vector<handle_t> &new_segment)

Delete a segment of a path and rewrite it as some other sequence of steps. Returns a pair of step_handle_t’s that indicate the range of the new segment in the path. The segment to delete should be designated by the first (begin) and past-last (end) step handles. If the step that is returned by path_begin is deleted, path_begin will now return the first step from the new segment or, in the case that the new segment is empty, the step used as segment_end. Empty ranges consist of two copies of the same step handle. Empty ranges in empty paths consist of two copies of the end sentinel handle for the path. Rewriting an empty range inserts before the provided end handle.

virtual void set_circularity(const path_handle_t &path, bool circular)

Make a path circular or non-circular. If the path is becoming circular, the last step is joined to the first step. If the path is becoming linear, the step considered “last” is unjoined from the step considered “first” according to the method path_begin.

virtual void set_id_increment(const nid_t &min_id)

Set a minimum id to increment the id space by, used as a hint during construction. May have no effect on a backing implementation.

virtual void increment_node_ids(nid_t increment)

Add the given value to all node IDs

virtual void reassign_node_ids(const std::function<nid_t(const nid_t&)> &get_new_id)

Reassign all node IDs as specified by the old->new mapping function.

virtual uint32_t get_magic_number() const

Returns a static high-entropy number to indicate the class.

HashGraph(const HashGraph &other)

Move/copy constructors/assignment operators.

Snarl Decomposition Implementations

There is one implementation for a snarl decomposition

SnarlDistanceIndex

class SnarlDistanceIndex : public handlegraph::SnarlDecomposition, public handlegraph::TriviallySerializable

The distance index, which also acts as a snarl decomposition.

The distance index provides an interface to traverse the snarl tree and to find minimum distances between two sibling nodes in the snarl tree (eg between two chains that are children of the same snarl).

It also provides a method for quickly calculating the minimum distance between two positions on the graph.

The implementation here is tightly coupled with the filling-in code in vg (see vg::fill_in_distance_index()). To make a SnarlDistanceIndex that actually works, you have to construct the object, and then call get_snarl_tree_records() with zero or more TemporaryDistanceIndex objects for connected components, and a graph.

The TemporaryDistanceIndex

needs to have a variety of TemporaryRecord implementation classes (TemporaryChainRecord, TemporarySnarlRecord, TemporaryNodeRecord) set up and added to it; this all has to be done “by

hand”, as it were, because no code is in this library to help you do it.

Public Types

enum connectivity_t

The connectivity of a net_handle- this defines the direction that the net_handle is traversed.

Values:

enumerator START_START
enumerator START_END
enumerator START_TIP
enumerator END_START
enumerator END_END
enumerator END_TIP
enumerator TIP_START
enumerator TIP_END
enumerator TIP_TIP
enum net_handle_record_t

Type of a net_handle_t, which may not be the type of the record This is to allow a node record to be seen as a chain from the perspective of a handle

Values:

enumerator ROOT_HANDLE
enumerator NODE_HANDLE
enumerator SNARL_HANDLE
enumerator CHAIN_HANDLE
enumerator SENTINEL_HANDLE
enum record_t

A record_t is the type of structure that a record can be.

NODE, SNARL, and CHAIN indicate that they don’t store distances. SIMPLE_SNARL is a snarl with all children connecting only to the boundary nodes in one direction. OVERSIZED_SNARL only stores distances to the boundaries. ROOT_SNARL represents a connected component of the root. It has no start or end node so its children technically belong to the root. MULTICOMPONENT_CHAIN can represent a chain with snarls that are not start-end connected. The chain is split up into components between these snarls, each node is tagged with which component it belongs to.

Values:

enumerator ROOT
enumerator NODE
enumerator DISTANCED_NODE
enumerator TRIVIAL_SNARL
enumerator DISTANCED_TRIVIAL_SNARL
enumerator SIMPLE_SNARL
enumerator DISTANCED_SIMPLE_SNARL
enumerator SNARL
enumerator DISTANCED_SNARL
enumerator OVERSIZED_SNARL
enumerator ROOT_SNARL
enumerator DISTANCED_ROOT_SNARL
enumerator CHAIN
enumerator DISTANCED_CHAIN
enumerator MULTICOMPONENT_CHAIN
enumerator CHILDREN

Public Functions

virtual void dissociate()

Break the write-back link between this object and the file it was loaded from, if any. Future modifications to the object will not affect the file, although future modifications to the file may still affect the object.

virtual void serialize(const std::function<void(const void*, size_t)> &iteratee) const

Serialize as blocks of data shown to the given function. The pointer must not be null. The blocks must include the magic number.

virtual void serialize(int fd)

Write the contents of this object to an open file descriptor. Makes sure to include a leading magic number. If the file is a normal file, future modifications to the object will affect the file until dissociate() is called or another normal file is associated.

Assumes that the file entirely belongs to this object.

virtual void deserialize(int fd)

Sets the contents of this object to the contents of a serialized object from an open file descriptor. The serialized object must be from the same implementation of this interface as is calling deserialize(). Can only be called on an empty object If the file is a normal writeable file, future modifications to the object will affect the file until dissociate() is called or another normal file is associated.

Assumes that the file entirely belongs to this object.

virtual void serialize_members(std::ostream &out) const

Underlying implementation for “serialize” method.

virtual void deserialize_members(std::istream &in)

Underlying implementation to “deserialize” method.

virtual uint32_t get_magic_number() const

Returns a number that is specific to the serialized implementation for type checking. Does not depend on the contents of any particular instantiation (i.e. behaves as if static, but cannot be static and virtual).

void preload(bool blocking = false) const

Allow for preloading the index for more accurate timing of algorithms that use it, if it fits in memory. If blocking is true, waits for the index to be paged in. Otherwise, just tells the OS that we will want to use it.

size_t minimum_distance(const handlegraph::nid_t id1, const bool rev1, const size_t offset1, const handlegraph::nid_t id2, const bool rev2, const size_t offset2, bool unoriented_distance = false, const HandleGraph *graph = nullptr, pair<vector<tuple<net_handle_t, int32_t, int32_t>>, vector<tuple<net_handle_t, int32_t, int32_t>>> *distance_traceback = nullptr) const

Get the minimum distance between two positions in the graph. If unoriented_distance is true, then ignore the orientations of the positions. Otherwise, distance is calculated from the first position going forward to the second position going forward. The distance includes one of the positions; the distance from one position to itself is 0. Returns std::numeric_limits<size_t>::max() if there is no path between the two positions.

distance_traceback are is helping to find actual distance path. For each of the nodes, keep a vector of the ancestors of the node and the distance and direction (as + or - values of the distance) taken in the minimum distance path to get to the start and end of the parent.

For example, if the first node is traversed forward and only reaches the end node of its parent chain, then the values stored will be <inf, +distance>. If it were traversed backwards to reach the start node and couldn’t reach the end node, then the values stored would be <-distance, inf>. The distance value is the distance to the end of the node to the and of the chain/snarl, and distances stored are distances within the net_handle associated with it. So the first value in the vector will always be node itself and the offset in the node, and the second value will be the parent chain and the distance from the. ends of the node to the ends of the parent ( or the distance between nodes in the chain). The last value of the two nodes should match (in the absolute value at least). It will be the common ancestor node and the distances will be the distance to the start/end of the other node with the same meaning for the sign of the distance. The hints will go up to the lowest common ancestor needed for finding the shortest distance, which may or may not be the LCA or the root.

size_t maximum_distance(const handlegraph::nid_t id1, const bool rev1, const size_t offset1, const handlegraph::nid_t id2, const bool rev2, const size_t offset2, bool unoriented_distance = false, const HandleGraph *graph = nullptr) const

Find an approximation of the maximum distance between two positions. This isn’t a true maximum- the only guarantee is that it’s greater than or equal to the minimum distance.

size_t max_distance_in_parent(const net_handle_t &parent, const net_handle_t &child1, const net_handle_t &child2, const HandleGraph *graph = nullptr, size_t distance_limit = std::numeric_limits<size_t>::max()) const

Find the maximum distance between two children in the parent. This is the same as distance_in_parent for everything except children of chains

size_t distance_to_parent_bound(const net_handle_t &parent, bool to_start, net_handle_t child, tuple<net_handle_record_t, net_handle_record_t, net_handle_record_t, net_handle_record_t> parent_and_child_types = make_tuple(ROOT_HANDLE, ROOT_HANDLE, ROOT_HANDLE, ROOT_HANDLE)) const

Get the distance from the child to the start or end bound of the parent. parent_and_child_types are hints to figure out the type of snarl/chain records the parent and child are. tuple of parent record type, parent handle type, child record type, child handle type. This is really just used to see if the parent and child are trivial chains, so it might not be exactly what the actual record is.

tuple<nid_t, bool, bool> into_which_snarl(const nid_t &id, const bool &reverse) const

If this node id and orientation is pointing into a snarl, then return the start. node id and orientation pointing into the snarl, and if the snarl is trivial. Returns <0, false, false> if it doesn’t point into a snarl.

bool is_ordered_in_chain(const net_handle_t &child1, const net_handle_t &child2) const

Return true if child1 comes before child2 in the chain.

pair<net_handle_t, bool> lowest_common_ancestor(const net_handle_t &net1, const net_handle_t &net2) const

For two net handles, get a net handle lowest common ancestor. If the lowest common ancestor is the root, then the two handles may be in different connected components. In this case, return false.

size_t node_length(const net_handle_t &net) const

Return the length of the net, which must represent a node (or sentinel of a snarl)

size_t minimum_length(const net_handle_t &net) const

This is also the length of a net, but it can also be a snarl or chain. The length of a chain includes the boundary nodes, a snarl does not. A looping chain only includes the start/end node once

size_t chain_minimum_length(const net_handle_t &net) const

The length of a chain. If it is a multicomponent chain, then the length of the last component, which is used for calculating distance, instead of inf

nid_t node_id(const net_handle_t &net) const

What is the node id of the node represented by this net handle. net must be a node or a sentinel

bool has_node(const nid_t id) const

Does the graph have this node?

bool is_reversed_in_parent(const net_handle_t &net) const

Only really relevant for nodes in chains, is the node traversed backwards relative to the orientation of the chain

net_handle_t get_node_net_handle(const nid_t id, bool rev = false) const

Get a net handle from a node and, optionally, an orientation.

size_t get_max_tree_depth() const

How deep is the snarl tree? The root is 0, top-level chain is 1, etc Only counts chains

size_t get_depth(const net_handle_t &net) const

What is the depth of this net handle? Nodes and snarls get the depth of their parent. The depth of the root is 0, the depth of its child chains is 1, the depth of the nodes and snarls that are children of those chains is also 1, and the chains that are children of those snarls have depth 2

net_handle_t get_handle_from_connected_component(size_t num) const

Given the connected component number (from get_connected_component_number), get the root-level handle pointing to it. If the connected component is a root-level snarl, then this may return a “root” handle, but it will actually point to the snarl

bool has_connectivity(const net_handle_t &net, endpoint_t start, endpoint_t end) const

Is there a path between the start and end endpoints within the net handle?

bool has_external_connectivity(const net_handle_t &net, endpoint_t start, endpoint_t end) const

Is there a path between the start and end endpoints outside the net handle? This is used for children of the root

size_t get_prefix_sum_value(const net_handle_t net) const

Get the prefix sum value for a node in a chain. Fails if the parent of net is not a chain

size_t get_forward_loop_value(const net_handle_t net) const

Get the prefix sum value for a node in a chain. Fails if the parent of net is not a chain

size_t get_reverse_loop_value(const net_handle_t net) const

Get the prefix sum value for a node in a chain. Fails if the parent of net is not a chain

virtual net_handle_t get_root() const

Get a net handle referring to a tip-to-tip traversal of the contents of the root snarl.

virtual bool is_root(const net_handle_t &net) const

Return true if the given handle refers to (a traversal of) the root snarl, and false otherwise.

bool is_root_snarl(const net_handle_t &net) const

Return true if the given handle refers to (a traversal of) a snarl of the root, which is considered to be the root but actually refers to a subset of the children of the root that are connected

virtual bool is_snarl(const net_handle_t &net) const

Returns true if the given net handle refers to (a traversal of) a snarl.

bool is_simple_snarl(const net_handle_t &net) const

Returns true if the given net handle refers to (a traversal of) a simple snarl A simple snarl is a bubble where each child node can only reach the boundary nodes, and each side of a node reaches a different boundary node

virtual bool is_chain(const net_handle_t &net) const

Returns true if the given net handle refers to (a traversal of) a chain.

bool is_multicomponent_chain(const net_handle_t &net) const

Returns true if the given net handle refers to (a traversal of) a chain that is not start-end connected.

bool is_looping_chain(const net_handle_t &net) const

Returns true if the given net handle refers to (a traversal of) a chain that loops (a chain where the first and last node are the same).

bool is_trivial_chain(const net_handle_t &net) const

Returns true if the given net handle refers to (a traversal of) a trivial chain that represents a single node.

virtual bool is_node(const net_handle_t &net) const

Returns true if the given net handle refers to (a traversal of) a single node, and thus has a corresponding handle_t.

virtual bool is_sentinel(const net_handle_t &net) const

Return true if the given net handle is a snarl bound sentinel (in either inward or outward orientation), and false otherwise.

virtual net_handle_t get_net(const handle_t &handle, const handlegraph::HandleGraph *graph) const

Turn a handle to an oriented node into a net handle for a start-to-end or end-to-start traversal of the node, as appropriate.

virtual handle_t get_handle(const net_handle_t &net, const handlegraph::HandleGraph *graph) const

For a net handle to a traversal of a single node, get the handle for that node in the orientation it is traversed. May not be called for other net handles.

virtual net_handle_t get_parent(const net_handle_t &child) const

Get the parent snarl of a chain, or the parent chain of a snarl or node. If the child is start-to-end or end-to-start, and the parent is a chain, the chain comes out facing the same way, accounting for the relative orientation of the child snarl or node in the chain. Otherwise, everything is produced as start-to-end, even if that is not actually a realizable traversal of a snarl or chain. May not be called on the root snarl.

Also works on snarl boundary sentinels.

virtual net_handle_t get_bound(const net_handle_t &snarl, bool get_end, bool face_in) const

Get the bounding handle for the snarl or chain referenced by the given net handle, getting the start or end facing in or out as appropriate.

For snarls, returns the bounding sentinel net handles. For chains, returns net handles for traversals of the bounding nodes of the chain. If the chain is a looping chain, then the start and end of the chain are the same, so the connectivity of the bound indicates which we’re looking at; the connectivity will be start-start if it is going backwards in the node, and end-end if it is going forwards.

Ignores traversal type.

May not be called on traversals of individual nodes.

net_handle_t get_node_from_sentinel(const net_handle_t &sentinel) const

Given the sentinel of a snarl, return a handle to the node representing it.

virtual net_handle_t flip(const net_handle_t &net) const

Return a net handle to the same snarl/chain/node in the opposite orientation. No effect on tip-to-tip, start-to-start, or end-to-end net handles. Flips all the others.

virtual net_handle_t canonical(const net_handle_t &net) const

Get a canonical traversal handle from any net handle. All handles to the same net graph element have the same canonical traversal. That canonical traversal must be realizable, and might not always be start-to-end or even consistently be the same kind of traversal for different snarls, chains, or nodes. Mostly useful to normalize for equality comparisons.

Any root snarl will become just the root Anything without connectivity will get START_END

net_handle_t start_end_traversal_of(const net_handle_t &net) const

Makes a start-end traversal of the net. Faster than canonical because it doesn’t check the index for anything

virtual endpoint_t starts_at(const net_handle_t &traversal) const

Return the kind of location at which the given traversal starts.

virtual endpoint_t ends_at(const net_handle_t &traversal) const

Return the kind of location at which the given traversal ends.

size_t get_rank_in_parent(const net_handle_t &net) const

For a child of a snarl, the rank is used to calculate the distance.

size_t connected_component_count() const

How many connected components are in this graph? This returns the number of topological connected components, not necessarily the number of nodes in the top-level snarl

virtual net_handle_t get_parent_traversal(const net_handle_t &traversal_start, const net_handle_t &traversal_end) const

Get a net handle for traversals of a snarl or chain that contains the given oriented bounding node traversals or sentinels. Given two sentinels for a snarl, produces a net handle to a start-to-end, end-to-end, end-to-start, or start-to-start traversal of that snarl. Given handles to traversals of the bounding nodes of a chain, similarly produces a net handle to a traversal of the chain.

For a chain, either or both handles can also be a snarl containing tips, for a tip-to-start, tip-to-end, start-to-tip, end-to-tip, or tip-to-tip traversal. Similarly, for a snarl, either or both handles can be a chain in the snarl that contains internal tips, or that has no edges on the appropriate end.

May only be called if a path actually exists between the given start and end.

Public Static Functions

static inline const net_handle_record_t get_record_handle_type(record_t type)

Given the type of the record, return the handle type. Some record types can represent multiple things, for example a simple snarl record is used to represent a snarl, and the nodes/trivial chains in it. This will return whatever is higher on the snarl tree. A simple snarl will be considered a snarl, a root snarl will be considered a root, etc

static inline const size_t get_record_offset(const handlegraph::net_handle_t &net_handle)

The offset into records that this handle points to.

static inline const size_t get_node_record_offset(const handlegraph::net_handle_t &net_handle)

The offset of a node in a trivial snarl (0 if it isn’t a node in a trivial snarl)

static inline const size_t get_node_pointer_offset(const handlegraph::nid_t &id, const handlegraph::nid_t &min_node_id, size_t component_count)

Get the offset into snarl_tree_records for the pointer to a node record.

static inline size_t sum(const size_t &val1, const size_t &val2)

Add integers, returning max() if any of them are max()

class TemporaryDistanceIndex
struct TemporaryChainRecord : public bdsg::SnarlDistanceIndex::TemporaryDistanceIndex::TemporaryRecord
struct TemporaryNodeRecord : public bdsg::SnarlDistanceIndex::TemporaryDistanceIndex::TemporaryRecord
struct TemporaryRecord

Subclassed by bdsg::SnarlDistanceIndex::TemporaryDistanceIndex::TemporaryChainRecord, bdsg::SnarlDistanceIndex::TemporaryDistanceIndex::TemporaryNodeRecord, bdsg::SnarlDistanceIndex::TemporaryDistanceIndex::TemporarySnarlRecord

struct TemporarySnarlRecord : public bdsg::SnarlDistanceIndex::TemporaryDistanceIndex::TemporaryRecord

Graph Overlays

In addition to these basic implementations, there are several “overlays”. These overlays are graphs that wrap other graphs, providing features not avialable in the backing graph, or otherwise transforming it.

class PositionOverlay : public handlegraph::PathPositionHandleGraph, public handlegraph::ExpandingOverlayGraph

Subclassed by bdsg::MutablePositionOverlay

class PackedPositionOverlay : public handlegraph::PathPositionHandleGraph, public handlegraph::ExpandingOverlayGraph

Subclassed by bdsg::PackedReferencePathOverlay

class MutablePositionOverlay : public bdsg::PositionOverlay, public handlegraph::MutablePathDeletableHandleGraph
class VectorizableOverlay : public virtual handlegraph::VectorizableHandleGraph, public virtual handlegraph::ExpandingOverlayGraph

Subclassed by bdsg::PathVectorizableOverlay

class PathVectorizableOverlay : public bdsg::VectorizableOverlay, public virtual handlegraph::PathHandleGraph

Subclassed by bdsg::PathPositionVectorizableOverlay

class PathPositionVectorizableOverlay : public bdsg::PathVectorizableOverlay, public virtual handlegraph::PathPositionHandleGraph

Many of these are based on the handlegraph::ExpandingOverlayGraph interface, which guarantees that the overlay does not remove any graph material, and allows handles form the backing graph and the overlay graph to be interconverted.

class ExpandingOverlayGraph : public virtual handlegraph::HandleGraph

This is the interface for a graph that represents a transformation of some underlying HandleGraph where every node in the overlay corresponds to a node in the underlying graph, but where more than one node in the overlay can map to the same underlying node.

Subclassed by bdsg::PackedPositionOverlay, bdsg::PackedSubgraphOverlay, bdsg::PositionOverlay, bdsg::StrandSplitOverlay, bdsg::VectorizableOverlay, handlegraph::algorithms::SubHandleGraph

Public Functions

virtual ~ExpandingOverlayGraph() = default
virtual handle_t get_underlying_handle(const handle_t &handle) const = 0

Returns the handle in the underlying graph that corresponds to a handle in the overlay

Graph Overlay Helpers

From C++, some types are available to allow code to take an input of a more general type (say, a bdsg::HandleGraph) and get a view of it as a more specific type (such as a bdsg::VectorizableHandleGraph), using an overlay to bridge the gap if the backing graph implementation does not itself support the requested feature. For each pf these “overlay helpers”, you instantiate the object (which allocates storage), use the apply() method to pass it a pointer to the backing graph and get a pointer to a graph of the requested type, and then use the get() method later if you need to get the requested-type graph pointer again.

typedef OverlayHelper<PathPositionHandleGraph, PackedPositionOverlay, PathHandleGraph> bdsg::PathPositionOverlayHelper

Helper to ensure that a PathHandleGraph has the PathPositionHandleGraph interface.

typedef OverlayHelper<RankedHandleGraph, VectorizableOverlay, HandleGraph> bdsg::RankedOverlayHelper

Helper to ensure that a HandleGraph has the RankedHandleGraph interface.

typedef OverlayHelper<RankedHandleGraph, PathVectorizableOverlay, PathHandleGraph> bdsg::PathRankedOverlayHelper

Helper to ensure that a PathHandleGraph has the RankedeHandleGraph interface.

typedef OverlayHelper<VectorizableHandleGraph, VectorizableOverlay, HandleGraph> bdsg::VectorizableOverlayHelper

Helper to ensure that a HandleGraph has the VectorizableHandleGraph interface.

typedef OverlayHelper<VectorizableHandleGraph, PathVectorizableOverlay, PathHandleGraph> bdsg::PathVectorizableOverlayHelper

Helper to ensure that a PathHandleGraph has the VectorizableHandleGraph interface.

typedef PairOverlayHelper<PathPositionHandleGraph, PackedPositionOverlay, PathHandleGraph, VectorizableHandleGraph, PathPositionVectorizableOverlay, PathPositionHandleGraph> bdsg::PathPositionVectorizableOverlayHelper

Helper to ensure that a PathHandleGraph has the VectorizableHandleGraph and PathPositionHandleGraph interfaces.

All these overlay helpers are really instantiations of a couple of templates:

template<typename T, typename U, typename V>
class OverlayHelper

Implementation of overlay helper core functionality. T = desired class U = overlay class V = input class

Public Functions

inline T *apply(V *input_graph)
inline const T *apply(const V *input_graph)
inline const T *get() const
template<typename T1, typename U1, typename V1, typename T2, typename U2, typename V2>
class PairOverlayHelper

Implementation of overlay helper functionality for when multiple overlays need to be stacked.

Public Functions

inline T2 *apply(V1 *input_graph)
inline const T2 *apply(const V1 *input_graph)
inline const T2 *get() const