ntjoin.gv 2.2 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
  1. /*
  2. https://github.com/bcgsc/ntJoin
  3. Scaffolding draft assemblies using reference assemblies and minimizer graphs
  4. Description of the algorithm
  5. ntJoin takes a target assembly and one or more 'reference' assembly as input, and uses information from the reference(s) to scaffold the target assembly. The 'reference' assemblies can be true reference assembly builds, or a different draft genome assemblies.
  6. Instead of using costly alignments, ntJoin uses a more lightweight approach using minimizer graphs to yield a mapping between the input assemblies.
  7. Main steps in the algorithm:
  8. Generate an ordered minimizer sketch for each contig of each input assembly
  9. Filter the minimizers to only retain minimizers that are:
  10. Unique within each assembly
  11. Found in all assemblies (target + all references)
  12. Build a minimizer graph
  13. Nodes: minimizers
  14. Edges: between minimizers that are adjacent in at least one of the assemblies. Edge weights are the sum of weights of the assemblies that support an edge.
  15. Filter the graph based on the minimum edge weight (n)
  16. For each node that is a branch node (degree > 2), filter the incident edges with an increasing edge threshold
  17. Each linear path is converted to a list of oriented target assembly contig regions to scaffold together
  18. Target assembly scaffolds are printed out
  19. */
  20. graph G {
  21. "3714041376220621505" [label="3714041376220621505
  22. ('test', 800)
  23. ('1_f', 800)"]
  24. "10820188111283998344" [label="10820188111283998344
  25. ('test', 1630)
  26. ('1_f', 1630)"]
  27. "4671501941577321508" [label="4671501941577321508
  28. ('test', 3156)
  29. ('2_f', 1155)"]
  30. "16743415676028282381" [label="16743415676028282381
  31. ('test', 3439)
  32. ('2_f', 1438)"]
  33. "17184023496183651984" [label="17184023496183651984
  34. ('test', 2656)
  35. ('2_f', 655)"]
  36. "8162189927378643732" [label="8162189927378643732
  37. ('test', 1575)
  38. ('1_f', 1575)"]
  39. "3714041376220621505" -- "8162189927378643732" [weight=3.0 color=lightgrey]
  40. "10820188111283998344" -- "8162189927378643732" [weight=3.0 color=lightgrey]
  41. "10820188111283998344" -- "17184023496183651984" [weight=2.0 color=red]
  42. "4671501941577321508" -- "17184023496183651984" [weight=3.0 color=lightgrey]
  43. "4671501941577321508" -- "16743415676028282381" [weight=3.0 color=lightgrey]
  44. }