New bioinformatics software: nesoni, and updated treemaker key-value store

homeblogmastodonblueskythingiverse



I just put some code I've been working on at work up on the web. Nesoni is Python / Cython software for analysing high-throughput sequencing data. If you have huge piles of high-throughput sequencing data, especially of bacterial genomes, you might find it useful.

It's still fairly alpha. There's not a lot of documentation, but it's getting everyday use.


Perhaps of more interest to those of you who don't have to deal with huge steaming piles of high-throughput sequencing data, nesoni includes an updated version of the "treemaker" key value store I described in a previous blog post. Treemaker is useful for huge steaming piles of data of any kind! Treemaker excels at fast random writes. A simple example application would be an always live, always up-to-date inverted index.

This new version can perform merges in the background, using the multiprocessing module. It can happily keep several CPUs busy when under heavy load.

Treemaker is my secret sauce. For example, I've recently been using it to map paired end sequencing data onto a de Bruijn graph layout, using an inverted index from k-mers to read-pairs. This isn't yet part of nesoni, but should hopefully be part of a future version. The Velvet Assembler could probably be rewritten using treemaker, to remove its current excessive memory demands. I am, kind of by accident, about a quarter of the way to having done this.




[æ]