AllBio / EU CodeFest

bmpvieira.com/allbio14

bmpvieira

Bruno Vieira | @bmpvieira

Phd Student @ QMUL

Bioinformatics and Population Genomics

Supervisor:
Yannick Wurm | @yannick__

Before:
FCUL CoBiG2 eseb2013 geeklist bbsrc

© 2014 Bruno Vieira CC-BY 4.0

Some problems I faced during my research:

  • Difficulty getting relevant descriptions and datasets from NCBI API using bio* libs

  • For web projects, needed to implement the same functionality on browser and server

  • Difficulty writing scalable, reproducible and complex bioinformatic pipelines

Bionode.io - Modular and universal bioinformatics bionode

Pipeable UNIX command line tools and JavaScript / Node.js APIs for bioinformatic analysis workflows on the server and browser.
Collaborates with BioJS - Represent biological data on the web

dat Dat - Build data pipelines

Provides a streaming interface between every file format and data storage backend. "git for data"

dat-data.com | @maxogden | @mafintosh

bionode.io (online shell)

Examples

BASH

 bionode-ncbi urls assembly Solenopsis invicta | grep genomic.fna

 http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG/
 GCA_000188075.1_Si_gnG_genomic.fna.gz
 bionode-ncbi download sra arthropoda | bionode-sra
 bionode-ncbi download gff bacteria

JavaScript

var ncbi = require('bionode-ncbi')
 ncbi.urls('assembly', 'Solenopsis invicta'), gotData)
 function gotData(urls) {
   var genome = urls[0].genomic.fna
   download(genome)
 })
ncbi.search('biosample', 'human').pipe(filter).pipe(analyse)

Difficulty getting relevant description and datasets from NCBI API using bio* libs

Python example

import xml.etree.ElementTree as ET
 from Bio import Entrez
 Entrez.email = "mail@bmpvieira.com"
 esearch_handle = Entrez.esearch(db="assembly", term="Achromyrmex")
 esearch_record = Entrez.read(esearch_handle)
 for id in esearch_record['IdList']:
   esummary_handle = Entrez.esummary(db="assembly", id=id)
   esummary_record = Entrez.read(esummary_handle)
   documentSummarySet = esummary_record['DocumentSummarySet']
   document = documentSummarySet['DocumentSummary'][0]
   metadata_XML = document['Meta'].encode('utf-8')
   metadata = ET.fromstring('<root>' + metadata_XML + '</root>')
   for entry in Metadata[1]:
     print entry.text

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG

Solution: bionode-ncbi

Need to reimplement the same code on browser and server.

Solution: JavaScript everywhere


Biodalliance is converting parsers to Bionode

Difficulty writing scalable, reproducible and complex bioinformatic pipelines.

Solution: Node.js Streams everywhere

 var ncbi = require('bionode-ncbi')
 var tool = require('tool-stream')
 var through = require('through2')
 var fork1 = through.obj()
 var fork2 = through.obj()

 ncbi
   .search('sra', 'Solenopsis invicta')
   .pipe(fork1)
   .pipe(dat.reads)

 fork1
   .pipe(tool.extractProperty('expxml.Biosample.id'))
   .pipe(ncbi.search('biosample'))
   .pipe(dat.samples)

 fork1
   .pipe(tool.extractProperty('uid'))
   .pipe(ncbi.link('sra', 'pubmed'))
   .pipe(ncbi.search('pubmed'))
   .pipe(dat.papers)

Benefit from other JS projects

Dat dat
BioJS biojs
NoFlo noflo

Reusable, small and tested modules

badges

Some users and Contributors:

Soon:

Thanks!

Acknowledgements:

@yannick__
@maxogden
@mafintosh
@alanmrice
@dasmoth
@biodevops

Why Node.js / JavaScript

Module counts

modules

Package Manager that works

NPM


npm install bionode
npm install bionode -g
npm test
npm start
npm run test-browser
npm run build-docs
npm init
npm publish

Not only for JavaScript, C/C++ too: