Bionode intro

Bionode

Bionode.io - Modular and universal bioinformatics

Pipeable UNIX command line tools and JavaScript / Node.js APIs for bioinformatic analysis workflows on the server and browser. #bionode
gitter.im/bionode/bionode

Problem: Too much data

Reproducibility crisis

Reproducibility layers

Code

Data

Workflow

Environment

Bionode also collaborates with BioJS

Bionode - list of modules

Name	Type	People
ncbi	Data access
fasta	Parser
seq	Wrangling	IM
ensembl	Data access
blast-parser	Parser

Bionode - list of modules

Name	Type	Status	People
template	Documentation
JS pipeline	Documentation
Gasket pipeline	Documentation
Dat/Bionode workshop	Documentation

Bionode - list of modules

Name	Type	Status	People
sra	Wrappers
bwa	Wrappers
sam	Wrappers
bbi	Parser

Bionode - list of modules

Name	Type	People
ebi	Data access
semantic	Data access
vcf	Parser
gff	Parser
bowtie	Wrappers
sge	Wrappers	badryan
blast	Wrappers

Bionode - list of modules

Name	Type	People
vsearch	Wrappers
khmer	Wrappers
rsem	Wrappers
gmap	Wrappers
star	Wrappers
go	Wrappers	badryan

Dat workshop

maxogden.github.io/get-dat

Bionode

npm install -g bionode
bionode ncbi download gff bacteria bionode ncbi download sra arthropoda | bionode sra fastq-dump
npm install -g bionode-ncbi
bionode-ncbi search assembly formicidae | dat import --json

Some problems I faced during my research:

For web projects, needed to implement the same functionality on browser and server
Difficulty getting relevant descriptions and datasets from NCBI API using bio* libs
Difficulty writing scalable, reproducible and complex bioinformatic pipelines

Need to reimplement the same code on browser and server.

Solution: JavaScript everywhere

Difficulty getting relevant description and datasets from NCBI API using bio* libs

Python example: URL for the Achromyrmex assembly?

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG

import xml.etree.ElementTree as ET
from Bio import Entrez
Entrez.email = "mail@bmpvieira.com"
esearch_handle = Entrez.esearch(db="assembly", term="Achromyrmex")
esearch_record = Entrez.read(esearch_handle)
for id in esearch_record['IdList']:
  esummary_handle = Entrez.esummary(db="assembly", id=id)
  esummary_record = Entrez.read(esummary_handle)
  documentSummarySet = esummary_record['DocumentSummarySet']
  document = documentSummarySet['DocumentSummary'][0]
  metadata_XML = document['Meta'].encode('utf-8')
  metadata = ET.fromstring('' + metadata_XML + '')
  for entry in Metadata[1]:
    print entry.text

Solution: bionode-ncbi

Difficulty getting relevant description and datasets from NCBI API using bio* libs

Example: URL for the Achromyrmex assembly?

http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000204515.1_Aech_3.9/GCA_000204515.1_Aech_3.9_genomic.fna.gz

JavaScript

var bio = require('bionode')
bio.ncbi.urls('assembly', 'Acromyrmex', function(urls) {
  console.log(urls[0].genomic.fna)
})

bio.ncbi.urls('assembly', 'Acromyrmex').on('data', printGenomeURL)
function printGenomeURL(urls) {
  console.log(urls[0].genomic.fna)
})

Difficulty getting relevant description and datasets from NCBI API using bio* libs

Example: URL for the Achromyrmex assembly?

http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000204515.1_Aech_3.9/GCA_000204515.1_Aech_3.9_genomic.fna.gz

JavaScript

var ncbi = require('bionode-ncbi')
var ndjson = require('ndjson')
ncbi.urls('assembly', 'Acromyrmex')
.pipe(ndjson.stringify())
.pipe(process.stdout)

BASH

bionode-ncbi urls assembly Acromyrmex |
tool-stream extractProperty genomic.fna