28 May 2014

Uniprot → SVG

My colleague @Solena recently asked me to write a tool that would help her to prepare a large number of figures for an article. The tool I wrote fetches an entry for a given Uniprot accession and creates a SVG diagram , editable in Inkscape. It is available at


That's it,

Pierre

22 May 2014

Breaking the " same origin security policy" with CORS. An example with @GenomeBrowser / DAS.

Jerven Bolleman recently taught me about the CORS/Cross-origin resource sharing:



"Cross-origin resource sharing (CORS) is a mechanism that allows many resources (e.g. fonts, JavaScript, etc.) on a web page to be requested from another domain outside of the domain the resource originated from. In particular, JavaScript's AJAX calls can use the XMLHttpRequest mechanism. Such "cross-domain" requests would otherwise be forbidden by web browsers, per the same origin security policy."

I've created a page testing if some bioinformatics web-service support CORS. This page is available at : http://lindenb.github.io/pages/cors/index.html

Interestingly NCBI, Uniprot and UCSC support CORS. As an example, the following <form> fetches a DNA sequence using the DAS server of the UCSC and display it:



The script:



That's it
Pierre

20 May 2014

A nodejs-based REST server for the UCSC @GenomeBrowser



Node.js provides a simple mechanism to write a REST server. As an exercise, I wrote a REST server for the mysql server of the UCSC genome bowser. The code is available on github at:


Starting the server


$ cd bionode
$ node ucsc/ucsc.js
Server running at http://localhost:8080/

METHOD: /schema/databases



Lists the available databases :e.g: http://localhost:8080/schemas/databases


[
"information_schema",
"ailMel1",
"allMis1",
"anoCar1",

(...)
"visiGene",
"xenTro1",
"xenTro2",
"xenTro3"
]


This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/schemas/databases?callback=handle


handle([
"information_schema",
"ailMel1",
"allMis1",
"anoCar1",
(...)
"visiGene",
"xenTro1",
"xenTro2",
"xenTro3"
]);


METHOD: /schema/:database/tables


Lists the available tables for a given database :e.g: http://localhost:8080/schemas/anoCar1/tables


[
"all_mrna",
"author",
"blastHg18KG",
"cds",
(...)
"xenoRefFlat",
"xenoRefGene",
"xenoRefSeqAli"
]

This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/schemas/anoCar1/tables?callback=handle


handle([
"all_mrna",
"author",
"blastHg18KG",
"cds",
"cell",
(...)
"xenoRefFlat",
"xenoRefGene",
"xenoRefSeqAli"
]);

METHOD: /schema/:database/:table


Returns a schema for the given database.table. E.g: http://localhost:8080/schemas/anoCar1/xenoMrna



{"database":"anoCar1","table":"xenoMrna","fields":[{"name":"bin","type":"smallint(5) unsigned","key":""},{"name":"matches","type":"int(10) unsigned","key":""},{"name":"misMatches","type":"int(10) unsigned","key":""},{"name":"repMatches","type":"int(10) unsigned","key":""},{"name":"nCount","type":"int(10) unsigned","key":""},{"name":"qNumInsert","type":"int(10) unsigned","key":""},{"name":"qBaseInsert","type":"int(10) unsigned","key":""},{"name":"tNumInsert","type":"int(10) unsigned","key":""},{"name":"tBaseInsert","type":"int(10) unsigned","key":""},{"name":"strand","type":"char(2)","key":""},{"name":"qName","type":"varchar(255)","key":"MUL"},{"name":"qSize","type":"int(10) unsigned","key":""},{"name":"qStart","type":"int(10) unsigned","key":""},{"name":"qEnd","type":"int(10) unsigned","key":""},{"name":"tName","type":"varchar(255)","key":"MUL"},{"name":"tSize","type":"int(10) unsigned","key":""},{"name":"tStart","type":"int(10) unsigned","key":""},{"name":"tEnd","type":"int(10) unsigned","key":""},{"name":"blockCount","type":"int(10) unsigned","key":""},{"name":"blockSizes","type":"longblob","key":""},{"name":"qStarts","type":"longblob","key":""},{"name":"tStarts","type":"longblob","key":""}]}


This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/schemas/anoCar1/xenoMrna?callback=handler


handler({"database":"anoCar1","table":"xenoMrna","fields":[{"name":"bin","type":"smallint(5) unsigned","key":""},{"name":"matches","type":"int(10) unsigned","key":""},{"name":"misMatches","type":"int(10) unsigned","key":""},{"name":"repMatches","type":"int(10) unsigned","key":""},{"name":"nCount","type":"int(10) unsigned","key":""},{"name":"qNumInsert","type":"int(10) unsigned","key":""},{"name":"qBaseInsert","type":"int(10) unsigned","key":""},{"name":"tNumInsert","type":"int(10) unsigned","key":""},{"name":"tBaseInsert","type":"int(10) unsigned","key":""},{"name":"strand","type":"char(2)","key":""},{"name":"qName","type":"varchar(255)","key":"MUL"},{"name":"qSize","type":"int(10) unsigned","key":""},{"name":"qStart","type":"int(10) unsigned","key":""},{"name":"qEnd","type":"int(10) unsigned","key":""},{"name":"tName","type":"varchar(255)","key":"MUL"},{"name":"tSize","type":"int(10) unsigned","key":""},{"name":"tStart","type":"int(10) unsigned","key":""},{"name":"tEnd","type":"int(10) unsigned","key":""},{"name":"blockCount","type":"int(10) unsigned","key":""},{"name":"blockSizes","type":"longblob","key":""},{"name":"qStarts","type":"longblob","key":""},{"name":"tStarts","type":"longblob","key":""}]});


METHOD: /ucsc/:database/:table/:column/:key


Fetch the rows for a given database.name having a :column==:key . The :column must be indexed. E.g: http://localhost:8080/ucsc/anoCar1/ensGene/name/ENSACAT00000004346


[
{"bin":592,"name":"ENSACAT00000004346","chrom":"scaffold_111","strand":"-","txStart":991522,"txEnd":996396,"cdsStart":991522,"cdsEnd":996396,"exonCount":3,"exonStarts":"991522,995669,995976,","exonEnds":"991954,995972,996396,","score":0,"name2":"PELO","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,0,0,"}
]

This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/ucsc/anoCar1/ensGene/name/ENSACAT00000004346?callback=handler


handler([
{"bin":592,"name":"ENSACAT00000004346","chrom":"scaffold_111","strand":"-","txStart":991522,"txEnd":996396,"cdsStart":991522,"cdsEnd":996396,"exonCount":3,"exonStarts":"991522,995669,995976,","exonEnds":"991954,995972,996396,","score":0,"name2":"PELO","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,0,0,"}
]);

METHOD: /ucsc/:database/:table?chrom=?&start=?&end=?




Fetch the rows for a given genomic database.name overlapping the given range. This method uses the UCSC-bin index if it is available; E.g: http://localhost:8080/ucsc/anoCar1/ensGene?chrom=scaffold_111&start=600000&end=900000


[
{"bin":589,"name":"ENSACAT00000003906","chrom":"scaffold_111","strand":"-","txStart":594783,"txEnd":614216,"cdsStart":595000,"cdsEnd":614201,"exonCount":9,"exonStarts":"594783,601291,601744,603640,604745,604865,609139,611740,614097,","exonEnds":"595105,601406,601813,603736,604771,604942,609173,611840,614216,","score":0,"name2":"DPM1","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,2,2,2,0,1,0,2,0,"},
{"bin":589,"name":"ENSACAT00000003908","chrom":"scaffold_111","strand":"+","txStart":614382,"txEnd":615600,"cdsStart":614382,"cdsEnd":615600,"exonCount":1,"exonStarts":"614382,","exonEnds":"615600,","score":0,"name2":"MOCS3","cdsStartStat":"incmpl","cdsEndStat":"cmpl","exonFrames":"0,"},
{"bin":589,"name":"ENSACAT00000003918","chrom":"scaffold_111","strand":"-","txStart":638920,"txEnd":642127,"cdsStart":638920,"cdsEnd":642127,"exonCount":2,"exonStarts":"638920,641368,","exonEnds":"639691,642127,","score":0,"name2":"KCNG1","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,0,"},
{"bin":591,"name":"ENSACAT00000003920","chrom":"scaffold_111","strand":"+","txStart":814576,"txEnd":826972,"cdsStart":814576,"cdsEnd":826972,"exonCount":3,"exonStarts":"814576,825125,826845,","exonEnds":"814594,825247,826972,","score":0,"name2":"ENSACAG00000003945","cdsStartStat":"incmpl","cdsEndStat":"cmpl","exonFrames":"0,0,2,"},
{"bin":591,"name":"ENSACAT00000004042","chrom":"scaffold_111","strand":"-","txStart":849731,"txEnd":881887,"cdsStart":849731,"cdsEnd":881887,"exonCount":24,"exonStarts":"849731,851343,855421,856165,857842,858090,861054,861943,862949,863773,865029,865639,867414,868216,872220,873601,874396,876850,877105,877711,878919,879681,881320,881738,","exonEnds":"849809,851460,855511,856279,857947,858201,861157,862027,863026,863866,865171,865722,867525,868368,872360,873738,874600,876994,877263,877850,878993,879847,881471,881887,","score":0,"name2":"ITGA2","cdsStartStat":"incmpl","cdsEndStat":"incmpl","exonFrames":"0,0,0,0,0,0,2,2,0,0,2,0,0,1,2,0,0,0,1,0,1,0,2,0,"},
{"bin":591,"name":"ENSACAT00000004050","chrom":"scaffold_111","strand":"-","txStart":883724,"txEnd":897808,"cdsStart":883724,"cdsEnd":897808,"exonCount":5,"exonStarts":"883724,885433,889264,889742,897701,","exonEnds":"883858,885548,889356,889852,897808,","score":0,"name2":"ENSACAG00000004086","cdsStartStat":"incmpl","cdsEndStat":"incmpl","exonFrames":"1,0,1,2,0,"}
]

This method accepts a parameter callback for JSON-P : e.g: http://localhost:8080/ucsc/anoCar1/ensGene?chrom=scaffold_111&start=600000&end=900000&callback=handler


handler([
{"bin":589,"name":"ENSACAT00000003906","chrom":"scaffold_111","strand":"-","txStart":594783,"txEnd":614216,"cdsStart":595000,"cdsEnd":614201,"exonCount":9,"exonStarts":"594783,601291,601744,603640,604745,604865,609139,611740,614097,","exonEnds":"595105,601406,601813,603736,604771,604942,609173,611840,614216,","score":0,"name2":"DPM1","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,2,2,2,0,1,0,2,0,"},
{"bin":589,"name":"ENSACAT00000003908","chrom":"scaffold_111","strand":"+","txStart":614382,"txEnd":615600,"cdsStart":614382,"cdsEnd":615600,"exonCount":1,"exonStarts":"614382,","exonEnds":"615600,","score":0,"name2":"MOCS3","cdsStartStat":"incmpl","cdsEndStat":"cmpl","exonFrames":"0,"},
{"bin":589,"name":"ENSACAT00000003918","chrom":"scaffold_111","strand":"-","txStart":638920,"txEnd":642127,"cdsStart":638920,"cdsEnd":642127,"exonCount":2,"exonStarts":"638920,641368,","exonEnds":"639691,642127,","score":0,"name2":"KCNG1","cdsStartStat":"cmpl","cdsEndStat":"cmpl","exonFrames":"0,0,"},
{"bin":591,"name":"ENSACAT00000003920","chrom":"scaffold_111","strand":"+","txStart":814576,"txEnd":826972,"cdsStart":814576,"cdsEnd":826972,"exonCount":3,"exonStarts":"814576,825125,826845,","exonEnds":"814594,825247,826972,","score":0,"name2":"ENSACAG00000003945","cdsStartStat":"incmpl","cdsEndStat":"cmpl","exonFrames":"0,0,2,"},
{"bin":591,"name":"ENSACAT00000004042","chrom":"scaffold_111","strand":"-","txStart":849731,"txEnd":881887,"cdsStart":849731,"cdsEnd":881887,"exonCount":24,"exonStarts":"849731,851343,855421,856165,857842,858090,861054,861943,862949,863773,865029,865639,867414,868216,872220,873601,874396,876850,877105,877711,878919,879681,881320,881738,","exonEnds":"849809,851460,855511,856279,857947,858201,861157,862027,863026,863866,865171,865722,867525,868368,872360,873738,874600,876994,877263,877850,878993,879847,881471,881887,","score":0,"name2":"ITGA2","cdsStartStat":"incmpl","cdsEndStat":"incmpl","exonFrames":"0,0,0,0,0,0,2,2,0,0,2,0,0,1,2,0,0,0,1,0,1,0,2,0,"},
{"bin":591,"name":"ENSACAT00000004050","chrom":"scaffold_111","strand":"-","txStart":883724,"txEnd":897808,"cdsStart":883724,"cdsEnd":897808,"exonCount":5,"exonStarts":"883724,885433,889264,889742,897701,","exonEnds":"883858,885548,889356,889852,897808,","score":0,"name2":"ENSACAG00000004086","cdsStartStat":"incmpl","cdsEndStat":"incmpl","exonFrames":"1,0,1,2,0,"}
]);


That's it,

Pierre

15 May 2014

How I start a bioinformatics project

Phil Ashton tweeted a link to a paper about how to set up a bioinformatics project file hierarchy: " A Quick Guide to Organizing Computational Biology Projects ".

Nick Loman posted his version yesterday : "How I start a bioinformatics project" on http://nickloman.github.io/2014/05/14/how-i-start-a-bioinformatics-project/.

Here is mine (simplified):

  • I start by creating a directory managed by git
  • I create a JSON-based description of my data, including the path to the softwares, to the references
  • I create a git submodule for a project hosting an Apache-velocity template transforming a Makefile from config.json :
  • The Makefile is generated using jsvelocity :It produces the following Makefile:
  • The Makefile is invoked with option -j N(Allow N jobs at once) using GNU-Make or QMake(distributed parallel make, scheduled by Sun Grid Engine)

That's it,

Pierre

12 May 2014

Generating wikipedia semantic links from a pubmed-id

In "Building a biomedical semantic network in Wikipedia with Semantic Wiki Links" (Database . 2012 Mar 20;2012) Benjamin Good & al. introduced the Semantic Wiki Link (SWL):

An SWL is a hyperlink on Wikipedia that allows the editor to explicitly specify the type of relationship between the concept described on the page being edited and the concept that is being linked to (http://en.wikipedia.org/wiki/Template:SWL). These SWLs are implemented using MediaWiki templates.
(...)
any programmer can now write computer programs to parse Wikipedia content for SWLs and import them into third-party tools (e.g. triplestores, etc.)
Example: Phospholamban:
The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase ({{SWL|type=substrate_for|target=protein kinase A|label=PKA}}) in cardiac muscle.




Using Entrez-Ajax (Loman & al.) and the Wikipedia API, I wrote a HTML+JS interface to accelerate the creation of a semantic SWL wiki-text from a PUBMED-id:


and.. well, that's it,

Pierre