21 May 2009

XML Pipelines/ XProc for bioinformatics: my notebook

In this post I describe how I used XProc, the XML "pipeline language" to create a workflow of XML data calling the NCBI for some SNP and building a HTML table describing those markers.
W3C:XProc: (the) XML Pipeline Language, (is) a language for describing operations to be performed on XML documents.

An XML Pipeline specifies a sequence of operations to be performed on zero or more XML documents. Pipelines generally accept zero or more XML documents as input and produce zero or more XML documents as output. Pipelines are made up of simple steps which perform atomic operations on XML documents and constructs similar to conditionals, iteration, and exception handlers which control which steps are executed.

The implementation I've choosen is Norman Walsh's XMLCalabash. It seemed to be the de-facto standard implementation. However I found it a little bit slow and I didn't like the fact that it sent 'log' messages to http://xproc.org/. XMLCalabash requires, here, the SAXON and the apache-httpclient libraries.
The XProc language itself was not easy to learn: it is missing some good examples for each feature.

A first workflow


Say , the file "rslist.xml" list of SNP packed in a HTML list:
<ul>
<li>rs25</li>
<li>rs26</li>
</ul>

The folling XProc script reads a XML file and returns the original input.
<p:declare-step name="snp">
<p:documentation>Reads a list of SNP and echoes it</p:documentation>
<p:input port="listOfSnp" primary="true">
</p:input>
<p:output port="result" primary="true"/>
<p:identity/>
</p:declare-step>

Here XMLCalabash was called by assigning the port called listOfSnp to our file "rslist.xml"
java -cp ${CLASSPATH} com.xmlcalabash.drivers.Main --input listOfSnp=rslist.xml xproc01.xpl

It returns the original file:
<ul>
<li>rs25</li>
<li>rs26</li>
</ul>

Workflow 2


In this second workflow, we loop over the SNPs and we echo each node. The attribute in <input> @sequence="true" tells xmlcalabash that the result will be a sequence of XML documents.
<p:declare-step name="snp">
<p:input port="listOfSnp" primary="true">
</p:input>
<p:output port="result" primary="true" sequence="true"/>
<p:for-each name="loopOverRs">
<p:iteration-source select="/ul/li"/>
<p:identity/>
</p:for-each>
</p:declare-step>

And here is the result
<li>rs25</li><li>rs26</li>


Workflow 3


This third workflow loops over each SNP, builds a URI for this SNP pointing to its XML definition at the NCBI
<p:declare-step name="snp">
<p:input port="listOfSnp" primary="true">
</p:input>
<p:output port="result" primary="true" sequence="true"/>
<p:for-each name="loopOverRs">
<p:iteration-source select="/ul/li"/>
<p:variable name="rsId" select="substring(.,3)"/>
<p:load>
<p:with-option name="href" select="concat('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&retmode=xml&id=',$rsId)"/>
</p:load>
</p:for-each>
</p:declare-step>

Here is the result, two concatened <ExchangeSet> documents :
<ExchangeSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www
.ncbi.nlm.nih.gov/SNP/docsum" xsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum htt
p://www.ncbi.nlm.nih.gov/SNP/docsum/eudocsum.xsd">
<Rs rsId="25" snpClass="snp" snpType="notwithdrawn" molType="cDN
A" genotype="true" bitField="030000080001020500020101">
<Het type="est" value="0.499585956335068" stdError="0.0143825300037861"/>
<RsLinkout resourceId="1" linkValue="25"/>
<hgvs>NM_015204.1:c.1454-1398A&gt;G</hgvs>
<hgvs>NT_007819.16:g.11073100T&gt;C</hgvs>
</Rs>
(...)
</ExchangeSet>



<ExchangeSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xm
lns="http://www.ncbi.nlm.nih.gov/SNP/docsum" xsi:schemaLocation="http://www.ncbi.nlm.ni
h.gov/SNP/docsum http://www.ncbi.nlm.nih.gov/SNP/docsum/eudocsum.xsd">
<Rs rsId="26" snpClass="mixed" snpType="notwithdrawn" molType="c
DNA" bitField="030100080001000000000700">
<Validation byCluster="true"/>
<Create build="36" date="2000-09-19 17:02"/>
(...)
<hgvs>NM_015204.1:c.1454-727A&gt;G</hgvs>
<hgvs>NT_007819.16:g.11072429T&gt;C</hgvs>
</Rs>
</ExchangeSet>

Workflow 4


This fourth workflow is the same than the previous one, but it uses <p:unwrap> to remove all the children from <ExchangeSet> for each call at the NCBI, and at the end of the workflow, <wrap-sequence wrapper="ExchangeSet"p> is called to merge all those children in a new <ExchangeSet>
<p:declare-step name="snp">
<p:input port="listOfSnp" primary="true">
</p:input>
<p:output port="result" primary="true"/>

<p:for-each name="loopOverRs">
<p:iteration-source select="/ul/li"/>
<p:output port="efetch-doc" sequence="true"/>
<p:variable name="rsId" select="substring(.,3)"/>
<p:load>
<p:with-option name="href" select="concat('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&retmode=xml&id=',$rsId)"/>
</p:load>
<p:unwrap match="/ncbi:ExchangeSet"/>
</p:for-each>

<p:wrap-sequence wrapper="ncbi:ExchangeSet">
</p:wrap-sequence>
</p:declare-step>


So, at the end, a one and only well defined document is returned:
<ncbi:ExchangeSet xmlns:ncbi="http://www.ncbi.nlm.nih.gov/SNP/docsum">
<Rs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.ncbi.nlm
.nih.gov/SNP/docsum" rsId="25" snpClass="snp" snpType="notwithdrawn" molType="cDNA" genotype="true" bitField="030000080001020500020101">
<Het type="est" value="0.499585956335068" stdError="0.0143825300037861"/>

(...)

</PrimarySequence>
<RsLinkout resourceId="1" linkValue="26"/>
<hgvs>NM_015204.1:c.1454-727A&gt;G</hgvs>
<hgvs>NT_007819.16:g.11072429T&gt;C</hgvs>
</Rs>

</ncbi:ExchangeSet>

Workflow 5


This time, we well use a list of ENTREZ queries grouped in a HTML list:
<ul>
<li>"snp_gene_clin"[Filter] AND "snp_pubmed_cited"[Filter] AND 2[CHR]</li>
<li>(1000[CHRPOS] : 5000[CHRPOS]) AND 2[CHR] AND "homo sapiens"[Organism]</li>
</ul>

The next workflow calls NCBI ESearch for each query for the SNP database. It then calls NCBI EFetch and retrieve the information about the SNP, using the parameters ( WebEnv and QueryKey) found in the previous EFetch call.
At the end, the ExchangeSet document is transformed into HTML using an XSLT stylesheet defined inline in the workflow:
<p:declare-step name="snp">
<p:input port="queries" primary="true">
</p:input>
<p:output port="result" primary="true"/>

<p:for-each name="loopOverQueries">
<p:iteration-source select="/ul/li"/>
<p:output port="efetch-doc" sequence="true"/>
<p:variable name="term" select="encode-for-uri(.)"/>
<p:load>
<p:with-option name="href" select="concat('http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=snp&usehistory=y&term=',$term)"/>
</p:load>

<p:load>
<p:with-option name="href" select="concat('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&retmode=xml&WebEnv=', encode-for-uri(/eSearchResult/WebEnv), '&query_key=', encode-for-uri(/eSearchResult/QueryKey) )"/>
</p:load>
<p:unwrap match="/ncbi:ExchangeSet"/>
</p:for-each>
<p:wrap-sequence wrapper="ncbi:ExchangeSet"/>
<p:xslt name="tr2html">
<p:input port="parameters">
<p:empty/>
</p:input>
<p:input port="stylesheet">
<p:inline>
<xsl:stylesheet version="1.0">
<xsl:output method="html"/>
<xsl:template match="/">
<html><body><table>
<thead>
<tr>
<th>rs#</th>
<th>dbSnpBuild</th>
<th>group</th>
<th>build</th>
<th>Chrom</th>
<th>Position</th>
</tr>
</thead>
<tbody>
<xsl:for-each select="/ncbi:ExchangeSet/ncbi:Rs">
<xsl:variable name="rsId"><xsl:value-of select="@rsId"/></xsl:variable>
<xsl:for-each select="ncbi:Assembly">
<xsl:variable name="dbSnpBuild"><xsl:value-of select="@dbSnpBuild"/></xsl:variable>
<xsl:variable name="groupLabel"><xsl:value-of select="@groupLabel"/></xsl:variable>
<xsl:variable name="genomeBuild"><xsl:value-of select="@genomeBuild"/></xsl:variable>
<xsl:for-each select="ncbi:Component">
<xsl:variable name="chromosome"><xsl:value-of select="@chromosome"/></xsl:variable>
<xsl:for-each select="ncbi:MapLoc">
<xsl:element name="tr">

<td>
<xsl:element name="a">
<xsl:attribute name="href">http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=<xsl:value-of select="$rsId"/></xsl:attribute>rs<xsl:value-of select="$rsId"/></xsl:element>
</td>
<td><xsl:value-of select="$dbSnpBuild"/></td>
<td><xsl:value-of select="$groupLabel"/></td>
<td><xsl:value-of select="$genomeBuild"/></td>
<td>chr<xsl:value-of select="$chromosome"/></td>
<td><xsl:value-of select="@physMapInt"/></td>
</xsl:element>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
</tbody>
</table></body></html>
</xsl:template>
</xsl:stylesheet>
</p:inline>
</p:input>
</p:xslt>


</p:declare-step>

And here is the HTML returned:


rs#dbSnpBuildgroupbuildChromPosition
rs28928870129Celera36_3chr249031258
rs28928870129HuRef36_3chr248924538
rs28928870129reference36_3chr249044117
(...)
rs10168026129Celera36_3chr271241
rs10168026129HuRef36_3chr24967
rs10168026129reference36_3chr24484


I've uploaded this workflow in myExperiment:

to my knowledge, this is the first XProc script available there.

That's it !
Pierre

20 May 2009

XForms for Bioinformatics : my notebook.


Here, I describe my experience with XFORMS:(W3C) XForms is an XML application that represents the next generation of forms for the Web. By splitting traditional XHTML forms into three parts—XForms model, instance data, and user interface, it separates presentation from content, allows reuse, gives strong typing—reducing the number of round-trips to the server, as well as offering device independence and a reduced need for scripting.

XForms currently requires the XForms-plugin for Firefox. This plugin is available https://addons.mozilla.org/en-US/firefox/addon/824. The current post was written using Mozilla XForms 0.8.6ff3.

In this post I show how to send a RDF+XML document describing a SNP to a web server using XForms. The form contains a field for the name of the SNP. It also contains a table with two columns (chromosome and position) for mapping the SNP. As this SNP could be mapped more than once on the genome (don't use this for genotyping !), the user will be able to append a row at the end of the table for each position.


Note: The file must be an Xhtml file and the form must send its data to the same domain.

The <HEAD> contains the <xforms:model> This element represents a form definition and is used as
a container for elements that define the XForms Model.

<xforms:model> contains the <xforms:instancel>. An instance defines a template for the data to be collected. Here the template is a RDF file describing a SNP. This SNP is mapped twice.
<xforms:instance id="me">
<rdf:RDF>
<bio:SNP rdf:about="">
<bio:rsId>rs25</bio:rsId>
<bio:mapping>
<bio:Position>
<bio:chromosome>chr1</bio:chromosome>
<bio:position>1000</bio:position>
</bio:Position>
</bio:mapping>
<bio:mapping>
<bio:Position>
<bio:chromosome>chr2</bio:chromosome>
<bio:position>2000</bio:position>
</bio:Position>
</bio:mapping>
</bio:SNP>
</rdf:RDF>
</xforms:instance>

We then bind the node of this instance to an unique descriptor using an xpath expression. Note that the attribute @type can be used to add a restriction to this form (for example the position must be a non-negative integer), if the form is not validated, the data won't be send to the server. A finer validation could also be performed if an XSD schema was attached to the model. Note also that the @rdf:about (a link to the ncbi for the SNP) attribute will be calculated from the name of the SNP.
<xforms:bind nodeset="/rdf:RDF/bio:SNP/bio:rsId" id="rsId" required="true( )"/>
<xforms:bind nodeset="/rdf:RDF/bio:SNP/@rdf:about" calculate="concat('http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=',substring(../bio:rsId,3))"/>
<xforms:bind nodeset="/rdf:RDF/bio:SNP/bio:mapping/bio:Position/bio:chromosome" type="xsd:NMTOKEN" id="mapChrom" required="true( )"/>
<xforms:bind nodeset="/rdf:RDF/bio:SNP/bio:mapping/bio:Position/bio:position" type="xsd:nonNegativeInteger" id="mapPos" required="true( )"/>

We then describe how the data should be submited. Here are two <xforms:submission>: one send the data as XML just like a regular html form, the other send a mail (this second form didn't worked).

<xforms:submission action="echo.php" method="post" id="submit" ref="/rdf:RDF" instance="me" encoding="ISO-8859-1" replace="all"/>
<xforms:submission action="mailto:me@yahoo.fr" method="post" id="mail" ref="/rdf:RDF" instance="me" encoding="ISO-8859-1" replace="all"/>


Then, the input fields are written in the body of the XHTML document. Each field contains a xforms:label as well as a xforms:alert displayed if the field is not valid.
<xforms:input ref="bio:chromosome">
<xforms:label>Chr</xforms:label>
<xforms:alert class="inline">Chromosme is a xsd:NMTOKEN</xforms:alert>
</xforms:input>

For each position of this SNP, the fields are inserted in a table using <xforms:repeat>
<table border="1"><xforms:repeat nodeset="/rdf:RDF/bio:SNP/bio:mapping/bio:Position" id="repeatpositions"><tr>
<td>

<xforms:input ref="bio:chromosome">
<xforms:label>Chr</xforms:label>
<xforms:alert class="inline">Chromosome is a xsd:NMTOKEN</xforms:alert>
</xforms:input>
</td>
<td>
<xforms:input ref="bio:position">
<xforms:label>Pos</xforms:label>
<xforms:alert class="inline">Position must be greater or equals to 0</xforms:alert>
</xforms:input>
</td>
</tr></xforms:repeat></table>

A button is used to append a new row in this table:
<xforms:trigger>
<xforms:label>Insert Row</xforms:label>
<xforms:insert nodeset="/rdf:RDF/bio:SNP/bio:mapping/bio:Position" at="index('repeatpositions')" position="after" ev:event="DOMActivate"/>
</xforms:trigger>

And at the end, here are the two 'submit' buttons:
<xforms:submit submission="submit">
<xforms:label>Go</xforms:label>
<xforms:hint>Click to post</xforms:hint>
</xforms:submit>
<xforms:submit submission="mail">
<xforms:label>Mail</xforms:label>
<xforms:hint>Click to Send</xforms:hint>
</xforms:submit>

... and in the header, a little piece of CSS for styling:
<style type="text/css">
@namespace xforms url("http://www.w3.org/2002/xforms");

xforms|input {
color:blue; font-weight:bold;font-size:20px;width:500px
}

xforms|label {
color:blue; font-weight:bold;font-size:20px;width:500px
}

</style>




And here is the XML sent to the server

<rdf:RDF>
<bio:SNP rdf:about="http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=25">
<bio:rsId>rs25</bio:rsId>
<bio:mapping>
<bio:Position>
<bio:chromosome>chr1</bio:chromosome>
<bio:position>1000</bio:position>
</bio:Position>
</bio:mapping>
<bio:mapping>
<bio:Position>
<bio:chromosome>chr2</bio:chromosome>
<bio:position>2000</bio:position>
</bio:Position>
</bio:mapping>
</bio:SNP>
</rdf:RDF>


All in one...

<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:xforms="http://www.w3.org/2002/xforms"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:ev="http://www.w3.org/2001/xml-events"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:bio="http://ontology.lindenb.org/snp#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<head>
<title>XFORM</title>
<!-- Let's add style to the XFORM components -->
<style type="text/css">
@namespace xforms url("http://www.w3.org/2002/xforms");

xforms|input {
color:blue; font-weight:bold;font-size:20px;width:500px
}

xforms|label {
color:blue; font-weight:bold;font-size:20px;width:500px
}

</style>

<!--

This element represents a form definition and is used as
a container for elements that define the XForms Model.

-->


<xforms:model>

<!-- instance defines a template for the data to be collected. -->
<xforms:instance id="me">
<!-- we send a RDF document to the server -->
<rdf:RDF>
<bio:SNP rdf:about="">
<bio:rsId>rs25</bio:rsId>
<bio:mapping>
<bio:Position>
<bio:chromosome>chr1</bio:chromosome>
<bio:position>1000</bio:position>
</bio:Position>
</bio:mapping>
<bio:mapping>
<bio:Position>
<bio:chromosome>chr2</bio:chromosome>
<bio:position>2000</bio:position>
</bio:Position>
</bio:mapping>
</bio:SNP>
</rdf:RDF>
</xforms:instance>

<xforms:bind nodeset="/rdf:RDF/bio:SNP/bio:rsId" id="rsId" required="true( )"/>
<xforms:bind nodeset="/rdf:RDF/bio:SNP/@rdf:about" calculate="concat('http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=',substring(../bio:rsId,3))"/>
<xforms:bind nodeset="/rdf:RDF/bio:SNP/bio:mapping/bio:Position/bio:chromosome" type="xsd:NMTOKEN" id="mapChrom" required="true( )"/>
<xforms:bind nodeset="/rdf:RDF/bio:SNP/bio:mapping/bio:Position/bio:position" type="xsd:nonNegativeInteger" id="mapPos" required="true( )"/>
<xforms:submission action="echo.php" method="post" id="submit" ref="/rdf:RDF" instance="me" encoding="ISO-8859-1" replace="all"/>
<xforms:submission action="mailto:me@yahoo.fr" method="post" id="mail" ref="/rdf:RDF" instance="me" encoding="ISO-8859-1" replace="all"/>
</xforms:model>
</head>
<body>
This page requires the
<a href="https://addons.mozilla.org/en-US/firefox/addon/824">XForms plugin for Firefox</a><hr/>
<xforms:input bind="rsId">
<xforms:label>RsId</xforms:label>
<xforms:hint>Rs##</xforms:hint>
</xforms:input><br/>
<table border="1">

<xforms:repeat nodeset="/rdf:RDF/bio:SNP/bio:mapping/bio:Position" id="repeatpositions"><tr>
<td>

<xforms:input ref="bio:chromosome">
<xforms:label>Chr</xforms:label>
<xforms:alert class="inline">Chromosme is a xsd:NMTOKEN</xforms:alert>
</xforms:input>
</td>
<td>
<xforms:input ref="bio:position">
<xforms:label>Pos</xforms:label>
<xforms:alert class="inline">Position must be greater or equal to 0</xforms:alert>
</xforms:input>
</td>
</tr></xforms:repeat>
<tfoot><tr>
<xforms:group>
<xforms:trigger>
<xforms:label>Insert Row</xforms:label>
<xforms:insert nodeset="/rdf:RDF/bio:SNP/bio:mapping/bio:Position" at="index('repeatpositions')" position="after" ev:event="DOMActivate"/>
</xforms:trigger>
</xforms:group>
</tr></tfoot>
</table>

<xforms:submit submission="submit">
<xforms:label>Go</xforms:label>
<xforms:hint>Click to post</xforms:hint>
</xforms:submit>
<xforms:submit submission="mail">
<xforms:label>Mail</xforms:label>
<xforms:hint>Click to Send</xforms:hint>
</xforms:submit>
<tr/>
<hr/>
<h6><a href="mailto:plindenbaum@yahoo.fr">Pierre Lindenbaum PhD</a> |
<a href="http://plindenbaum.blogspot.com">http://plindenbaum.blogspot.com</a></h6>
</body>
</html>




That's it,
Pierre

18 May 2009

Rare Disease: How-to ?

I had a conversation with a colleague about how to study a gene involved in a rare human lethal disease. A set of mutations in this gene favors the apparition of the symptoms and only a tiny number of samples is available . How can we learn more about this gene ? I've just got a one or two ideas (hey ! I've not been working at the bench for 10 years :-)...)

  • using the product of this gene as a bait in a Yeast two hybrid screen to find one or more cellular partners to this protein.
  • A Q-PCR (or chip ?) for a few candidate genes to see how they are expressed in different tissues on affected/non-affected
  • ?
hum... that's short. Any other idea ?

14 May 2009

WebServices/JAXWS for SNP, Glassfish, Taverna: my notebook

In this post I describe how to deploy a WebService in the GlassFish web server and to to use it via the Taverna workflow engine.

Server side


Classes


The JAX-WS API (the java API for Web Services) was used here. Our Web Service will be designed to
  • find the position of the SNP from his name
  • find the SNPs in a given region
First of all, a simple POJO (Plain Old Java Object) for a SNP (name, chromosome, position, ....) was created
public class SNP
implements Serializable
{
private static final long serialVersionUID = 1L;
private String name=null;
private String acn=null;
private String chromosome=null;
private String sequence=null;
private int position=-1;

(...)
//getters and setters here...
(...)

}

The interface of our web service "SnpTool" is then defined. The class is decorated with the JAX-WS annotations defining the methods of the web service and the name of the parameters:
package fr.cephb.operon.server.ws;
import javax.jws.WebMethod;
import javax.jws.WebParam;
import javax.jws.WebResult;
import javax.jws.WebService;
import javax.jws.soap.SOAPBinding;
import javax.jws.soap.SOAPBinding.Style;

@WebService
@SOAPBinding(style=Style.DOCUMENT)
public interface SnpTool
{
@WebMethod
@WebResult(name="SnpDescriptorList")
public SNP[] getSNPByName(@WebParam(name="rsNumber")String name) throws Exception;

@WebMethod
@WebResult(name="SnpList")
public SNP[] getSNPByPosition(@WebParam(name="chromosome")String chrom,@WebParam(name="start")int start,@WebParam(name="end")int end) throws Exception;
}
The annotation @SOAPBinding(style=Style.DOCUMENT) was critical because Taverna doesn't seem to handle Style.RPC.
The service for SnpTool is then implemented. As I want to configure this WebService on deployment time (for example to specify a maximum number of SNPs to be retrieved, the default assembly, etc...), we need to get a handle to the web container: this pointer was obtained by injecting a @WebServiceContext annotation. This context is then used to retrieve the initialization parameters of the web application. Warning, this context is not initialized in the SNPToolWeb constructor.
package fr.cephb.operon.server.ws;
import java.io.File;

import javax.annotation.Resource;
import javax.jws.WebService;
import javax.servlet.ServletContext;
import javax.xml.ws.WebServiceContext;
import javax.xml.ws.handler.MessageContext;

import fr.cephb.joperon.core.bio.Assembly;

@WebService(endpointInterface="fr.cephb.operon.server.ws.SnpTool")
public class SNPToolWeb
implements SnpTool
{
@Resource
private WebServiceContext wsContext=null;

/** max number of SNP to be retrieved */
private Integer maxNumberOfSNP=null;

private Integer getMaxNumberOfSNP() throws Exception
{
if(maxNumberOfSNP!=null) return maxNumberOfSNP;
MessageContext ctxt = wsContext.getMessageContext();
ServletContext ctx = (ServletContext)ctxt.get(MessageContext.SERVLET_CONTEXT);
String s= ctx.getInitParameter("limit");
if(s!=null)
{
maxNumberOfSNP=Integer.parseInt(s);
}
return maxNumberOfSNP;
}



@Override
public SNP[] getSNPByName(String rsName) throws Exception {
//get your data here from a database, a file, etc....
//yes it returns an array because some SNP may have been merged
return snp;
}

@Override
public SNP[] getSNPByPosition(String krom, int start, int end)
throws Exception {
List<SNP> list= new ArrayList<SNP>();

//get your data here....

return list.toArray(new SNP[list.size()]);
}

}

Compile and Deploy

The developement descriptor looks like this:
<web-app>
<context-param>
<param-name>limit</param-name>
<param-value>1000000</param-value>
</context-param>
</web-app>

Ant is called to deploy this web service in Glassfish. Note that wsgen was invoked to generate the JAX-WS portable artifacts used in JAX-WS web services.
<project default="all" basedir=".">
<property environment="env"/>
<property name="home.dir" value="${env.HOME}"/>
<property name="rootdir" value="."/>
<property name="builddir" value="${rootdir}/build"/>
<property name="compiledir" value="${builddir}/compile"/>
<property file="${home.dir}/.project-properties"/>

<path id="libraries">
<pathelement path="lib1.jar"/>
<pathelement path="lib2.jar"/>
</path>

<path id="j2eelib">
<pathelement path="${appserver.dir}/lib/webservices-rt.jar"/>
<pathelement path="${appserver.dir}/lib/webservices-tools.jar"/>
</path>



<target name="demoservlet">
<mkdir dir="${compiledir}"/>
<mkdir dir="${builddir}"/>

<copy todir="${compiledir}" includeEmptyDirs="false">
<fileset dir="${rootdir}/src/java">
<filename name="**/*.java"/>
</fileset>
<fileset dir="${rootdir}/src/java">
<filename name="**/*.xml"/>
</fileset>
</copy>

<javac srcdir="${compiledir}" destdir="${compiledir}" debug="true" source="1.6" target="1.6">
<include name="**/SNPServlet.java"/>
<include name="**/SNPToolWeb.java"/>
<classpath>
<path refid="libraries"/>
<pathelement location="${appserver.dir}/lib/j2ee.jar"/>
</classpath>
</javac>


<echo message="Running wsgen"/>
<exec executable="${appserver.dir}/bin/wsgen">
<arg value="-cp"/> <arg value="${appserver.dir}/lib/j2ee.jar:$lib1.jar:$lib2.jar:${compiledir}"/>
<arg value="-verbose"/>
<arg value="-s"/> <arg value="${compiledir}"/>
<arg value="-d"/> <arg value="${compiledir}"/>
<arg value="-wsdl"/>
<arg value="-keep"/>
<arg value="fr.cephb.operon.server.ws.SNPToolWeb"/>
</exec>


<delete includeEmptyDirs="true">
<fileset dir="${compiledir}" includes="**/*.java"/>
</delete>

<war destfile="${builddir}/snp2rdf.war" webxml="${compiledir}/WEB-INF/web.xml">
<lib file="lib1.jar"/>
<classes dir="${compiledir}"/>
</war>

<delete dir="${compiledir}"/>

<exec executable="asadmin" failonerror="true">
<arg value="deploy"/>
<arg line="--user username"/>
<arg line="--passwordfile ${passwordfile}"/>
<arg value="--host"/>
<arg value="localhost"/>
<arg value="--port"/>
<arg value="${domain.admin.port}"/>
<arg value="--echo=true"/>
</>
<arg value="--libraries"/><arg value="lib1.jar"/>
<arg value="${builddir}/snp2rdf.war"/>
</exec>
</target>
</project>

The WSDL

After this WebService was compiled and deployed, we can see its WSDL at http://www.example.org:8080/snp2rdf/SNPToolWebService?wsdl:
<definitions targetNamespace="http://ws.server.operon.cephb.fr/" name="SNPToolWebService">
<types>
<xsd:schema>
<xsd:import namespace="http://ws.server.operon.cephb.fr/" schemaLocation="http://www.example.org:8080/snp2rdf/SNPToolWebService?xsd=1"/>
</xsd:schema>
</types>
<message name="getSNPByName">
<part name="parameters" element="tns:getSNPByName"/>
</message>
<message name="getSNPByNameResponse">
<part name="parameters" element="tns:getSNPByNameResponse"/>
</message>
<message name="Exception">
<part name="fault" element="tns:Exception"/>
</message>
<message name="getAssemblyName">
<part name="parameters" element="tns:getAssemblyName"/>
</message>
<message name="getAssemblyNameResponse">
<part name="parameters" element="tns:getAssemblyNameResponse"/>
</message>
<message name="getSNPByPosition">
<part name="parameters" element="tns:getSNPByPosition"/>
</message>
<message name="getSNPByPositionResponse">
<part name="parameters" element="tns:getSNPByPositionResponse"/>
</message>
<portType name="SnpTool">
<operation name="getSNPByName">
<input message="tns:getSNPByName"/>
<output message="tns:getSNPByNameResponse"/>
<fault message="tns:Exception" name="Exception"/>
</operation>
<operation name="getAssemblyName">
<input message="tns:getAssemblyName"/>
<output message="tns:getAssemblyNameResponse"/>
<fault message="tns:Exception" name="Exception"/>
</operation>
<operation name="getSNPByPosition">
<input message="tns:getSNPByPosition"/>
<output message="tns:getSNPByPositionResponse"/>
<fault message="tns:Exception" name="Exception"/>
</operation>
</portType>
<binding name="SNPToolWebPortBinding" type="tns:SnpTool">
<soap:binding transport="http://schemas.xmlsoap.org/soap/http" style="document"/>
<operation name="getSNPByName">
<soap:operation soapAction=""/>
<input>
<soap:body use="literal"/>
</input>
<output>
<soap:body use="literal"/>
</output>
<fault name="Exception">
<soap:fault name="Exception" use="literal"/>
</fault>
</operation>
<operation name="getAssemblyName">
<soap:operation soapAction=""/>
<input>
<soap:body use="literal"/>
</input>
<output>
<soap:body use="literal"/>
</output>
<fault name="Exception">
<soap:fault name="Exception" use="literal"/>
</fault>
</operation>
<operation name="getSNPByPosition">
<soap:operation soapAction=""/>
<input>
<soap:body use="literal"/>
</input>
<output>
<soap:body use="literal"/>
</output>
<fault name="Exception">
<soap:fault name="Exception" use="literal"/>
</fault>
</operation>
</binding>
<service name="SNPToolWebService">
<port name="SNPToolWebPort" binding="tns:SNPToolWebPortBinding">
<soap:address location="http://www.example.org:8080/snp2rdf/SNPToolWebService"/>
</port>
</service>
</definitions>


Client side


Creating a Client with wsimport


I've previously described how to use wsimport to generate the classes using a WebService in a previous post (see "The EBI/IntAct Web-Service API, my notebook")

Using the WS with Taverna



I've used this WSDL to run my first Taverna workflow: the input is the SNP "rs25". The Web Services invoked finds its position on the human genome and find its neighbours at 100bp. The XML result is then saved to a local file.
  • The green nodes are the WebServices.
  • The blue nodes are the constants (e.g. "rs25")
  • The orange node is a simple Java BeanShell script extending the position of the SNP:
    left=Math.max(0,Integer.parseInt(position)-Integer.parseInt(extend));
    right=Integer.parseInt(position)+Integer.parseInt(extend);
  • the purple nodes are the XML scavengers (a mysterious thing used to convert a structure to/from a XML file) and the processors (e.g. write to a file)

We this workflow was invoked it saved the following file:
<ns2:getSNPByPositionResponse>
<SnpList>
<acn>rs12699208</acn>
<chromosome>Chr7</chromosome>
<name>rs12699208</name>
<position>11549694</position>
<sequence>(...)TATAGCTTCAACATATATGAAAAAAATGTCCACTGA[R]TAGTTCCTGGTGGAGAACTCTCCCATCTCTTTTG</sequence>
</SnpList>
<SnpList>
<acn>rs27</acn>
<chromosome>Chr7</chromosome>
<name>rs27</name>
<position>11549750</position>
<sequence>(..)CCCCCATTTGAGATCCTTCTTCATCTCACCTG[S]TACCTCTCAATCCCGGTGAACCAAAAGAGATGGG(...)</sequence>
</SnpList>

(...)

<SnpList>
<acn>rs7458209</acn>
<chromosome>Chr7</chromosome>
<name>rs7458209</name>
<position>11551560</position>
<sequence>(...)GAAAACTTTAGGAAGCAAACAT[Y]GTTTTATTAAGAAAACAGGTTAAGCAAGATGGCTGACAGGAAGAGCTTCTCC(...)</sequence>
</SnpList>
</ns2:getSNPByPositionResponse>

This workflow was then uploaded and shared on
. Please note that this web service is under developpement and might be replaced and/or switched off soon.

That's it !
Pierre

01 May 2009

Beautiful Data : The Stories Behind Elegant Data Solutions



My contribution to this book was minor, so I really want to thank Jean Claude Bradley, Rajarshi Guha, Andrew Lang, Cameron Neylon, Antony Williams and Egon Willighagen for keeping my name in the list of authors. They wrote a great chapter for this book.

Pierre