15 December 2006

JAVA 1.6 Mustang, Derby/JavaDB and Bioinformatics.

Java1.6 nows contains an embedded SQL database engine called derby. In this post I will show how I have tested derby to store some fasta sequences. What is cool is that you can call any public java static methods directly from the SQL queries.

The source code is available here

When the SQL driver is called, it creates a new directory (here derby4fasta) where derby will store its data


File database=new File("derby4fasta");
String DRIVER="org.apache.derby.jdbc.EmbeddedDriver";
Class<?> driver=Class.forName(DRIVER);
boolean create=!database.exists();
driver.newInstance();
String url="jdbc:derby:";
Properties props= new Properties();
if(create)
{
props.setProperty("create", "true");
}
props.setProperty("user", "dba");
props.setProperty("password", "");
props.setProperty("databaseName","derby4fasta");
this.connection = DriverManager.getConnection(url,props);


I declare a static method returning the GC% of a sequence.

public static double gcPercent(String sequence)
{
int n=0;
for(int i=0;i< sequence.length();++i)
{
char base= Character.toUpperCase(sequence.charAt(i));
n+=(base=='G' || base=='C'?1:0);
}
return (double)(n/(double)sequence.length());
}


I declare a new function that will call this method.

/* the function FASTA.GC call org.lindenb.sandbox.Derby4Fasta.gcPercent() */
statement.executeUpdate(
"create function FASTA.GC( seq VARCHAR(2000) ) returns DOUBLE "+
" LANGUAGE JAVA "+
" NO SQL "+
" PARAMETER STYLE JAVA "+
" EXTERNAL NAME \'org.lindenb.sandbox.Derby4Fasta.gcPercent\'"
);




The fasta sequences are read and inserted in the database. We can now select the sequences having a GC% greater than 55%.

//find sequences having a GC% > 55%
Statement stmt= app.connection.createStatement();
ResultSet row=stmt.executeQuery("select FASTA.GC(seq)*100.0,name,length(seq) from FASTA.SEQUENCE " +
"where FASTA.GC(seq)>0.55");
while(row.next())
{
System.out.println(row.getInt(1)+"%\t"+row.getString(2)+"("+row.getInt(3)+")");
}


compiling and executing....

55% >gi|11448650|gb|BF436335.1|BF436335 7p06d03.x1 NCI_CGA
55% >gi|23257376|gb|BU583411.1|BU583411 mai04h08.y1 McCarr
57% >gi|10811235|gb|BF057339.1|BF057339 7k19e02.x1 NCI_CGA
57% >gi|11271233|gb|BF321955.1|BF321955 uz66f08.y1 NCI_CGA
57% >gi|11271233|gb|BF321955.1|BF321955 uz66f08.y1 NCI_CGA
58% >gi|13675777|gb|BG625264.1|BG625264 pgn1c.pk002.c2 Nor
58% >gi|13675786|gb|BG625273.1|BG625273 pgn1c.pk002.d1 Nor
58% >gi|13675786|gb|BG625273.1|BG625273 pgn1c.pk002.d1 Nor
63% >gi|84131965|gb|CV878005.1|CV878005 PDUts1172A01 Porci


That's it.

Pierre

No comments: