26 September 2011

PostScript as a Programming Language for Bioinformatics: mynotebook

"PostScript (PS) is an interpreted, stack-based programming language. It is best known for its use as a page description language in the electronic and desktop publishing areas."[wikipedia]. In this post, I'll show how I've used to create a simple and lightweight view of the genome.

Introduction: just a simple postscript program

The following PS program fills a rectangular gray shape; You can display the result using ghostview, a2ps, etc...
%!PS
newpath
50 50 moveto
0 100 rlineto
100 0 rlineto
0 -100 rlineto
closepath
0.5 setgray
fill
showpage

Some global variables

The page width

/screenWidth 1000 def

The page width

/screenHeight 1000 def

The minimum 5' position

/minChromStart 1E9 def

The maximum 3' position

/maxChromEnd -1 def

The size of a genomic feature

/featureHeight 20 def

The distance between two 'ticks' for drawing the orientation

/ticksx 20 def

The font size

/theFontSize 9 def
The variable knownGene is a PS array of genes.

/knownGene [
[(uc002zkr.3) (chr22) (-) 161242...
...]
] def

Each Gene is a PS array holding the structure of the UCSC knownGene table, that is to say: name , chromosome, txStart, txEnd, cdsStart, cdsEnd, exonStarts, exonEnds:

[(uc002zmh.2) (chr22) (-) 17618410 17646177 17618910 17646134
   [17618410 17619439 17621948 17623987 17625913 17629337 17630431 17646098 ]
   [17619247 17619628 17622123 17624021 17626007 17629450 17630635 17646177 ]
]
. a simple command line can be used to fetch those data:
%  curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz" |\
gunzip -c | grep chr22 | head -n 20 |\
awk '{printf("[(%s) (%s) (%s) %s %s %s %s [%s] [%s] ]\n",$1,$2,$3,$4,$5,$6,$7,$9,$10);}' |\
tr "," " " > result.txt 

Some utilities

converting a PS object to string

/toString
{
20 string cvs 
} bind def

Converting a string to interger (loop over each character an increase the current value)

/toInteger
{
3 dict begin
/s exch def
/i 0 def
/n 0 def
s {
  n 10 mul /n exch def
  s i get 48 sub n add /n exch def %48=ascii('0')
  i 1 add /i exch def
  } forall
n % leave n on the stack
end
} bind def

Convert a genomic position to a index on the page 'x' axis

/convertPos2pixel
{
minChromStart sub maxChromEnd minChromStart sub div screenWidth mul
} bind def

Extract the chromosome (that is to say, extract the 1st element of the current array on the stack)

/getChrom
{
1 get
} bind def

Create a hyperlink to the UCSC genome browser

/getHyperLink
{
3 dict begin
/E exch def %% END
/S exch def %% START
/C exch def %% CHROMOSOME
[ (http://genome.ucsc.edu/cgi-bin/hgTracks?position=) C (:) S toString (-) E toString (&) (&db=hg19) ] concatstringarray
end
} bind def

Paint a rectangle

/box
{
4 dict begin
/height exch def
/width exch def
/y exch def
/x exch def
x y moveto
width 0 rlineto
0 height rlineto
width -1 mul 0 rlineto
0 height -1 mul rlineto
end
} bind def

Paint a gray gradient

/gradient
{
4 dict begin
/height exch def
/width exch def
/y exch def
/x exch def
/i 0 def
height 2 div /i exch def

0 1 height 2 div {
	1 i height 2.0 div div sub setgray
	newpath
	x  
	y height 2 div i sub  add
	width
	i 2 mul
	box
	closepath
	fill
	i 1 sub /i exch def
	}for
newpath
0 setgray
0.4 setlinewidth
x y width height box
closepath
stroke
end
} bind def

Methods extracting a data about the current gene on the PS stack.

Extract the transcription start:

/getTxStart
{
3 get
} bind def

Extract the transcription end:

/getTxEnd
{
4 get
} bind def

Extract the CDS start:

/getCdsStart
{
5 get
} bind def

Extract the transcription end:

/getCdsEnd
{
6 get
} bind def

Extract the strand:

/getStrand
{
2 get (+) eq {1} {-1} ifelse
} bind def
Get the gene name

/getKgName
{
0 get
} bind def

Get the number of exons:

/getExonCount
{
7 get length
} bind def

Get the start position of the i-th exon:

/getExonStart
{
2 dict begin
/i exch def
/gene exch def
gene 7 get i get
end
} bind def

Get the end position of the i-th exon:

/getExonEnd
{
2 dict begin
/i exch def
/gene exch def
gene 8 get i get
end
} bind def

Should we draw this gene on the page ?

/isVisible
{
1 dict begin
/gene exch def
minChromStart gene getTxEnd gt 
	{
	false
	}
	{
	gene getTxStart maxChromEnd gt
		{
		false
		}
		{
		true
		}ifelse
	}ifelse
end
}bind def

Methods for an array of genes

Loop over the genes and extract the lowest 5' index:

/getMinChromStart
{
3 dict begin
/genes exch def
/pos 10E9 def
/i 0 def
genes length {
	genes i get getTxStart pos min /pos  exch def
	i 1 add /i exch def
	}repeat
pos
end
} bind def

Loop over the genes and extract the highest 3' index:

/getMaxChromEnd
{
3 dict begin
/genes exch def
/pos -1E9 def
/i 0 def
genes length {
	genes i get getTxEnd pos max /pos  exch def
	i 1 add /i exch def
	}repeat
pos
end
} bind def

Painting ONE Gene

/paintGene
{
5 dict begin
/gene exch def %% the GENE argument
/midy featureHeight 2.0 div def %the middle of the row
/x0 gene getTxStart convertPos2pixel def % 5' side of the gene in pixel
/x1 gene getTxEnd convertPos2pixel def % 3' side of the gene in pixel
/i 0 def
0.1 setlinewidth

1 0 0 setrgbcolor

newpath
x0 midy moveto
x1 midy lineto
closepath
stroke


% paint ticks
0 1 x1 x0 sub ticksx div{
	newpath
	gene getStrand 1 eq 
		{
		x0 ticksHeight sub i add midy ticksHeight add moveto
		x0 i add midy lineto
		x0 ticksHeight sub i add midy ticksHeight sub lineto
		}
	%else
		{
		x0 ticksHeight add i add midy ticksHeight add moveto
		x0 i add midy lineto
		x0 ticksHeight add i add midy ticksHeight sub lineto
		} ifelse
	stroke
	i ticksx add /i exch def
	} for

%paint Transcript start-end
0 0 1 setrgbcolor
newpath
gene getCdsStart convertPos2pixel
midy cdsHeight 2 div sub
gene getCdsEnd convertPos2pixel gene getCdsStart convertPos2pixel sub 
cdsHeight box
closepath
fill

% loop over exons
0 /i exch def
gene getExonCount
	{
	gene i getExonStart convertPos2pixel
	midy exonHeight 2 div sub
	gene i getExonEnd convertPos2pixel gene i getExonStart convertPos2pixel sub
	exonHeight gradient
	i 1 add /i exch def
	} repeat
0 setgray
gene getTxEnd convertPos2pixel 10 add midy moveto
gene getKgName show

%URL 
[ /Rect [x0 0 x1 1 add featureHeight]
/Border [1 0 0]
/Color [1 0 0]
/Action << /Subtype /URI /URI gene getChrom gene getTxStart gene getTxEnd getHyperLink  >>
/Subtype /Link
/ANN pdfmark

end
} bind def

Paint all Genes

/paintGenes
{
3 dict begin
/genes exch def %the GENE argument (an array)
/i 0 def % loop iterator
/j 0 def % row iterator


% draw 10 vertical lines
i 0 /i exch def
0 setgray
0 1 10 {
	%draw a vertical line
	screenWidth 10 div i mul 0 moveto
	screenWidth 10 div i mul screenHeight lineto
	stroke
	% print the position at the top rotate by 90°
	screenWidth 10 div i mul 10 add screenHeight 5 sub moveto
	-90 rotate
	maxChromEnd minChromStart sub i 10 div mul minChromStart add toString show
	90 rotate
	i 1 add /i exch def
	} for

0 /i exch def
genes length {
	genes i get isVisible
		{
		gsave
		0 j  featureHeight 2 add mul translate
		genes i get paintGene
		j 1 add /j exch def
		grestore
		} if
	i 1 add /i exch def
	}repeat
end
} bind def

All in one: the postscript code


Open the PS file in ghostview, evince, ...

Zooming ? Yes we can.

Ghostview has an option -Sname=string
       -Sname=string
       -sname=string
              Define  a  name  in  "systemdict"  with a given string as value.
              This is different from -d.

In my postscript file, the default values for minChromStart and maxChromEnd are overridden by the user's parameters:

systemdict /userChromStart known {
	userChromStart toInteger /minChromStart  exch def
	} if

systemdict /userChromEnd known {
	userChromEnd toInteger /maxChromEnd  exch def
	} if
That's it,

Pierre

23 September 2011

Joining genomic annotations files with the tabix API.

Tabix is a software that is part of the samtools package. After indexing a file, tabix is able to quickly retrieve data lines overlapping genomic regions (see also my previous post about tabix). Here, I wrote a tool named jointabix that joins the data of a (chrom/start/end) file with a file indexed with tabix. I've posted the code on github at: https://github.com/lindenb/samtools-utilities/blob/master/src/jointabix.c.

Usage


$ jointabix  -h

Usage: jointabix (options) {stdin|file|gzfiles}:

  -d   column delimiter. default: TAB
  -c   chromosome column (1).
  -s   start column (2).
  -e   end column (2).
  -i   ignore lines starting with ('#').
  -t   tabix file (required).
  +1 add 1 to the genomic coodinates.
  -1 remove 1 to the genomic coodinates.
 

Example:

In the following example, I'm going to join the SNPs from the 1000 genome project with the "cytoband" database of the UCSC.

##download and index UCSC-cytobands:
$ wget -O cytoBand.txt.gz "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz"
$ gunzip cytoBand.txt.gz
$ bgzip cytoBand.txt


$ curl -s  "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20100804/ALL.2of4intersection.20100804.sites.vcf.gz" |\
   gunzip -c |\
   sed 's/^\([^#]\)/chr\1/' |\
   cut -d '   ' -f 1-5 |\
   jointabix -c 1 -s 2 -e 2 -1 -f cytoBand.txt.gz |\
   grep -v "##"
 
#CHROM	POS	ID	REF	ALT
chr1	10327	rs112750067	T	C	chr1	0	2300000	p36.33	gneg
chr1	10469	rs117577454	C	G	chr1	0	2300000	p36.33	gneg
chr1	10492	rs55998931	C	T	chr1	0	2300000	p36.33	gneg
chr1	10583	rs58108140	G	A	chr1	0	2300000	p36.33	gneg
chr1	11508	.	A	G	chr1	0	2300000	p36.33	gneg
chr1	11565	.	G	T	chr1	0	2300000	p36.33	gneg
chr1	12783	.	G	A	chr1	0	2300000	p36.33	gneg
chr1	13116	.	T	G	chr1	0	2300000	p36.33	gneg
chr1	13327	.	G	C	chr1	0	2300000	p36.33	gneg
chr1	13980	.	T	C	chr1	0	2300000	p36.33	gneg
chr1	14699	.	C	G	chr1	0	2300000	p36.33	gneg
chr1	14930	.	A	G	chr1	0	2300000	p36.33	gneg
chr1	14933	.	G	A	chr1	0	2300000	p36.33	gneg
chr1	14948	.	G	A	chr1	0	2300000	p36.33	gneg
chr1	15118	.	A	G	chr1	0	2300000	p36.33	gneg
chr1	15211	.	T	G	chr1	0	2300000	p36.33	gneg
chr1	15274	.	A	T	chr1	0	2300000	p36.33	gneg
chr1	15820	.	G	T	chr1	0	2300000	p36.33	gneg
chr1	16206	.	T	A	chr1	0	2300000	p36.33	gneg
chr1	16257	.	G	C	chr1	0	2300000	p36.33	gneg
chr1	16280	.	T	C	chr1	0	2300000	p36.33	gneg
chr1	16298	.	C	T	chr1	0	2300000	p36.33	gneg
chr1	16378	.	T	C	chr1	0	2300000	p36.33	gneg
chr1	16495	.	G	C	chr1	0	2300000	p36.33	gneg
chr1	16534	.	C	T	chr1	0	2300000	p36.33	gneg
chr1	16841	.	G	T	chr1	0	2300000	p36.33	gneg
chr1	28376	.	G	A	chr1	0	2300000	p36.33	gneg
chr1	28563	.	A	G	chr1	0	2300000	p36.33	gneg
chr1	30860	.	G	C	chr1	0	2300000	p36.33	gneg
chr1	30885	.	T	C	chr1	0	2300000	p36.33	gneg
chr1	30923	.	G	T	chr1	0	2300000	p36.33	gneg
chr1	31295	.	A	C	chr1	0	2300000	p36.33	gneg
chr1	31467	.	T	C	chr1	0	2300000	p36.33	gneg
chr1	31487	.	G	A	chr1	0	2300000	p36.33	gneg
chr1	40261	.	C	A	chr1	0	2300000	p36.33	gneg
chr1	46633	.	T	A	chr1	0	2300000	p36.33	gneg
chr1	48183	.	C	A	chr1	0	2300000	p36.33	gneg
chr1	48186	.	T	G	chr1	0	2300000	p36.33	gneg
chr1	49272	.	G	A	chr1	0	2300000	p36.33	gneg
chr1	49298	.	T	C	chr1	0	2300000	p36.33	gneg
chr1	49554	.	A	G	chr1	0	2300000	p36.33	gneg
chr1	51479	rs116400033	T	A	chr1	0	2300000	p36.33	gneg
chr1	51673	.	T	C	chr1	0	2300000	p36.33	gneg
chr1	51803	rs62637812	T	C	chr1	0	2300000	p36.33	gneg
chr1	51898	rs76402894	C	A	chr1	0	2300000	p36.33	gneg
chr1	52058	rs62637813	G	C	chr1	0	2300000	p36.33	gneg
chr1	52238	.	T	G	chr1	0	2300000	p36.33	gneg
chr1	52727	.	C	G	chr1	0	2300000	p36.33	gneg
chr1	54353	.	C	A	chr1	0	2300000	p36.33	gneg
(...)

That's it,

Pierre

11 September 2011

The Wikipedia Template:Infobox_biodatabase is now integrated in DBPedia

In January 2011, I started the project Template:Infobox_biodatabase. The goal of this project is the annotation of the biological databases in wikipedia using an infobox. The pages annotated with this template have now been integrated into DBpedia 3.7 and it is now possible to query the data through a SPARQL endpoint.
(Note: during the process of writing the new pages in wikipedia, a few articles have been proposed for deletion for notability reasons: I din't fight against the choise of the WP editors).

Articles in category: "Biological database"

SPARQL

List the biological databases.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/ontology/>

SELECT   ?title ?uri WHERE {
  ?uri a dbpedia:BiologicalDatabase .
  OPTIONAL {
  	?uri dbpedia:title ?title.
  	}
} ORDER By ?uri

Result:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| title                                                                      | uri                                                                                                                     |
========================================================================================================================================================================================================
| "3did"@en                                                                  | <http://dbpedia.org/resource/3did>                                                                                      |
| "ABCdb"@en                                                                 | <http://dbpedia.org/resource/ABCdb>                                                                                     |
| "AREsite"@en                                                               | <http://dbpedia.org/resource/AREsite>                                                                                   |
| "AlloSteric Database"@en                                                   | <http://dbpedia.org/resource/ASD_%28database%29>                                                                        |
| "AgBase"@en                                                                | <http://dbpedia.org/resource/AgBase>                                                                                    |
| "Allele frequency net"@en                                                  | <http://dbpedia.org/resource/Allele_frequency_net_database>                                                             |
| "ASTD"@en                                                                  | <http://dbpedia.org/resource/Alternative_splicing_and_transcript_diversity_database>                                    |
| "ASAP"@en                                                                  | <http://dbpedia.org/resource/Alternative_splicing_annotation_project>                                                   |
| "AmoebaDB"@en                                                              | <http://dbpedia.org/resource/AmoebaDB>                                                                                  |
| "ArachnoServer"@en                                                         | <http://dbpedia.org/resource/ArachnoServer>                                                                             |
| "ArtadeDB"@en                                                              | <http://dbpedia.org/resource/Artade>                                                                                    |
| "ASPicDB"@en                                                               | <http://dbpedia.org/resource/AspicDB>                                                                                   |
| "The Autophagy Database"@en                                                | <http://dbpedia.org/resource/Autophagy_database>                                                                        |
| "BGMUT"@en                                                                 | <http://dbpedia.org/resource/BGMUT>                                                                                     |
| "BISC"@en                                                                  | <http://dbpedia.org/resource/BISC_%28database%29>                                                                       |
| "BRENDA"@en                                                                | <http://dbpedia.org/resource/BRENDA>                                                                                    |
| "The BRENDA Tissue Ontology (BTO)"@en                                      | <http://dbpedia.org/resource/BRENDA_tissue_ontology>                                                                    |
|                                                                            | <http://dbpedia.org/resource/BindingDB>                                                                                 |
| "Bio2RDF"@en                                                               | <http://dbpedia.org/resource/Bio2RDF>                                                                                   |
| "BioGRID"@en                                                               | <http://dbpedia.org/resource/BioGRID>                                                                                   |
| "BioModels Database"@en                                                    | <http://dbpedia.org/resource/BioModels_Database>                                                                        |
| "BSDB"@en                                                                  | <http://dbpedia.org/resource/Biomolecule_stretching_database>                                                           |
| "Bovine Genome Database"@en                                                | <http://dbpedia.org/resource/Bovine_genome_database>                                                                    |
| "BriX"@en                                                                  | <http://dbpedia.org/resource/Brix_%28database%29>                                                                       |
| "CADgene"@en                                                               | <http://dbpedia.org/resource/CADgene>                                                                                   |
| "CATH"@en                                                                  | <http://dbpedia.org/resource/CATH>                                                                                      |
| "CLIPZ:"@en                                                                | <http://dbpedia.org/resource/CLIPZ>                                                                                     |
| "COSMIC"@en                                                                | <http://dbpedia.org/resource/COSMIC_cancer_database>                                                                    |
| "CaSNP"@en                                                                 | <http://dbpedia.org/resource/CaSNP>                                                                                     |
| "CancerResource:"@en                                                       | <http://dbpedia.org/resource/CancerResource>                                                                            |
| "cBARBEL"@en                                                               | <http://dbpedia.org/resource/Catfish_genome_database>                                                                   |
| "CCDB"@en                                                                  | <http://dbpedia.org/resource/Cervical_cancer_gene_database>                                                             |
|                                                                            | <http://dbpedia.org/resource/ChEBI>                                                                                     |
|                                                                            | <http://dbpedia.org/resource/ChEMBL>                                                                                    |
| "ChemProt"@en                                                              | <http://dbpedia.org/resource/ChemProt>                                                                                  |
| "ChimerDB"@en                                                              | <http://dbpedia.org/resource/ChimerDB>                                                                                  |
| "MDS_IES_DB"@en                                                            | <http://dbpedia.org/resource/Ciliate_MDS/IES_database>                                                                  |
| "Ciona intestinalis protein database"@en                                   | <http://dbpedia.org/resource/Ciona_intestinalis_protein_database>                                                       |
| "ACLAME"@en                                                                | <http://dbpedia.org/resource/Classification_of_mobile_genetic_elements>                                                 |
| "COMBREX: COMputational BRidges to EXperiments"@en                         | <http://dbpedia.org/resource/Combrex>                                                                                   |
| "CAMERA"@en                                                                | <http://dbpedia.org/resource/Community_Cyberinfrastructure_for_Advanced_Marine_Microbial_Ecology_Research_and_Analysis> |
| "CORG"@en                                                                  | <http://dbpedia.org/resource/Comparative_regulatory_genomics_database>                                                  |
| "CPLA"@en                                                                  | <http://dbpedia.org/resource/Compendium_of_protein_lysine_acetylation>                                                  |
| "Conformational dynamics data bank"@en                                     | <http://dbpedia.org/resource/Conformational_dynamics_data_bank>                                                         |
| "ConsensusPathDB"@en                                                       | <http://dbpedia.org/resource/ConsensusPathDB>                                                                           |
| "CDD"@en                                                                   | <http://dbpedia.org/resource/Conserved_domain_database>                                                                 |
| "DAnCER"@en                                                                | <http://dbpedia.org/resource/DAnCER_%28database%29>                                                                     |
| "DBASS3 and DBASS5"@en                                                     | <http://dbpedia.org/resource/DBASS3/5>                                                                                  |
| "DIMA"@en                                                                  | <http://dbpedia.org/resource/DIMA_%28database%29>                                                                       |
| "DNA Data Bank of Japan"@en                                                | <http://dbpedia.org/resource/DNA_Data_Bank_of_Japan>                                                                    |
| "PCDB"@en                                                                  | <http://dbpedia.org/resource/Database_of_protein_conformational_diversity>                                              |
| "dbCRID"@en                                                                | <http://dbpedia.org/resource/DbCRID>                                                                                    |
| "dbDNV"@en                                                                 | <http://dbpedia.org/resource/DbDNV>                                                                                     |
| "dbSNP"@en                                                                 | <http://dbpedia.org/resource/DbSNP>                                                                                     |
| "DiProDB: a database for dinucleotide properties."@en                      | <http://dbpedia.org/resource/DiProDB>                                                                                   |
| "dictyBase"@en                                                             | <http://dbpedia.org/resource/DictyBase>                                                                                 |
| "DOMINE"@en                                                                | <http://dbpedia.org/resource/Domine_Database>                                                                           |
| "DroID"@en                                                                 | <http://dbpedia.org/resource/Droid_%28database%29>                                                                      |
| "ECRbase"@en                                                               | <http://dbpedia.org/resource/ECRbase>                                                                                   |
| "ECgene"@en                                                                | <http://dbpedia.org/resource/ECgene>                                                                                    |
| "EDAS."@en                                                                 | <http://dbpedia.org/resource/EDAS>                                                                                      |
| "EMAGE"@en                                                                 | <http://dbpedia.org/resource/EMAGE>                                                                                     |
| "EMDataBank.org"@en                                                        | <http://dbpedia.org/resource/EM_Data_Bank>                                                                              |
| "ENCODE"@en                                                                | <http://dbpedia.org/resource/ENCODE>                                                                                    |
| "EcoCyc"@en                                                                | <http://dbpedia.org/resource/EcoCyc>                                                                                    |
| "Effective-"@en                                                            | <http://dbpedia.org/resource/Effective_%28database%29>                                                                  |
| "The Ensembl genome database project."@en                                  | <http://dbpedia.org/resource/Ensembl>                                                                                   |
| "EID"@en                                                                   | <http://dbpedia.org/resource/Exon-intron_database>                                                                      |
| "ExtraTrain"@en                                                            | <http://dbpedia.org/resource/ExtraTrain>                                                                                |
| "FANTOM"@en                                                                | <http://dbpedia.org/resource/FANTOM>                                                                                    |
| "FINDbase"@en                                                              | <http://dbpedia.org/resource/FINDbase>                                                                                  |
| "FREP"@en                                                                  | <http://dbpedia.org/resource/FREP>                                                                                      |
| "FishBase"@en                                                              | <http://dbpedia.org/resource/FishBase>                                                                                  |
| "FlyFactorSurvey"@en                                                       | <http://dbpedia.org/resource/FlyFactorSurvey>                                                                           |
| "Full-parasites"@en                                                        | <http://dbpedia.org/resource/Full-parasites>                                                                            |
| "FESD"@en                                                                  | <http://dbpedia.org/resource/Functional_element_SNPs_database>                                                          |
| "FGDB"@en                                                                  | <http://dbpedia.org/resource/Fusarium_graminearum_genome_database>                                                      |
| "GISSD"@en                                                                 | <http://dbpedia.org/resource/GISSD>                                                                                     |
| "GPnotebook"@en                                                            | <http://dbpedia.org/resource/GPnotebook>                                                                                |
| "GPCRDB"@en                                                                | <http://dbpedia.org/resource/G_protein-coupled_receptors_database>                                                      |
| "GenBank"@en                                                               | <http://dbpedia.org/resource/GenBank>                                                                                   |
| "Genetic codes"@en                                                         | <http://dbpedia.org/resource/Genetic_codes_%28database%29>                                                              |
| "GlycomeDB"@en                                                             | <http://dbpedia.org/resource/GlycomeDB>                                                                                 |
| "GyDB of mobile genetic elements:"@en                                      | <http://dbpedia.org/resource/Gypsy_%28database%29>                                                                      |
| "The H-Invitational"@en                                                    | <http://dbpedia.org/resource/H-Invitational>                                                                            |
| "HGNC"@en                                                                  | <http://dbpedia.org/resource/HUGO_Gene_Nomenclature_Committee>                                                          |
| "HitPredict"@en                                                            | <http://dbpedia.org/resource/HitPredict>                                                                                |
| "HOLLYWOOD"@en                                                             | <http://dbpedia.org/resource/Hollywood_%28database%29>                                                                  |
| "HUMHOT"@en                                                                | <http://dbpedia.org/resource/HumHot>                                                                                    |
| "H-DBAS"@en                                                                | <http://dbpedia.org/resource/Human-transcriptome_database_for_alternative_splicing>                                     |
| "Hymenoptera Genome Database"@en                                           | <http://dbpedia.org/resource/Hymenoptera_genome_database>                                                               |
| "IGRhCellID"@en                                                            | <http://dbpedia.org/resource/IGRhCellID>                                                                                |
| "IUPHAR-DB."@en                                                            | <http://dbpedia.org/resource/IUPHAR_%28database%29>                                                                     |
| "InSatDb"@en                                                               | <http://dbpedia.org/resource/InSatDb>                                                                                   |
|                                                                            | <http://dbpedia.org/resource/Indian_Genetic_Disease_Database_%28IGDD%29>                                                |
| "InterPro"@en                                                              | <http://dbpedia.org/resource/InterPro>                                                                                  |
| "INTERFEROME"@en                                                           | <http://dbpedia.org/resource/Interferome>                                                                               |
| "IKMC: International Knockout Mouse Consortium"@en                         | <http://dbpedia.org/resource/International_Knockout_Mouse_Consortium>                                                   |
| "Intronerator"@en                                                          | <http://dbpedia.org/resource/Intronerator>                                                                              |
| "ISfinder"@en                                                              | <http://dbpedia.org/resource/Isfinder>                                                                                  |
| "Islander"@en                                                              | <http://dbpedia.org/resource/Islander_%28database%29>                                                                   |
| "IsoBase"@en                                                               | <http://dbpedia.org/resource/IsoBase>                                                                                   |
| "KEGG"@en                                                                  | <http://dbpedia.org/resource/KEGG>                                                                                      |
| "KUPS"@en                                                                  | <http://dbpedia.org/resource/KUPS_%28database%29>                                                                       |
| "KaPPA-View4"@en                                                           | <http://dbpedia.org/resource/KaPPA-View4>                                                                               |
| "L1Base"@en                                                                | <http://dbpedia.org/resource/L1Base>                                                                                    |
| "Laminin database"@en                                                      | <http://dbpedia.org/resource/Laminin_database>                                                                          |
| "LarvalBase"@en                                                            | <http://dbpedia.org/resource/LarvalBase>                                                                                |
| "lncRNAdb"@en                                                              | <http://dbpedia.org/resource/LncRNAdb>                                                                                  |
| "LocDB"@en                                                                 | <http://dbpedia.org/resource/LocDB>                                                                                     |
| "mESAdb"@en                                                                | <http://dbpedia.org/resource/MESAdb>                                                                                    |
| "MICdb"@en                                                                 | <http://dbpedia.org/resource/MICdb>                                                                                     |
| "MPromDb"@en                                                               | <http://dbpedia.org/resource/Mammalian_promoter_database>                                                               |
| "MatrixDB, the extracellular matrix interaction database."@en              | <http://dbpedia.org/resource/MatrixDB>                                                                                  |
| "MetaCyc"@en                                                               | <http://dbpedia.org/resource/MetaCyc>                                                                                   |
| "MethDB-"@en                                                               | <http://dbpedia.org/resource/MethDB>                                                                                    |
| "miRBase"@en                                                               | <http://dbpedia.org/resource/MiRBase>                                                                                   |
| "miRGator"@en                                                              | <http://dbpedia.org/resource/MiRGator>                                                                                  |
| "miRTarBase"@en                                                            | <http://dbpedia.org/resource/MiRTarBase>                                                                                |
| "ModBase"@en                                                               | <http://dbpedia.org/resource/ModBase>                                                                                   |
| "The Mouse Genome Database"@en                                             | <http://dbpedia.org/resource/Mouse_Genome_Database>                                                                     |
| "The mouse Gene Expression Database"@en                                    | <http://dbpedia.org/resource/Mouse_gene_expression_database>                                                            |
| "MIPS"@en                                                                  | <http://dbpedia.org/resource/Munich_Information_Center_for_Protein_Sequences>                                           |
| "NCBI Epigenomics"@en                                                      | <http://dbpedia.org/resource/NCBI_Epigenomics>                                                                          |
| "PID"@en                                                                   | <http://dbpedia.org/resource/NCI-Nature_Pathway_Interaction_Database>                                                   |
| "NGSmethDB"@en                                                             | <http://dbpedia.org/resource/NGSmethDB>                                                                                 |
| "neXtProt"@en                                                              | <http://dbpedia.org/resource/NeXtProt>                                                                                  |
| "NetPath"@en                                                               | <http://dbpedia.org/resource/Netpath>                                                                                   |
| "NeuroLex"@en                                                              | <http://dbpedia.org/resource/NeuroLex>                                                                                  |
| "Non-B DB"@en                                                              | <http://dbpedia.org/resource/Non-B_database>                                                                            |
| "NPRD"@en                                                                  | <http://dbpedia.org/resource/Nucleosome_positioning_region_database>                                                    |
| "OMPdb"@en                                                                 | <http://dbpedia.org/resource/OMPdb>                                                                                     |
| "TOPSAN"@en                                                                | <http://dbpedia.org/resource/Open_protein_structure_annotation_network>                                                 |
| "ODB"@en                                                                   | <http://dbpedia.org/resource/Operon_database>                                                                           |
| "OriDB"@en                                                                 | <http://dbpedia.org/resource/OriDB>                                                                                     |
| "Orientations of Proteins in Membranes"@en                                 | <http://dbpedia.org/resource/Orientations_of_Proteins_in_Membranes_database>                                            |
| "OrthoDB"@en                                                               | <http://dbpedia.org/resource/OrthoDB>                                                                                   |
| "OMA"@en                                                                   | <http://dbpedia.org/resource/Orthologous_MAtrix>                                                                        |
| "P2CS"@en                                                                  | <http://dbpedia.org/resource/P2CS>                                                                                      |
| "PANDIT"@en                                                                | <http://dbpedia.org/resource/PANDIT_%28database%29>                                                                     |
| "PCRPi-DB"@en                                                              | <http://dbpedia.org/resource/PCRPi-DB>                                                                                  |
| "PDBSum"@en                                                                | <http://dbpedia.org/resource/PDBsum>                                                                                    |
| "PROSITE"@en                                                               | <http://dbpedia.org/resource/PROSITE>                                                                                   |
| "PSORTdb"@en                                                               | <http://dbpedia.org/resource/PSORTdb>                                                                                   |
| "ParameciumDB"@en                                                          | <http://dbpedia.org/resource/ParameciumDB>                                                                              |
| "Pathway Commons"@en                                                       | <http://dbpedia.org/resource/Pathway_commons>                                                                           |
| "Patome"@en                                                                | <http://dbpedia.org/resource/Patome>                                                                                    |
| "PREX"@en                                                                  | <http://dbpedia.org/resource/Peroxiredoxin_classification_index>                                                        |
| "Pfam"@en                                                                  | <http://dbpedia.org/resource/Pfam>                                                                                      |
| "PhEVER"@en                                                                | <http://dbpedia.org/resource/PhEVER>                                                                                    |
| "PHOSIDA"@en                                                               | <http://dbpedia.org/resource/Phosida>                                                                                   |
| "Phospho.ELM"@en                                                           | <http://dbpedia.org/resource/Phospho.ELM>                                                                               |
| "Phospho3D"@en                                                             | <http://dbpedia.org/resource/Phospho3D>                                                                                 |
| "PhylomeDB"@en                                                             | <http://dbpedia.org/resource/PhylomeDB>                                                                                 |
| "PlasmoDB"@en                                                              | <http://dbpedia.org/resource/PlasmoDB>                                                                                  |
| "PmiRKB"@en                                                                | <http://dbpedia.org/resource/PmiRKB>                                                                                    |
| "PolyQ"@en                                                                 | <http://dbpedia.org/resource/PolyQ_%28database%29>                                                                      |
| "PolymiRTS"@en                                                             | <http://dbpedia.org/resource/PolymiRTS>                                                                                 |
| "PSSRdb"@en                                                                | <http://dbpedia.org/resource/Polymorphic_simple_sequence_repeats_database>                                              |
| "ProSAS"@en                                                                | <http://dbpedia.org/resource/ProSAS>                                                                                    |
| "ProtCID"@en                                                               | <http://dbpedia.org/resource/ProtCID>                                                                                   |
| "PRIDB"@en                                                                 | <http://dbpedia.org/resource/Protein-RNA_interface_database>                                                            |
| "The Protein Data Bank."@en                                                | <http://dbpedia.org/resource/Protein_Data_Bank>                                                                         |
| "PCDDB"@en                                                                 | <http://dbpedia.org/resource/Protein_circular_dichroism_data_bank>                                                      |
| "Pseudogene.org"@en                                                        | <http://dbpedia.org/resource/Pseudogene_%28database%29>                                                                 |
| "Pseudomonas Genome Database"@en                                           | <http://dbpedia.org/resource/Pseudomonas_genome_database>                                                               |
| "PubChem"@en                                                               | <http://dbpedia.org/resource/PubChem>                                                                                   |
| "PubMed"@en                                                                | <http://dbpedia.org/resource/PubMed>                                                                                    |
| "REDfly"@en                                                                | <http://dbpedia.org/resource/REDfly>                                                                                    |
| "REPAIRtoire"@en                                                           | <http://dbpedia.org/resource/REPAIRtoire>                                                                               |
| "RIKEN integrated database of mammals."@en                                 | <http://dbpedia.org/resource/RIKEN_integrated_database_of_mammals>                                                      |
| "RBPDB"@en                                                                 | <http://dbpedia.org/resource/RNA-binding_protein_database>                                                              |
| "RNA helicase database."@en                                                | <http://dbpedia.org/resource/RNA_helicase_database>                                                                     |
| "RNAMDB"@en                                                                | <http://dbpedia.org/resource/RNA_modification_database>                                                                 |
| "Reactome: a database of reactions, pathways and biological processes."@en | <http://dbpedia.org/resource/Reactome>                                                                                  |
| "REBASE"@en                                                                | <http://dbpedia.org/resource/Rebase_%28database%29>                                                                     |
| "RECODE"@en                                                                | <http://dbpedia.org/resource/Recode_%28database%29>                                                                     |
| "Refseq"@en                                                                | <http://dbpedia.org/resource/RefSeq>                                                                                    |
| "RegPhos"@en                                                               | <http://dbpedia.org/resource/RegPhos>                                                                                   |
| "RegTransBase"@en                                                          | <http://dbpedia.org/resource/RegTransBase>                                                                              |
| "RegulonDB"@en                                                             | <http://dbpedia.org/resource/RegulonDB>                                                                                 |
| "RepTar"@en                                                                | <http://dbpedia.org/resource/RepTar_%28database%29>                                                                     |
| "RetrOryza"@en                                                             | <http://dbpedia.org/resource/RetrOryza>                                                                                 |
| "Rfam"@en                                                                  | <http://dbpedia.org/resource/Rfam>                                                                                      |
| "S/MARt DB"@en                                                             | <http://dbpedia.org/resource/S/MARt>                                                                                    |
| "STRING"@en                                                                | <http://dbpedia.org/resource/STRING>                                                                                    |
| "SUPERFAMILY"@en                                                           | <http://dbpedia.org/resource/SUPERFAMILY>                                                                               |
| "SeaLifeBase"@en                                                           | <http://dbpedia.org/resource/SeaLifeBase>                                                                               |
| "SMART"@en                                                                 | <http://dbpedia.org/resource/Simple_Modular_Architecture_Research_Tool>                                                 |
| "SNPSTR"@en                                                                | <http://dbpedia.org/resource/Snptstr_%28database%29>                                                                    |
| "SPIKE"@en                                                                 | <http://dbpedia.org/resource/Spike_%28database%29>                                                                      |
| "SpliceInfo"@en                                                            | <http://dbpedia.org/resource/SpliceInfo>                                                                                |
| "StarBase"@en                                                              | <http://dbpedia.org/resource/StarBase_%28database%29>                                                                   |
| "SCLD"@en                                                                  | <http://dbpedia.org/resource/Stem_cell_lineage_database>                                                                |
| "STRBase"@en                                                               | <http://dbpedia.org/resource/Strbase>                                                                                   |
| "SAHG"@en                                                                  | <http://dbpedia.org/resource/Structure_atlas_of_human_genome>                                                           |
| "SuperSweet"@en                                                            | <http://dbpedia.org/resource/SuperSweet>                                                                                |
| "SGDB"@en                                                                  | <http://dbpedia.org/resource/Synthetic_gene_database>                                                                   |
| "TIARA"@en                                                                 | <http://dbpedia.org/resource/TIARA_%28database%29>                                                                      |
| "The TIGR Plant Repeat Databases"@en                                       | <http://dbpedia.org/resource/TIGR_plant_repeat_database>                                                                |
| "TIGR Plant Transcript Assemblies database."@en                            | <http://dbpedia.org/resource/TIGR_plant_transcript_assembly_database>                                                   |
| "TMPad"@en                                                                 | <http://dbpedia.org/resource/TMPad>                                                                                     |
| "tRNADB"@en                                                                | <http://dbpedia.org/resource/TRNADB>                                                                                    |
| "TRDB-"@en                                                                 | <http://dbpedia.org/resource/Tandem_repeats_database>                                                                   |
| "TassDB"@en                                                                | <http://dbpedia.org/resource/TassDB>                                                                                    |
| "TcoF-DB"@en                                                               | <http://dbpedia.org/resource/TcoF-DB>                                                                                   |
| "ThYme"@en                                                                 | <http://dbpedia.org/resource/ThYme_%28database%29>                                                                      |
| "TADB"@en                                                                  | <http://dbpedia.org/resource/Toxin-antitoxin_database>                                                                  |
| "TRIP"@en                                                                  | <http://dbpedia.org/resource/Transient_receptor_potential_channel-interacting_protein_database>                         |
| "TranspoGene and microTranspoGene"@en                                      | <http://dbpedia.org/resource/Transpogene>                                                                               |
| "TreeFam"@en                                                               | <http://dbpedia.org/resource/TreeFam>                                                                                   |
| "U12DB"@en                                                                 | <http://dbpedia.org/resource/U12_intron_database>                                                                       |
| "The UCSC Genome Browser"@en                                               | <http://dbpedia.org/resource/UCSC_Genome_Browser>                                                                       |
| "UCbase & miRfunc"@en                                                      | <http://dbpedia.org/resource/UCbase>                                                                                    |
| "UKPMC"@en                                                                 | <http://dbpedia.org/resource/UK_PubMed_Central>                                                                         |
| "UTRdb and UTRsite"@en                                                     | <http://dbpedia.org/resource/UTRdb>                                                                                     |
| "UTRome"@en                                                                | <http://dbpedia.org/resource/UTRome>                                                                                    |
| "UgMicroSatdb"@en                                                          | <http://dbpedia.org/resource/UgMicroSatdb>                                                                              |
| "UniGene"@en                                                               | <http://dbpedia.org/resource/UniGene>                                                                                   |
| "UniPROBE"@en                                                              | <http://dbpedia.org/resource/UniPROBE>                                                                                  |
| "UniProt"@en                                                               | <http://dbpedia.org/resource/UniProt>                                                                                   |
| "UniVec"@en                                                                | <http://dbpedia.org/resource/Univec>                                                                                    |
| "VISTA Enhancer Browser"@en                                                | <http://dbpedia.org/resource/VISTA_%28comparative_genomics%29>                                                          |
| "VnD"@en                                                                   | <http://dbpedia.org/resource/Variations_and_drugs_database>                                                             |
| "VectorDB"@en                                                              | <http://dbpedia.org/resource/VectorDB>                                                                                  |
| "ViralZon"@en                                                              | <http://dbpedia.org/resource/ViralZone>                                                                                 |
| "VKCDB"@en                                                                 | <http://dbpedia.org/resource/Voltage-gated_potassium_channel_database>                                                  |
| "WebGeSTer DB"@en                                                          | <http://dbpedia.org/resource/WebGeSTer>                                                                                 |
| "WormBase"@en                                                              | <http://dbpedia.org/resource/WormBase>                                                                                  |
| "YPA"@en                                                                   | <http://dbpedia.org/resource/Yeast_promoter_atlas>                                                                      |
| "YEASTRACT"@en                                                             | <http://dbpedia.org/resource/Yeastract>                                                                                 |
|                                                                            | <http://dbpedia.org/resource/ZINC_database>                                                                             |
| "ZFIN"@en                                                                  | <http://dbpedia.org/resource/Zebrafish_Information_Network>                                                             |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

List the biological databases in the category "Systems Biology"

SPARQL

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/ontology/>

SELECT   ?title ?description ?uri WHERE {
  ?uri a dbpedia:BiologicalDatabase .
  ?uri <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Systems_biology> .
  OPTIONAL {
  	?uri <http://dbpedia.org/property/title> ?title.
  	}
  OPTIONAL {
  	?uri <http://dbpedia.org/property/description> ?description .
  	}
} ORDER By ?title

Results:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| title                                       | description                                                                                                  | uri                                                                   |
======================================================================================================================================================================================================================================
| "BISC"@en                                   | "Protein–protein interaction database linking structural biology with functional genomics"@en                | <http://dbpedia.org/resource/BISC_%28database%29>                     |
| "BioGRID"@en                                | "interaction data."@en                                                                                       | <http://dbpedia.org/resource/BioGRID>                                 |
| "BioModels Database"@en                     | "A database for storing, exchanging and retrieving published quantitative models of biological interest."@en | <http://dbpedia.org/resource/BioModels_Database>                      |
| "ChemProt"@en                               | "disease chemical biology database."@en                                                                      | <http://dbpedia.org/resource/ChemProt>                                |
| "ConsensusPathDB"@en                        | "human functional interaction networks."@en                                                                  | <http://dbpedia.org/resource/ConsensusPathDB>                         |
| "DIMA"@en                                   | "predicted and known interactions between protein domains"@en                                                | <http://dbpedia.org/resource/DIMA_%28database%29>                     |
| "HitPredict"@en                             | "quality assessed protein-protein interactions in nine species."@en                                          | <http://dbpedia.org/resource/HitPredict>                              |
| "KEGG"@en                                   | "The KEGG resource for deciphering the genome."@en                                                           | <http://dbpedia.org/resource/KEGG>                                    |
| "KUPS"@en                                   | "datasets of interacting and non-interacting protein pairs with associated attributions."@en                 | <http://dbpedia.org/resource/KUPS_%28database%29>                     |
| "PID"@en                                    | "Pathway Interaction Database."@en                                                                           | <http://dbpedia.org/resource/NCI-Nature_Pathway_Interaction_Database> |
| "Pathway Commons"@en                        | "biological pathways."@en                                                                                    | <http://dbpedia.org/resource/Pathway_commons>                         |
| "ProtCID"@en                                | "interactions of homologous proteins in multiple crystal forms."@en                                          | <http://dbpedia.org/resource/ProtCID>                                 |
| "REPAIRtoire"@en                            | <http://dbpedia.org/resource/DNA_repair>                                                                     | <http://dbpedia.org/resource/REPAIRtoire>                             |
| "REPAIRtoire"@en                            | <http://dbpedia.org/resource/Systems_biology>                                                                | <http://dbpedia.org/resource/REPAIRtoire>                             |
| "SPIKE"@en                                  | "highly curated human signaling pathways."@en                                                                | <http://dbpedia.org/resource/Spike_%28database%29>                    |
| "STRING"@en                                 | "Search Tool for the Retrieval of Interacting Genes/Proteins"@en                                             | <http://dbpedia.org/resource/STRING>                                  |
| "3"^^<http://www.w3.org/2001/XMLSchema#int> | "identification and classification of domain-based interactions of known three-dimensional structure."@en    | <http://dbpedia.org/resource/3did>                                    |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

list the databases available at the NCBI

Sparql query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/ontology/>


SELECT  ?title ?description ?uri WHERE {
  
  
  ?uri a <http://dbpedia.org/ontology/BiologicalDatabase> .
  ?uri <http://dbpedia.org/property/center> <http://dbpedia.org/resource/National_Center_for_Biotechnology_Information> .
  OPTIONAL {
  	?uri <http://dbpedia.org/property/title> ?title .
  	}
  OPTIONAL {
  	?uri <http://dbpedia.org/property/description> ?description .
  	}
} ORDER BY ?title

Result:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| title                 | description                                                                                                        | uri                                                     |
========================================================================================================================================================================================================
| "BGMUT"@en            | "database of variations in the genes that encode antigens of blood group systems"@en                               | <http://dbpedia.org/resource/BGMUT>                     |
| "CDD"@en              | "Conserved Domain Database for the functional annotation of proteins."@en                                          | <http://dbpedia.org/resource/Conserved_domain_database> |
| "GenBank"@en          | "Nucleotide sequences for more than 300 000 organisms with supporting bibliographic and biological annotation."@en | <http://dbpedia.org/resource/GenBank>                   |
| "NCBI Epigenomics"@en | "epigenomic data sets."@en                                                                                         | <http://dbpedia.org/resource/NCBI_Epigenomics>          |
| "Refseq"@en           | "curated non-redundant sequence database of genomes."@en                                                           | <http://dbpedia.org/resource/RefSeq>                    |
| "UniGene"@en          | <http://dbpedia.org/resource/Transcriptome>                                                                        | <http://dbpedia.org/resource/UniGene>                   |
| "dbSNP"@en            | <http://dbpedia.org/resource/Database>                                                                             | <http://dbpedia.org/resource/DbSNP>                     |
| "dbSNP"@en            | <http://dbpedia.org/resource/Single-nucleotide_polymorphism>                                                       | <http://dbpedia.org/resource/DbSNP>                     |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

List the biological databases having a SPARQL endpoint

SPARQL query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>


SELECT ?uri ?endpoint WHERE {
  ?uri a <http://dbpedia.org/ontology/BiologicalDatabase> .
  ?uri <http://dbpedia.org/property/sparql> ?endpoint .
}

Result:

------------------------------------------------------------------------------------
| uri                                  | endpoint                                  |
====================================================================================
| <http://dbpedia.org/resource/ChEBI>  | <http://chebi.bio2rdf.org>                |
| <http://dbpedia.org/resource/ChEMBL> | <http://rdf.farmbio.uu.se/chembl/snorql/> |
------------------------------------------------------------------------------------
That's It, Pierre