11 November 2008

SPARQL for solubility/RDF: my notebook

In a recent thread on FriendFeed , I've transformed Jean-Claude's Bradley's data about the solubility of some compounds into RDF.


The original data set looks like this:

The RDF version looks like this:
<!DOCTYPE rdf:RDF [
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
<!ENTITY doap "http://usefulinc.com/ns/doap#">
<!ENTITY foaf "http://xmlns.com/foaf/0.1/">
<!ENTITY dc "http://purl.org/dc/elements/1.1/">
<!ENTITY chem "http://blueobelisk.sourceforge.net/chemistryblogs/">
]>
<rdf:RDF
xmlns:rdf="&rdf;"
xmlns:dc="&dc;"
xmlns:rdfs="&rdfs;"
xmlns:doap="&doap;"
xmlns:foaf="&foaf;"
xmlns:chem="&chem;"
>
<!--=== PERSONS ============================================================== -->
<foaf:Person rdf:about="http://www.chemistry.drexel.edu/people/bradley/bradley.asp">
<foaf:name>Jean-Claude Bradley</foaf:name>
<foaf:nick>jcbradley</foaf:nick>
<foaf:sha1_sum>b68f7dca9555a1cfe1ad18c6d2be0db6e552d678</foaf:sha1_sum>
<foaf:holdsAccount>
<foaf:OnlineAccount>
<foaf:accountServiceHomepage rdf:resource="http://www.linkedin.com/"/>
<foaf:accountProfilePage rdf:resource="http://www.linkedin.com/in/jcbradley"/>
</foaf:OnlineAccount>
</foaf:holdsAccount>
</foaf:Person>

<!--=== PROJECT ============================================================== -->
<doap:Project rdf:ID="SolubilityProject">
<doap:name>Solubility</doap:name>
<doap:homepage rdf:resource="http://spreadsheets.google.com/ccc?key=plwwufp30hfq0udnEmRD1aQ" />
<doap:shortdesc xml:lang="en">Solubility</doap:shortdesc>
<doap:shortdesc xml:lang="fr">Solubilité</doap:shortdesc>
<doap:description xml:lang="en">Solubility</doap:description>
<doap:description xml:lang="fr">Solubilité</doap:description>
</doap:Project>
<!--=== Compound ============================================================== -->

<chem:Compound rdf:about="http://commons.wikimedia.org/wiki/Image:D-Mannitol_structure.png">
<chem:name>D-Manitol</chem:name>
<chem:image rdf:resource="http://upload.wikimedia.org/wikipedia/commons/b/bb/D-Mannitol_structure.png"/>
<chem:smiles>O[C@H]([C@H](O)CO)[C@H](O)[C@H](O)CO</chem:smiles>
</chem:Compound>

<chem:Compound rdf:about="http://en.wikipedia.org/wiki/Ethanol">
<chem:name>Ethanol</chem:name>
<chem:image rdf:resource="http://upload.wikimedia.org/wikipedia/commons/6/6f/Ethanol_flat_structure.png"/>
<chem:smiles>OCC</chem:smiles>
</chem:Compound>

<chem:Compound rdf:about="http://en.wikipedia.org/wiki/Sodium_chloride">
<chem:name>Sodium chloride</chem:name>
<chem:image rdf:resource="http://upload.wikimedia.org/wikipedia/commons/e/e9/Sodium-chloride-3D-ionic.png"/>
<chem:smiles>[Na+].[Cl-]</chem:smiles>
</chem:Compound>

<!--=== Experiment ============================================================== -->
<chem:Experiment rdf:about="http://usefulchem.wikispaces.com/exp207">
<dc:name>Hello, I'm Experiment 207</dc:name>
<chem:project rdf:resource="#SolubilityProject"/>
</chem:Experiment>

<chem:Experiment rdf:about="http://usefulchem.wikispaces.com/exp1">
<dc:name>Hello, I'm Experiment 1</dc:name>
<chem:project rdf:resource="#SolubilityProject"/>
</chem:Experiment>

<!--=== Sample ============================================================== -->
<chem:Sample rdf:about="sample:11">
<dc:name>Hello, I'm Sample 11</dc:name>
</chem:Sample>
<chem:Sample rdf:about="sample:3">
<dc:name>Hello, I'm Sample 3</dc:name>
</chem:Sample>
<chem:Sample rdf:about="sample:12">
<dc:name>Hello, I'm Sample 12</dc:name>
</chem:Sample>
<!--=== Experimental Data ============================================================== -->
<chem:ExperimentalData >
<dc:date>2008-01-01</dc:date>
<chem:author rdf:resource="http://www.chemistry.drexel.edu/people/bradley/bradley.asp"/>
<chem:solute rdf:resource="http://commons.wikimedia.org/wiki/Image:D-Mannitol_structure.png"/>
<chem:solvent rdf:resource="http://en.wikipedia.org/wiki/Ethanol"/>
<chem:experiment-id rdf:resource="http://usefulchem.wikispaces.com/exp207"/>
<chem:sample rdf:resource="sample:11"/>
<chem:concentration rdf:datatype="chem:Molar">0.00</chem:concentration>
</chem:ExperimentalData>

<chem:ExperimentalData>
<dc:date>2008-02-01</dc:date>
<chem:author rdf:resource="http://www.chemistry.drexel.edu/people/bradley/bradley.asp"/>
<chem:solute rdf:resource="http://en.wikipedia.org/wiki/Sodium_chloride"/>
<chem:solvent rdf:resource="http://en.wikipedia.org/wiki/Ethanol"/>
<chem:experiment-id rdf:resource="http://onschallenge.wikispaces.com/JennyHale-1"/>
<chem:sample rdf:resource="sample:3"/>
<chem:concentration rdf:datatype="chem:Molar">0.00</chem:concentration>
</chem:ExperimentalData>

<chem:ExperimentalData>
<dc:date>2008-03-01</dc:date>
<chem:author rdf:resource="http://www.chemistry.drexel.edu/people/bradley/bradley.asp"/>
<chem:solute rdf:resource="http://en.wikipedia.org/wiki/Sodium_chloride"/>
<chem:solvent rdf:resource="http://en.wikipedia.org/wiki/Ethanol"/>
<chem:experiment-id rdf:resource="http://usefulchem.wikispaces.com/exp207"/>
<chem:sample rdf:resource="sample:12"/>
<chem:concentration rdf:datatype="chem:Molar">0.00</chem:concentration>
</chem:ExperimentalData>

</rdf:RDF>

Here I describe how I used SPARQL to retrieve Jean-Claude's original data set from this RDF file.
I've downloaded ARQ , the SPARQL engine, from http://jena.sourceforge.net/ARQ/.
Here are a few queries:

listing all the chem:Compound


query


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX doap: <http://usefulinc.com/ns/doap#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX chem: <http://blueobelisk.sourceforge.net/chemistryblogs/>

SELECT ?x
{
?x
}

Running the query


sparql -query jeter.rq --data=solubility.rdf

Result


----------------------------------------------------------------------
| x |
======================================================================
| <http://en.wikipedia.org/wiki/Sodium_chloride> |
| <http://en.wikipedia.org/wiki/Ethanol> |
| <http://commons.wikimedia.org/wiki/Image:D-Mannitol_structure.png> |
----------------------------------------------------------------------

The same query but using prefixes


query


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX doap: <http://usefulinc.com/ns/doap#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX chem: <http://blueobelisk.sourceforge.net/chemistryblogs/>

SELECT ?x
{
?x rdf:type chem:Compound
}

result


----------------------------------------------------------------------
| x |
======================================================================
| <http://en.wikipedia.org/wiki/Sodium_chloride> |
| <http://en.wikipedia.org/wiki/Ethanol> |
| <http://commons.wikimedia.org/wiki/Image:D-Mannitol_structure.png> |
----------------------------------------------------------------------

Listing the compounds , their names, their 'smiles'


query


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX doap: <http://usefulinc.com/ns/doap#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX chem: <http://blueobelisk.sourceforge.net/chemistryblogs/>

SELECT ?compound ?compoundName ?compoundSmiles
{
?compound rdf:type chem:Compound .
?compound chem:name ?compoundName .
?compound chem:smiles ?compoundSmiles .
}

result


-----------------------------------------------------------------------------------------------------------------------------------
| compound | compoundName | compoundSmiles |
===================================================================================================================================
| <http://en.wikipedia.org/wiki/Sodium_chloride> | "Sodium chloride" | "[Na+].[Cl-]" |
| <http://en.wikipedia.org/wiki/Ethanol> | "Ethanol" | "OCC" |
| | "D-Manitol" | "O[C@H]([C@H](O)CO)[C@H](O)[C@H](O)CO" |
-----------------------------------------------------------------------------------------------------------------------------------

The same, but only the compounds with a name containing "OL"


query


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX doap: <http://usefulinc.com/ns/doap#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX chem: <http://blueobelisk.sourceforge.net/chemistryblogs/>

SELECT ?compound ?compoundName ?compoundSmiles
{
?compound rdf:type chem:Compound .
?compound chem:name ?compoundName .
?compound chem:smiles ?compoundSmiles .
FILTER regex(?compoundName, "ol", "i")
}

result


------------------------------------------------------------------------------------------------------------------------------
| compound | compoundName | compoundSmiles |
==============================================================================================================================
| <http://en.wikipedia.org/wiki/Ethanol> | "Ethanol" | "OCC" |
| <http://commons.wikimedia.org/wiki/Image:D-Mannitol_structure.png> | "D-Manitol" | "O[C@H]([C@H](O)CO)[C@H](O)[C@H](O)CO" |
------------------------------------------------------------------------------------------------------------------------------

the same, but add the 'chem:description', if any


query


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX doap: <http://usefulinc.com/ns/doap#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX chem: <http://blueobelisk.sourceforge.net/chemistryblogs/>

SELECT ?compound ?compoundName ?compoundSmiles ?description
{
?compound rdf:type chem:Compound .
?compound chem:name ?compoundName .
?compound chem:smiles ?compoundSmiles .
FILTER regex(?compoundName, "ol", "i")
OPTIONAL { ?compound chem:description ?description }

}

result


--------------------------------------------------------------------------------------------------------------------------------------------
| compound | compoundName | compoundSmiles | description |
============================================================================================================================================
| <http://en.wikipedia.org/wiki/Ethanol> | "Ethanol" | "OCC" | |
| <http://commons.wikimedia.org/wiki/Image:D-Mannitol_structure.png> | "D-Manitol" | "O[C@H]([C@H](O)CO)[C@H](O)[C@H](O)CO" | |
--------------------------------------------------------------------------------------------------------------------------------------------

retrieving Jean-Claude's data


The query


PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX chem: <http://blueobelisk.sourceforge.net/chemistryblogs/>

SELECT
?exp
?sample
?solvent ?solventName ?solventSmiles
?solute ?soluteName ?soluteSmiles
?conc

{
?solvent rdf:type chem:Compound .
?solvent chem:name ?solventName .
?solvent chem:smiles ?solventSmiles .

?solute rdf:type chem:Compound .
?solute chem:name ?soluteName .
?solute chem:smiles ?soluteSmiles .

?expData rdf:type chem:ExperimentalData .
?expData chem:solute ?solute .
?expData chem:solvent ?solvent .
?expData chem:concentration ?conc .
?expData chem:experiment-id ?exp .
?expData chem:sample ?sample .
}

result


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| exp | sample | solvent | solventName | solventSmiles | solute | soluteName | soluteSmiles | conc |
==================================================================================================================================================================================================================================================================================================
| <http://usefulchem.wikispaces.com/exp207> | <sample:12> | <http://en.wikipedia.org/wiki/Ethanol> | "Ethanol" | "OCC" | <http://en.wikipedia.org/wiki/Sodium_chloride> | "Sodium chloride" | "[Na+].[Cl-]" | "0.00"^^<chem:Molar> |
| <http://onschallenge.wikispaces.com/JennyHale-1> | <sample:3> | <http://en.wikipedia.org/wiki/Ethanol> | "Ethanol" | "OCC" | <http://en.wikipedia.org/wiki/Sodium_chloride> | "Sodium chloride" | "[Na+].[Cl-]" | "0.00"^^<chem:Molar> |
| <http://usefulchem.wikispaces.com/exp207> | <sample:11> | <http://en.wikipedia.org/wiki/Ethanol> | "Ethanol" | "OCC" | <http://commons.wikimedia.org/wiki/Image:D-Mannitol_structure.png> | "D-Manitol" | "O[C@H]([C@H](O)CO)[C@H](O)[C@H](O)CO" | "0.00"^^<chem:Molar> |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


That's it

Pierre

4 comments:

Egon Willighagen said...

There is information in the RDF that is not in the spreadsheet... did you compile it manually?

Pierre Lindenbaum said...

Egon, yes that was just a test to learn sparql.

Egon Willighagen said...

OK, fine.

Writing some Jena code to convert the CSV into RDF. Will upload to GitHub...

Egon Willighagen said...

Pierre, on GitHub I have uploaded a slightly alternative route to get to wikipedia links: dbpedia.org.