06 September 2011

Merging two XML files using the libxml streaming API.

I just wrote a program appending one or more XML files to another XML file using the libxml streaming API. This program named mergexml is available on github at: mergexml.

Usage

$ mergexml -h
mergexml: Pierre Lindenbaum PHD. 2011.
Compilation: Sep 6 2011 at 19:54:22.
Usage:
mergexml [options] -i database.xml xml1, xml2, ....xmln
Options:
-i (required)
-o (default:stdout)
--replace replace xml-in
-r ignore root of XML docs

Example

Say you have a RDF-based collection of quotes in a large file database.rdf:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:q="http://quote.lindenb.org/ontology"
>
<!-- (...) -->
<foaf:Person rdf:about="http://en.wikipedia.org/wiki/Louis_Pasteur">
<foaf:name>Louis Pasteur</foaf:name>
</foaf:Person>

<q:Quote>
<q:author rdf:resource="http://en.wikipedia.org/wiki/Louis_Pasteur"/>
<q:quote xml:lang="en">Science knows no country, because knowledge belongs to humanity, and is the torch which illuminates the world.</q:quote>
</q:Quote>

<q:Quote>
<q:author rdf:resource="http://en.wikipedia.org/wiki/Louis_Pasteur"/>
<q:quote xml:lang="en">I am utterly convinced that Science and Peace will triumph over Ignorance and War, that nations will eventually unite not to destroy but to edify, and that the future will belong to those who have done the most for the sake of suffering humanity.</q:quote>
</q:Quote>

</rdf:RDF>
And you want to append another file 'chunck.rdf' at the end to 'database.rdf':
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:q="http://quote.lindenb.org/ontology"
>
<foaf:Person rdf:about="http://en.wikipedia.org/wiki/Isaac_Asimov">
<foaf:name>Isaac Asimov</foaf:name>
</foaf:Person>

<q:Quote>
<q:author rdf:resource="http://en.wikipedia.org/wiki/Isaac_Asimov"/>
<q:quote xml:lang="en">Violence is the last refuge of the incompetent.</q:quote>
</q:Quote>

<q:Quote>
<q:author rdf:resource="http://en.wikipedia.org/wiki/Isaac_Asimov"/>
<q:quote xml:lang="en">Never let your sense of morals prevent you from doing what is right.</q:quote>
</q:Quote>

<q:Quote>
<q:author rdf:resource="http://en.wikipedia.org/wiki/Isaac_Asimov"/>
<q:quote xml:lang="en">Creationists make it sound as though a "theory" is something you dreamt up after being drunk all night</q:quote>
</q:Quote>


<q:Quote>
<q:author rdf:resource="http://en.wikipedia.org/wiki/Isaac_Asimov"/>
<q:quote xml:lang="fr">Si le savoir peut créer des problèmes, ce n'est pas l'ignorance qui les résoudra.</q:quote>
</q:Quote>

</rdf:RDF>


Append the file:
mergexml --replace -r -i database.rdf chunk.rdf

Result:
$ xmllint --format database.rdf
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:q="http://quote.lindenb.org/ontology">
<foaf:Person rdf:about="http://en.wikipedia.org/wiki/Louis_Pasteur">
<foaf:name>Louis Pasteur</foaf:name>
</foaf:Person>
<q:Quote>
<q:author rdf:resource="http://en.wikipedia.org/wiki/Louis_Pasteur"/>
<q:quote xml:lang="en">Science knows no country, because knowledge belongs to humanity, and is the torch which illuminates the world.</q:quote>
</q:Quote>
<q:Quote>
<q:author rdf:resource="http://en.wikipedia.org/wiki/Louis_Pasteur"/>
<q:quote xml:lang="en">I am utterly convinced that Science and Peace will triumph over Ignorance and War, that nations will eventually unite not to destroy but to edify, and that the future will belong to those who have done the most for the sake of suffering humanity.</q:quote>
</q:Quote>
<foaf:Person rdf:about="http://en.wikipedia.org/wiki/Isaac_Asimov">
<foaf:name>Isaac Asimov</foaf:name>
</foaf:Person>
<q:Quote>
<q:author rdf:resource="http://en.wikipedia.org/wiki/Isaac_Asimov"/>
<q:quote xml:lang="en">Violence is the last refuge of the incompetent.</q:quote>
</q:Quote>
<q:Quote>
<q:author rdf:resource="http://en.wikipedia.org/wiki/Isaac_Asimov"/>
<q:quote xml:lang="en">Never let your sense of morals prevent you from doing what is right.</q:quote>
</q:Quote>
<q:Quote>
<q:author rdf:resource="http://en.wikipedia.org/wiki/Isaac_Asimov"/>
<q:quote xml:lang="en">Creationists make it sound as though a "theory" is something you dreamt up after being drunk all night</q:quote>
</q:Quote>
<q:Quote>
<q:author rdf:resource="http://en.wikipedia.org/wiki/Isaac_Asimov"/>
<q:quote xml:lang="fr">Si le savoir peut créer des problèmes, ce n'est pas l'ignorance qui les résoudra.</q:quote>
</q:Quote>
</rdf:RDF>


That's it

Pierre

No comments: