15 July 2013

Playing with the "UCSC Genome Browser Track Hubs". my notebook

The UCSC has recently created the Genome Browser Track Hubs: " Track hubs are web-accessible directories of genomic data that can be viewed on the UCSC Genome Browser. ". I've created a Hub for the Rotavirus Genome hosted on github at:https://github.com/lindenb/genomehub.
My data were primarily described as a XML file. It contains a description of the genome, of the tracks, the path to the fasta sequence etc... The FASTA sequence was provided by Dr Didier Poncet (CNRS/Gig). As far as I understand, it is not currently possible to specify that a track describes a protein.

<?xml version="1.0" encoding="UTF-8"?>
<genomeHub >
  <name>Rotavirus</name>
  <shortLabel>Rotavirus</shortLabel>
  <longLabel>Rotavirus</longLabel>
  (...)
  <accessions id="set1">
   <acn>GU144588</acn>
 <acn source="uniprot">Q0H8C5</acn>
 <acn source="uniprot">Q45UF6</acn>
 (..)
  <genome id="rf11">
    <description>Rotavirus RF11</description>
    <organism>Rotavirus</organism>
    <defaultPos>RF01:1-10</defaultPos>
    <scientificName>Rotavirus</scientificName>
    <organism>Rotavirus</organism>
    <orderKey>10970</orderKey>
    <fasta>rotavirus/rf/rf.fa</fasta>
     (...)
 <group id="active_site"><accessions ref="set1"/><include>active site</include></group>
 <group id="calcium-binding_region"><accessions ref="set1"/><include>calcium-binding region</include></group>
 <group id="chain"><accessions ref="set1"/><include>chain</include></group>
   (...)
This XML file is then processed with the following xsl stylsheet: https://github.com/lindenb/genomehub/blob/master/data/genomehub.xml : it generates a Makefile that will translate the fasta sequence to 2bit, create the bed files by aligning some annotated files to the reference with blast and convert them to bigbed.
At the end, my directory contains the following files:
./data/genomehub.xml
./data/genomehub2make.xsl
./data/sequence2fasta.xsl
./data/hub.txt
./data/genomes.txt
./data/rotavirus
./data/rotavirus/rf
./data/rotavirus/rf/signal_peptide.bed
./data/rotavirus/rf/CDS.bed
./data/rotavirus/rf/turn.bb
./data/rotavirus/rf/chrom.sizes
./data/rotavirus/rf/site.bed
./data/rotavirus/rf/coiled-coil_region.bed
./data/rotavirus/rf/mutagenesis_site.bb
./data/rotavirus/rf/UTR.bed
./data/rotavirus/rf/reference.fa~
./data/rotavirus/rf/misc_feature.bed
./data/rotavirus/rf/CDS.bb
./data/rotavirus/rf/helix.bed
./data/rotavirus/rf/strand.bb
./data/rotavirus/rf/sequence_conflict.bb
./data/rotavirus/rf/modified_residue.bb
./data/rotavirus/rf/coiled-coil_region.bb
./data/rotavirus/rf/topological_domain.bb
./data/rotavirus/rf/active_site.bed
./data/rotavirus/rf/sequence_variant.bb
./data/rotavirus/rf/transmembrane_region.bb
./data/rotavirus/rf/zinc_finger_region.bed
./data/rotavirus/rf/region_of_interest.bb
./data/rotavirus/rf/glycosylation_site.bb
./data/rotavirus/rf/domain.bb
./data/rotavirus/rf/region_of_interest.bed
./data/rotavirus/rf/misc_feature.bb
./data/rotavirus/rf/topological_domain.bed
./data/rotavirus/rf/sequence_conflict.bed
./data/rotavirus/rf/UTR.bb
./data/rotavirus/rf/compositionally_biased_region.bed
./data/rotavirus/rf/chain.bed
./data/rotavirus/rf/glycosylation_site.bed
./data/rotavirus/rf/trackDb.txt
./data/rotavirus/rf/modified_residue.bed
./data/rotavirus/rf/disulfide_bond.bed
./data/rotavirus/rf/strand.bed
./data/rotavirus/rf/helix.bb
./data/rotavirus/rf/compositionally_biased_region.bb
./data/rotavirus/rf/transmembrane_region.bed
./data/rotavirus/rf/rf.fa
./data/rotavirus/rf/rf.2bit
./data/rotavirus/rf/splice_variant.bed
./data/rotavirus/rf/short_sequence_motif.bed
./data/rotavirus/rf/rf.fa.nsq
./data/rotavirus/rf/ALL.bed.blast.xml~
./data/rotavirus/rf/gene.bed
./data/rotavirus/rf/sequence_variant.bed
./data/rotavirus/rf/disulfide_bond.bb
./data/rotavirus/rf/signal_peptide.bb
./data/rotavirus/rf/rf.fa.nin
./data/rotavirus/rf/short_sequence_motif.bb
./data/rotavirus/rf/turn.bed
./data/rotavirus/rf/domain.bed
./data/rotavirus/rf/mutagenesis_site.bed
./data/rotavirus/rf/zinc_finger_region.bb
./data/rotavirus/rf/chain.bb
./data/rotavirus/rf/rf.fa.nhr
./data/rotavirus/rf/splice_variant.bb
./data/rotavirus/rf/active_site.bb
./data/rotavirus/rf/site.bb
./data/rotavirus/rf/description.html
./README.md

The files required by the UCSC are then pushed on github and the URL pointing to hub.txt (https://raw.github.com/lindenb/genomehub/master/data/hub.txt) is registered at http://genome.ucsc.edu/cgi-bin/hgHubConnect. And a few clicks later...





That's it,

Pierre




No comments: