26 September 2011

PostScript as a Programming Language for Bioinformatics: mynotebook

"PostScript (PS) is an interpreted, stack-based programming language. It is best known for its use as a page description language in the electronic and desktop publishing areas."[wikipedia]. In this post, I'll show how I've used to create a simple and lightweight view of the genome.

Introduction: just a simple postscript program

The following PS program fills a rectangular gray shape; You can display the result using ghostview, a2ps, etc...
%!PS
newpath
50 50 moveto
0 100 rlineto
100 0 rlineto
0 -100 rlineto
closepath
0.5 setgray
fill
showpage

Some global variables

The page width

/screenWidth 1000 def

The page width

/screenHeight 1000 def

The minimum 5' position

/minChromStart 1E9 def

The maximum 3' position

/maxChromEnd -1 def

The size of a genomic feature

/featureHeight 20 def

The distance between two 'ticks' for drawing the orientation

/ticksx 20 def

The font size

/theFontSize 9 def
The variable knownGene is a PS array of genes.

/knownGene [
[(uc002zkr.3) (chr22) (-) 161242...
...]
] def

Each Gene is a PS array holding the structure of the UCSC knownGene table, that is to say: name , chromosome, txStart, txEnd, cdsStart, cdsEnd, exonStarts, exonEnds:

[(uc002zmh.2) (chr22) (-) 17618410 17646177 17618910 17646134
   [17618410 17619439 17621948 17623987 17625913 17629337 17630431 17646098 ]
   [17619247 17619628 17622123 17624021 17626007 17629450 17630635 17646177 ]
]
. a simple command line can be used to fetch those data:
%  curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz" |\
gunzip -c | grep chr22 | head -n 20 |\
awk '{printf("[(%s) (%s) (%s) %s %s %s %s [%s] [%s] ]\n",$1,$2,$3,$4,$5,$6,$7,$9,$10);}' |\
tr "," " " > result.txt 

Some utilities

converting a PS object to string

/toString
{
20 string cvs 
} bind def

Converting a string to interger (loop over each character an increase the current value)

/toInteger
{
3 dict begin
/s exch def
/i 0 def
/n 0 def
s {
  n 10 mul /n exch def
  s i get 48 sub n add /n exch def %48=ascii('0')
  i 1 add /i exch def
  } forall
n % leave n on the stack
end
} bind def

Convert a genomic position to a index on the page 'x' axis

/convertPos2pixel
{
minChromStart sub maxChromEnd minChromStart sub div screenWidth mul
} bind def

Extract the chromosome (that is to say, extract the 1st element of the current array on the stack)

/getChrom
{
1 get
} bind def

Create a hyperlink to the UCSC genome browser

/getHyperLink
{
3 dict begin
/E exch def %% END
/S exch def %% START
/C exch def %% CHROMOSOME
[ (http://genome.ucsc.edu/cgi-bin/hgTracks?position=) C (:) S toString (-) E toString (&) (&db=hg19) ] concatstringarray
end
} bind def

Paint a rectangle

/box
{
4 dict begin
/height exch def
/width exch def
/y exch def
/x exch def
x y moveto
width 0 rlineto
0 height rlineto
width -1 mul 0 rlineto
0 height -1 mul rlineto
end
} bind def

Paint a gray gradient

/gradient
{
4 dict begin
/height exch def
/width exch def
/y exch def
/x exch def
/i 0 def
height 2 div /i exch def

0 1 height 2 div {
	1 i height 2.0 div div sub setgray
	newpath
	x  
	y height 2 div i sub  add
	width
	i 2 mul
	box
	closepath
	fill
	i 1 sub /i exch def
	}for
newpath
0 setgray
0.4 setlinewidth
x y width height box
closepath
stroke
end
} bind def

Methods extracting a data about the current gene on the PS stack.

Extract the transcription start:

/getTxStart
{
3 get
} bind def

Extract the transcription end:

/getTxEnd
{
4 get
} bind def

Extract the CDS start:

/getCdsStart
{
5 get
} bind def

Extract the transcription end:

/getCdsEnd
{
6 get
} bind def

Extract the strand:

/getStrand
{
2 get (+) eq {1} {-1} ifelse
} bind def
Get the gene name

/getKgName
{
0 get
} bind def

Get the number of exons:

/getExonCount
{
7 get length
} bind def

Get the start position of the i-th exon:

/getExonStart
{
2 dict begin
/i exch def
/gene exch def
gene 7 get i get
end
} bind def

Get the end position of the i-th exon:

/getExonEnd
{
2 dict begin
/i exch def
/gene exch def
gene 8 get i get
end
} bind def

Should we draw this gene on the page ?

/isVisible
{
1 dict begin
/gene exch def
minChromStart gene getTxEnd gt 
	{
	false
	}
	{
	gene getTxStart maxChromEnd gt
		{
		false
		}
		{
		true
		}ifelse
	}ifelse
end
}bind def

Methods for an array of genes

Loop over the genes and extract the lowest 5' index:

/getMinChromStart
{
3 dict begin
/genes exch def
/pos 10E9 def
/i 0 def
genes length {
	genes i get getTxStart pos min /pos  exch def
	i 1 add /i exch def
	}repeat
pos
end
} bind def

Loop over the genes and extract the highest 3' index:

/getMaxChromEnd
{
3 dict begin
/genes exch def
/pos -1E9 def
/i 0 def
genes length {
	genes i get getTxEnd pos max /pos  exch def
	i 1 add /i exch def
	}repeat
pos
end
} bind def

Painting ONE Gene

/paintGene
{
5 dict begin
/gene exch def %% the GENE argument
/midy featureHeight 2.0 div def %the middle of the row
/x0 gene getTxStart convertPos2pixel def % 5' side of the gene in pixel
/x1 gene getTxEnd convertPos2pixel def % 3' side of the gene in pixel
/i 0 def
0.1 setlinewidth

1 0 0 setrgbcolor

newpath
x0 midy moveto
x1 midy lineto
closepath
stroke


% paint ticks
0 1 x1 x0 sub ticksx div{
	newpath
	gene getStrand 1 eq 
		{
		x0 ticksHeight sub i add midy ticksHeight add moveto
		x0 i add midy lineto
		x0 ticksHeight sub i add midy ticksHeight sub lineto
		}
	%else
		{
		x0 ticksHeight add i add midy ticksHeight add moveto
		x0 i add midy lineto
		x0 ticksHeight add i add midy ticksHeight sub lineto
		} ifelse
	stroke
	i ticksx add /i exch def
	} for

%paint Transcript start-end
0 0 1 setrgbcolor
newpath
gene getCdsStart convertPos2pixel
midy cdsHeight 2 div sub
gene getCdsEnd convertPos2pixel gene getCdsStart convertPos2pixel sub 
cdsHeight box
closepath
fill

% loop over exons
0 /i exch def
gene getExonCount
	{
	gene i getExonStart convertPos2pixel
	midy exonHeight 2 div sub
	gene i getExonEnd convertPos2pixel gene i getExonStart convertPos2pixel sub
	exonHeight gradient
	i 1 add /i exch def
	} repeat
0 setgray
gene getTxEnd convertPos2pixel 10 add midy moveto
gene getKgName show

%URL 
[ /Rect [x0 0 x1 1 add featureHeight]
/Border [1 0 0]
/Color [1 0 0]
/Action << /Subtype /URI /URI gene getChrom gene getTxStart gene getTxEnd getHyperLink  >>
/Subtype /Link
/ANN pdfmark

end
} bind def

Paint all Genes

/paintGenes
{
3 dict begin
/genes exch def %the GENE argument (an array)
/i 0 def % loop iterator
/j 0 def % row iterator


% draw 10 vertical lines
i 0 /i exch def
0 setgray
0 1 10 {
	%draw a vertical line
	screenWidth 10 div i mul 0 moveto
	screenWidth 10 div i mul screenHeight lineto
	stroke
	% print the position at the top rotate by 90°
	screenWidth 10 div i mul 10 add screenHeight 5 sub moveto
	-90 rotate
	maxChromEnd minChromStart sub i 10 div mul minChromStart add toString show
	90 rotate
	i 1 add /i exch def
	} for

0 /i exch def
genes length {
	genes i get isVisible
		{
		gsave
		0 j  featureHeight 2 add mul translate
		genes i get paintGene
		j 1 add /j exch def
		grestore
		} if
	i 1 add /i exch def
	}repeat
end
} bind def

All in one: the postscript code


Open the PS file in ghostview, evince, ...

Zooming ? Yes we can.

Ghostview has an option -Sname=string
       -Sname=string
       -sname=string
              Define  a  name  in  "systemdict"  with a given string as value.
              This is different from -d.

In my postscript file, the default values for minChromStart and maxChromEnd are overridden by the user's parameters:

systemdict /userChromStart known {
	userChromStart toInteger /minChromStart  exch def
	} if

systemdict /userChromEnd known {
	userChromEnd toInteger /maxChromEnd  exch def
	} if
That's it,

Pierre

3 comments:

Tim said...

That's really lightweight, great! Is there a way to pass the gene list in the command-line instead of copy-paste it in the ".ps" file?

Pierre Lindenbaum said...

@Tim I don't know, I'm not a PS guru. I briefly searched on the web but found no answer.

Con said...

Yes PS has file operators, and you can use them in GhistScript. See http://www.ghostscript.com/doc/current/Use.htm#Finding_files