Data visualization with Python and SVG

21
Data visualization with Python and SVG Plotting an RNA secondary structure Sukjun Kim The Baek Research Group of Computational Biology Seoul National University April 11 th , 2015 Special Lecture at Biospin Group 1

Transcript of Data visualization with Python and SVG

Page 1: Data visualization with Python and SVG

1

Data visualization with Python and SVGPlotting an RNA secondary structure

Sukjun KimThe Baek Research Group of Computational Biology

Seoul National University

April 11th, 2015

Special Lecture at Biospin Group

Page 2: Data visualization with Python and SVG

2

Plotting libraries for data visualization

• They have their own language for plotting.

• They should be installed prior to use.

• There are dependencies on upper level libraries.

• They are appropriate for high level graphics.

• We cannot customize a plot at low level.

R matplotlib d3.js

gnuplot Origin PgfPlots

PLplot Pyxplot Grace

Page 3: Data visualization with Python and SVG

3

SVG(Scalable Vector Graphics)

• XML-based vector image format for two-dimensional graphics.

• The SVG specification is an open standard developed by the World Wide Web Consortium (W3C) since 1999.

• As XML files, SVG images can be created and edited with any text editor.

• All major modern web browsers – including Mozilla Firefox, Internet Explorer, Google Chrome, Opera, and Safari – have at least some degree of SVG rendering support.

(Wikipedia – Scalable Vector Graphics)

Data visualization by writing SVG document

• SVG markup language is open standard and easy to learn.

• Not only python but also any programming language can be used.

• It requires no dependent libraries.

• We can customize graphic elements at low level.

Page 4: Data visualization with Python and SVG

4

Structure of SVG document

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">

<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="100" height="100">

<circle cx="50" cy="50" r="40" stroke="green" stroke-width="4" fill="yellow"/>

</svg>

XML tag

declaration of DOCTYPE

start of SVG tag

end of SVG tag

contents ofSVG document

SVG elements

• SVG has some predefined shape elements.

• rectangle <rect>, circle <circle>, ellipse <ellipse>, line <line>,polyline <polyline>, polygon <polygon>, path <path>, ...

• group <g>, hyperlink <a>, text <text>, ...

40

(50,50)

Page 5: Data visualization with Python and SVG

RNA secondary structural data

## microRNA structural dataseq = 'CCACCACUUAAACGUGGAUGUACUUGCUUUGAAACUAAAGAAGUAAGUGCUUCCAUGUUUUGGUGAUGG'dotbr = '(((.((((.(((((((((.(((((((((((.........))))))))))).))))))))).)))).)))'pairs = [(0,68), (1,67), (2,66), (4,64), (5,63), (6,62), ... , (29,39)]coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]

5

RNAplotRNAfoldseq dotbr, pairs coor

How to generate RNA structural data?

(Vienna RNA package, http://www.tbi.univie.ac.at/RNA/)

• seq: RNA sequence.

• dotbr: dot-bracket notation which is used to define RNA secondary structure.

• pairs: base-pairing information.

• coor: x and y coordinates for nucleotides.

This is our final image to plot

Page 6: Data visualization with Python and SVG

Writing a SVG tag in python script

6

out = []out.append('<svg xmlns="http://www.w3.org/2000/svg" version="1.1">\n') ## svg elements here out.append('</svg>\n')open('rna.svg', 'w').write(''.join(out))

<svg xmlns="http://www.w3.org/2000/svg" version="1.1"></svg>

rna.py

rna.svg

SVG documents basically requires open and close SVG tags

Page 7: Data visualization with Python and SVG

SVG Polyline

7

<polyline points="10,10 20,10 10,20 20,20" style="fill:none;stroke:black;stroke-width:3"/>

(10,10) (20,10)

(10,20) (20,20)

fill:none

stroke:black

stroke-width:3

Page 8: Data visualization with Python and SVG

Drawing phosphate backbone

8

points = ' '.join(['%.3f,%.3f'%(x, y) for x, y in coor])

out.append('<polyline points="%s" style="fill:none; stroke:black; stroke-width:1;"/>\n'%(points))

coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]

In DNA and RNA, phosphate backbone is regarded as a skeleton of the molecule. The skeleton will be represented by SVG <polyline> tag.

We have x and y coordinates of each nucleotide as below.

Using the coordination information, we can specifiy points attribute of polyline tag.

Page 9: Data visualization with Python and SVG

SVG Line

9

<line x1="0" y1="0" x2="20" y2="20" style="stroke:red;stroke-width:2"/>

(0,0)

(20,20)

stroke:red

stroke-width:2

Page 10: Data visualization with Python and SVG

Drawing base-pairing

10

for i, j in pairs:    x1, y1 = coor[i]    x2, y2 = coor[j]    out.append('<line x1="%.3f" y1="%.3f" x2="%.3f" y2="%.3f" style="stroke:black; stroke-width:1;"/>\n'%(x1, y1, x2, y2))

pairs = [(0,68), (1,67), (2,66), (4,64), (5,63), (6,62), ... , (29,39)]coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]

Watson-Crick base pairs occur between A and U, and between C and G. We will use <line> tag to represent the hydrogen bonds.

In addition to a coordination information, we also have base-pairing information in the form of tuple carrying the indexes of two nucleotides.

From two types of data, base-pairing information can be visualized as a simple line.

Page 11: Data visualization with Python and SVG

SVG Circle

11

<circle cx="50" cy="50" r="20" style="fill:red;stroke:black;stroke-width:3"/>

(50,50)

fill:red

stroke:black

40

stroke-width:3

Page 12: Data visualization with Python and SVG

SVG Text

12

<text x="0" y="15" font-size="15" style="fill:blue">I love SVG!</text>

(0,15)

fill:blue

font-size="15"I love SVG!

Page 13: Data visualization with Python and SVG

Drawing nucleotides

13

A

Each nucleotide will be represented by one character text enclosed with a circle.

seq = 'CCACCACUUAAACGUGGAUGUACUUGCUUUGAAACUAAAGAAGUAAGUGCUUCCAUGUUUUGGUGAUGG'coor = [(69.515,526.033), (69.515,511.033), ... , (84.515,526.033)]

<text><circle>

for i, base in enumerate(seq):    x, y = coor[i]    out.append('<circle cx="%.3f" cy="%.3f" r="%.3f" style="fill:white; stroke:black; stroke-width:1"/>\n'%(x, y, 5))    out.append('<text x="%.3f" y="%.3f" font-size="6" text-anchor="middle" style="fill:black">%s</text>\n'%(x, y+6*0.35, base))

RNA sequence and a coordination information is required.

<text> tag should be written after the <circle> tag.

Page 14: Data visualization with Python and SVG

Content of the python script

14

## microRNA structural dataseq = 'CCACCACUUAAACGUGGAUGUACUUGCUUUGAAACUAAAGAAGUAAGUGCUUCCAUGUUUUGGUGAUGG'dotbr = '(((.((((.(((((((((.(((((((((((.........))))))))))).))))))))).)))).)))'pairs = [(0, 68), (1, 67), (2, 66), (4, 64), (5, 63), (6, 62), (7, 61), (9, 59), (10, 58), (11, 57), (12, 56), (13, 55), (14, 54), (15, 53), (16, 52), (17, 51), (19, 49), (20, 48), (21, 47), (22, 46), (23, 45), (24, 44), (25, 43), (26, 42), (27, 41), (28, 40), (29, 39)]coor = [(69.515,526.033),(69.515,511.033),(69.515,496.033),(61.778,483.306),(69.515,469.506),(69.515,454.506),(69.515,439.506),(69.515,424.506),(62.691,412.302),(69.515,400.099),(69.515,385.099),(69.515,370.099),(69.515,355.099),(69.515,340.099),(69.515,325.099),(69.515,310.099),(69.515,295.099),(69.515,280.099),(61.778,266.298),(69.515,253.571),(69.515,238.571),(69.515,223.571),(69.515,208.571),(69.515,193.571),(69.515,178.571),(69.515,163.571),(69.515,148.571),(69.515,133.571),(69.515,118.571),(69.515,103.571),(56.481,95.317),(50.000,81.317),(52.139,66.039),(62.216,54.357),(77.015,50.000),(91.814,54.357),(101.891,66.039),(104.030,81.317),(97.549,95.317),(84.515,103.571),(84.515,118.571),(84.515,133.571),(84.515,148.571),(84.515,163.571),(84.515,178.571),(84.515,193.571),(84.515,208.571),(84.515,223.571),(84.515,238.571),(84.515,253.571),(92.252,266.298),(84.515,280.099),(84.515,295.099),(84.515,310.099),(84.515,325.099),(84.515,340.099),(84.515,355.099),(84.515,370.099),(84.515,385.099),(84.515,400.099),(91.339,412.302),(84.515,424.506),(84.515,439.506),(84.515,454.506),(84.515,469.506),(92.252,483.306),(84.515,496.033),(84.515,511.033),(84.515,526.033)]

out = []out.append('<svg xmlns="http://www.w3.org/2000/svg" version="1.1">\n')

## [1] phosphate backbone - <polyline> tagpoints = ' '.join(['%.3f,%.3f'%(x, y) for x, y in coor])out.append('<polyline points="%s" style="fill:none; stroke:black; stroke-width:1;"/>\n'%(points))

## [2] base-pairing - <line> tagfor i, j in pairs:    x1, y1 = coor[i]    x2, y2 = coor[j]    out.append('<line x1="%.3f" y1="%.3f" x2="%.3f" y2="%.3f" style="stroke:black; stroke-width:1;"/>\n'%(x1, y1, x2, y2))

## [3] nucleotide - <circle> and <text> tagsfor i, base in enumerate(seq):    x, y = coor[i]    out.append('<circle cx="%.3f" cy="%.3f" r="%.3f" style="fill:white; stroke:black; stroke-width:1"/>\n'%(x, y, 5))    out.append('<text x="%.3f" y="%.3f" font-size="6" text-anchor="middle" style="fill:black">%s</text>\n'%(x, y+6*0.35, base))

out.append('</svg>\n')open('rna.svg', 'w').write(''.join(out))

Page 15: Data visualization with Python and SVG

How to use other SVG tags? Go to w3schools.com!

Page 16: Data visualization with Python and SVG

16

Real exampleswith Python and SVG

Page 17: Data visualization with Python and SVG

17

reciPlot

<text><polygon>

Plot for visualizingthe tissue-specific

expression of genes.

Page 18: Data visualization with Python and SVG

18

escPlot

<line><text><path><circle><polyline>

Plot for representing expression, structure, and conservation data of RNA

collectively in a single plot.

Page 19: Data visualization with Python and SVG

wheelPlot

19

<circle><polyline><path> <line><rect> <text>

Plot for visualizingall suboptimal RNA

secondary structures.

Page 20: Data visualization with Python and SVG

Conclusions

20

• There are many graphic tools and libraries for data visualization.

• These software options provide a function limited to high level graphics.

• No dependent libraries or significant time investment are required for learning a specific language to write SVG documents.

• If you want to plot a noncanonical type of graph and customize it at low level, writing a SVG document with Python will be the best solution that meets your purpose.

Page 21: Data visualization with Python and SVG

Thank you!Have a nice weekend.

21