Lecture 4 - Stanford University · Cell502 Poor Graphs Figure 1. Classification of TFBS Regions...
Transcript of Lecture 4 - Stanford University · Cell502 Poor Graphs Figure 1. Classification of TFBS Regions...
Lecture 4: Visualization
• Basic plotting commands
• Types of plots
• Customizing plots graphically
• Specifying color
• Customizing plots programmatically
• Exporting figures
Outline
Why use Matlab for visualization?
• Matlab is flexible enough to let you quickly visualize data, and powerful enough to give you complete control over the final product
• Features:• Interactive plotting• simple 3D plotting• programmatic annotation
Basic Plots
• 2D Visualization• plot (line plots)• histogram• scatter (scatter plots)• image/imagesc (images)
• 3D Visualization• surf/mesh (surfaces)• plot3 (lines)• scatter3
Working with Figures
• Create a new figure : figure();• Specify a figure number: figure(1)• Hold onto a figure “handle”
• figHandle1 = figure(1); • Re-select a figure:
• figure(figHandle1)• Some useful functions:
• clf: clear figure• close all: closes all figures• gcf: get handle to current figure
Demo: Figures
plot– Syntax:����������� ������������������ plot(x,y)����������� ������������������ plots����������� ������������������ points����������� ������������������ in����������� ������������������ the����������� ������������������ vector����������� ������������������ y����������� ������������������ against����������� ������������������ points����������� ������������������ in����������� ������������������ the����������� ������������������ vector����������� ������������������ x
histogram/bar– Syntax:����������� ������������������ histogram(y)����������� ������������������ plots����������� ������������������ a����������� ������������������ histogram����������� ������������������ of����������� ������������������ the����������� ������������������ values����������� ������������������ in����������� ������������������ y,����������� ������������������ bar(x,y)����������� ������������������ plots����������� ������������������ bars����������� ������������������ at����������� ������������������ the����������� ������������������ points����������� ������������������ given����������� ������������������ by����������� ������������������ (x,y)
scatter– Syntax:����������� ������������������ scatter(x,y,s,c)����������� ������������������ lets����������� ������������������ you����������� ������������������ specify����������� ������������������ the����������� ������������������ size����������� ������������������ (s)����������� ������������������ and����������� ������������������ color����������� ������������������ (c)����������� ������������������ of����������� ������������������ each����������� ������������������ point����������� ������������������ given����������� ������������������ by����������� ������������������ (x,y)
image/imagesc– Syntax:����������� ������������������ image(C)����������� ������������������ plots����������� ������������������ the����������� ������������������ values����������� ������������������ stored����������� ������������������ in����������� ������������������ the����������� ������������������ matrix����������� ������������������ C����������� ������������������ as����������� ������������������ an����������� ������������������ image
surf & mesh– Syntax:����������� ������������������ surf(x,y,z)����������� ������������������ and����������� ������������������ mesh(x,y,z)����������� ������������������ are����������� ������������������ used����������� ������������������ to����������� ������������������ visualize����������� ������������������ a����������� ������������������ surface����������� ������������������ in����������� ������������������ three����������� ������������������ dimensions
plot3– Syntax:����������� ������������������ plot3(x,y,z)����������� ������������������ plot����������� ������������������ points����������� ������������������ in����������� ������������������ 3D
Demo: Plot Types
http://www.mathworks.com/help/matlab/2-and-3d-plots.html
subplots– the����������� ������������������ ‘subplot’����������� ������������������ command����������� ������������������ let’s����������� ������������������ you����������� ������������������ plot����������� ������������������ multiple����������� ������������������ plots����������� ������������������ on����������� ������������������ one����������� ������������������ figure����������� ������������������
– syntax:����������� ������������������ subplot(nRows,����������� ������������������ nCols,����������� ������������������ index)
(Figure����������� ������������������ 1)
subplot(1,3,1) subplot(1,3,2) subplot(1,3,3)
subplots– the����������� ������������������ ‘subplot’����������� ������������������ command����������� ������������������ let’s����������� ������������������ you����������� ������������������ plot����������� ������������������ multiple����������� ������������������ plots����������� ������������������ on����������� ������������������ one����������� ������������������ figure����������� ������������������
– syntax:����������� ������������������ subplot(nRows,����������� ������������������ nCols,����������� ������������������ index)
(Figure����������� ������������������ 2)
subplot(3,2,1)
subplot(3,2,3)
subplot(3,2,5)
subplot(3,2,2)
subplot(3,2,4)
subplot(3,2,6)
Demo: Subplots
Other functions– gca����������� ������������������ ����������� ������������������ ����������� ������������������ ����������� ������������������ ����������� ������������������ ����������� ������������������ ����������� ������������������ ����������� ������������������ ����������� ������������������ ����������� ������������������ ����������� ������������������ ����������� ������������������ ����������� ������������������ ����������� ������������������ ����������� ������������������ ����������� ������������������ ����������� ������������������ get����������� ������������������ handle����������� ������������������ to����������� ������������������ current����������� ������������������ axis����������� ������������������
– panel
Panel()– user-submitted����������� ������������������ function����������� ������������������ from����������� ������������������ Matlab����������� ������������������ File����������� ������������������ Exchange����������� ������������������ (FEX)����������� ������������������
– http://www.mathworks.com/matlabcentral/fileexchange/20003-panel
– Provides����������� ������������������ MUCH����������� ������������������ more����������� ������������������ control����������� ������������������ over����������� ������������������ subplot����������� ������������������ positioning,����������� ������������������ layout,����������� ������������������ margins,����������� ������������������ etc.
Customizing Graphs Graphically
Plot Tools
Demo: Customizing Graphically
• Plot() plots along dimension 1 of an array. • If there are multiple dimensions, plot creates a
separate line for each column• If your data isn’t constructed this way, just transpose
with the apostrophy character: • plot(data’)
The plot() function (again)
Demo: Plot()
• For line plots, specify the line type using a format string:
• plot(x,y,’b’) % plots blue line (default)• plot(x,y,’b.’) % plots blue dots• plot(x,y,’b:’) % plots blue dotted line• plot(x,y,’k--’) % plots black dashed line• plot(x,y,’ro’) % plots red circles
• Chain together characters for full specification of color, marker, and line
• plot(x,y,’ro-’) % plots red circles with solid line• plot(x,y,’ro:’) % plots red circles with dotted line
The plot() function (again)
• For line plots, specify the line type using a format string:
• plot(x,y,’b’) % plots blue line (default)• plot(x,y,’b.’) % plots blue dots• plot(x,y,’b:’) % plots blue dotted line• plot(x,y,’k--’) % plots black dashed line• plot(x,y,’ro’) % plots red circles
• Chain together characters for full specification of color, marker, and line
• plot(x,y,’ro-’) % plots red circles with solid line• plot(x,y,’ro:’) % plots red circles with dotted line
Plot Line Style
Plot Line Style
Demo: LineSpec
Customizing Programmatically
• Everything you can do graphically you can also do programmatically.
• DON’T do something by hand if you have to do it more than once!
• Examples• axes labels: xlabel(‘text’), ylabel(‘text’) • plot/axis title: title(‘text’)• Add text: text(x,y, ‘text to add’)
Customizing Programmatically
• Graphics parameters are usually specified as ‘parameter’, value pairs:
• plot(x,y, ’linewidth’, 1.4)• plot(x,y, ’bo-‘, ’linewidth’, 2, ‘markersize’, 15)• plot(x,y, ‘o-’,‘MarkerFaceColor’, [1 0 0],‘markerEdgeColor’, [0 0 1])
Other useful functions
• grid on adds grid lines• axis off turns off the axes• colorbar adds a colorbar to image plot• colormap hot switch colormap
Demo: Customizing
Programmatically
Plot colorsMatlab has 8 built-in colors:
Black (k), Red (r), Blue (b), Green (g),Cyan (c), Magenta (m), Yellow (y), White (w)
We can specify other colors using RGB (red, green blue) notation:red = [1 0 0]blue = [0 0 1]green = [0 1 0]gray = [0.2 0.2 0.2]black = [0 0 0]
All RGB colors are 1x3 arrays and all elements between 0-1.
Demo: Color
Colormaps
Colormaps are used to specify how data gets mapped onto different colors.
Matlab has a few built-in colormaps, but you can also specify your own!
Why are colormaps important?
Much better!
Avoid the default colormap (jet)
Manipulating Figures
Figures in Matlab are referenced using “handles”, which are pointers to different parts of the figure.
Example:myhandle = plot(x,y);
Will return a handle to the plot. Then you can run the following:
get(myhandle); % to see a list of propertiesset(myhandle,‘Name’,Value); % to set the value of a property
Different parts of the figure are organized hierarchically:
Manipulating Figures
>> gcf
>> gca
>> get(gca,'Children') >> get(gcf,'Children')
Demo: Annotating plots
Exporting Figures - Formats
Matlab saves figures using it’s own .fig format.
To share figures or view outside matlab, export to other formats, including:
JPG, PNG, EPS, PDF, TIFF
Bitmap vs. Vector graphics
Two main classes of image formats: bitmap vs. vector graphics
Bitmap (jpg, png):• Fixed image sizes• Best for actual images (pictures of stuff)
Vector (eps, pdf):• Variable image sizes• Best for line / bar graphs, scatter plots, etc.
Exporting Figuresprint(figHandle,filename,formattype)
e.g.: print(figure(1), ’MyPlot’,’-dpng’)
formats: ‘-dpng’, ‘-depsc2’, ‘-dpdf’, etc
add flag for resolution: ‘-r300’, etc
Demo: Exporting Figures
Other Resourceshttp://www.mathworks.com/help/matlab/2-and-3d-plots.html
http://colorbrewer.org
2D and 3D visualization examples:
Custom colormaps:
http://www.mathworks.com/matlabcentral/fileexchange/20003-panel
Panel
Colors in figures (blog post)http://figuredesign.blogspot.com/2012/04/meeting-recap-colors-in-figures.html
Poor GraphsCell502
Figure 1. Classification of TFBS Regions
TFBS regions for Sp1, cMyc, and p53 wereclassified based upon proximity to annota-tions (RefSeq, Sanger hand-curated annota-tions, GenBank full-length mRNAs, and En-sembl predicted genes). The proximity wascalculated from the center of each TFBS re-gion. TFBS regions were classified as follows:within 5 kb of the 5! most exon of a gene,within 5 kb of the 3! terminal exon, or withina gene, novel or outside of any annotation,and pseudogene/ambiguous (TFBS overlap-ping or flanking pseudogene annotations,limited to chromosome 22, or TFBS regionsfalling into more than one of the above cate-gories).
imental data, preliminary evidence for the presence of that are located on the 3! end of the well-characterizedgene appear to be located 5! of the overlapping novelnovel transcripts was derived from chromosome 21 and
22 RNA maps (Kapranov et al., 2002) and from the pub- transcript, which suggests that these transcripts maybe regulated by these factors and in precisely the samelicly available EST data. Novel transcripts were verified
using RT-PCR analyses in 9/11 regions and were found way as protein coding genes.Additional supporting evidence that these TFs mayto have little coding capacity (less then 50 amino acids).
Northern hybridization analysis of these isolated tran- be regulating antisense transcripts was found by relatingthem to full-length mRNAs and ESTs with confidentlyscripts with strand-specific oligonucleotides or ribo-
probes indicate that they are polyadenylated, in some assignable strandedness (determined from splicing andpolyadenylation sites and signals). 1782 clusters of tran-cases spliced, and are present as single and multi-exon
isoforms ranging in size from 800 bp to 9 Kb (Supple- scripts were formed of well-oriented sequences frompublic databases aligning to chromosomes 21 or 22.mental Figure S3 on Cell website). Together with the
strand-specific RT-PCR data, this suggests that several Among these clusters, there was a significant associa-tion (chi-square p value " 10#15) between the propertyof them might also be antisense to known genes, such
as, for example, EP300 (Figures 2C and 2D), UBASH3A of proximity to a noncanonical TF and the property ofhaving evidence for transcription on the opposite strand.(Supplemental Figures S2A and S2B online), SEC14L2
(Supplemental Figures S2C and S2D), and others. In this context, a noncanonical TF is one not located atthe 5! end of a known gene and evidence for transcrip-The Ewing sarcoma gene (EWSR1) (Plougastel et al.,
1993), the tumor suppressor gene, EP300 (Gayther et tion on the opposite strand is based on public sequencedata. Twenty-one percent (363) of these transcript clus-al., 2000), and mitogen-activated protein kinase MAPK1
(Gonzalez et al., 1992) on chromosome 22 illustrate po- ters are made up of sense antisense pairs, 44% (161)have an associated noncanonical TF. Of the 161 sensetential utilization of common TFs to regulate both well-
characterized and novel transcripts (Figure 2). Sequence antisense pairs that have a noncanonical TF, 52% con-tain at least one site conserved between the humananalysis of the novel transcripts that overlap EWSR1
and EP300 indicate that they are spliced RNAs. Interest- and mouse genomes based on BlastZ human-mousealignments (Schwartz et al., 2003).ingly, a conserved region in the 3! UTR of the EWSR1
gene is consistent with the evidence of antisense regula-tion of this gene (Lipman, 1997). The EP300 gene is a Differential Expression Patterns
of Novel Transcriptsstriking example (Figures 2C and 2D), having a TFBSregion 17 kb away from the 3! end and a novel transcript To address the issue of whether the observed overlap-
ping noncoding transcripts are biologically important,that splices from this site into the 3! end of the gene.Additionally, overlapping novel transcripts from the we examined whether some of them exhibited a repro-
ducible and coordinated program of differential expres-genes encoding nuclear protein UBASH3A (Supplemen-tal Figures S2A and S2B), phosphatidylinositol transfer- sion correlated with the companion coding transcripts.
The expression profiles of the poly(A)$ cytosolic RNAlike protein SEC14L2 (Supplemental Figures S2C andS2D), TBC/rabGAP domain protein EPI64 (Supplemental fraction were monitored during the response of a pluri-
potent human germ cell tumor-derived cell line, NCCIT,Figures S2E and S2F), guanine-nucleotide exchangefactor TIAM1 (Supplemental Figures S2G and S2H), which undergoes retinoic acid (RA)-induced differentia-
tion into keratin- and neurofilament-positive somaticKIAA0376 protein (Supplemental Figures S2I and S2J),and GTSE1 (Supplemental Figures S2K and S2L) were cells (Damjanov et al., 1993). Empirically derived tran-
scriptional maps of NCCIT using the chromosome 21verified by RT-PCR and/or Northern blot analyses (Sup-plemental Figure S3). In many of these cases, the TFBS and 22 genome tiling arrays during various stages of
Cawley et. al., Cell, Volume 116, Issue 4, 20 February 2004.
Poor Graphs
Cotter et. al., Journal of Clinical Epidemiology 57 (2004)
D.J. Cotter et al. / Journal of Clinical Epidemiology 57 (2004) 1086–1095 1093
D.J. Cotter et al. / Journal of Clinical Epidemiology 57 (2004) 1086–1095 1091
<30 30- <33 33- <36 36- <39 >=39 All0%
25%
50%
75%
100%
<=8,738 units/wk >8,738-13,944 units/wk >13,944-21,692 units/wk >21,692 units/wk
Hematocrit Group(%)
Pro
port
ion
Fig. 1. Distribution of epoetin dose by quartiles Q1–Q4, using mean dose per week (units/wk) disaggregated by hematocrit group. Within each epoetindose quartile, the distribution of dosing resembles a bell-shaped curve around the recommended target hematocrit range (33% to !36%). Quartiles arerepresented by shaded segments on histogram bars, darkest for the first quartile (bottom), lightest for the fourth quartile (bar), with the following values:Q1, "8,738; Q2, #8,738 to 13,944; Q3, #13,944 to 21,692; Q4, #21,692.
Surrogates fail for a number of reasons and can be ex-plained by one or another of many failed-surrogate mecha-nisms [13,16]. In the case of epoetin, mistaken conclusionscan potentially occur using two different mechanisms. One,if the surrogate end point is associated with the actual clinicalend point due to a shared common cause, a treatment thataddresses the surrogate end point without affecting thecommon causal agent may not have an effect on the actualend point. In a second possibility, treatments can affect out-comes through unanticipated causal pathways that are unre-lated to the surrogate end point. The difference in mortalityrates among patients with similar hematocrit levels couldbe related to either of these possibilities. We will discussthe clinical interpretation of each of these possibilities.
As shown in Fig. 3B, the observed relationship betweenhematocrit and mortality could be due to other, potentiallyunmeasured, aspects of a patient’s health status that indepen-dently affect hematocrit, epoetin responsiveness and sur-vival. Factors affecting epoetin responsiveness are not wellunderstood [28], but several possibilities have been men-tioned in the literature. Ma et al. [7] cited inflammation as
Table 3Unadjusted 1-year mortality rates per 1,000 patients, by hematocritlevel and epoetin dose quartile
Hematocrit group
30% to 33% to 36% toDose quartilea !30% !33% !36% !39% $39% All
Q1 271 245 185 184 177 203Q2 344 278 212 195 186 232Q3 425 316 247 199 180 265Q4 501 354 280 227 196 310All 412 297 225 200 186 251
a For dose quartiles, see Table 2.
one possible common cause of poor epoetin response andmortality, and Tonelli et al. [29] found an association be-tween sensitivity to epoetin and markers of inflammation.Current guidelines call for investigation of inflammationwhen a patient exhibits a poor response to epoetin [14].Because inflammation and protein-energy malnutrition havea high prevalence and are found to be closely related to eachother in dialysis patients, they are referred to together asmalnutrition–inflammation complex syndrome (MICS)[30]. Taken together, MICS is hypothesized to blunt theresponsiveness of anemia of ESRD to epoetin. Althoughthe possible interactions between inflammatory and nutri-tional markers and their influence on anemia and epoetinhyporesponsiveness require further study, poor responderswho continue to have low hematocrit levels despite receivinghigh doses of epoetin may not benefit significantly frommore epoetin. With patients who are poor responders toepoetin therapy, K/DOQI recommends that they be consid-ered for other approaches that might be complementary toincrease response, such as iron supplementation, improveddialysis adequacy, or improved nutrition.
Recently, Ifudu et al. [31] studied 309 hemodialysis pa-tients to determine the relative effects of adequacy of dial-ysis and intravenous iron on hematocrit. Pointing out thatpatients with low hematocrit levels may have received inade-quate dialysis and may have been inappropriately adminis-tered excess intravenous iron as a corrective measure, Ifuduet al. [31] concluded that adequacy of dialysis predicts theresponse to epoetin therapy. Tonelli et al. [29], however, ina study of 135 chronic hemodialysis patients dialyzed to aKt/Vurea of 1.6 (where K is clearance, t is time, and V isvolume) found no such relationship. The authors theorizeda possible threshold effect. Both Tonelli et al. [29] and Esch-bach et al. [28] reported that lower serum albumin levels