RDP – Capturing the Unclassified
description
Transcript of RDP – Capturing the Unclassified
RDP – Capturing the Unclassified
Use only on data that can be publicly shared.
These are not secure tools.
Genboree RDP Output
• Tutorial 2 Dataset– QIIME• chimeras removed
– RDP• Sample Period
Download files
• Raw.results.tar.gz
Unarchive and Decompress
• Use 7zip• Seq.fna
• Open in Bioedit
• In Bioedit:– Ctrl +A – to select all sequences– Shift + Ctrl + C – to copy all sequence titles
• In Excel:– Paste into excel. In Column B (or other)– =left(a1,number_of_characters_in_titles)– Ctrl+Shift+Down arrow– Ctrl+D – to copy to all cells below
• Check your work. Select only your samples. Do not select blank cells. Copy the correct titles.
• In Bioedit:• Paste Over titles• Save as: your_filename.fas• In the pull down menu – choose fasta
rdp.cme.msu.edu
Make an Account
For very tiny datasets
very tiny datasets
very tiny datasets
• Do not navigate away
For pyrosequenced datasets
You can navigate away and pick up the results later.
Check in while running?
Done: Download
What do you get back?
• Confidence file• Classifications• Failed classifications Check this file. – Problems have happened if not empty.
• Hierarchy
Open classifications in excel
• Focus on Phylum for tutorial. Use any level.
Tutorial ease condense sample periods
Keep it Tidy
• Cut out what isn’t needed or being used.
Confidence in the Classification• Sort on the confidence level• Odd groups– Leave in or take out?
• Replace those below your confidence level• Unclassified_• =concatenate($column$row,cell)
• $ keeps the column or row static in your formula as you drag to multiple cells
Copy to a new columnRemove Duplicates
Even at the Phylum Level
• 60 categorical levels – (could be 2 for every known phylum)
To count by sample and phylum classification
• =countifs($K:$K,$O2,$A:$A,P$1)
• How to stop recalculation and manually restart – don’t crash your machine! You can easily cause hours of computation on large matrixes!
Stop Automatic Recalculation
• In the Options Menu• Under Formulas• F9
Fill Formulas and Check Cells
Copy Whole and Paste As Values
Sum Rows and Sort On (Your Favorite)
• Total is Customary• Can rearrange as needed
Select Data and Titles Only
Make a 100% Stacked Chart
• Not very pretty
Switch Perspectives
Size Correctly
To Compare to Genboree
• RDP must be run• png.result.tar.gz
What did we learn?
1 2 3 4 50%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%TenericutesSynergistetesOD1FusobacteriaDeinococcus-ThermusBRC1ArmatimonadetesThermotogaeOP11GemmatimonadetesTM7LentisphaeraeWS3DeferribacteresNitrospiraSpirochaetesActinobacteriaVerrucomicrobiaPlanctomycetesCyanobacteria/ChloroplastChlorobiAcidobacteriaBacteroidetesChloroflexiUnclassified_othersUnclassified_ChloroflexiUnclassified_ProteobacteriaUnclassified_FirmicutesProteobacteriaFirmicutes
What did we learn?
1 2 3 4 50%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%TenericutesSynergistetesOD1FusobacteriaDeinococcus-ThermusBRC1ArmatimonadetesThermotogaeOP11GemmatimonadetesTM7LentisphaeraeWS3DeferribacteresNitrospiraSpirochaetesActinobacteriaVerrucomicrobiaPlanctomycetesCyanobacteria/ChloroplastChlorobiAcidobacteriaBacteroidetesChloroflexiUnclassified_othersUnclassified_ChloroflexiUnclassified_ProteobacteriaUnclassified_FirmicutesProteobacteriaFirmicutes
Some Problems Commonly Encountered
• Column formatting is not always followed with RDP output.
• To get a clean graph with all taxonomic levels on one column, you may need to sort and remove sections of data.
• Some have additional levels• Some have fewer levels of classification
Additional Levels of Classification
Delete Move over Delete Move over
Fewer Levels of Classification
Common Trouble Makers• Bacteroidetes• Verrucomicrobia • Acidobacteria• Dehalococcoidetes • Cyanobacteria • Chloroplast • Deltaproteobacteria • OD1_genera_incertae_sedis• TM7_genera_incertae_sedis • Armatimonadetes• WS3_genera_incertae_sedis
Move Over