Sub-Topic Classification of HIV related Opportunistic ... · and HCV (text 2) subtopics (possible...
Transcript of Sub-Topic Classification of HIV related Opportunistic ... · and HCV (text 2) subtopics (possible...
Sub-Topic Classification of
HIV related Opportunistic
InfectionsMiguel Anderson and Joseph Fonseca
Introduction
● Image collected from the CDC https://www.cdc.gov/hiv/basics/statistics.html
Background Info
● What is HIV?
● How is it transferred?
● How is it treated?
● What are opportunistic infections?
The viral particles
● HIV Is a Retrovirus.” The Montgolfier Brothers, University of Bristol, www.chm.bris.ac.uk/webprojects2002/levasseur/hiv/hiv3.htm.
Binding of the coat protein to receptor
VIRAL
COAT
PROTEIN
● “HIV life cycle: How HIV infects a cell and replicates itself using reverse transcriptase,” Youtube Video,
posted by, Kleptoplast, Jan 6, 2012. https://www.youtube.com/watch?v=odRyv7V8LAE
Conformational change; binding of second protein
● “HIV life cycle: How HIV infects a cell and replicates itself using reverse transcriptase,” Youtube Video, posted by,
Kleptoplast, Jan 6, 2012. https://www.youtube.com/watch?v=odRyv7V8LAE
Fusion of the Membranes; phospholipids
● “HIV life cycle: How HIV infects a cell and replicates itself using reverse transcriptase,” Youtube
Video, posted by, Kleptoplast, Jan 6, 2012. https://www.youtube.com/watch?v=odRyv7V8LAE
Degradation of the matrix and capsid protein
MATRIX
PROTEINCAPSID
PROTEIN
● “HIV life cycle: How HIV infects a cell and replicates itself using reverse transcriptase,” Youtube Video, posted by,
Kleptoplast, Jan 6, 2012. https://www.youtube.com/watch?v=odRyv7V8LAE
HIV RNA in Cell
VIRAL SS
RNA
VIRAL DNA with host nucleotides; single
stranded
VIRAL SS DNA
● “HIV life cycle: How HIV infects a cell and replicates itself using reverse transcriptase,” Youtube Video, posted by,
Kleptoplast, Jan 6, 2012. https://www.youtube.com/watch?v=odRyv7V8LAE
Double stranded DNA with Reverse Transcriptase
VIRAL DOUBLE
STRANDED DNA
● “HIV life cycle: How HIV infects a cell and replicates itself using reverse transcriptase,” Youtube Video,
posted by, Kleptoplast, Jan 6, 2012. https://www.youtube.com/watch?v=odRyv7V8LAE
Integrase Protein
● “HIV life cycle: How HIV infects a cell and replicates itself using reverse transcriptase,” Youtube Video, posted
by, Kleptoplast, Jan 6, 2012. https://www.youtube.com/watch?v=odRyv7V8LAE
Inside the Nucleus
NUCLEAR PORE
HOST
CHROMOSOME
VIRAL DNA
INTEGRATION ESTABLISHES LIFE LONG INFECTION; ENDONUCLEASE ACTIVITY
Production of Viral Particles-Transcription
RNA POLYMERASE
● “HIV life cycle: How HIV infects a cell and replicates itself using reverse transcriptase,” Youtube Video, posted by,
Kleptoplast, Jan 6, 2012. https://www.youtube.com/watch?v=odRyv7V8LAE
Any viral
protein
Ribosome
mRNA
ROUGH
ER
● “HIV life cycle: How HIV infects a cell and replicates itself using reverse transcriptase,”
Youtube Video, posted by, Kleptoplast, Jan 6, 2012.
https://www.youtube.com/watch?v=odRyv7V8LAE
Translation
VIRAL Polyproteins and RNA at Infected Cell
Surface
Virus Budding Off From Infected Cell
Protease Clips Polyprotein Chains
I
PROTEASE
POLYPROTEIN
CHAIN
https://www.cdc.gov/hiv/pdf/library/factsheets/hiv101-consumer-
info.pdf
Treatment with Molecular Targets---Antiretrovirals
● Two to three drugs to prevent resistance; fast mutation rate.
● Fusion inhibitors (Gp120 proteins)
● CC25 antagonists
● Nucleoside Reverse Transcriptase Inhibitor (affects reverse transcriptase with
dummy nucleosides(base and sugar))
● Integrase Inhibitors (Allosteric Sites). Less integration causes less CD4 cells
to go through apoptosis
● Protease Inhibitor (active site)
Opportunistic Infections and Stages of HIV
● Opportunistic infections (OIs) are infections that occur more frequently and
are more severe in individuals with weakened immune systems, including
people with HIV
● Acute Stage: 2-4 weeks of infections
● Stage 2: Clinical Latency (HIV inactivity)
● Stage 3: AIDS which can lead to opportunistic Diseases
Overview
● Fonseca et al. (2018) - Social Network analysis of HIV/AIDS literature
○ Similar to Golgi 2 approach
● Pletscher-Frankild et al. (2015) - Co-occurrences of features in abstracts
based on count occurances
○ Different from approach used - tf*idf and cosine similarity network
Approaches testedTrain Model in Two Ways
● Binary
○ Viral vs Bacterial opportunistic infections
● Multi Class
○ HBV vs HCV vs Syphilis vs Tuberculosis
Precision and recall for the model
Approaches
tested
Infection
Type
MeSH Terms searched [all searched
with HIV and Boolean Operators]
# of Abstracts retrieved
Viral Viral Infection 67667
Viral Hepatitis C 2370
Viral Hepatitis B 1866
Bacterial Bacterial infection 3429
Bacterial Tuberculosis 1829
Bacterial Syphilis 581
Fungal Fungal Infection 947
Fungal Pneumocystis Pneumonia 459
Fungal Candidiasis 377
Fungal Cryptococcal Meningitis 131
● Query the actual
disease with HIV
● Retrieved by Golgi2
and PubMed manual
search
Exploratory Analysis Methods
Golgi 2 Parameters
Min doc frequency %: 5
# tokens can be included in phrase: 3
Threshold rank b/w 0 and 1: .6
Golgi 2 Parameters
Min doc frequency %: 3
# tokens can be included in phrase: 3
Threshold rank b/w 0 and 1: .7
Golgi 2 and PubMed
scrapper
● Vectorize documents
and weighting
scheme
● n-gram*Freq-IDF
ranking
● Latent semantic
analysis
● Semantic Concept
clustering
● Cluster visualization
Predictive Analysis Methods
● LightSIDE binary
classification using Naïve
Bayes
● LightSIDE sub-topic
classification using Logistic
Regression
● Weka sub-topic
classification using Logistic
regression
● 10 Fold Cross Validation for
Both
● Evaluation Metric- Accuracy
of predicted labels
Subtopic Classification Training Features
•Features were used for training and selected based on accuracy of the model
•Unigrams outperformed other features
•The best features of the LightSIDE and Weka models were then compared in
Model Confusion Matrix for LightSIDE Model
•The confusion matrix
shows where the
model is accurately
classifying the labels
of the abstracts using
the LightSIDE
features.
•This model
performed at 78%
accuracy
Model Confusion Matrix for Weka Model
•The confusion matrix
shows where the model
is accurately classifying
the labels of the
abstracts using the
UMLS concepts in
Weka.
•This model performed
at 45% accuracy
Results
•LightSIDE Model Testing
Results
•The LightSIDE Model +
selected features were
used to test the model
accuracy
•The model performed at
80.5% accuracy
TAPoR Text Analysis Comparator
Figure to right
represents the
Word Distribution
of most common
words
POI- Relative
ratio
Figure to left
represents the
Word Distribution
of unique words
POI- text counts
TAPoR Statistics on the text analysis
Discussion/Conclusion
● See why the predictive model still showed confusion between HBV (text 1)
and HCV (text 2) subtopics (possible indication for the high rate of false
negatives in the model).
● The word count for the abstracts of each subtopic were not equal.
● Although the same amount of abstracts were used, the quality of the
abstracts were not accounted for.
● Based on the analyses it apparent that there were biases towards HCV
subtopic classification due to the amount of unique words it possessed.
Future Direction
•Combine UMLS and MeSH terms to see if this increases classification model
accuracy
•Add more robust features to the abstract scraper that test for quality of results
•Build a predictive model to test where common opportunistic infections may arise
in a population. This can be used to determine if there are undiagnosed HIV
positive patients in the population
References● Grimwade, K., & Swingler, G. H. (2006). Cotrimoxazole prophylaxis for opportunistic infections in children with HIV
infection. Cochrane Database of Systematic Reviews. doi:10.1002/14651858.cd003508.pub2
● Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245),
255-260. doi:10.1126/science.aaa8415
● Ortiz, M. S. (2018, June 11). Tokens [Video file]. Retrieved from
https://drive.google.com/file/d/1n57f6eDKRHK7Zx_nqe9e_sy3JCnFSsN7/view
● Ortiz, M. S. (2018, June 11). How does TF-IDF weighting really work? [Video file]. Retrieved from
https://drive.google.com/file/d/1uUnZgJhMZ4S7qQhHf395dOt6bMHzVm4-/view
● HIV Is a Retrovirus.” The Montgolfier Brothers, University of Bristol,
www.chm.bris.ac.uk/webprojects2002/levasseur/hiv/hiv3.htm
● “HIV life cycle: How HIV infects a cell and replicates itself using reverse transcriptase,” Youtube Video, posted by,
Kleptoplast, Jan 6, 2012. https://www.youtube.com/watch?v=odRyv7V8LAE
● “HIV/AIDS.” Centers for Disease Control and Prevention, Centers for Disease Control and Prevention, 30 May 2017,
www.cdc.gov/hiv/basics/livingwithhiv/opportunisticinfections.html.