MURI Summer
-
Upload
zachary-east -
Category
Documents
-
view
42 -
download
0
Transcript of MURI Summer
Identifying and Repurposing Novel Drug Candidates for Treating Leukemia Using Drug, Protein and
Disease Interaction NetworksRashell Garretson1, Rut Thakkar2 , Zack East2 , Bin Peng3
Dr. Jake Chen4 and Dr. Walter Jessen5
1Department of Biology, Purdue School of Science, IUPUI; 2Neuroscience Program, Purdue School of Science, IUPUI; 3Department of Computer and Information Science, Purdue School of Science, IUPUI, 4Indiana University Center for Systems
Biology and Personalized Medicine, IUPUI; 5Informatics, Covance, Greenfield, IN
IntroductionTaking a drug from discovery to market takes an average of twelve years. To minimize the time and costs of new drug development, data mining can be utilized to identify currently available drugs and other associated data, and prioritize candidates that can be repurposed to treat other diseases. This study focuses on three subtypes of leukemia: myelomonocytic leukemia, acute megakaryoblastic leukemia and B-cell prolymphocytic leukemia as they lack sufficient treatment along with having a poor prognosis. The data mining process is initiated by gathering information regarding FDA approved drugs and drugs in clinical trials to treat these subtypes of leukemia. A complex network is then generated through the curation of information on drug, protein, and disease interactions. A host of other diseases are then analyzed through disease to disease interactions to compile a list of diseases that are closely related to our leukemia subtypes of interest. Drugs used for these closely related diseases are then contrasted with drugs used for leukemia based on their protein targets, interactions and structure to identify drugs that would most likely be effective in treating our leukemia subtypes. Repurposing drugs based on structure, protein interactions, and target similarity can be beneficial in saving immense time and resources by utilizing drugs that are already available on the market in a novel way with the ultimate goal of saving lives.
MethodsDefining Subtypes of Interest• Subtypes were chosen by reviewing articles about the prognosis, 5-
year survival rate, and currently available treatments. Subtypes with a poor prognosis, a low survival rate, and few effective treatments were prioritize.
• Myelomonocytic leukemia and acute megakaryoblastic leukemia are subtypes of acute myeloid leukemia (AML) while B-cell prolymphocytic leukemia is a subtype of chronic lymphoblastic leukemia (CLL). We used AML and CLL subcategories in our drug, disease, and protein interactions to gather more general information. The more specific category information will be added later to find drugs to target our subtypes of interest.
Disease to DrugDrugs are separated into categories using two criteria:• A D category drug is a drug being used for the specific disease of
interest while an X category drug is a drug currently being used for a related disease.
• A level 1 drug is one that is curretly FDA approved. A level 2 drug is a drug that is currently in clinical trial. A level 3 drug is a drug that has been terminated, withdrawn or suspended in clinical trial, or in this study any drug in a clinical trial that has not be updated since 2010.
• D1 and X1: Using cancer.gov and the Leukemia and Lymphoma society website, information about drugs which are currently on the market to treat the chosen subtypes or related diseases was collected.
• D2, D3, X2, and X3: Using clinicaltrials.gov each subtype and related disease was inputed and all information on clinical trials was downloaded and sorted. Trials that were listed as terminated, withdrawn, or suspended or that had not been updated in the last 5 years were labeled as a category 3. The rest were considered a category 2. All the drugs from each trial were separated, filtered, and listed.
Disease to Protein• Preliminary mutated genes associated with AML and CLL were found
through scrutinizing articles on Pubmed as well as OMIM.• Effector genes are discovered using the GEO database, which lists all
the up and down regulated gene expressions in a disease.
Drug to Protein• The D1 drug information collected from the disease to drug curation
was evaluated using DrugBank and STITCH that gave information about protein interactions and targets for each drug.
Protein to Protein• Using the Disease to Protein interactions, the key proteins connected
with the subtypes of interest were evaluated using STRING and HAPPI databases. These interactions were used to create networks using cytoscape.
Diseases to Disease• CMBI and Diseaseconnect databases were used to acquire a list of all
the disease associated with AML and CLL.• The list was then analyzed to obtain the top disease that are similar to
both the leukemia subtypes.
Conclusion & Future Studies
Current Status of Research
References• The UniProt Consortium. UniProt: a hub for
protein information. Nucleic Acids Res. 43: D204-D212 (2015). http://www.uniprot.org
• Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C. STRING 8--a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 2009 Jan;37(Database issue):D412-6. doi: 10.1093/nar/gkn760. Epub 2008 Oct 21. http://string-db.org
• Kuhn M, Szklarczyk D, Pletscher-Frankild S, Blicher TH, von Mering C, Jensen LJ, Bork P. STITCH 4: integration of protein-chemical interactions with user data. Nucleic Acids Res. 2014 Jan;42(Database issue):D401-7. doi: 10.1093/nar/gkt1207. Epub 2013 Nov 28. http://stitch.embl.de
• Chen JY, Mamidipalli S, Huan T. HAPPI: an online database of comprehensive human annotated and predicted protein interactions. BMC Genomics. 2009 Jul 7;10 Suppl 1:S16.
doi: 10.1186/1471-2164-10-S1-S16. http://discovery.informatics.iupui.edu/HAPPI/
• Nucleic Acids Res. 2014 Jul;42(Web Server issue):W137-46. doi: 10.1093/nar/gku412. Epub 2014 Jun 3.
• Liu CC, Tseng YT, Li W, Wu CY, Mayzus I, Rzhetsky A, Sun F, Waterman M, Chen JJ, Chaudhary PM, Loscalzo J, Crandall E, Zhou XJ. DiseaseConnect: a comprehensive web server for mechanism-based disease-disease connections. http://disease-connect.org
• DrugBank 4.0: shedding new light on drug metabolism. Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, Tang A, Gabriel G, Ly C, Adamjee S, Dame ZT, Han B, Zhou Y, Wishart DS. Nucleic Acids Res. 2014 Jan 1;42(1):D1091-7. http://www.drugbank.ca
• Bolton E, Wang Y, Thiessen PA, Bryant SH. PubChem: Integrated Platform of Small Molecules and Biological Activities. Chapter 12 IN Wheeler RA and Spellmeyer DC, eds. Annual Reports in Computational Chemistry, Volume 4. Oxford, UK: Elsevier, 2008, pp. 217-241. doi:10.1016/S1574-1400(08)00012-1. https://pubchem.ncbi.nlm.nih.gov
• Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research 2003 Nov; 13(11):2498-504. http://cytoscape.org
• Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002 Jan 1;30(1):207-10. http://www.ncbi.nlm.nih.gov/geo/
D1 D2 D3 X1 X2 X30
150
300
450
600Number of Drugs Per Category
Category of Drugs
Num
ber o
f Dru
gs
AML and CLL Protein to Protein Interaction Network
Top Proteins Targeted by D1 Drugs
AML Drug Targets Number of Drugs CLL Drug Targets Number of Drugs
P42574 (CASP3) 8 P42574 (CASP3) 6
P08684 (CYP3A4) 8 P55211 (CASP9) 5
P33527 (ABCC1) 7 Q14790 (CASP8) 4
P08183 (ABCB1) 6 P09874 (PARP1) 4
Q14790 (CASP8) 6 P33527 (ABCC1) 4
P04637 (TP53) 5 P33527 (ABCB1) 4
Q9UNQ0 (ABCG2) 5 P08684 (CYP3A4) 4
Q92887 (ABCC2) 4 P55210 (CASP7) 3
P55211 (CASP9) 4 Q9UNQ0 (ABCG2) 3
Q16678 (CYP1B1) 4 P20815 (CYP3A5) 3
Table 2: Top proteins identified as targets in DrugBank and STITCH from drug to protein interaction. The top ten proteins listed were the proteins that were targeted by the highest number of D1 for each subtype.
Table 3: Top related diseases from CMBI. Diseases that were found in the related disease list for CLL and AML as well as having a CMBI score greater than 0.3 were chosen as the top related diseases. These diseases were used to identify X category drugs as candidates for repurposing.
Our team has developed a website that allows us to import data into a database which we can use to analyze and visualize our data collected from the data mining process. The data will be stored in the postgreSQL database and use the elasticsearch framework to do the fuzzy searching which would allow us to narrow down our searches to find key information. The website also can provide the relationship between the drug, disease and protein interactions which we can use to create models and networks. The next steps are to improve the website’s functionality so that we can import all our data collected regarding drugs, diseases, and proteins. We will then use the website to gather information about the interactions between the data we imported to help us create a model which we will use to identify which of the drug candidates we found are the most suited for repurposing.
Top Associated Diseases
Disease AML/CLL CMBI Score Disease AML/CLL
CMBI Score
Acute Lymphoblastic Leukemia 0.3682,0.3979 Chronic Myeloid
Leukemia 0.3999,0.4131
Mixed lineage leukemia0.2942/0.3031
T-cell acute lymphocytic leukemia 0.4212/0.3074
Non-bruton agammaglobulinemia 0.335/0.4215
Hemophagocytic lymphohistiocytosis 0.3328/0.3775
Mycosis Fungoides0.3083/0.3194
B-cell lymphoma0.3129/0.2957
Non-hodgkin lymphoma0.3092/0.3293
Hodgkin lymphoma0.3092/0.3424
Burkitt's lymphoma0.3092/0.3622
Werner Syndrome0.2948/0.303
Figures 1 and 2: Proteins from disease to protein interaction were combined with additional interacting proteins using STRING and HAPPI databases. Cytoscape was used to create a network of connections between proteins associated with each subtype.
Figure 3: Number of drugs found using clinicaltrial.gov, cancer.gov, Leukemia and Lymphoma society, and articles found on pubmed for AML and CLL as well as the top associated diseases. D1= FDA approved drugs for AML or CLL. D2= drugs currently in clinical trial for AML or CLL. D3= drugs with clinical trial information that has not been updated in the last five years or clinical trials that have been suspended, terminated, or withdrawn. X category drugs follow the same rules as D category drugs, but they are being used for the top associated diseases.
Table 1: Top proteins identified as targets in DrugBank and STITCH from drug to protein interaction. The top ten proteins listed were the proteins that were targeted by the highest number of D1 for each subtype.
Popularity of D Category Drugs by Pubmed Search
Name of drugNumber of Pubmed
articles foundName of drug
Number of Pubmed
articles found
Cytoxan 6967 Etoposide 3421
Methotrexate 6668 Mercaptopurine 2860
Imatinib 6442Tetradecanoylphorbol
acetate2759
Aminopterin 5268 Asparaginase 2701
Antracycline 4175 Cytosar 2508