MURI Summer

1
Identifying and Repurposing Novel Drug Candidates for Treating Leukemia Using Drug, Protein and Disease Interaction Networks Rashell Garretson 1 , Rut Thakkar 2 , Zack East 2 , Bin Peng 3 Dr. Jake Chen 4 and Dr. Walter Jessen 5 1 Department of Biology, Purdue School of Science, IUPUI; 2 Neuroscience Program, Purdue School of Science, IUPUI; 3 Department of Computer and Information Science, Purdue School of Science, IUPUI, 4 Indiana University Center for Systems Biology and Personalized Medicine, IUPUI; 5 Informatics, Covance, Greenfield, IN Introduction Taking a drug from discovery to market takes an average of twelve years. To minimize the time and costs of new drug development, data mining can be utilized to identify currently available drugs and other associated data, and prioritize candidates that can be repurposed to treat other diseases. This study focuses on three subtypes of leukemia: myelomonocytic leukemia, acute megakaryoblastic leukemia and B-cell prolymphocytic leukemia as they lack sufficient treatment along with having a poor prognosis. The data mining process is initiated by gathering information regarding FDA approved drugs and drugs in clinical trials to treat these subtypes of leukemia. A complex network is then generated through the curation of information on drug, protein, and disease interactions. A host of other diseases are then analyzed through disease to disease interactions to compile a list of diseases that are closely related to our leukemia subtypes of interest. Drugs used for these closely related diseases are then contrasted with drugs used for leukemia based on their protein targets, interactions and structure to identify drugs that would most likely be effective in treating our leukemia subtypes. Repurposing drugs based on structure, protein interactions, and target similarity can be beneficial in saving immense time and resources by utilizing drugs that are already available on the market in a novel way with the ultimate goal of saving lives. Methods Defining Subtypes of Interest Subtypes were chosen by reviewing articles about the prognosis, 5-year survival rate, and currently available treatments. Subtypes with a poor prognosis, a low survival rate, and few effective treatments were prioritize. Myelomonocytic leukemia and acute megakaryoblastic leukemia are subtypes of acute myeloid leukemia (AML) while B-cell prolymphocytic leukemia is a subtype of chronic lymphoblastic leukemia (CLL). We used AML and CLL subcategories in our drug, disease, and protein interactions to gather more general information. The more specific category information will be added later to find drugs to target our subtypes of interest. Disease to Drug Drugs are separated into categories using two criteria: A D category drug is a drug being used for the specific disease of interest while an X category drug is a drug currently being used for a related disease. A level 1 drug is one that is curretly FDA approved. A level 2 drug is a drug that is currently in clinical trial. A level 3 drug is a drug that has been terminated, withdrawn or suspended in clinical trial, or in this study any drug in a clinical trial that has not be updated since 2010. D1 and X1: Using cancer.gov and the Leukemia and Lymphoma society website, information about drugs which are currently on the market to treat the chosen subtypes or related diseases was collected. D2, D3, X2, and X3: Using clinicaltrials.gov each subtype and related disease was inputed and all information on clinical trials was downloaded and sorted. Trials that were listed as terminated, withdrawn, or suspended or that had not been updated in the last 5 years were labeled as a category 3. The rest were considered a category 2. All the drugs from each trial were separated, filtered, and listed. Disease to Protein Preliminary mutated genes associated with AML and CLL were found through scrutinizing articles on Pubmed as well as OMIM. Effector genes are discovered using the GEO database, which lists all the up and down regulated gene expressions in a disease. Drug to Protein The D1 drug information collected from the disease to drug curation was evaluated using Conclusion & Future Studies Current Status of Research References • The UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43: D204-D212 (2015). http://www.uniprot.org • Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C. STRING 8--a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 2009 Jan;37(Database issue):D412-6. doi: 10.1093/nar/gkn760. Epub 2008 Oct 21. http://string-db.org • Kuhn M, Szklarczyk D, Pletscher- Frankild S, Blicher TH, von Mering C, Jensen LJ, Bork P. STITCH 4: integration of protein-chemical interactions with user data. Nucleic Acids Res. 2014 Jan;42(Database issue):D401-7. doi: 10.1093/nar/gkt1207. Epub 2013 Nov 28. http://stitch.embl.de • Chen JY, Mamidipalli S, Huan T. HAPPI: an online database of comprehensive human annotated and predicted protein interactions. BMC Genomics. 2009 Jul 7;10 Suppl 1:S16. doi: 10.1186/1471-2164-10- S1-S16. http://discovery.informatics.iupui. edu/HAPPI/ • Nucleic Acids Res. 2014 Jul;42(Web Server issue):W137-46. doi: 10.1093/nar/gku412. Epub 2014 Jun 3. • Liu CC, Tseng YT, Li W, Wu CY, Mayzus I, Rzhetsky A, Sun F, Waterman M, Chen JJ, Chaudhary PM, Loscalzo J, Crandall E, Zhou XJ. DiseaseConnect: a comprehensive web server for mechanism-based disease- disease connections. http://disease-connect.org • DrugBank 4.0: shedding new light on drug metabolism. Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, Tang A, Gabriel G, Ly C, Adamjee S, Dame ZT, Han B, Zhou Y, Wishart DS. Nucleic Acids Res. 2014 Jan 1;42(1):D1091-7. http://www.drugbank.ca • Bolton E, Wang Y, Thiessen PA, Bryant SH. PubChem: Integrated Platform of Small Molecules and Biological Activities. Chapter 12 IN Wheeler RA and Spellmeyer DC, eds. Annual Reports in Computational Chemistry, Volume 4. Oxford, UK: Elsevier, 2008, pp. 217-241. doi:10.1016/S1574- 1400(08)00012-1. https://pubchem.ncbi.nlm.nih.gov • Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research 2003 Nov; 13(11):2498-504. http://cytoscape.org • Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002 Jan 1;30(1):207-10. http://www.ncbi.nlm.nih.gov/geo/ D1 D2 D3 X1 X2 X3 0 150 300 450 600 Number of Drugs Per Category Category of Drugs Number of Drugs AML and CLL Protein to Protein Interaction Network Top Proteins Targeted by D1 Drugs AML Drug Targets Number of Drugs CLL Drug Targets Number of Drugs P42574 (CASP3) 8 P42574 (CASP3) 6 P08684 (CYP3A4) 8 P55211 (CASP9) 5 P33527 (ABCC1) 7 Q14790 (CASP8) 4 P08183 (ABCB1) 6 P09874 (PARP1) 4 Q14790 (CASP8) 6 P33527 (ABCC1) 4 P04637 (TP53) 5 P33527 (ABCB1) 4 Q9UNQ0 (ABCG2) 5 P08684 (CYP3A4) 4 Q92887 (ABCC2) 4 P55210 (CASP7) 3 P55211 (CASP9) 4 Q9UNQ0 (ABCG2) 3 Q16678 (CYP1B1) 4 P20815 (CYP3A5) 3 Table 2: Top proteins identified as targets in DrugBank and STITCH from drug to protein interaction. The top ten proteins listed were the proteins that were targeted by the highest number of D1 for each subtype. Table 3: Top related diseases from CMBI. Diseases that were found in the related disease list for CLL and AML as well as having a CMBI score greater than 0.3 were chosen as the top related diseases. These diseases were used to identify X category drugs as candidates for repurposing. Our team has developed a website that allows us to import data into a database which we can use to analyze and visualize our data collected from the data mining process. The data will be stored in the postgreSQL database and use the elasticsearch framework to do the fuzzy searching which would allow us to narrow down our searches to find key information. The website also can provide the relationship between the drug, disease and protein interactions which we can use to create models and networks. The next steps are to improve the website’s functionality so that we can import all our data collected regarding drugs, diseases, and proteins. We will then use the website to gather information about the interactions between the data we imported to help us create a model which we will use to identify which of the drug candidates we found are the most suited for repurposing. Top Associated Diseases Disease AML/CLL CMBI Score Disease AML/CLL CMBI Score Acute Lymphoblastic Leukemia 0.3682,0.39 79 Chronic Myeloid Leukemia 0.3999,0.4 131 Mixed lineage leukemia 0.2942/0.30 31 T-cell acute lymphocytic leukemia 0.4212/0.3 074 Non-bruton agammaglobulinemi a 0.335/0.421 5 Hemophagocytic lymphohistiocyt osis 0.3328/0.3 775 Mycosis Fungoides 0.3083/0.31 94 B-cell lymphoma 0.3129/0.2 957 Non-hodgkin lymphoma 0.3092/0.32 93 Hodgkin lymphoma 0.3092/0.3 424 Burkitt's lymphoma 0.3092/0.36 22 Werner Syndrome 0.2948/0.3 03 Figures 1 and 2: Proteins from disease to protein interaction were combined with additional interacting proteins using STRING and HAPPI databases. Cytoscape was used to create a network of connections between proteins associated with each subtype. Figure 3: Number of drugs found using clinicaltrial.gov, cancer.gov, Leukemia and Lymphoma society, and articles found on pubmed for AML and CLL as well as the top associated diseases. D1= FDA approved drugs for AML or CLL. D2= drugs currently in clinical trial for AML or CLL. D3= drugs with clinical trial information that has not been updated in the last five years or clinical trials that have been suspended, the same rules as D category drugs, but they are being used for the top associated diseases. Table 1: Top proteins identified as targets in and from to interaction. The top ten proteins listed were the proteins that were targeted by the highest number of D1 for each subtype. Popularity of D Category Drugs by Pubmed Search Name of drug Number of Pubmed articles found Name of drug Number of Pubmed articles found Cytoxan 6967 Etoposide 3421 Methotrexate 6668 Mercaptopurine 2860 Imatinib 6442 Tetradecanoylpho rbol acetate 2759 Aminopterin 5268 Asparaginase 2701 Antracycline 4175 Cytosar 2508

Transcript of MURI Summer

Page 1: MURI Summer

Identifying and Repurposing Novel Drug Candidates for Treating Leukemia Using Drug, Protein and

Disease Interaction NetworksRashell Garretson1, Rut Thakkar2 , Zack East2 , Bin Peng3

Dr. Jake Chen4 and Dr. Walter Jessen5

1Department of Biology, Purdue School of Science, IUPUI; 2Neuroscience Program, Purdue School of Science, IUPUI; 3Department of Computer and Information Science, Purdue School of Science, IUPUI, 4Indiana University Center for Systems

Biology and Personalized Medicine, IUPUI; 5Informatics, Covance, Greenfield, IN

IntroductionTaking a drug from discovery to market takes an average of twelve years. To minimize the time and costs of new drug development, data mining can be utilized to identify currently available drugs and other associated data, and prioritize candidates that can be repurposed to treat other diseases. This study focuses on three subtypes of leukemia: myelomonocytic leukemia, acute megakaryoblastic leukemia and B-cell prolymphocytic leukemia as they lack sufficient treatment along with having a poor prognosis. The data mining process is initiated by gathering information regarding FDA approved drugs and drugs in clinical trials to treat these subtypes of leukemia. A complex network is then generated through the curation of information on drug, protein, and disease interactions. A host of other diseases are then analyzed through disease to disease interactions to compile a list of diseases that are closely related to our leukemia subtypes of interest. Drugs used for these closely related diseases are then contrasted with drugs used for leukemia based on their protein targets, interactions and structure to identify drugs that would most likely be effective in treating our leukemia subtypes. Repurposing drugs based on structure, protein interactions, and target similarity can be beneficial in saving immense time and resources by utilizing drugs that are already available on the market in a novel way with the ultimate goal of saving lives.

MethodsDefining Subtypes of Interest• Subtypes were chosen by reviewing articles about the prognosis, 5-

year survival rate, and currently available treatments. Subtypes with a poor prognosis, a low survival rate, and few effective treatments were prioritize.

• Myelomonocytic leukemia and acute megakaryoblastic leukemia are subtypes of acute myeloid leukemia (AML) while B-cell prolymphocytic leukemia is a subtype of chronic lymphoblastic leukemia (CLL). We used AML and CLL subcategories in our drug, disease, and protein interactions to gather more general information. The more specific category information will be added later to find drugs to target our subtypes of interest.

Disease to DrugDrugs are separated into categories using two criteria:• A D category drug is a drug being used for the specific disease of

interest while an X category drug is a drug currently being used for a related disease.

• A level 1 drug is one that is curretly FDA approved. A level 2 drug is a drug that is currently in clinical trial. A level 3 drug is a drug that has been terminated, withdrawn or suspended in clinical trial, or in this study any drug in a clinical trial that has not be updated since 2010.

• D1 and X1: Using cancer.gov and the Leukemia and Lymphoma society website, information about drugs which are currently on the market to treat the chosen subtypes or related diseases was collected.

• D2, D3, X2, and X3: Using clinicaltrials.gov each subtype and related disease was inputed and all information on clinical trials was downloaded and sorted. Trials that were listed as terminated, withdrawn, or suspended or that had not been updated in the last 5 years were labeled as a category 3. The rest were considered a category 2. All the drugs from each trial were separated, filtered, and listed.

Disease to Protein• Preliminary mutated genes associated with AML and CLL were found

through scrutinizing articles on Pubmed as well as OMIM.• Effector genes are discovered using the GEO database, which lists all

the up and down regulated gene expressions in a disease.

Drug to Protein• The D1 drug information collected from the disease to drug curation

was evaluated using DrugBank and STITCH that gave information about protein interactions and targets for each drug.

Protein to Protein• Using the Disease to Protein interactions, the key proteins connected

with the subtypes of interest were evaluated using STRING and HAPPI databases. These interactions were used to create networks using cytoscape.

Diseases to Disease• CMBI and Diseaseconnect databases were used to acquire a list of all

the disease associated with AML and CLL.• The list was then analyzed to obtain the top disease that are similar to

both the leukemia subtypes.

Conclusion & Future Studies

Current Status of Research

References• The UniProt Consortium. UniProt: a hub for

protein information. Nucleic Acids Res. 43: D204-D212 (2015). http://www.uniprot.org

• Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C. STRING 8--a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 2009 Jan;37(Database issue):D412-6. doi: 10.1093/nar/gkn760. Epub 2008 Oct 21. http://string-db.org

• Kuhn M, Szklarczyk D, Pletscher-Frankild S, Blicher TH, von Mering C, Jensen LJ, Bork P. STITCH 4: integration of protein-chemical interactions with user data. Nucleic Acids Res. 2014 Jan;42(Database issue):D401-7. doi: 10.1093/nar/gkt1207. Epub 2013 Nov 28. http://stitch.embl.de

• Chen JY, Mamidipalli S, Huan T. HAPPI: an online database of comprehensive human annotated and predicted protein interactions. BMC Genomics. 2009 Jul 7;10 Suppl 1:S16.

doi: 10.1186/1471-2164-10-S1-S16. http://discovery.informatics.iupui.edu/HAPPI/

• Nucleic Acids Res. 2014 Jul;42(Web Server issue):W137-46. doi: 10.1093/nar/gku412. Epub 2014 Jun 3.

• Liu CC, Tseng YT, Li W, Wu CY, Mayzus I, Rzhetsky A, Sun F, Waterman M, Chen JJ, Chaudhary PM, Loscalzo J, Crandall E, Zhou XJ. DiseaseConnect: a comprehensive web server for mechanism-based disease-disease connections. http://disease-connect.org

• DrugBank 4.0: shedding new light on drug metabolism. Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, Tang A, Gabriel G, Ly C, Adamjee S, Dame ZT, Han B, Zhou Y, Wishart DS. Nucleic Acids Res. 2014 Jan 1;42(1):D1091-7. http://www.drugbank.ca

• Bolton E, Wang Y, Thiessen PA, Bryant SH. PubChem: Integrated Platform of Small Molecules and Biological Activities. Chapter 12 IN Wheeler RA and Spellmeyer DC, eds. Annual Reports in Computational Chemistry, Volume 4. Oxford, UK: Elsevier, 2008, pp. 217-241. doi:10.1016/S1574-1400(08)00012-1. https://pubchem.ncbi.nlm.nih.gov

• Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research 2003 Nov; 13(11):2498-504. http://cytoscape.org

• Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002 Jan 1;30(1):207-10. http://www.ncbi.nlm.nih.gov/geo/

D1 D2 D3 X1 X2 X30

150

300

450

600Number of Drugs Per Category

Category of Drugs

Num

ber o

f Dru

gs

AML and CLL Protein to Protein Interaction Network

Top Proteins Targeted by D1 Drugs

AML Drug Targets Number of Drugs CLL Drug Targets Number of Drugs

P42574 (CASP3) 8 P42574 (CASP3) 6

P08684 (CYP3A4) 8 P55211 (CASP9) 5

P33527 (ABCC1) 7 Q14790 (CASP8) 4

P08183 (ABCB1) 6 P09874 (PARP1) 4

Q14790 (CASP8) 6 P33527 (ABCC1) 4

P04637 (TP53) 5 P33527 (ABCB1) 4

Q9UNQ0 (ABCG2) 5 P08684 (CYP3A4) 4

Q92887 (ABCC2) 4 P55210 (CASP7) 3

P55211 (CASP9) 4 Q9UNQ0 (ABCG2) 3

Q16678 (CYP1B1) 4 P20815 (CYP3A5) 3

Table 2: Top proteins identified as targets in DrugBank and STITCH from drug to protein interaction. The top ten proteins listed were the proteins that were targeted by the highest number of D1 for each subtype.

Table 3: Top related diseases from CMBI. Diseases that were found in the related disease list for CLL and AML as well as having a CMBI score greater than 0.3 were chosen as the top related diseases. These diseases were used to identify X category drugs as candidates for repurposing.

Our team has developed a website that allows us to import data into a database which we can use to analyze and visualize our data collected from the data mining process. The data will be stored in the postgreSQL database and use the elasticsearch framework to do the fuzzy searching which would allow us to narrow down our searches to find key information. The website also can provide the relationship between the drug, disease and protein interactions which we can use to create models and networks. The next steps are to improve the website’s functionality so that we can import all our data collected regarding drugs, diseases, and proteins. We will then use the website to gather information about the interactions between the data we imported to help us create a model which we will use to identify which of the drug candidates we found are the most suited for repurposing.

Top Associated Diseases

Disease AML/CLL CMBI Score Disease AML/CLL

CMBI Score

Acute Lymphoblastic Leukemia 0.3682,0.3979 Chronic Myeloid

Leukemia 0.3999,0.4131

Mixed lineage leukemia0.2942/0.3031

T-cell acute lymphocytic leukemia 0.4212/0.3074

Non-bruton agammaglobulinemia 0.335/0.4215

Hemophagocytic lymphohistiocytosis 0.3328/0.3775

Mycosis Fungoides0.3083/0.3194

B-cell lymphoma0.3129/0.2957

Non-hodgkin lymphoma0.3092/0.3293

Hodgkin lymphoma0.3092/0.3424

Burkitt's lymphoma0.3092/0.3622

Werner Syndrome0.2948/0.303

Figures 1 and 2: Proteins from disease to protein interaction were combined with additional interacting proteins using STRING and HAPPI databases. Cytoscape was used to create a network of connections between proteins associated with each subtype.

Figure 3: Number of drugs found using clinicaltrial.gov, cancer.gov, Leukemia and Lymphoma society, and articles found on pubmed for AML and CLL as well as the top associated diseases. D1= FDA approved drugs for AML or CLL. D2= drugs currently in clinical trial for AML or CLL. D3= drugs with clinical trial information that has not been updated in the last five years or clinical trials that have been suspended, terminated, or withdrawn. X category drugs follow the same rules as D category drugs, but they are being used for the top associated diseases.

Table 1: Top proteins identified as targets in DrugBank and STITCH from drug to protein interaction. The top ten proteins listed were the proteins that were targeted by the highest number of D1 for each subtype.

Popularity of D Category Drugs by Pubmed Search

Name of drugNumber of Pubmed

articles foundName of drug

Number of Pubmed

articles found

Cytoxan 6967 Etoposide 3421

Methotrexate 6668 Mercaptopurine 2860

Imatinib 6442Tetradecanoylphorbol

acetate2759

Aminopterin 5268 Asparaginase 2701

Antracycline 4175 Cytosar 2508