How to Write a Java Program for Bioinformatics ... · Java Server Faces Technology at In order to...

Java-for-beginners

Written by AdministratorMonday, 01 November 2010 13:26 - Last Updated Tuesday, 02 November 2010 00:35

How to Write a Java Program for Bioinformatics Applications: - A Manual

How to Write a Java Program for Bioinformatics Applications: - A Manual

By Jitesh B Dundas

This manual is a self-help guide for doing programming, especially in Java.

I learnt this way of looking at programming from my MCA Java guest faculty Prof. Lele, so mythanks to him.

It is recommended that you read the following sources for the Java language before or whilereading this manual.

Java: Complete Reference by Herbert Schildt

The Sun Java Tutorial (online at www.java.sun.com )

1 / 16

http://www.java.sun.com

Java-for-beginners


JavaScript / HTML tutorial (online at www.w3schools.com )

Inside Servlets by Justin Callaway

Java Server Faces Technology at www.java.sun.com

In order to do write a Java Program, we need to do take care of the following:-

1) A blueprint of the actual requirements with the expected input and the expected output

2) A series of steps that explain the logic to be used to implement the requirements of Step-1.

3) Set of function calls using the Java language API to set the language

A Java Program is nothing but a set of function calls to the Java language API. It is all aboutcalling the right functions of Java that will satisfy our logic for getting our results.

So let us assume that you need to write a Java Program to fetch data from EMBL Stringdatabase. How do we go about getting this desired result?

The first thing to do is to get the requirements very clearly. Here, our requirement is to get datafrom EMBL String database. Thus, we need to understand:-

1) What are the methods or ways available in the database source to fulfill the requirements?

2 / 16

http://www.w3schools.com

http://www.java.sun.com

Java-for-beginners


Well, we know that the EMBL String database (referred to as db in future here) has aprogrammable API that allows us to fetch data from the database in real-time. Thus, if we justmake use of the correct function calls to the correct URL (with correct parameters and inputvalues); we should be able to get the desired results from the database.

On visiting the String EMBL API, we find that it has an API in the Help/Info section on thewebsite. There is a documentation available explaining how to use the database for fetchingdata from your Java program. Thus, we first get the URL and the parameters to send therequest to the database.

Thus, we read for the URL and reach to the paragraph where they have mentioned the URLparameters.

Try to make the URL using the parameters presented for the use by the API. This can be donein our program by adding parameters as needed. You could add the input values based on thenormal use of database and see if you get the same results programmatically as well.

For e.g. we need to find the list of protein interactors for the p53 protein. On the basis of theURL presented, besides the keywords

http://string-db.org/api/psi-mi-tab/interactionsList?identifiers=p53&required_score=900

In the above e.g., the search term is p53 protein with a required score of 900. Look up the APIof the String db and check which all parameters you can add.

2) How do we actually do this using the methods available?

Ok, now we have the parameters and the URL to be used. So how do we use all this for our

3 / 16

http://string-db.org/api/psi-mi-tab/interactionsList?identifiers=p53&required_score=900

Java-for-beginners


program. Well, first we need:-

2.1) The logic for the program

2.2) The function calls and statements in Java for implementing each line of the logic

2.3.) The correct and validated set of inputs from the user as well as the correct format of thetotal input for the String db.

Right, now we have the steps in place. Let us proceed towards writing the program.

So what is the logic of the program? We need to find out the steps in which we can send thedata from the user, in the format requested by the String db. Thus, we need to understand howJava communicates to an external URL for sending and receiving input. Next, you understandthe flow of the logic and then replace it with Java statements to actually execute the same.

Thus, we have the following steps of logic for sending and receiving data from EMBL.

1) Get the input from the user

2) Create the URL for sending the data to the db

4 / 16

Java-for-beginners


3) Get the proper authentication for sending data via the internet to the db

4) Send the input data to the String db API via the internet

5) Read the response that is received.

6) Print the output into the format that the user understands.

7) End the execution

Writing the logic helps in implementing the java program faster.

So now, we write the steps for implementing the logic mentioned above.

The following is the Java program that will actually send the request and get the response fromthe String EMBL.

1) Get the input from the user

This will be a simple html form in which we ask the user to input the parameters like searchterms and required score. This will be just like the String EMBL system form that we see on thehomepage of the latter. Please refer to the file ImportFromEMBLDb.jsp (ProteomDb projectfolder) for further details.

2) To Step -7) are implemented in the program below.

5 / 16

Java-for-beginners


ImportFromEMBLPI.jsp

The comments are mentioned in ‘//’ to explain the logic that has been used here. They arenot a part of the actual code. To get the working code, goto the ProteomDb project folderand find the file by this name.

Let us walk make the code step by step.

1) Open any HTML Editor or Java IDE (for beginners, simple Notepad or the former ispreferred. Later on, you can use Eclipse or Netbeans IDE)

6 / 16

Java-for-beginners


2) Save the file as ImportFromEMBLPI.jsp (you can give any other name as you want).Next, we add the java libraries that are to be needed for our code. You need to read theAPI to specifically understand which APIs will be needed for your proposed program.

3) Next, we mention the try-catch block and define the session. The session object isneeded to hold variables across pages and limit the duration of any Java activity (checkJava tutorial and API for further details).

7 / 16

Java-for-beginners


4) Define the URL string by taking user input from the input form.

5) Define the System Properties. These are values that are stored in a file that containsinformation to log into the internet via your home network.

6) Define the URL object to hold the URL string. Next create a file name using theRandom class object integer value.

8 / 16

Java-for-beginners


7) Now define two handles to write to:- BufferedReader in – this is to print the output from the NCBI server and get it to the JavaProgram handle

8) Declare the variables to hold the values.

9) Write the while loop to actually print the content to the browser. You could store it inan array and display it in a tabular form.

First we get the while loop which reads the output from NCBI line by line. Then we use stringhandling functions to print the different parts of the array. The EMBL text output is in text formatwith tab delimiters between them. Using string functions, we can parse the output and get theactual column values. 1) 2) 3) 4) 5) 6) 7) 8) Write the while loop to actually print the content to the browser. You could store it in an arrayand display it in a tabular form. 9) Now we try to keep a counter for managing the number of records. Again, we close the loopsand print the values. Lastly, we store the URL as a session attribute. You could redirect thepage to another welcome page if wanted using the response.sendRedirect command.

9 / 16

Java-for-beginners


Code Explanation:- The first step is to declare that this is a JSP file and that we are going to import and use javaAPI functions for our program below. //declare that this is a java program <%@ page language = "java"%> //import all the java api libraries <%@ page import = "java.util.*"%> <%@ page import = "java.io.*"%> <%@ page import="java.lang.*"%> <%@ page import="java.net.*"%> <%@ page import="java.nio.*"%> <%@ page import= "javax.xml.parsers.*" %> <%@ page import= "org.w3c.dom.*" %> <% //java program begins. System.out.println("ImportFromEMBL3.jsp"); //create an http session javax.servlet.http.HttpSession hs = request.getSession(); try //the try catch block starts here { //this is the base URL of the String EMBL db. Store it in a variable String URLString = http://string-db.org/api/psi-mi-tab/interactionsList?identifiers= + request.getParameter(“txtSearchWords”) + “& required_score=” + request.getParameter(“txtReqScore”); //store the authentication settings for connecting to the internet Properties systemSettings = System.getProperties(); systemSettings.put("http.proxyHost", "proxy.it.iitb.ac.in"); //proxy host systemSettings.put("http.proxyPort", "80"); //proxy port //time out settings for connecting to the internet. systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); //the user id and password to connect to the internet via the home network. Authenticator.setDefault(new Authenticator() { protected PasswordAuthentication getPasswordAuthentication() { return new PasswordAuthentication("aaqua", "aaqua123".toCharArray()); // specify ur username password of iitb login } }); System.setProperties(systemSettings); //set the user id and settings System.out.println(" System Properties Set"); System.out.println(" URL variables set="+URLString); URL yahoo = new URL(URLString); // store the URL in the a URL class object //create the file name for storing the output from the String db. Random rn = new Random(); int rnval = rn.nextInt() ; String fname = "file_embl_pi_"+ String.valueOf(rnval) + ".xml" ; File file = new File(fname); String tempstr = ""; //create the handle to the stream that sends the input URL to the db using the streamcreated and then starts reading the output data from the db. BufferedReader in = new BufferedReader( new InputStreamReader(yahoo.openStream()));//read the file //define the variables for reading and writing the data String inputLine; String temp = ""; String taxonId = ""; String arrVal[] ; String arrVal2[] ; String arrVal3[] ; int reccnt = 0; String tscore = ""; String score = ""; String escore = ""; String dscore = ""; String temp2 = ""; //Print to the browser that the output display starts. out.println(" List of Protein Interactors for your query "); out.println("-------------------------------------------------------- "); //this loop reads the content of the output line by line while ((inputLine = in.readLine()) != null) { //Here we just parse the data received and present it in a readable format. out.println("Record No:-"+ ( reccnt+ 1) + " "); tempstr = tempstr + inputLine ; //store the line data in another variable. temp = inputLine ; arrVal = temp.split("t"); //split the line content at each tab space in the line. Here weassume that we know the format of the output coming from the db, which is in acontinuous set of words, each separated by tab space. for( int i=0;i< arrVal.length; i++) //loop to find all the column values. { if (i == 0) //if this is the first column of the line i.e. the start of the output line { //the array has stored the data in each variable. Thus, we now just present the data to theuser on the browser. out.println("String ID:"+ arrVal[i] +" "); //data at array position 0 out.println("NCBI ID:"+ arrVal[i+1] +" ");//data at array position 1 out.println("Preferred Name:"+ arrVal[i+2] +" ");//data at array position 2 //data at array position 3 out.println("String DB Name:"+ arrVal[i+3] +" "); //data at array position 9 out.println("String Db Taxonomy ID:"+ arrVal[i+9] +" "); //data at array position 10 out.println("NCBI taxonomy ID:"+ arrVal[i+10] +" "); //data at array position 14 out.println("Score String:"+ arrVal[i+14] +" "); //sometimes the value in an array position is again a string of data. Thus, we need to split thedata in the same manner and use it to arrVal3 = arrVal[i+14].split("\|"); //split the array at each ‘’ character for( int j=0; j < arrVal3.length; j++) //loop through all the elements { out.println( "" + arrVal3[j] +" "); //print data at the index position } } } //loop ends here reccnt++; // variable to count no. Of records. out.println("-------------------------------------------------------- "); } in.close(); //close the stream //store the URL String hs.setAttribute("urlstring",URLString); System.out.println( "urlstring="+URLString); //print the file name to tomcat log } //try block ends catch(Exception ex) //catch block to handle any error or exception that may occur { out.println("Exception->"+ex); //get the handle to the write the error to the browser. PrintWriter pw = response.getWriter(); ex.printStackTrace(pw); //print the details of the error to the browser } //code ends %> //this means that the java program ends So the complete code is shown to you with the logic. Notice how easy it becomes to actuallywrite the code if the logic is clear from the start. Executing the Program Now we know that the program is working fine (trying to be confident, but this hardly happens.You will need to debug and improve you code). Thus, you have to place the file in the webappsfolder of your tomcat engine (assuming that you are using Apache Tomcat 5.5 engine with JavaSE 5 and MySQL 5.0 on the Windows XP OS). Next open any browser window (after startingthe tomcat engine service). Type the path of your input html file (this file will get the input from client and then send the datato the java program, which will get the output from the db). Thus, here we have http://localhost:8080/ProteomDb/ImportFromEMBLDb.jsp.Type this in the browser and input the parameters in the resulting client page. Press the buttonand you will get a list of protein interactors on the resulting window. Debugging or handling errors Again, it is easy to get involved with an error, especially in programming. Thus, this try/catchblock helps a lot in such cases. There is no standard way of solving errors in your code.However, you can save a lot of time by:- 1) Knowing the flow of the program. Are you sure that your program is following the correctlogic. 2) Are you sure that the parameters are correctly defined? 3) It is good to print statements at each logic sub-unit of the code so that we can actually track ifthe program is executing fine or not. In case there is any error, you will know at which point theerror has occurred. 4) Check the stack trace that is printed by Java. Mostly, it will give you the name of theexception. Just Google on the term or look in the Java API documentation to find out what doesthe error mean. Does some background check to solve the problem on the internet and youshould get the solution to the problem. 5) Check if the internet is connected and that all the support environment requirements areproperly running. For e.g.) The internet may not be connected and it may give you a ConnectTimeOutexception. Just google on the term ‘ConnectTimeOut’ and you will get a list of possible answers.One of the most common reasons is that the internet is not working properly or the input settings for connecting to the internet iswrong or missing. 6) In case you see a statement in the stacktrace, it will also show you the line no. Where thiserror occurred. Check the syntax of the statement in the API documentation or in the books tocorrect the code. 7) You must read and understand the Java language properly to be able to write a goodprogram. Reading the above resources once is not enough. Practice is the key, for newbeginners as well as experts. Everybody faces problems in coding and the more you know thelanguage, the better will be you resulting program. Fetch Data from NCBI I am assuming that you are aware of the terms URL (Uniform Resource) The following are the four steps in which data is fetched from NCBI 1) Take input from user. Fetching data from NCBI uses the Entrez programming utility. In this step, we present an HTML page to the user which will take the necessary informationfrom the user. The parameters of the request URL (in this case NCBI Entrez URL) that is to beused is set as in the input field. For e.g., if one of the parameters to be added is the searchwords, then we will add a field in the input form called “Search Term”. This field will take in thesearch terms that the user enters. A snapshot of the form is shown below.

In this step, we first create an interface (or a user page) to allow the user to enter the inputparameters for the page. For e.g. in the functionality to enter the NCBI details, we need to askthe user to enter the search terms (if you have seen the URL values in the Entrez ProgramingUtility, you will find that there is a field called “term”). Thus, we will need to create a URL inEntrez format based on the input parameters provided. On looking at the Entrez utility to fetch a list of records for a particular search term, we use theeSearch method. This eSearch method is the base URL that you will use to fetch a list ofrecords from NCBI database. The URL is http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi ? Please check the Entrez utility for furtherdetails. Now, the next task is to find the set of parameters that are to be sent along with thisURL. In the input page, there are two field defined. 1) Database Name: - This is to tell the NCBI server which database is to be searched. 2) Search Term: - This is to tell the NCBI server the search terms for which the records are tobe returned. Look up the Entrez utility page on NCBI. There in the eSearch page, you will find the keyparameters. For the database name, the keyword will be “db” and for search term, the keywordis “term”. Again, we need to tell the server how many records are to be returned. By default, NCBI willreturn a maximum of 50 records. Also, you have to tell NCBI which page is to be returned. Sayfor e.g. there are 12000 records found for your search term. NCBI returns 50 records per page,and then it will return 12000 / 50 = 240 pages. Thus, you will need to define which page is to bereturned right? The values for this 2 parameters are called “retmax” and “retstart”. These parameters arepre-decided by us and the client does not control these values. Here, we have kept the value of“retmax” as 50 and “retstart” as the page no. for the search page. OK, now you have all the parameters along with the base URL. Thus, you need to send thisURL to the second program (that will actually send the request to NCBI and get the output). In order to make this page, you just need the simple HTML editor and some basicHTML/JavaScript programming. I have attached a simple code file for your reference. You mightwant to look at www.w3schools.com for further details on learning them. In the form presented above, the user enters the keyword values and then presses the “Submit”button. This sends the details of the page to the second program (explained in Step-2) 2) Send request and response from NCBI Now we have received the user input. Assume that the user has entered the keyword “cancer”search term and the database is the “genome”. Now we need to present the NCBI server withthe input in the specific Entrez format and then send the request. OK, the URL that will be created is something like:- http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=genome&term=cancer We need to pass this URL to the Java program here which will send it to the NCBI server.Please note that we are using the same URL pattern to get the different pages from theresulting output page. For e.g. once the first 50 records are shown to the user (Step-4), the usermay want to browse the next or previous pages too. For this, we need to send a request to thissame Java program. Again, how do we know if the request is from the client page or from theoutput page. For this, we have used the variable “loopflag”. For requests from client page,loopflag = no. Snapshots of the program. 1) First include the libraries.

2) Start the try catch block and

10 / 16

http://string-db.org/api/psi-mi-tab/interactionsList?identifiers=

http://localhost:8080/ProteomDb/ImportFromEMBLDb.jsp

http://localhost:8080/ProteomDb/ImportFromEMBLDb.jsp

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi

http://www.w3schools.com/

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=genome&term=cancer

Java-for-beginners


Declare the session variable and start the try-catch block. 3) Define the System properties for connecting to the internet from the home network.

4) Now check if this file has been called from the client or from the output file.

We have placed a flag and check if its value. If it is from the client side, then it’s value will be no.Else, it will be next or previous as per the value sent from the output file. 5) Set the connection to the URL of the NCBI server for the database. Note that the methodshave been set for the connection. The connection object will connect to the NCBI URL and isset to accept input, send output with no cache storage and no user interaction. This connectionwill also accept text/xml output.

11 / 16

Java-for-beginners


6) Declare the variables. Create an object of BufferedReader to read the input from the NCBIserver.

7) Now we use the Random class object to create the file name. We create the BufferedWriterclass object to write the contents to a file. Next, we create a

while loop to actually read each content line by line and write the content to the XML file presentin the tomcat home directory. 8) Now we try to close the loop and the objects that we created. Next, we store the file nameand the variables to the session. Next, we print the values and then redirect the page to parsethe XML file.

Check the xml file that was generated to view the output. Hope this explains the code. The while file is mentioned below Here is the Java program for our reference:- File Name:- ImportFromNCBI3.jsp // comments are entered in java code in this format. //define the language for this page as Java <%@ page language = "java"%> //import the libraries for java <%@ page import = "java.util.*"%> <%@ page import = "java.io.*"%> <%@ page import="java.lang.*"%> <%@ page import="java.net.*"%> <%@ page import="java.nio.*"%> <%@ page import= "javax.xml.parsers.*" %> <%@ page import= "org.w3c.dom.*" %> <% //print the message to the log file of Tomcat engine. System.out.println("ImportFromNCBI3.jsp"); //create the session variable javax.servlet.http.HttpSession hs = request.getSession(); try //start the try-catch block. { String URLString = ""; //variable to hold the URL //store the authentication settings for connecting to the internet Properties systemSettings = System.getProperties(); systemSettings.put("http.proxyHost", "proxy.it.iitb.ac.in"); //proxy host systemSettings.put("http.proxyPort", "80"); //proxy port //time out settings for connecting to the internet. systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); //the user id and password to connect to the internet via the home network. Authenticator.setDefault(new Authenticator() { protected PasswordAuthentication getPasswordAuthentication() { return new PasswordAuthentication("aaqua", "aaqua123".toCharArray()); // specify ur username password of iitb login } }); System.setProperties(systemSettings); //set the user id and settings System.out.println(" System Properties Set"); System.out.println(" URL variables set="+URLString);//press the URL String retstart = "";//variable to set the page number String retmax = "";//variable to set the maximum records per page /*Is the request coming from the client or from the output file. Is it a different page beingrequested for the same search request or is it a new request coming from the client. If the latteris true, then the variable loopflag = no, else it will be false. This variable loopflag is set in theclient input form and the value received from there to this program. */ if (request.getParameter("loopflag").toString().equals("no") ) {//this is a page from the client input. //the URL data is sent from the previous input. The javascript converts the //client input into NCBI format URL and then sends the details via a text //field named “txtURLString”. Only the parameters retmax and retstart are //added here URLString = request.getParameter("txtURLString").toString(); System.out.println("loopflag=no");//print message:- the loopflag value is no retstart = request.getParameter("retstart") ; //get the value of retstart from client retmax = request.getParameter("retmax");//get the value of retmax from client //print the values of the parameters to the apache tomcat log file. System.out.println("loopflag=else part"); System.out.println("retstart="+retstart); System.out.println("retmax="+retmax); System.out.println("url_IMPORTFROM="+request.getParameter("txtURLString")); //contruct the final URL and store in the variable //set the value of page number in session variable URLString = (String)request.getParameter("txtURLString")+ "&retstart=" + retstart + "&retmax="+ retmax; //set the value of no. of records in session variable //value of retstart and retmax stored in the variable hs.setAttribute("retstart",retstart); hs.setAttribute("retmax",retmax); } else if ( request.getParameter("loopflag").toString().equals("next") )//loop { System.out.println("next loopflag="+hs.getAttribute("next")); URLString = (String)hs.getAttribute("next"); //database name stored in the session variable. hs.setAttribute("dbname", hs.getAttribute("dbname") ); //database System.out.println("loopflag=next"); //next page is being clicked. } else if ( request.getParameter("loopflag").toString().equals("previous") ) //loop input if loopflag is from the ouput page in which the previous page //has been requester i.e. the user has clicked on the “previous” page of the //output page. { //previous page is being clicked. System.out.println("previous loopflag="+hs.getAttribute("previous")); //database name is being set in to the session variable. hs.setAttribute("dbname", hs.getAttribute("dbname") ); //previous page value being set into the session variable. URLString = (String)hs.getAttribute("previous"); System.out.println("loopflag=previous"); } else { //none of the above cases were executed. retstart = request.getParameter("retstart") ; retmax = request.getParameter("retmax"); System.out.println("loopflag=else part"); System.out.println("retstart="+retstart); System.out.println("retmax="+retmax); System.out.println("url_IMPORTFROM="+request.getParameter("txtURLString")); //print theURL string //store the variables in session.. hs.setAttribute("retstart",retstart); hs.setAttribute("retmax",retmax); hs.setAttribute("dbname", hs.getAttribute("dbname") );//database name } System.out.println(" URL variables set="+URLString); URL url = new URL(URLString); //url string taken from user input. //define the HTTPURLConnection object HttpURLConnection connection = null; //open the connection to the following URL. connection = (HttpURLConnection) url.openConnection(); //set the connection type to POST method. connection.setRequestMethod("POST"); connection.setDoInput(true);//allow the user to give input connection.setDoOutput(true);//allow the connection to give output connection.setUseCaches(false);//don’t use cache to store temporary results. connection.setAllowUserInteraction(false); //don’t allow user interaction for this //connection //set the content type of output to XML/text format. This means that the output will //be of text orXML format. connection.setRequestProperty ("Content-Type","text/xml; charset="utf-8""); //print message. This will help in tracking the errors in tomcat log file. System.out.println(" connection Set"); //set the channel to get the input. This is called the input stream. We use the //buffered readerfor getting the input. Open the input stream using the buffered //reader. BufferedReader in = new BufferedReader( new InputStreamReader(connection.getInputStream())); String decodedString;//variable to hold the decoded string String tempstr = ""; //variable to store the single output System.out.println("Reader Set");//message – stream set Random rn = new Random(); //random object to get a random integer int rnval = rn.nextInt() ; //get a random integer. //create the XML file name String fname = "file_ncbi_"+ String.valueOf(rnval) + ".xml" ; //a channel to handle the output from the NCBI and then write it to the XML file specified above. BufferedWriter bw = new BufferedWriter(new FileWriter(fname)); //loop through all the records line by line while ((decodedString = in.readLine()) != null) { tempstr = tempstr + decodedString;// append the output line to the existing o/p //out.println(tempstr); bw.write(decodedString); //write the output line to the XML file. if ( tempstr.indexOf("/") == -1 ) //find if there is a new line started { bw.newLine(); //if yes, then start on a new line } } System.out.println(" output given.Set"); //message – output set. bw.close(); //close the streams in.close(); //store the file name and URL in session variable. This value will be passed onto //the nextprogram in Step-3) hs.setAttribute("NCBIfile",fname.toString()); //store the file name in session //variable hs.setAttribute("urlstring",URLString); //print the values of the variables. System.out.println("Session variable Set"); System.out.println( "fname="+fname.toString() ); System.out.println( "urlstring="+URLString); //move the control to the next page. response.sendRedirect("/ProteomDb/FetchDataFromNCBI.jsp"); } catch(Exception ex) //catch any exceptions { out.println("Exception->"+ex); //print the exception PrintWriter pw = response.getWriter(); //get the handle to the output stream for //this JSP page ex.printStackTrace(pw); //print the error stack details to the JSP page }//try catch block ends. %> 3) Parse the output response from NCBI So now you have the output data from the previous program. The problem is that this data is inXML format. In an XML file, data is present in the form of tags. The value will be presentbetween the tags. We need to extract this information between each tag. This is what thecurrent Java program does. Please note that the name of the XML file is needed here. We assume that we are having theXML file at present. I have attached a snapshot besides each code shunk to explain how itlooks

<%//@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8"%> <%@ page language = "java" // set the page language to java %> //import the java libraries <%@ page import = "java.util.*"%> <%@ page import = "java.io.*"%> <%@ page import="java.lang.*"%> <%@ page import="java.net.*"%> <%@ page import="java.nio.*"%> <%@ page import = "java.sql.*" %> <%@ include file = "header.jsp" %> <%@ page import = "javax.xml.parsers.*" %> <%@ page import = "org.w3c.dom.*" %>

12 / 16

Java-for-beginners


<% try { //create the object for session. javax.servlet.http.HttpSession hs = request.getSession(); //get the name of the file that was stored in the session variable. String fname = (String) hs.getAttribute("NCBIfile"); System.out.println("fname = " + fname.toString() ); //print the file name.

File file = new File(fname); //create the file name //create the DocumentBuilderFactory class object. This will be the handle to //the XMLfile. Create a document object and map it to the XML file. DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(file); doc.getDocumentElement().normalize(); //get the root element and print its value to the tomcat node. System.out.println("Root element " + .getDocumentElement().getNodeName()); //total no. of records retrieved in the search

//get the NodeList Object and get the handle to the tag – Count . NodeList nodeLst_cnt = doc.getElementsByTagName("Count"); //get the first item for the tag Count. Element fstNmElmnt_cnt = (Element) nodeLst_cnt.item(0); //get all the child nodes for the tag Count. NodeList fstNm_cnt = fstNmElmnt_cnt.getChildNodes(); System.out.println("Document Count : " + ((Node) fstNm_cnt.item(0)).getNodeValue()); //get the node value at first child node and print the value to the Tomcat log file.

int rec_cnt = Integer.valueOf( ((Node) fstNm_cnt.item(0)).getNodeValue() ); if (rec_cnt != 0 ) //check if the no. Of records is > 0 or not. { System.out.println("rec_cnt not 0");//print the message to tomcat log. //Ret Max variable. Return no. of records to be shown per page. here it is 20. //Now we try to get the value for the tag RetMax in the same way. NodeList nodeLst_retmax = doc.getElementsByTagName("RetMax"); Element fstNmElmnt_retmax = (Element) nodeLst_retmax.item(0); NodeList fstNm_retmax = fstNmElmnt_retmax.getChildNodes(); System.out.println("Document RetMax : " + ((Node) fstNm_retmax.item(0)).getNodeValue()); int rec_max = Integer.valueOf( ((Node) fstNm_retmax.item(0)).getNodeValue() );

//Now we try to get the value for the tag RetStart in the same way. NodeList nodeLst_retstart = doc.getElementsByTagName("RetStart"); Element fstNmElmnt_retstart = (Element) nodeLst_retstart.item(0); NodeList fstNm_retstart = fstNmElmnt_retstart.getChildNodes(); System.out.println("Document RetStart : " + ((Node) fstNm_retstart.item(0)).getNodeValue()); int rec_start = Integer.valueOf( ((Node) fstNm_retstart.item(0)).getNodeValue() );

//The node below is IdList. This tag contains a list of all the record id tags for the searchresults. Thus, this will be a loop and have multiple child nodes. NodeList nodeLst = doc.getElementsByTagName("IdList");//get the handle to tag System.out.println("Information of all ids WITH IdList Length=" + nodeLst.getLength() ); //printthe number of tags present for IdList. //the tag IdList has several child nodes called Id. We need to first count the number oftags and then parse each of the tags.

13 / 16

Java-for-beginners


for (int s = 0; s < nodeLst.getLength(); s++) //loop through all the “Id” node. { Node fstNode = nodeLst.item(s); //get the handle to the node at position s. System.out.println("in first for loop "); //print the for loop. //check if the node is of the type ELEMENT if (fstNode.getNodeType() == Node.ELEMENT_NODE) { System.out.println("in first if condition "); Element fstElmnt = (Element) fstNode; //Assign the node //get the element to the node Id. NodeList fstNmElmntLst = stElmnt.getElementsByTagName("Id"); //GET The string array to store the Ids. String[] pubmedids = new String[fstNmElmntLst.getLength()];

for (int h = 0; h < fstNmElmntLst.getLength(); h++) //gets all the Id //tag nodes { System.out.println("in second for loop "); //get the handle to each element of the node at position h. Element fstNmElmnt = (Element) fstNmElmntLst.item(h); NodeList fstNm = fstNmElmnt.getChildNodes();//get the child //nodes. pubmedids[h] = (String) fstNm.item(0).getNodeValue() ;//store //the value in array. //print the node value. System.out.println("Id : " + ( (Node) fstNm.item(0)).getNodeValue() + " "); System.out.println("pubmedids@h=:" + h + "=" + pubmedids[h] + " "); } //store the variables in the session variable.

hs.setAttribute("pubmedids", pubmedids); hs.setAttribute("dbname","genome"); hs.setAttribute("retstart", String.valueOf(rec_start) ); hs.setAttribute("retmax", String.valueOf(rec_max) ); hs.setAttribute("retcnt",String.valueOf(rec_cnt));

//print the values of the variables. Next close the loop. System.out.println("retstart="+rec_start); System.out.println("retmax="+rec_max); System.out.println("rec_cnt="+rec_cnt); } } //redirect to the next page to show the output. If there were no records //returned, thenredirect to the first client page and ask him to enter the //details again.

response.sendRedirect("/ProteomDb/ShowDataFromNCBI.jsp"); } else { System.out.println("rec_cnt=0"); //if the error is found then redirect to the client page with //message. response.sendRedirect("/ProteomDb/ImportFromNCBIDb.jsp?noresultsflag=true&db"); } //close the try-catch block. If there is any error present, then the description of the //same shouldbe shown.

} catch (Exception e) //catch the exception. { e.printStackTrace(); //print the stack trace } finally {} %> 4) Display the formatted output to the user Create a simple HTML page using any HTML editor or write the code yourself. Screenshotsbelow.

14 / 16

Java-for-beginners


Then you can Right, now we have got the output in the text formatted. This output is in the form of a Stringarray. Now, we need to just parse the loop through each of the elements of the array anddisplay it to the user in a tabular format. This is a very simple code and should be self-explanatory. Here we just get the values from thesession variable and use them to present the code for each database. Please remember thatthis program allows you to browse through all the NCBI databases and thus many categories foreach case will be present. I suggest you go through the tutorials and references mentioned above in the manual beforereading the code. <%@ page language = "java" %> <%//@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8"%> <%@ page language = "java"%> <%@ page import = "java.util.*"%> <%@ page import = "java.io.*"%> <%@ page import="java.lang.*"%> <%@ page import="java.net.*"%> <%@ page import="java.nio.*"%> <%@ page import = "java.sql.*" %> <%@ include file = "header.jsp" %> <%@ page import = "javax.xml.parsers.*" %> <%@ page import = "org.w3c.dom.*" %> <% javax.servlet.http.HttpSession hs = request.getSession(); String pubmedids[] = (String[]) hs.getAttribute("pubmedids"); String next = ""; String previous = "" ; System.out.println("ShowDataFromNCBI.jsp"); String loopflag = "yes"; String operation = ""; String recordspage = (String) hs.getAttribute("dbname"); String pmid_url_base ="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=gp&retmode=xml&id="; String url ="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=gp&retmode=xml&id=" ; String pagename = ""; if ( recordspage.equals("genome") ) { url ="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=genome&rettype=gp&retmode=xml&id="; pagename = "ImportGenomeDetails3.jsp"; out.println("db=genome&pagename="+pagename+" "); hs.setAttribute("linkurl",url); } //http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id=2&rettype=native&retmode=xml if ( recordspage.equals("taxonomy") ) { url ="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&rettype=gp&retmode=xml&id="; pagename = "ImportTaxonomyDetails3.jsp"; hs.setAttribute("linkurl",url); } //http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=unists&id=254085,254086&retmode=xml if ( recordspage.equals("unists") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=unists&retmode=xml&id="; pagename = "ImportUnistsDetails3.jsp"; hs.setAttribute("linkurl",url); } //http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=unists&id=254085,254086&retmode=xml if ( recordspage.equals("structure") ) { url ="http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=structure&retmode=xml&id="; pagename = "ImportStructureDetails3.jsp"; hs.setAttribute("linkurl",url); } //------- if ( recordspage.equals("biosystems") ) { url ="http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=biosystems&retmode=xml&id="; pagename = "ImportBioSystemsDetails3.jsp"; hs.setAttribute("linkurl",url); } //books if ( recordspage.equals("books") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=books&retmode=xml&id="; pagename = "ImportBooksDetails3.jsp"; hs.setAttribute("linkurl",url); } //cancerchromosomes if ( recordspage.equals("cancerchromosomes") ) { url ="http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=cancerchromosomes&retmode=xml&id="; pagename = "ImportCancerChromosomesDetails3.jsp"; hs.setAttribute("linkurl",url); } //cdd if ( recordspage.equals("cdd ") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=cdd&retmode=xml&id="; pagename = "ImportCddDetails3.jsp"; hs.setAttribute("linkurl",url); } //gap if ( recordspage.equals("gap") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gap&retmode=xml&id="; pagename = "ImportCancerChromosomesDetails3.jsp"; hs.setAttribute("linkurl",url); } //domains if ( recordspage.equals("domains") ) { url ="http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=domains&retmode=xml&id="; pagename = "ImportDomainsDetails3.jsp"; hs.setAttribute("linkurl",url); } //gene if ( recordspage.equals("gene") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&retmode=xml&id="; pagename = "ImportGeneDetails3.jsp"; hs.setAttribute("linkurl",url); } //genomeprj if ( recordspage.equals("genomeprj") ) { url ="http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=genomeprj&retmode=xml&id="; pagename = "ImportGenomeprjDetails3.jsp"; hs.setAttribute("linkurl",url); } // gensat if ( recordspage.equals("gensat") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gensat&retmode=xml&id="; pagename = "ImportGensatDetails3.jsp"; hs.setAttribute("linkurl",url); } //geo if ( recordspage.equals("geo") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=geo&retmode=xml&id="; pagename = "ImportGeoDetails3.jsp"; hs.setAttribute("linkurl",url); } //gds if ( recordspage.equals("gds") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gds&retmode=xml&id="; pagename = "ImportGdsDetails3.jsp"; hs.setAttribute("linkurl",url); } //homologene if ( recordspage.equals("homologene") ) { url ="http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=homologene&retmode=xml&id="; pagename = "ImportHomologeneDetails3.jsp"; hs.setAttribute("linkurl",url); } // journals if ( recordspage.equals("journals") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=journals&retmode=xml&id="; pagename = "ImportJournalsDetails3.jsp"; hs.setAttribute("linkurl",url); } //mesh if ( recordspage.equals("mesh") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=mesh&retmode=xml&id="; pagename = "ImportMeshDetails3.jsp"; hs.setAttribute("linkurl",url); } //ncbisearch if ( recordspage.equals("ncbisearch") ) { url ="http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=ncbisearch&retmode=xml&id="; pagename = "ImportNcbisearchDetails3.jsp"; hs.setAttribute("linkurl",url); } //nlmcatalog if ( recordspage.equals("nlmcatalog") ) { url ="http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nlmcatalog&retmode=xml&id="; pagename = "ImportNlmcatalogDetails3.jsp"; hs.setAttribute("linkurl",url); } //code starts // omia if ( recordspage.equals("omia") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=omia&retmode=xml&id="; pagename = "ImportOmiaDetails3.jsp"; hs.setAttribute("linkurl",url); } //omim if ( recordspage.equals("omim") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=omim&retmode=xml&id="; pagename = "ImportOmimDetails3.jsp"; hs.setAttribute("linkurl",url); } //pepdome if ( recordspage.equals("pepdome") ) { url ="http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pepdome&retmode=xml&id="; pagename = "ImportPepdomeDetails3.jsp"; hs.setAttribute("linkurl",url); } //pmc if ( recordspage.equals("pmc") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pmc&retmode=xml&id="; pagename = "ImportPmcDetails3.jsp"; hs.setAttribute("linkurl",url); } // popset if ( recordspage.equals("popset") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=popset&retmode=xml&id="; pagename = "ImportPopsetDetails3.jsp"; hs.setAttribute("linkurl",url); } // probe if ( recordspage.equals("probe") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=probe&retmode=xml&id="; pagename = "ImportprobeDetails3.jsp"; hs.setAttribute("linkurl",url); } //proteinclusters if ( recordspage.equals("proteinclusters") ) { url ="http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=proteinclusters&retmode=xml&id="; pagename = "ImportProteinclustersDetails3.jsp"; hs.setAttribute("linkurl",url); } //pcassay if ( recordspage.equals("pcassay") ) { url ="http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pcassay&retmode=xml&id="; pagename = "ImportPcassayDetails3.jsp"; hs.setAttribute("linkurl",url); } //code starts //pccompound if ( recordspage.equals("pccompound") ) { url ="http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pccompound&retmode=xml&id="; pagename = "ImportPccompoundDetails3.jsp"; hs.setAttribute("linkurl",url); } //pcsubstance if ( recordspage.equals("pcsubstance") ) { url ="http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pcsubstance&retmode=xml&id="; pagename = "ImportPcsubstanceDetails3.jsp"; hs.setAttribute("linkurl",url); } //snp if ( recordspage.equals("snp") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=snp&retmode=xml&id="; pagename = "ImportSnpdomeDetails3.jsp"; hs.setAttribute("linkurl",url); } //sra if ( recordspage.equals("sra") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=sra&retmode=xml&id="; pagename = "ImportSraDetails3.jsp"; hs.setAttribute("linkurl",url); } // toolkit if ( recordspage.equals("toolkit") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=toolkit&retmode=xml&id="; pagename = "ImportToolkitDetails3.jsp"; hs.setAttribute("linkurl",url); } // toolkitall if ( recordspage.equals("toolkitall") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=toolkitall&retmode=xml&id="; pagename = "ImportToolkitallDetails3.jsp"; hs.setAttribute("linkurl",url); } //unigene if ( recordspage.equals("unigene") ) { url = "http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=unigene&retmode=xml&id="; pagename = "ImportUnigeneDetails3.jsp"; hs.setAttribute("linkurl",url); } //code ends //---------- String retstart = hs.getAttribute("retstart").toString(); String retmax = hs.getAttribute("retmax").toString() ; String retcnt = hs.getAttribute("retcnt").toString() ; System.out.println("url_string="+ hs.getAttribute("urlstring")); String urlstring=null; urlstring = (String)hs.getAttribute("urlstring"); System.out.println("url_sheesecond="+ urlstring); next = previous = urlstring ; out.println("retstart="+retstart); out.println("retmax="+retmax); out.println("retcnt="+retcnt); out.println("urstring="+urlstring); String tempvar = ""; tempvar = urlstring.substring(0, urlstring.indexOf("&retstart")) ; System.out.println("tempvar="+tempvar); urlstring= tempvar ; //String hrefvalue = "txtURLString="+urlstring ; int maxpage = 0 ; if ( ( retcnt != null ) && (retmax != null ) ) { maxpage = Integer.valueOf(retcnt) % Integer.valueOf(retmax) ; //no of pages to be shown System.out.println("maxpage="+maxpage); } if ( Integer.valueOf(retstart) < maxpage ) // if curpage is less than lastpage { next = tempvar + "&retstart=" + (Integer.valueOf(retstart)+1) + "&retmax=" + retmax; //find the value for previous if ( Integer.valueOf(retstart) > 0 ) { //previous = hrefvalue + "&retstart=" + (Integer.valueOf(retstart)-1) + "&retmax=" + retmax; previous = urlstring + "&retstart=" + (Integer.valueOf(retstart)-1) + "&retmax=" + retmax; System.out.println("previous="+previous); } else//first page { previous = "NA"; } } else { next = "NA"; //find the value for previous if ( Integer.valueOf(retstart) > 0 ) { //previous = hrefvalue + "&retstart=" + (Integer.valueOf(retstart)-1) + "&retmax=" + retmax; previous = urlstring + "&retstart=" + (Integer.valueOf(retstart)-1) + "&retmax=" + retmax; } else//first page { previous = "NA"; } } %> <html> <head> <title>New Page 1</title> <style type="text/css"> .style1 { border: 1px solid #3399FF; background-color: #99CCFF; font-size: small; } .style2 { border: 1px solid #3399FF; font-weight: bold; background-color: #99CCFF; font-size: small; } .style3 { border-collapse: collapse; border: 1px solid #3399FF; background-color: #FFFFFF; } .style4 { border: 1px solid #3399FF; } .style5 { text-align: right; } .style7 { font-size: large; font-weight: bold; } .style8 { font-size: large; } .style9 { text-align: center; } </style> </head> <form type=POST action="cancer_des.jsp"> <body> <div align="center"> <center> <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse"bordercolor="#111111" width="68%" id="AutoNumber2" height="58"> <tr> <td width="150%" height="19"> Search Results Page for <%=(recordspage)%>Database</td> </tr> <tr> <td width="150%" height="19" class="style9"> <%=("Select your choice and click your operation button")%> </td> </tr> </table> </center> </div> The following are the search results of your query. Please click on the link to view the paper. <div align="center"> <center> <table cellpadding="0" cellspacing="0" style="padding: 2 4; width: 71%; height: 82px;"id="AutoNumber1" class="style3"> <tr> <td align="center" style="width: 13%" class="style2">Serial No</td> <td align="center" style="width: 23%" class="style2">Record ID</td> <td align="center" style="width: 73%" class="style1">URL</td> </tr> <% for (int i=0;i < pubmedids.length ; i++ ) { %> <tr> <td bgcolor="#FFFFFF" align="center" style="width: 13%" class="style4"><%=(i+1)%></td> <td bgcolor="#FFFFFF" align="center" style="width: 23%" class="style4"><%=( pubmedids[i])%></td> <td bgcolor="#FFFFFF" align="center" style="width: 73%" class="style4"> <a href="<%=( pagename + "?loopflag=no&id=" + pubmedids[i] )%>"> View Record</a> </td> </tr> <% } %> <tr> <td bgcolor="#FFFFFF" align="center" style="width: 13%" class="style4"> </td> <td bgcolor="#FFFFFF" align="center" style="width: 23%" class="style4"> </td> <td bgcolor="#FFFFFF" align="center" style="width: 73%" class="style4"> <% System.out.println("next="+next); System.out.println("previous="+previous); hs.setAttribute("next",next); hs.setAttribute("previous",previous); if (next != "NA" ) { %> <a href="<%=( "ImportFromNCBI3.jsp?" + "loopflag=next")%>">Next</a> <% } %> <% if (previous != "NA" ) { %> <a href="<%=( "ImportFromNCBI3.jsp?" + "loopflag=previous")%>">Previous</a> <% } %> </td> </tr> </table> </center> </div> <a href="mainpage.jsp">Back</a> </body> </form> </html> Screenshot of the execution of the above file (assuming it has received the output from theprevious step of parsing the XML file).

So now we have displayed the list of records. Now the client will click on a particular record.Thus, this request is sent to the NCBI server in the same manner. We follow the same 4 steps(Step- 1 being the output file of last request. The final output will be a complete set of details forthe selected record as shown in this file)

15 / 16

Java-for-beginners


So the first step is the output file having the list of records from the eSearch utility. Now therequest goes to the java program that sends it to the NCBI server. WE use the same programbut the URL will be different. We use the eFetch utility to get the details in the XML format. The Java program to send request and receive response. The code should be self-explanatorynow after the previous java program explained in the same fashion. File Name:- ImportGenomeDetails3.jsp <%//@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8"%> <%@ page language = "java"%> <%@ page import = "java.util.*"%> <%@ page import = "java.io.*"%> <%@ page import="java.lang.*"%> <%@ page import="java.net.*"%> <%@ page import="java.nio.*"%> <%@ page import= "javax.xml.parsers.*" %> <%@ page import= "org.w3c.dom.*" %> <% System.out.println("ImportGenomeDetails3.jsp"); javax.servlet.http.HttpSession hs = request.getSession(); try { String URLString = ""; Properties systemSettings = System.getProperties(); systemSettings.put("http.proxyHost", "proxy.it.iitb.ac.in"); systemSettings.put("http.proxyPort", "80"); systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); Authenticator.setDefault(new Authenticator() { protected PasswordAuthentication getPasswordAuthentication() { return new PasswordAuthentication("aaqua", "aaqua123".toCharArray()); // specify ur username password of iitb login } }); System.setProperties(systemSettings); System.out.println(" System Properties Set"); String retstart = ""; String retmax = ""; URLString ="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=genome&rettype=gp&retmode=xml&id=" + request.getParameter("id").toString(); System.out.println(" URL variables set="+URLString); URL url = new URL(URLString); //url string taken from user input. HttpURLConnection connection = null; connection = (HttpURLConnection) url.openConnection(); connection.setRequestMethod("POST"); connection.setDoInput(true); connection.setDoOutput(true); connection.setUseCaches(false); connection.setAllowUserInteraction(false); connection.setRequestProperty ("Content-Type","text/xml; charset="utf-8""); System.out.println(" connection Set"); BufferedReader in = new BufferedReader( new InputStreamReader(connection.getInputStream())); String decodedString; String tempstr = ""; System.out.println("Reader Set"); Random rn = new Random(); int rnval = rn.nextInt() ; String fname = "genomefile_ncbi_"+ String.valueOf(rnval) + ".xml" ; //File file = new File(fname); BufferedWriter bw = new BufferedWriter(new FileWriter(fname)); while ((decodedString = in.readLine()) != null) { tempstr = tempstr + decodedString; //out.println(tempstr); bw.write(decodedString); if ( tempstr.indexOf("/") == -1 ) { bw.newLine(); } } System.out.println(" output given.Set"); bw.close(); in.close(); hs.setAttribute("genomencbifile",fname.toString()); hs.setAttribute("urlstring",URLString); System.out.println("Session variable Set"); System.out.println( "fname="+ fname.toString() ); System.out.println( "urlstring=" + URLString ); response.sendRedirect("/ProteomDb/FetchGenomeDetailsDataFromNCBI.jsp"); } catch(Exception ex) { out.println("Exception->"+ex); PrintWriter pw = response.getWriter(); ex.printStackTrace(pw); } %> I hope the code is self-explanatory. It is similar to the step -2 of fetching records usingthe eSearch utility. The next step is actually parsing the XML file that is generated. We assume here that youhave access to the XML file on your machine. This file is very complicated to parse and contains nested loops. If you have understoodthe previous file, then we can easily follow the flow and parse through each tag. You willneed to keep the XML file open in another window so that you can understand the file wediscuss here. File -> FetchGenomeDetailsFromNCBI.jsp //set the language to java and the encoding type to XML/Text <%@ page language = "java" %> <%//@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8"%> //import the java libraries. <%@ page import = "java.util.*"%> <%@ page import = "java.io.*"%> <%@ page import="java.lang.*"%> <%@ page import="java.net.*"%> <%@ page import="java.nio.*"%> <%@ page import = "java.sql.*" %> <% //@ include file = "header.jsp" %> <%@ page import = "javax.xml.parsers.*" %> <%@ page import = "org.w3c.dom.*" %> <% //get the base URL String pmid_url_base ="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids="; out.println("FetchGenomeDetailsFromNCBI.jsp"); try { //create the session object javax.servlet.http.HttpSession hs = request.getSession(); //get the file name from the session variable String fname = (String) hs.getAttribute("genomencbifile"); //print the file name out.println("fname = " + fname.toString() ); //get the handle of the file name File file = new File(fname); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(file); doc.getDocumentElement().normalize(); //get the value for the parent element GbSet(an element without any child //nodes) NodeList nodeLst_gbset = doc.getElementsByTagName("GBSet"); Element fstNmElmnt_gbset = (Element) nodeLst_gbset.item(0); NodeList fstNm_gbset = fstNmElmnt_gbset.getChildNodes(); //out.println("Document GBSet : " + ((Node) fstNm_gbset.item(0)).getNodeValue()); out.println("GBSet node "); //get the value of the node GBSeq NodeList nodeLst = doc.getElementsByTagName("GBSeq"); //out.println(" Information of all ids.Length="+nodeLst.getLength() + " "); Node fstNode = nodeLst.item(0); // get the value of the node GbSeq_locus NodeList nodeLst_gbseqlocus = doc.getElementsByTagName("GBSeq_locus"); Element fstNmElmnt_gbseqlocus = (Element) nodeLst_gbseqlocus.item(0); NodeList fstNm_gbseqlocus = fstNmElmnt_gbseqlocus.getChildNodes(); out.println(" Document GBSeqlocus : " + ((Node)fstNm_gbseqlocus.item(0)).getNodeValue() + " "); // GbSeq_locus. // get the value of the node GbSeq_length NodeList nodeLst_gbseqlen = doc.getElementsByTagName("GBSeq_length"); Element fstNmElmnt_gbseqlen = (Element) nodeLst_gbseqlen.item(0); NodeList fstNm_gbseqlen = fstNmElmnt_gbseqlen.getChildNodes(); out.println("Document GBSeqlen : " + ((Node)fstNm_gbseqlen.item(0)).getNodeValue() + " "); // GbSeq_length code ends.. // get the value of the node GBSeq_definition NodeList nodeLst_gbseqdefinition = doc.getElementsByTagName("GBSeq_definition"); Element fstNmElmnt_gbseqdefinition = (Element) nodeLst_gbseqdefinition.item(0); NodeList fstNm_gbseqdefinition = fstNmElmnt_gbseqdefinition.getChildNodes(); out.println(" Document GBSeq_definition : " + ((Node)fstNm_gbseqdefinition.item(0)).getNodeValue() + " "); //GBSeq_definition ends // get the value of the node GBSeq_strandedness NodeList nodeLst_gbseqstrandedness = doc.getElementsByTagName("GBSeq_strandedness"); Element fstNmElmnt_gbseqstrandedness = (Element) nodeLst_gbseqstrandedness.item(0); NodeList fstNm_gbseqstrandedness = fstNmElmnt_gbseqstrandedness.getChildNodes(); out.println(" Document GBSeq_strandedness : " + ((Node)fstNm_gbseqstrandedness.item(0)).getNodeValue() + " "); //GBSeq_strandedness ends // get the value of the node GBSeq_moltype NodeList nodeLst_gbseqmoltype = doc.getElementsByTagName("GBSeq_moltype"); Element fstNmElmnt_gbseqmoltype = (Element) nodeLst_gbseqmoltype.item(0); NodeList fstNm_gbseqmoltype = fstNmElmnt_gbseqmoltype.getChildNodes(); out.println(" Document GBSeq_moltype : " + ((Node)fstNm_gbseqmoltype.item(0)).getNodeValue() + " "); //GBSeq_moltype // get the value of the node GBSeq_topology NodeList nodeLst_gbseqtopology = doc.getElementsByTagName("GBSeq_topology"); Element fstNmElmnt_gbseqtopology = (Element) nodeLst_gbseqtopology.item(0); NodeList fstNm_gbseqtopology = fstNmElmnt_gbseqtopology.getChildNodes(); out.println(" Document GBSeq_topology : " + ((Node)fstNm_gbseqtopology.item(0)).getNodeValue() + " "); //GBSeq_topology code ends // get the value of the node GBSeq_division NodeList nodeLst_gbseqdivision = doc.getElementsByTagName("GBSeq_division"); Element fstNmElmnt_gbseqdivision = (Element) nodeLst_gbseqdivision.item(0); NodeList fstNm_gbseqdivision = fstNmElmnt_gbseqdivision.getChildNodes(); out.println(" Document GBSeq_division : " + ((Node)fstNm_gbseqdivision.item(0)).getNodeValue() + " "); //GBSeq_division code ends // get the value of the node GBSeq_update-date NodeList nodeLst_gbsequpdatedate = doc.getElementsByTagName("GBSeq_update-date"); Element fstNmElmnt_gbsequpdatedate = (Element) nodeLst_gbsequpdatedate.item(0); NodeList fstNm_gbsequpdatedate = fstNmElmnt_gbsequpdatedate.getChildNodes(); out.println(" Document GBSeq_updatedate : " + ((Node)fstNm_gbsequpdatedate.item(0)).getNodeValue() + " "); //GBSeq_update-date code ends // get the value of the node GBSeq_create-date NodeList nodeLst_gbseqcreatedate = doc.getElementsByTagName("GBSeq_create-date"); Element fstNmElmnt_gbseqcreatedate = (Element) nodeLst_gbseqcreatedate.item(0); NodeList fstNm_gbseqcreatedate = fstNmElmnt_gbseqcreatedate.getChildNodes(); out.println(" Document GBSeq_createdate : " + ((Node)fstNm_gbseqcreatedate.item(0)).getNodeValue() + " "); //GBSeq_create-date code ends // get the value of the node GBSeq_primary-accession NodeList nodeLst_gbseqpriacc = doc.getElementsByTagName("GBSeq_primary-accession"); Element fstNmElmnt_gbseqpriacc = (Element) nodeLst_gbseqpriacc.item(0); NodeList fstNm_gbseqpriacc = fstNmElmnt_gbseqpriacc.getChildNodes(); out.println("Document GBSeq_primary-accession : " + ((Node)fstNm_gbseqpriacc.item(0)).getNodeValue() + " "); //GBSeq_primary-accession ends.. // get the value of the node GBSeq_accession-version NodeList nodeLst_gbseqpriaccver = doc.getElementsByTagName("GBSeq_primary-accession"); Element fstNmElmnt_gbseqpriaccver = (Element) nodeLst_gbseqpriaccver.item(0); NodeList fstNm_gbseqpriaccver = fstNmElmnt_gbseqpriaccver.getChildNodes(); out.println(" Document GBSeq_primary-accession version: " + ((Node)fstNm_gbseqpriaccver.item(0)).getNodeValue() + " "); //GBSeq_accession-version ends.. // get the value of the node GBSeq_source NodeList nodeLst_gbseqsource = doc.getElementsByTagName("GBSeq_source"); Element fstNmElmnt_gbseqsource = (Element) nodeLst_gbseqsource.item(0); NodeList fstNm_gbseqsource = fstNmElmnt_gbseqsource.getChildNodes(); out.println(" Document GBSeq_source: " + ((Node)fstNm_gbseqsource.item(0)).getNodeValue() + " "); //GBSeq_source ends. // get the value of the node GBSeq_organism NodeList nodeLst_gbseqorg = doc.getElementsByTagName("GBSeq_organism"); Element fstNmElmnt_gbseqorg = (Element) nodeLst_gbseqorg.item(0); NodeList fstNm_gbseqorg = fstNmElmnt_gbseqorg.getChildNodes(); out.println(" Document GBSeq_organism: " + ((Node)fstNm_gbseqorg.item(0)).getNodeValue() + " "); //GBSeq_organism ends // get the value of the node GBSeq_taxonomy NodeList nodeLst_gbseqtax = doc.getElementsByTagName("GBSeq_taxonomy"); Element fstNmElmnt_gbseqtax = (Element) nodeLst_gbseqtax.item(0); NodeList fstNm_gbseqtax = fstNmElmnt_gbseqtax.getChildNodes(); out.println(" Document GBSeq_taxonomy: " + ((Node)fstNm_gbseqtax.item(0)).getNodeValue() + " "); //GBSeq_taxonomy // get the value of the node GBSeq_references NodeList nodeLst_GbSeqRefs = doc.getElementsByTagName("GBSeq_references"); //out.println(" Node GBSeq_references ..Length="+nodeLst_GbSeqRefs.getLength() +" "); //out.println("Node GBSeq_references "); //loop through all the records of this tag - GBSeq_references. for (int s = 0; s < nodeLst_GbSeqRefs.getLength(); s++) { //get the value for the node item at position s. Node fstNode_GbSeqRefNode = nodeLst_GbSeqRefs.item(s); // out.println(" fst_Node_GbSeqRefNode first for loop "); if (fstNode_GbSeqRefNode.getNodeType() == Node.ELEMENT_NODE) //if condition for idlistnodes { //out.println(" second if condition "); //out.println("fstNode_GbSeqRefNode in first if condition "); //Element fstElmnt_gbref = (Element) fstNode; //get the node for GBReference. NodeList fstNmElmntLst_gbref = doc.getElementsByTagName("GBReference"); //String[] pubmedids = new String[fstNmElmntLst.getLength()]; //loop through all the Gbreference for (int h = 0; h < fstNmElmntLst_gbref.getLength(); h++) //gets all the gbreference tag nodes { //get the node of the item at position s Node fstNode_GbSeqRefNode2 = fstNmElmntLst_gbref.item(s); if (fstNode_GbSeqRefNode2.getNodeType() == Node.ELEMENT_NODE) //if condition -gets all the gbreference tag nodes { / out.println(" fst_Node_GbSeqRefNode2 second for loop "); // out.println(" fstNmElmntLst_gbref loop "); Element fstNmElmnt_gbref_ref = (Element) fstNode_GbSeqRefNode2; //get node GBReference_reference NodeList nodeLst_gbref_ref = fstNmElmnt_gbref_ref.getElementsByTagName("GBReference_reference"); Element nodeLst_gbref_refElement = (Element)nodeLst_gbref_ref.item(0); NodeList textnodeLst_gbref_ref = nodeLst_gbref_refElement.getChildNodes(); out.println(" Document GBReference_reference: " + ((Node)textnodeLst_gbref_ref.item(0)).getNodeValue() + " "); //code ends for GBReference_reference //get node GBReference_position NodeList nodeLst_gbref_pos = fstNmElmnt_gbref_ref.getElementsByTagName("GBReference_position"); Element nodeLst_gbref_posElement = (Element)nodeLst_gbref_pos.item(0); NodeList textnodeLst_gbref_pos = nodeLst_gbref_posElement.getChildNodes(); out.println(" Document GBReference_position: " + ((Node)textnodeLst_gbref_pos.item(0)).getNodeValue() + " "); //code ends for GBReference_reference //gbauth loop for authors fields starts //Element fstElmnt_gbref_auth = (Element) fstNodeauth; NodeList fstNmElmntLst_gbauth = fstNmElmnt_gbref_ref.getElementsByTagName("GBReference_authors"); for (int k = 0; k < fstNmElmntLst_gbauth.getLength(); k++) //gets all the //gbreference tagnodes { Node fstNode_GbSeqRefNode3 = fstNmElmntLst_gbauth.item(k); if (fstNode_GbSeqRefNode3.getNodeType() == Node.ELEMENT_NODE) { //out.println(" fst_Node_GbSeqRefNode3 authors second for loop "); //out.println(" fstNmElmntLst_gbauth loop "); Element fstNmElmnt_gbref_auth = (Element) fstNode_GbSeqRefNode3; //get node GBAuthor NodeList nodeLst_gbref_gbauth = fstNmElmnt_gbref_auth.getElementsByTagName("GBAuthor"); //loop for all records. //out.println(" No of gbAuthors =" + nodeLst_gbref_gbauth.getLength() + " "); for (int l = 0; l < nodeLst_gbref_gbauth.getLength(); l++) //gets all the / /gbreference tag nodes { out.println(" Author No=" + l + "=" +nodeLst_gbref_gbauth.item(l).getChildNodes().item(0).getNodeValue() + " "); }//code ends } }//gbauth loop ends //get node GBReference_title NodeList nodeLst_gbref_title = fstNmElmnt_gbref_ref.getElementsByTagName("GBReference_title"); Element nodeLst_gbref_titleElement = (Element)nodeLst_gbref_title.item(0); NodeList textnodeLst_gbref_title = nodeLst_gbref_titleElement.getChildNodes(); out.println(" Document GBReference_title: " + ((Node)textnodeLst_gbref_title.item(0)).getNodeValue() + " "); //code ends for GBReference_reference //get node GBReference_journal NodeList nodeLst_gbref_journal = fstNmElmnt_gbref_ref.getElementsByTagName("GBReference_journal"); Element nodeLst_gbref_journalElement = (Element)nodeLst_gbref_journal.item(0); NodeList textnodeLst_gbref_journal = nodeLst_gbref_journalElement.getChildNodes(); out.println(" Document GBReference_journal: " + ((Node)textnodeLst_gbref_journal.item(0)).getNodeValue() + " "); //code ends for GBReference_journal //out.println("Features - Location/Qualifiers"); //features code starts GBSeq_feature-table //Element fstElmnt_gbref = (Element) fstNode; NodeList fstNmElmntLst_gbfeattab = doc.getElementsByTagName("GBSeq_feature-table"); for (int z = 0; z < fstNmElmntLst_gbfeattab.getLength(); z++) //gets all the GBSeq_feature-tabletag nodes { Node fstNode_GbSeqRefNode7 = fstNmElmntLst_gbfeattab.item(z); if (fstNode_GbSeqRefNode7.getNodeType() == Node.ELEMENT_NODE) //if condition forGBSeq_feature-table { //out.println(" fst_Node_GbSeqRefNode7 features table second for loop "); //out.println(" fstNmElmntLst_gbfeattab loop "); Element fstNmElmnt_gbfeat = (Element) fstNode_GbSeqRefNode7; //get node GB feat tab NodeList nodeLst_gbfeat = fstNmElmnt_gbfeat.getElementsByTagName("GBFeature"); //loop for all records. //out.println(" No of GBFeature =" + nodeLst_gbfeat.getLength() + " "); for (int y = 0; y < nodeLst_gbfeat.getLength(); y++) //gets all the gbfeature tag nodes { Node fstNode_GbSeqRefNode8 = nodeLst_gbfeat.item(y); out.println(" GBFeature No=" + y + "=" +nodeLst_gbfeat.item(y).getChildNodes().item(0).getNodeValue() + " "); if (fstNode_GbSeqRefNode8.getNodeType() == Node.ELEMENT_NODE) { //out.println(" fst_Node_GbSeqRefNode8 features table second for loop "); Element fstNmElmnt_gbfeat2 = (Element) fstNode_GbSeqRefNode8; NodeList textnodeLst_gbfeatkey = fstNmElmnt_gbfeat2.getElementsByTagName("GBFeature_key"); out.println("Document GBFeature_key: " + ((Node)textnodeLst_gbfeatkey.item(0)).getChildNodes().item(0).getNodeValue() + " "); out.println(" "); //GBFeature_location NodeList textnodeLst_gbfeatloc = fstNmElmnt_gbfeat2.getElementsByTagName("GBFeature_location"); out.println("Document GBFeature_location: " + ((Node)textnodeLst_gbfeatloc.item(0)).getChildNodes().item(0).getNodeValue() + " "); //code ends out.println(" "); //GBFeature_intervals NodeList textnodeLst_gbfeatint = fstNmElmnt_gbfeat2.getElementsByTagName("GBFeature_intervals"); //gets all the gbreference tag nodes for (int x = 0; x < textnodeLst_gbfeatint.getLength(); x++) { Node fstNode_GbSeqRefNode9 = textnodeLst_gbfeatint.item(x); if (fstNode_GbSeqRefNode9.getNodeType() == Node.ELEMENT_NODE) { Element fstNmElmnt_gbfeatint2 = (Element) fstNode_GbSeqRefNode9; NodeList textnodeLst_gbintfrom = fstNmElmnt_gbfeatint2.getElementsByTagName("GBInterval_from"); out.println("Document GBFeature_Interval From: " + ((Node)textnodeLst_gbintfrom.item(0)).getChildNodes().item(0).getNodeValue() + " "); //GBInterval_to NodeList textnodeLst_gbintto = fstNmElmnt_gbfeatint2.getElementsByTagName("GBInterval_to"); out.println("Document GBFeature_Interval To: " + ((Node)textnodeLst_gbintto.item(0)).getChildNodes().item(0).getNodeValue() + " "); //GBInterval_accession //GBInterval_to NodeList textnodeLst_gbintAcc = fstNmElmnt_gbfeatint2.getElementsByTagName("GBInterval_accession"); out.println("Document GBFeature_ GBInterval_accession: " + ((Node)textnodeLst_gbintAcc.item(0)).getChildNodes().item(0).getNodeValue() + " "); } } out.println(" "); //GBFeature_intervals code ends //GBFeature_quals NodeList textnodeLst_gbfeatquals = fstNmElmnt_gbfeat2.getElementsByTagName("GBFeature_quals"); //out.println("Length="+textnodeLst_gbfeatquals.getLength() + " "); for ( int x = 0; x < textnodeLst_gbfeatquals.getLength(); x++) //gets all the gbreference tagnodes { Node fstNode_GbSeqRefNode10 = textnodeLst_gbfeatquals.item(x); if (fstNode_GbSeqRefNode10.getNodeType() == Node.ELEMENT_NODE) { Element fstNmElmnt_gbfeatint2 = (Element) fstNode_GbSeqRefNode10; //GBQualifier_name NodeList textnodeLst_gbintfrom = fstNmElmnt_gbfeatint2.getElementsByTagName("GBQualifier_name"); out.println("GBQualifier_name Length="+textnodeLst_gbintfrom.getLength() + " "); //GBQualifier_value NodeList textnodeLst_gbqualval = fstNmElmnt_gbfeatint2.getElementsByTagName("GBQualifier_value"); //out.println("GBQualifier_name Length="+textnodeLst_gbqualval.getLength() + " "); out.println(" "); //gets all the gbreference tag nodes for ( int d = 0; d < textnodeLst_gbintfrom.getLength(); d++) { out.println("DocumentGBSeqRefNode10: " + ((Node) textnodeLst_gbintfrom.item(x)).getChildNodes().item(d).getNodeValue() + " "); out.println("GBQualifier_name No= " + d + "=" +textnodeLst_gbintfrom.item(d).getChildNodes().item(0).getNodeValue() + " "); out.println("GBQualifier_value No= " + d + "=" +textnodeLst_gbqualval.item(d).getChildNodes().item(0).getNodeValue() + " "); } } } //GBFeature_quals code ends } }//gbfeature tag nodes code ends }//if condition for GBSeq_feature-table ends here.. }//GBSeq_feature-table code ends here@ for loop }////if condition -gets all the gbreference tag nodes ends }////condition -gets all the gbreference tag nodes for ends //NodeList fstNmElmntLst_gbsequence = doc.getElementsByTagName("GBSeq_sequence"); NodeList nodeLst_gbsequence = doc.getElementsByTagName("GBSeq_sequence"); Element fstNmElmnt_gbsequence = (Element) nodeLst_gbsequence.item(0); NodeList fstNm_gbsequence = fstNmElmnt_gbsequence.getChildNodes(); out.println(" Document GBsequence : " + ((Node)fstNm_gbsequence.item(0)).getNodeValue() + " "); out.println(" "); //GBSeq_sequence code goes here. }////if condition for idlist nodes }//for ends@idlist nodes . //GBSeq_references ends } catch (Exception e) { e.printStackTrace(); } finally {} %>

16 / 16

How to Write a Java Program for Bioinformatics ... · Java Server Faces Technology at In order to...

Documents

Transcript of How to Write a Java Program for Bioinformatics ... · Java Server Faces Technology at In order to...