EdoRoth.max Sigalov.poster

8/4/2019 EdoRoth.max Sigalov.poster

1/1

Edo Roth and Maxim Sigalov under the supervision of Prof. Rohit Bhargava and Dr. David Mayerich

University Laboratory High School and the Department of Bioengineering, College of Engineering, University of Illinois at Urbana-Champaign

Aim

Currently, a vast number of frequencies are used inthis system of spectroscopy. Our purpose in thisproject is to reduce the number of frequencies used inclassification, but still maintain a high classificationaccuracy. In fact, by reducing the number of datapoints used by the imaging system, the imagingspeed is increased, the data takes up less space, andnoise can be reduced, improving accuracy.

Also, the data that is generated needs to beeffectively classified. As there is no intuitive way toclassify tissue types from infrared images, somesystem must be used. This system needs to beefficient, accurate, and versatile (so that it performswell on a variety of data sets).

Finally, we must have a method that determines theaccuracy of the tissue classifier. Without it, there isno way to judge how accurate our predictions are,and we cannot isolate the correct frequencies.

Introduction

Presently, tissue classification, especially in cancer

tissue, can be a complicated process. Often,chemical tests are used for this classification, but theuse of spectroscopy provides information about thetissue chemistry, allowing for more accurateclassification.

Most current imaging techniques utilize visible light,with a red, green, and a blue value associated witheach pixel in the image. However, in Professor RohitBhargavas lab, a large number of values across the

infrared spectrum are taken at each point. This allowsfor much more accurate information for classifyingtissue.

However, infrared imaging often results in data that iscomplex and takes up large amounts of space.Also, the method of determining tissue types from thespectrum is not obvious, and noise in the spectrumcan lead to misclassification.

Method

Our project involved heavy use of MatLab, a high

level programming language that works especiallywell with matrices. We use MatLab in nearly everyfacet of our work, from creating the classifier tomeasuring its accuracy.

To classify spectra into tissue types, we primarily useneural networks. A neural network is an algorithm thatclassifies data matrices into result matrices. H owever,one does not write the algorithm, as it generates itselffrom sample data and target matrices. The algorithmreturns a matrix with its predictions for each point ofthe sample.

To then determine the accuracy of the neural

network, several techniques can be utilized. The mostobvious is total accuracy, which simply computeswhen the classifier predicts the correct type of tissue.A method of gauging accuracy that is more commonlyused in medicine, however, is the use of ReceiverOperating Characteristic (ROC) curves. ROC curvescompare the sensitivity and specificity of data sets,and give graphical representations of their accuracy.When the area under the ROC curve is highest, theclassifier is most accurate.

We created a program that creates an ROC curvegiven resulting data from a neural network. Theexample on the right shows two mock data sets ofvarying difficulty, how they were classified by theneural network, and their total accuracy. Each coloredcurve is an indicator of how well the classifier

performed for that color of data.

Results

Using neural networks, we have developed a program

that classifies epithelium and stroma tissue toapproximately 95% accuracy. This result should holdfor any tissue sample, given the neural networkprogram we created, which trains itself and can thenapply itself to any data set.

While the I-STEM period is over, we still have muchwork to do to finish our project. The next step is tolook at another classification method, known asregression trees. Using this method, we will utilize theRandom Forest algorithm to isolate certainfrequencies which are most efficient for classifyingtissue. Use of this algorithm will likely allow us torepeat these accurate results, but will also allow for afaster and more efficient classification. Reflections

Working on this project thus far has been an amazing

experience for us. Although it has been challenging, ithas given us valuable tools that we can use in thefuture. An important skill that we both learned was theability to use MatLab. The language is very powerful,and through continued use of the software, we will beable to utilize it in many ways. Learning to create andevaluate neural networks was also an enjoyable skillto pick up. Most of all, this project allowed us to gainexperience in a professional academic environment,and share ideas with prominent scientists in the fieldof bioengineering. We both hope to continue workingon this project, and hope that we can help achievefurther positive results.

Acknowledgments

N

N

S0

S1

S2

S3

S4

S5

S6

N

E

S

Evaluating and Improving Classifiers for Tissue Samples Imaged

Using Mid-Infrared Spectroscopy

The figure below shows a neural network mapping spectrato a classification of either epithelium(E) or stroma(S).Stained Image IR Image Classified Image

Stroma Epithelium

The images below depict a slide of tissue in various

stages of the classification process.

This demonstrates acurrent method ofspectroscopy, knownas FT-IR (Fourier

transform infrared)spectroscopy

EdoRoth.max Sigalov.poster

Documents

Transcript of EdoRoth.max Sigalov.poster