cancer detection - formal version.pptx

download cancer detection -  formal version.pptx

of 38

Transcript of cancer detection - formal version.pptx

  • 7/26/2019 cancer detection - formal version.pptx

    1/38

    Cancerdetection

    By : gierminsahagun

    11337710

  • 7/26/2019 cancer detection - formal version.pptx

    2/38

    This exampledemonstrates using a

    neural network todetect cancer from

    mass spectrometrydata on protien

    proles.

  • 7/26/2019 cancer detection - formal version.pptx

    3/38

    What is cancer ?

    Is a term used fordiseases in whichabnormal cells divide

    without control andare able to invadeother tissues.

    Cancer cells can

    spread to other partsof the body throughthe bloodand lymph systems

  • 7/26/2019 cancer detection - formal version.pptx

    4/38

  • 7/26/2019 cancer detection - formal version.pptx

    5/38

    "andom facts aboutcancer

    &he ma'ority of cancer sur(i(ors )#"$* werediagnosed + or more years ago.

    ,early half )"#$* of cancer sur(i(ors are 70years of age or older

    &obacco use is the cause of about 22$ ofcancer deaths.

    Another 10$ is due to obesity- a %oor diet- lac

    of %hysical acti(ity- and drining alcohol. /ther factors include certain infections-

    e%osure to ioniing radiation- anden(ironmental %ollutants.

  • 7/26/2019 cancer detection - formal version.pptx

    6/38

    Introductio

    n

    #erum proteomic pattern diagnosticscan be used to di$erentiate samplesfrom patients with and without

    disease. %role patterns aregenerated using surface&enhancedlaser desorption and ioni'ation(#)*!I+ protein mass spectrometry.

    This technology has the potential toimprove clinical diagnostics tests forcancer pathologies.

  • 7/26/2019 cancer detection - formal version.pptx

    7/38

    The %roblem, Cancer !etection

    &he goal is to build a classier that candistinguish between cancer and control%atients from the mass s%ectrometry data.

    &he methodology followed in this eam%leis to select a reduced set of measurementsor features that can be used todistinguish between cancer and control

    %atients using a classier. &hese features will be ion intensity le(els at

    s%ecic mass4charge (alues.

  • 7/26/2019 cancer detection - formal version.pptx

    8/38

    -ormatti

    ng the

    !ata

  • 7/26/2019 cancer detection - formal version.pptx

    9/38

    &o recreate the datain ovariandataset.matused in this

    eam%le- download and uncom%ress theraw mass5s%ectrometry data from the 6A5,C web site. Create the datale /varianCancer010Cdataset.matby

    either running scri%t msse2processinginBioinformatics &oolbo )&8* or by followingthe ste%s in theeam%lebiodistcompdemo)Batch%rocessing with %arallel com%uting*. &henew le contains (ariables 9- 8 and gr%.

  • 7/26/2019 cancer detection - formal version.pptx

    10/38

    )ach column in 3 representsmeasurements taken from apatient. There

    are 456 columnsin 3 representing 456patients7 out of which 545 are

    ovarian cancer patientsand 89 are normal patients.

  • 7/26/2019 cancer detection - formal version.pptx

    11/38

    )ach row in 3 represents the

    ion intensity level at a specicmass&charge value indicatedin :;. There are 59

  • 7/26/2019 cancer detection - formal version.pptx

    12/38

    The variable grp holds theindex information as to

    which of these samplesrepresent cancer patientsand which ones represent

    normal patients.

  • 7/26/2019 cancer detection - formal version.pptx

    13/38

    "anking

    =ey

    -eatures

  • 7/26/2019 cancer detection - formal version.pptx

    14/38

    "anking =ey-eatures

    &his is a ty%ical classication%roblem in which the number offeatures is much larger than the

    number of obser(ations- but in whichno single feature achie(es a correctclassication- therefore we need to

    nd a classier which a%%ro%riatelylearns how to weight multi%lefeatures and at the same time

    %roduce a generalied ma%%ing5

  • 7/26/2019 cancer detection - formal version.pptx

    15/38

    A sim%le a%%roach for ndingsignicant features is to assume

    that each 84 (alue isinde%endent and com%ute a two5way t5test. rankfeaturesreturnsan inde to the most signicant84 (alues- for instance 100

    indices raned by the absolute(alue of the test statistic.

  • 7/26/2019 cancer detection - formal version.pptx

    16/38

    &o nish recreating the datafrom ovariandataset.mat- loadthe /varianCancer010Cdataset.mat

    andrankfeaturesfrom Bioinformatics&oolbo to choose 100 highest ranedmeasurements as in%uts .

    ind ;ranfeatures)9-gr%-=/,

  • 7/26/2019 cancer detection - formal version.pptx

    17/38

    The preprocessing steps from the

    script and example listed above areintended to demonstrate a

    representative set of possible pre&

    processing and feature selectionprocedures. >sing di$erent steps orparameters may lead to di$erentand possibly improved results of

    this example.

  • 7/26/2019 cancer detection - formal version.pptx

    18/38

  • 7/26/2019 cancer detection - formal version.pptx

    19/38

    Classication >sing a

    -eed -orward euraletwork

    ,ow that you ha(e identied some

    signicant features- you can use thisinformation to classify the cancerand normal sam%les.

  • 7/26/2019 cancer detection - formal version.pptx

    20/38

    setdemorandstream)#72!!0+1*

    Dince the neural networ is initialiedwith random initial weights- the

    results after training the networ(ary slightly e(ery time the eam%leis run. &o a(oid this randomness- the

    random seed is set to re%roduce thesame results e(ery time. Eowe(erthis is not necessary for your owna%%lications.

  • 7/26/2019 cancer detection - formal version.pptx

    21/38

    A 15hidden layer feed forward neuralnetwor with + hidden layer neurons is

    created and trained. &he in%ut and target sam%les are

    automatically di(ided into training-(alidation and test sets. &he training set isused to teach the networ.

    &raining continues as long as the networcontinues im%ro(ing on the (alidation set.

    &he test set %ro(ides a com%letelyinde%endent measure of networ accuracy.

  • 7/26/2019 cancer detection - formal version.pptx

    22/38

    net @ patternnet(9+A

    view(net+

    The input and output have si'esof < because the network has notyet been congured to match our

    input and target data. This willhappen when the network istrained.

  • 7/26/2019 cancer detection - formal version.pptx

    23/38

  • 7/26/2019 cancer detection - formal version.pptx

    24/38

    ow the network is ready to be trained.The samples are automatically divided intotraining7 validation and test sets.

    The training set is used to teach thenetwork. Training continues as long as thenetwork continues improving on thevalidation set.

    The test set provides a completelyindependent measure of network accuracy.

    The Training Tool shows the networkbeing trained and the algorithms used totrain it.

    It also displays the training state duringtraining and the criteria which stoppedtraining will be highlighted in green.

  • 7/26/2019 cancer detection - formal version.pptx

    25/38

    net-tr ; train)net--t*@

    The buttons at the bottom openuseful plots which can be openedduring and after training. *inks

    next to the algorithm names andplot buttons open

    documentation on thosesubBects.

  • 7/26/2019 cancer detection - formal version.pptx

    26/38

  • 7/26/2019 cancer detection - formal version.pptx

    27/38

  • 7/26/2019 cancer detection - formal version.pptx

    28/38

    &he trained neural networ can

    now be tested with the testingsam%les we %artitioned from themain dataset.

    &he testing data was not used intraining in any way and hence%ro(ides an out5of5sam%le

    dataset to test the networ on.&his will gi(e us a sense of how

    well the networ will do whentested with data from the real

  • 7/26/2019 cancer detection - formal version.pptx

    29/38

    testI ; ):-tr.testnd*@ test& ;

    t):-tr.testnd*@ test9 ; net)testI*@testClasses ; test9 J 0.+

    The network outputs will be in

    the range < to 57 so wethreshold them to get 5s and

  • 7/26/2019 cancer detection - formal version.pptx

    30/38

  • 7/26/2019 cancer detection - formal version.pptx

    31/38

    /ne measure of how well the neural networ has t thedata is the confusion %lot. Eere the confusion matri is%lotted across all sam%les.

    &he confusion matri shows the %ercentages of correct andincorrect classications. Correct classications are thegreen sHuares on the matrices diagonal.

    ncorrect classications form the red sHuares.

    f the networ has learned to classify %ro%erly- the%ercentages in the red sHuares should be (ery small-indicating few misclassications.

    f this is not the case then further training- or training anetwor with more hidden neurons- would be ad(isable.

    %lotconfusion)test&-test9*

  • 7/26/2019 cancer detection - formal version.pptx

    32/38

  • 7/26/2019 cancer detection - formal version.pptx

    33/38

  • 7/26/2019 cancer detection - formal version.pptx

    34/38

    Another measure of how well the neural

    network has fit data is the receiver operating

    characteristic plot. This shows how the false positive and true

    positive rates relate as the thresholding of

    outputs is varied from 0 to 1. The farther left and up the line is, the fewer

    false positives need to be accepted in order to

    get a high true positive rate.

    The best classifiers will have a line going

    from the bottom left corner, to the top left

    corner, to the top right corner, or close to that.

  • 7/26/2019 cancer detection - formal version.pptx

    35/38

    Class 1 indicate cancer %atiencts-

    class 2 normal %atients.

    plotroc(testT7test3+

  • 7/26/2019 cancer detection - formal version.pptx

    36/38

  • 7/26/2019 cancer detection - formal version.pptx

    37/38

    This example illustrated how

    neural networks can be usedas classiers for cancerdetection.

    /ne can also experimentusing techni2ues likeprincipal component analysisto reduce the dimensionalityof the data to be used forbuilding neural networks to

  • 7/26/2019 cancer detection - formal version.pptx

    38/38

    "eferences

    1 &.F. Conrads- et al.- Eigh5resolutionserum %roteomic features for o(ariandetection- >ndocrine5=elated Cancer- 11-

    200"- %%. 1#3517!. 2 >.6. Fetricoin- et al.- ?se of %roteomic

    %atterns in serum to identify o(ariancancer- Gancet- 3+)30#*- 2002- %%. +725

    +77.