Kddcup Analysis

download Kddcup Analysis

of 17

description

This document contain the classification procedure of kdd cup 99 data used in intrusion detection competition. The classification is done using dato graphlab python package

Transcript of Kddcup Analysis

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 1/17

    In[1]:

    In[2]:

    [INFO]ThisnoncommerciallicenseofGraphLabCreateisassignedtobhanusharma.a3@gmail.comandwillexpireonFebruary16,2017.Forcommerciallicensingoptions,visithttps://dato.com/buy/.(https://dato.com/buy/.)

    [INFO]Startserverat:ipc:///tmp/graphlab_server9792Serverbinary:C:\Users\dshar\Anaconda2\envs\datoenv\lib\sitepackages\graphlab\unity_server.exeServerlog:C:\Users\dshar\AppData\Local\Temp\graphlab_server_1456062829.log.0[INFO]GraphLabServerVersion:1.8.1

    PROGRESS:FinishedparsingfileC:\Users\dshar\kddcup.csvPROGRESS:Parsingcompleted.Parsed100linesin0.962981secs.Inferredtypesfromfirstlineoffileascolumn_type_hints=[long,str,str,str,long,long,long,long,long,long,long,long,long,long,long,long,long,long,long,long,long,long,long,long,float,float,float,float,float,float,float,long,long,float,float,float,float,float,float,float,float,str]Ifparsingfailsduetoincorrecttypes,youcancorrecttheinferredtypelistaboveandpassittoread_csvinthecolumn_type_hintsargumentPROGRESS:Read345187lines.Linespersecond:285583PROGRESS:FinishedparsingfileC:\Users\dshar\kddcup.csvPROGRESS:Parsingcompleted.Parsed494020linesin1.48128secs.

    importgraphlab

    kdd=graphlab.SFrame('kddcup.csv')

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 2/17

    In[8]:

    Mostfrequentitemsfrom

    Value Count Percent

    smurf. 280,790 56.838%

    neptune. 107,201 21.7%

    normal. 97,277 19.691%

    back. 2,203 0.446%

    satan. 1,589 0.322%

    ipsweep. 1,247 0.252%

    portsweep. 1,040 0.211%

    warezclient. 1,020 0.206%

    teardrop. 979 0.198%

    pod. 264 0.053%

    nmap. 231 0.047%

    guess_passwd. 53 0.011%

    buffer_overflow. 30 0.006%

    land. 21 0.004%

    warezmaster. 20 0.004%

    imap. 12 0.002%

    rootkit. 10 0.002%

    loadmodule. 9 0.002%

    ftp_write. 8 0.002%

    multihop. 7 0.001%

    phf. 4 8.097e4%

    perl. 3 6.073e4%

    spy. 2 4.048e4%

    graphlab.canvas.set_target('ipynb')kdd['normal.'].show(view='Categorical')

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 3/17

    In[9]:

    Out[9]:

    kdd.head()

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 4/17

    0 tcp http SF 181 5450 0.1 0.2 0.3 0.4 0.5

    0 tcp http SF 239 486 0 0 0 0 0

    0 tcp http SF 235 1337 0 0 0 0 0

    0 tcp http SF 219 1337 0 0 0 0 0

    0 tcp http SF 217 2032 0 0 0 0 0

    0 tcp http SF 217 2032 0 0 0 0 0

    0 tcp http SF 212 1940 0 0 0 0 0

    0 tcp http SF 159 4087 0 0 0 0 0

    0 tcp http SF 210 151 0 0 0 0 0

    0 tcp http SF 212 786 0 0 0 1 0

    0 tcp http SF 210 624 0 0 0 0 0

    0.15 0.16 8 8.1 0.00 0.00.1 0.00.2 0.00.3 1.00

    0 0 8 8 0.0 0.0 0.0 0.0 1.0

    0 0 8 8 0.0 0.0 0.0 0.0 1.0

    0 0 6 6 0.0 0.0 0.0 0.0 1.0

    0 0 6 6 0.0 0.0 0.0 0.0 1.0

    0 0 6 6 0.0 0.0 0.0 0.0 1.0

    0 0 1 2 0.0 0.0 0.0 0.0 1.0

    0 0 5 5 0.0 0.0 0.0 0.0 1.0

    0 0 8 8 0.0 0.0 0.0 0.0 1.0

    0 0 8 8 0.0 0.0 0.0 0.0 1.0

    0 0 18 18 0.0 0.0 0.0 0.0 1.0

    0.00.7 0.00.8 0.00.9 0.00.10 ...

    0.0 0.0 0.0 0.0 ...

    0.0 0.0 0.0 0.0 ...

    0.0 0.0 0.0 0.0 ...

    0.0 0.0 0.0 0.0 ...

    0.0 0.0 0.0 0.0 ...

    0.04 0.0 0.0 0.0 ...

    0.04 0.0 0.0 0.0 ...

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 5/17

    In[10]:

    In[41]:

    0.04 0.0 0.0 0.0 ...

    0.05 0.0 0.0 0.0 ...

    SF 181 5450str660

    dtype: strnum_unique(est.): 11num_undefined: 0

    frequentitems:

    dtype: intnum_unique(est.): 3,298

    num_undefined: 0min: 0max: 6.934e+8median: 520mean: 3,025.616std: 988,218.101

    distributionofvalues:

    dtype:num_unique(est.):num_undefined:min:max:median:mean:std:

    distributionofvalues

    SFS0REJRSTRRSTOSHS1S2RSTOS0S3OTH

    kdd.show()

    d={'0':'duration','tcp':'protocol_type1','SF':'flag','181':'src_bytes','5450':'1':'logged_in','0.6':'num_compromised','0.7':'root_shell','0.10':'num_file_creations''0.12':'num_shells','0.13':'num_access_files','0.14':'num_outbound_cmds','0.15''0.00.1':'srv_serror_rate','0.00.2':'rerror_rate','0.00.3':'srv_rerror_rate','1.00''9.1':'dst_host_srv_count','1.00.1':'dst_host_same_srv_rate','0.00.6':'dst_host_diff_srv_rate''0.00.8':'dst_host_serror_rate','0.00.9':'dst_host_srv_serror_rate','normal.':'response'

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 6/17

    In[15]:

    Out[15]:

    {'0':'duration','0.00':'serror_rate','0.00.1':'srv_serror_rate','0.00.10':'dst_host_rerror_rate','0.00.11':'dst_host_srv_rerror_rate','0.00.2':'rerror_rate','0.00.3':'srv_rerror_rate','0.00.4':'diff_srv_rate','0.00.5':'srv_diff_host_rate','0.00.6':'dst_host_diff_srv_rate','0.00.7':'dst_host_srv_diff_host_rate','0.00.8':'dst_host_serror_rate','0.00.9':'dst_host_srv_serror_rate','0.1':'land','0.10':'num_file_creations','0.11':'dst_host_same_src_port_rate','0.12':'num_shells','0.13':'num_access_files','0.14':'num_outbound_cmds','0.15':'is_host_login','0.16':'is_guest_login','0.2':'wrong_fragment','0.3':'urgent','0.4':'hot','0.5':'num_failed_logins','0.6':'num_compromised','0.7':'root_shell','0.8':'su_attempted','0.9':'num_root','1':'logged_in','1.00':'same_srv_rate','1.00.1':'dst_host_same_srv_rate','181':'src_bytes','5450':'dst_bytes','8':'count','8.1':'srv_count','9':'dst_host_count','9.1':'dst_host_srv_count','SF':'flag','http':'protocol_type2','normal.':'response','tcp':'protocol_type1'}

    d

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 7/17

    In[42]:

    Out[42]:

    kdd.rename(d)

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 8/17

    0.0 1.0 0.0 0.0

    0.0 1.0 0.0 0.0

    0.0 1.0 0.0 0.0

    0.0 1.0 0.0 0.0

    0.0 1.0 0.0 1.0

    0.0 1.0 0.0 0.0

    0.0 1.0 0.0 0.0

    0.0 1.0 0.0 0.0

    0.0 1.0 0.0 0.0

    dst_host_diff_srv_rate dst_host_same_src_port_rate...dst_host_srv_diff_host_ra

    te...

    0.0 0.05 0.0

    0.0 0.03 0.0

    0.0 0.03 0.0

    0.0 0.02 0.0

    0.0 0.02 0.0

    0.0 1.0 0.04

    0.0 0.09 0.04

    0.0 0.12 0.04

    0.0 0.12 0.05

    0.0 0.06 0.05

    dst_host_rerror_rate ...

    0.0 ...

    0.0 ...

    0.0 ...

    0.0 ...

    0.0 ...

    0.0 ...

    0.0 ...

    0.0 ...

    0.0 ...

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 9/17

    In[43]:

    In[44]:

    0.0 ...

    duration protocol_type1 protocol_typedtype: intnum_unique(est.): 2,501num_undefined: 0min: 0max: 58,329median: 0mean: 47.979std: 707.746

    distributionofvalues:

    dtype: strnum_unique(est.): 3num_undefined: 0

    frequentitems:

    dtype:num_unique(est.):num_undefined:

    frequentitems:

    icmptcpudp

    ecr_iprivatehttpsmtpotherdomain_uftp_dataeco_iftpfingerurp_itelnet

    kdd.show()

    train_data,test_data=kdd.random_split(.8,seed=0)

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 10/17

    In[45]:

    Out[45]:

    train_data.head()

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 11/17

    0 tcp http SF 235 1337

    0 tcp http SF 219 1337

    0 tcp http SF 217 2032

    0 tcp http SF 217 2032

    0 tcp http SF 212 1940

    0 tcp http SF 159 4087

    0 tcp http SF 210 151

    0 tcp http SF 212 786

    0 tcp http SF 210 624

    logged_in num_compromised root_shell su_attempted num_root

    1 0 0 0 0

    1 0 0 0 0

    1 0 0 0 0

    1 0 0 0 0

    1 0 0 0 0

    1 0 0 0 0

    1 0 0 0 0

    1 0 0 0 0

    1 0 0 0 0

    1 0 0 0 0

    num_outbound_cmds is_host_login is_guest_login count srv_count

    0 0 0 8

    0 0 0 8

    0 0 0 6

    0 0 0 6

    0 0 0 6

    0 0 0 1

    0 0 0 5

    0 0 0 8

    0 0 0 8

    0 0 0 18

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 12/17

    Creatingmodel

    0 0 0 18

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 13/17

    In[48]:

    kdd_model=graphlab.classifier.create(train_data,target='response',features=['protocol_type'

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 14/17

    PROGRESS:Creatingavalidationsetfrom5percentoftrainingdata.Thismaytakeawhile.Youcanset``validation_set=None``todisablevalidationtracking.

    PROGRESS:Thefollowingmethodsareavailableforthistypeofproblem.PROGRESS:BoostedTreesClassifier,RandomForestClassifier,LogisticClassifierPROGRESS:Thereturnedmodelwillbechosenaccordingtovalidationaccuracy.PROGRESS:Boostedtreesclassifier:PROGRESS:PROGRESS:Numberofexamples:375394PROGRESS:Numberofclasses:23PROGRESS:Numberoffeaturecolumns:2PROGRESS:Numberofunpackedfeatures:2PROGRESS:+++++PROGRESS:|Iteration|ElapsedTime|Trainingaccuracy|Validationaccuracy|PROGRESS:+++++PROGRESS:|1|2.961036|0.981233|0.981515|PROGRESS:|2|4.742356|0.982621|0.983081|PROGRESS:|3|6.585209|0.982621|0.983081|PROGRESS:|4|8.430268|0.982621|0.983081|PROGRESS:|5|10.256491|0.982621|0.983081|PROGRESS:|6|12.089232|0.982621|0.983081|PROGRESS:|7|13.987530|0.982621|0.983081|PROGRESS:|8|15.854709|0.982621|0.983081|PROGRESS:|9|17.797652|0.982696|0.983283|PROGRESS:|10|19.734425|0.982696|0.983283|PROGRESS:+++++PROGRESS:Randomforestclassifier:PROGRESS:PROGRESS:Numberofexamples:375394PROGRESS:Numberofclasses:23PROGRESS:Numberoffeaturecolumns:2PROGRESS:Numberofunpackedfeatures:2PROGRESS:+++++PROGRESS:|Iteration|ElapsedTime|Trainingaccuracy|Validationaccuracy|PROGRESS:+++++PROGRESS:|1|2.013409|0.973793|0.974949

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 15/17

    |PROGRESS:|2|3.754968|0.974483|0.975556|PROGRESS:|3|5.468433|0.974765|0.97596|PROGRESS:|4|7.223957|0.98208|0.982727|PROGRESS:|5|8.930002|0.981795|0.982071|PROGRESS:|6|10.593517|0.981795|0.982071|PROGRESS:|7|12.395478|0.981808|0.982273|PROGRESS:|8|14.178238|0.981811|0.982273|PROGRESS:|9|15.930915|0.981808|0.982273|PROGRESS:|10|17.570304|0.981808|0.982273|PROGRESS:+++++PROGRESS:Logisticregression:PROGRESS:PROGRESS:Numberofexamples:375394PROGRESS:Numberofclasses:23PROGRESS:Numberoffeaturecolumns:2PROGRESS:Numberofunpackedfeatures:2PROGRESS:Numberofcoefficients:1496PROGRESS:StartingLBFGSPROGRESS:PROGRESS:+++++++PROGRESS:|Iteration|Passes|Stepsize|ElapsedTime|Trainingaccuracy|Validationaccuracy|PROGRESS:+++++++PROGRESS:|1|4|0.000013|0.922823|0.851716|0.851010|PROGRESS:|2|6|1.000000|1.834471|0.855115|0.854343|PROGRESS:|3|7|1.000000|2.363347|0.980221|0.980909|PROGRESS:|4|8|1.000000|2.924757|0.976286|0.977071|PROGRESS:|5|9|1.000000|3.594237|0.982440|0.983030|PROGRESS:|6|10|1.000000|4.106106|0.982131|0.982929|PROGRESS:|10|16|1.000000|6.497316|0.981185|0.981869|PROGRESS:+++++++PROGRESS:TERMINATED:Iterationlimitreached.PROGRESS:Thismodelmaynotbeoptimal.Toimproveit,considerincreasing`max_iterations`.PROGRESS:Modelselectionbasedonvalidationaccuracy:PROGRESS:

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 16/17

    In[59]:

    In[66]:

    In[67]:

    PROGRESS:BoostedTreesClassifier:0.983282828283PROGRESS:RandomForestClassifier:0.982272727273PROGRESS:LogisticClassifier:0.981869PROGRESS:PROGRESS:SelectingBoostedTreesClassifierbasedonvalidationsetperformance.

    dict_values([Columns: target_label str predicted_labelstr count int

    Rows:41

    Data:++++|target_label|predicted_label|count|++++|back.|normal.|413||normal.|normal.|19083||warezclient.|satan.|1||smurf.|smurf.|56142||normal.|smurf.|77||guess_passwd.|neptune.|8||normal.|neptune.|76||portsweep.|neptune.|131||satan.|neptune.|34||portsweep.|satan.|42|++++[41rowsx3columns]Note:OnlytheheadoftheSFrameisprinted.Youcanuseprint_rows(num_rows=m,num_columns=n)toprintmorerowsandcolumns.])

    NameErrorTraceback(mostrecentcalllast)in()>1print_rows(num_rows=41,num_columns=3).con

    NameError:name'print_rows'isnotdefined

    con=kdd_model.evaluate(test_data,metric='confusion_matrix')

    printcon.viewvalues()

    print_rows(num_rows=41,num_columns=3).con

  • 2/21/2016 KddcupAnalysis

    http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 17/17

    In[74]:

    In[75]:

    In[]:

    TypeErrorTraceback(mostrecentcalllast)in()>1graphlab.SFrame.print_rows(num_rows=41,num_columns=3)

    TypeError:unboundmethodprint_rows()mustbecalledwithSFrameinstanceasfirstargument(gotnothinginstead)

    AttributeErrorTraceback(mostrecentcalllast)in()>1con.show()

    AttributeError:'dict'objecthasnoattribute'show'

    graphlab.SFrame.print_rows(num_rows=41,num_columns=3)

    con.show()