Kddcup Analysis
-
Upload
bhanusharmaa3 -
Category
Documents
-
view
230 -
download
0
description
Transcript of Kddcup Analysis
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 1/17
In[1]:
In[2]:
[INFO]ThisnoncommerciallicenseofGraphLabCreateisassignedtobhanusharma.a3@gmail.comandwillexpireonFebruary16,2017.Forcommerciallicensingoptions,visithttps://dato.com/buy/.(https://dato.com/buy/.)
[INFO]Startserverat:ipc:///tmp/graphlab_server9792Serverbinary:C:\Users\dshar\Anaconda2\envs\datoenv\lib\sitepackages\graphlab\unity_server.exeServerlog:C:\Users\dshar\AppData\Local\Temp\graphlab_server_1456062829.log.0[INFO]GraphLabServerVersion:1.8.1
PROGRESS:FinishedparsingfileC:\Users\dshar\kddcup.csvPROGRESS:Parsingcompleted.Parsed100linesin0.962981secs.Inferredtypesfromfirstlineoffileascolumn_type_hints=[long,str,str,str,long,long,long,long,long,long,long,long,long,long,long,long,long,long,long,long,long,long,long,long,float,float,float,float,float,float,float,long,long,float,float,float,float,float,float,float,float,str]Ifparsingfailsduetoincorrecttypes,youcancorrecttheinferredtypelistaboveandpassittoread_csvinthecolumn_type_hintsargumentPROGRESS:Read345187lines.Linespersecond:285583PROGRESS:FinishedparsingfileC:\Users\dshar\kddcup.csvPROGRESS:Parsingcompleted.Parsed494020linesin1.48128secs.
importgraphlab
kdd=graphlab.SFrame('kddcup.csv')
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 2/17
In[8]:
Mostfrequentitemsfrom
Value Count Percent
smurf. 280,790 56.838%
neptune. 107,201 21.7%
normal. 97,277 19.691%
back. 2,203 0.446%
satan. 1,589 0.322%
ipsweep. 1,247 0.252%
portsweep. 1,040 0.211%
warezclient. 1,020 0.206%
teardrop. 979 0.198%
pod. 264 0.053%
nmap. 231 0.047%
guess_passwd. 53 0.011%
buffer_overflow. 30 0.006%
land. 21 0.004%
warezmaster. 20 0.004%
imap. 12 0.002%
rootkit. 10 0.002%
loadmodule. 9 0.002%
ftp_write. 8 0.002%
multihop. 7 0.001%
phf. 4 8.097e4%
perl. 3 6.073e4%
spy. 2 4.048e4%
graphlab.canvas.set_target('ipynb')kdd['normal.'].show(view='Categorical')
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 3/17
In[9]:
Out[9]:
kdd.head()
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 4/17
0 tcp http SF 181 5450 0.1 0.2 0.3 0.4 0.5
0 tcp http SF 239 486 0 0 0 0 0
0 tcp http SF 235 1337 0 0 0 0 0
0 tcp http SF 219 1337 0 0 0 0 0
0 tcp http SF 217 2032 0 0 0 0 0
0 tcp http SF 217 2032 0 0 0 0 0
0 tcp http SF 212 1940 0 0 0 0 0
0 tcp http SF 159 4087 0 0 0 0 0
0 tcp http SF 210 151 0 0 0 0 0
0 tcp http SF 212 786 0 0 0 1 0
0 tcp http SF 210 624 0 0 0 0 0
0.15 0.16 8 8.1 0.00 0.00.1 0.00.2 0.00.3 1.00
0 0 8 8 0.0 0.0 0.0 0.0 1.0
0 0 8 8 0.0 0.0 0.0 0.0 1.0
0 0 6 6 0.0 0.0 0.0 0.0 1.0
0 0 6 6 0.0 0.0 0.0 0.0 1.0
0 0 6 6 0.0 0.0 0.0 0.0 1.0
0 0 1 2 0.0 0.0 0.0 0.0 1.0
0 0 5 5 0.0 0.0 0.0 0.0 1.0
0 0 8 8 0.0 0.0 0.0 0.0 1.0
0 0 8 8 0.0 0.0 0.0 0.0 1.0
0 0 18 18 0.0 0.0 0.0 0.0 1.0
0.00.7 0.00.8 0.00.9 0.00.10 ...
0.0 0.0 0.0 0.0 ...
0.0 0.0 0.0 0.0 ...
0.0 0.0 0.0 0.0 ...
0.0 0.0 0.0 0.0 ...
0.0 0.0 0.0 0.0 ...
0.04 0.0 0.0 0.0 ...
0.04 0.0 0.0 0.0 ...
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 5/17
In[10]:
In[41]:
0.04 0.0 0.0 0.0 ...
0.05 0.0 0.0 0.0 ...
SF 181 5450str660
dtype: strnum_unique(est.): 11num_undefined: 0
frequentitems:
dtype: intnum_unique(est.): 3,298
num_undefined: 0min: 0max: 6.934e+8median: 520mean: 3,025.616std: 988,218.101
distributionofvalues:
dtype:num_unique(est.):num_undefined:min:max:median:mean:std:
distributionofvalues
SFS0REJRSTRRSTOSHS1S2RSTOS0S3OTH
kdd.show()
d={'0':'duration','tcp':'protocol_type1','SF':'flag','181':'src_bytes','5450':'1':'logged_in','0.6':'num_compromised','0.7':'root_shell','0.10':'num_file_creations''0.12':'num_shells','0.13':'num_access_files','0.14':'num_outbound_cmds','0.15''0.00.1':'srv_serror_rate','0.00.2':'rerror_rate','0.00.3':'srv_rerror_rate','1.00''9.1':'dst_host_srv_count','1.00.1':'dst_host_same_srv_rate','0.00.6':'dst_host_diff_srv_rate''0.00.8':'dst_host_serror_rate','0.00.9':'dst_host_srv_serror_rate','normal.':'response'
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 6/17
In[15]:
Out[15]:
{'0':'duration','0.00':'serror_rate','0.00.1':'srv_serror_rate','0.00.10':'dst_host_rerror_rate','0.00.11':'dst_host_srv_rerror_rate','0.00.2':'rerror_rate','0.00.3':'srv_rerror_rate','0.00.4':'diff_srv_rate','0.00.5':'srv_diff_host_rate','0.00.6':'dst_host_diff_srv_rate','0.00.7':'dst_host_srv_diff_host_rate','0.00.8':'dst_host_serror_rate','0.00.9':'dst_host_srv_serror_rate','0.1':'land','0.10':'num_file_creations','0.11':'dst_host_same_src_port_rate','0.12':'num_shells','0.13':'num_access_files','0.14':'num_outbound_cmds','0.15':'is_host_login','0.16':'is_guest_login','0.2':'wrong_fragment','0.3':'urgent','0.4':'hot','0.5':'num_failed_logins','0.6':'num_compromised','0.7':'root_shell','0.8':'su_attempted','0.9':'num_root','1':'logged_in','1.00':'same_srv_rate','1.00.1':'dst_host_same_srv_rate','181':'src_bytes','5450':'dst_bytes','8':'count','8.1':'srv_count','9':'dst_host_count','9.1':'dst_host_srv_count','SF':'flag','http':'protocol_type2','normal.':'response','tcp':'protocol_type1'}
d
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 7/17
In[42]:
Out[42]:
kdd.rename(d)
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 8/17
0.0 1.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 1.0 0.0 1.0
0.0 1.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 1.0 0.0 0.0
dst_host_diff_srv_rate dst_host_same_src_port_rate...dst_host_srv_diff_host_ra
te...
0.0 0.05 0.0
0.0 0.03 0.0
0.0 0.03 0.0
0.0 0.02 0.0
0.0 0.02 0.0
0.0 1.0 0.04
0.0 0.09 0.04
0.0 0.12 0.04
0.0 0.12 0.05
0.0 0.06 0.05
dst_host_rerror_rate ...
0.0 ...
0.0 ...
0.0 ...
0.0 ...
0.0 ...
0.0 ...
0.0 ...
0.0 ...
0.0 ...
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 9/17
In[43]:
In[44]:
0.0 ...
duration protocol_type1 protocol_typedtype: intnum_unique(est.): 2,501num_undefined: 0min: 0max: 58,329median: 0mean: 47.979std: 707.746
distributionofvalues:
dtype: strnum_unique(est.): 3num_undefined: 0
frequentitems:
dtype:num_unique(est.):num_undefined:
frequentitems:
icmptcpudp
ecr_iprivatehttpsmtpotherdomain_uftp_dataeco_iftpfingerurp_itelnet
kdd.show()
train_data,test_data=kdd.random_split(.8,seed=0)
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 10/17
In[45]:
Out[45]:
train_data.head()
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 11/17
0 tcp http SF 235 1337
0 tcp http SF 219 1337
0 tcp http SF 217 2032
0 tcp http SF 217 2032
0 tcp http SF 212 1940
0 tcp http SF 159 4087
0 tcp http SF 210 151
0 tcp http SF 212 786
0 tcp http SF 210 624
logged_in num_compromised root_shell su_attempted num_root
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
num_outbound_cmds is_host_login is_guest_login count srv_count
0 0 0 8
0 0 0 8
0 0 0 6
0 0 0 6
0 0 0 6
0 0 0 1
0 0 0 5
0 0 0 8
0 0 0 8
0 0 0 18
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 12/17
Creatingmodel
0 0 0 18
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 13/17
In[48]:
kdd_model=graphlab.classifier.create(train_data,target='response',features=['protocol_type'
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 14/17
PROGRESS:Creatingavalidationsetfrom5percentoftrainingdata.Thismaytakeawhile.Youcanset``validation_set=None``todisablevalidationtracking.
PROGRESS:Thefollowingmethodsareavailableforthistypeofproblem.PROGRESS:BoostedTreesClassifier,RandomForestClassifier,LogisticClassifierPROGRESS:Thereturnedmodelwillbechosenaccordingtovalidationaccuracy.PROGRESS:Boostedtreesclassifier:PROGRESS:PROGRESS:Numberofexamples:375394PROGRESS:Numberofclasses:23PROGRESS:Numberoffeaturecolumns:2PROGRESS:Numberofunpackedfeatures:2PROGRESS:+++++PROGRESS:|Iteration|ElapsedTime|Trainingaccuracy|Validationaccuracy|PROGRESS:+++++PROGRESS:|1|2.961036|0.981233|0.981515|PROGRESS:|2|4.742356|0.982621|0.983081|PROGRESS:|3|6.585209|0.982621|0.983081|PROGRESS:|4|8.430268|0.982621|0.983081|PROGRESS:|5|10.256491|0.982621|0.983081|PROGRESS:|6|12.089232|0.982621|0.983081|PROGRESS:|7|13.987530|0.982621|0.983081|PROGRESS:|8|15.854709|0.982621|0.983081|PROGRESS:|9|17.797652|0.982696|0.983283|PROGRESS:|10|19.734425|0.982696|0.983283|PROGRESS:+++++PROGRESS:Randomforestclassifier:PROGRESS:PROGRESS:Numberofexamples:375394PROGRESS:Numberofclasses:23PROGRESS:Numberoffeaturecolumns:2PROGRESS:Numberofunpackedfeatures:2PROGRESS:+++++PROGRESS:|Iteration|ElapsedTime|Trainingaccuracy|Validationaccuracy|PROGRESS:+++++PROGRESS:|1|2.013409|0.973793|0.974949
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 15/17
|PROGRESS:|2|3.754968|0.974483|0.975556|PROGRESS:|3|5.468433|0.974765|0.97596|PROGRESS:|4|7.223957|0.98208|0.982727|PROGRESS:|5|8.930002|0.981795|0.982071|PROGRESS:|6|10.593517|0.981795|0.982071|PROGRESS:|7|12.395478|0.981808|0.982273|PROGRESS:|8|14.178238|0.981811|0.982273|PROGRESS:|9|15.930915|0.981808|0.982273|PROGRESS:|10|17.570304|0.981808|0.982273|PROGRESS:+++++PROGRESS:Logisticregression:PROGRESS:PROGRESS:Numberofexamples:375394PROGRESS:Numberofclasses:23PROGRESS:Numberoffeaturecolumns:2PROGRESS:Numberofunpackedfeatures:2PROGRESS:Numberofcoefficients:1496PROGRESS:StartingLBFGSPROGRESS:PROGRESS:+++++++PROGRESS:|Iteration|Passes|Stepsize|ElapsedTime|Trainingaccuracy|Validationaccuracy|PROGRESS:+++++++PROGRESS:|1|4|0.000013|0.922823|0.851716|0.851010|PROGRESS:|2|6|1.000000|1.834471|0.855115|0.854343|PROGRESS:|3|7|1.000000|2.363347|0.980221|0.980909|PROGRESS:|4|8|1.000000|2.924757|0.976286|0.977071|PROGRESS:|5|9|1.000000|3.594237|0.982440|0.983030|PROGRESS:|6|10|1.000000|4.106106|0.982131|0.982929|PROGRESS:|10|16|1.000000|6.497316|0.981185|0.981869|PROGRESS:+++++++PROGRESS:TERMINATED:Iterationlimitreached.PROGRESS:Thismodelmaynotbeoptimal.Toimproveit,considerincreasing`max_iterations`.PROGRESS:Modelselectionbasedonvalidationaccuracy:PROGRESS:
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 16/17
In[59]:
In[66]:
In[67]:
PROGRESS:BoostedTreesClassifier:0.983282828283PROGRESS:RandomForestClassifier:0.982272727273PROGRESS:LogisticClassifier:0.981869PROGRESS:PROGRESS:SelectingBoostedTreesClassifierbasedonvalidationsetperformance.
dict_values([Columns: target_label str predicted_labelstr count int
Rows:41
Data:++++|target_label|predicted_label|count|++++|back.|normal.|413||normal.|normal.|19083||warezclient.|satan.|1||smurf.|smurf.|56142||normal.|smurf.|77||guess_passwd.|neptune.|8||normal.|neptune.|76||portsweep.|neptune.|131||satan.|neptune.|34||portsweep.|satan.|42|++++[41rowsx3columns]Note:OnlytheheadoftheSFrameisprinted.Youcanuseprint_rows(num_rows=m,num_columns=n)toprintmorerowsandcolumns.])
NameErrorTraceback(mostrecentcalllast)in()>1print_rows(num_rows=41,num_columns=3).con
NameError:name'print_rows'isnotdefined
con=kdd_model.evaluate(test_data,metric='confusion_matrix')
printcon.viewvalues()
print_rows(num_rows=41,num_columns=3).con
-
2/21/2016 KddcupAnalysis
http://localhost:8889/notebooks/Kddcup%20Analysis.ipynb 17/17
In[74]:
In[75]:
In[]:
TypeErrorTraceback(mostrecentcalllast)in()>1graphlab.SFrame.print_rows(num_rows=41,num_columns=3)
TypeError:unboundmethodprint_rows()mustbecalledwithSFrameinstanceasfirstargument(gotnothinginstead)
AttributeErrorTraceback(mostrecentcalllast)in()>1con.show()
AttributeError:'dict'objecthasnoattribute'show'
graphlab.SFrame.print_rows(num_rows=41,num_columns=3)
con.show()