Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016:...
Transcript of Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016:...
![Page 1: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/1.jpg)
Artificial Neural Networks: Deep orBroad? An Empirical Study
Nian Liu and Nayyar A. Zaidi
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 1
![Page 2: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/2.jpg)
Introduction
I Two significant trends in machine learning in last 10 years:I Ever-growing quantities of training data – Advent of Big DataI Success of Deep Learning on many problems
I Lessons learnedI For big data we need low-bias modelsI Feature Engineering: Main reason behind the success of deep
learning
I Big Learning: Feature Engineering (low-bias), Minimal Pass,Minimal Tuning Parameters, Dynamic Models
I Is feature engineering and low-bias models two newphenomenon?
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 2
![Page 3: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/3.jpg)
Introduction
I Two significant trends in machine learning in last 10 years:I Ever-growing quantities of training data – Advent of Big DataI Success of Deep Learning on many problems
I Lessons learnedI For big data we need low-bias modelsI Feature Engineering: Main reason behind the success of deep
learning
I Big Learning: Feature Engineering (low-bias), Minimal Pass,Minimal Tuning Parameters, Dynamic Models
I Is feature engineering and low-bias models two newphenomenon?
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 3
![Page 4: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/4.jpg)
Introduction
I Two significant trends in machine learning in last 10 years:I Ever-growing quantities of training data – Advent of Big DataI Success of Deep Learning on many problems
I Lessons learnedI For big data we need low-bias modelsI Feature Engineering: Main reason behind the success of deep
learning
I Big Learning: Feature Engineering (low-bias), Minimal Pass,Minimal Tuning Parameters, Dynamic Models
I Is feature engineering and low-bias models two newphenomenon?
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 4
![Page 5: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/5.jpg)
Introduction
I Two significant trends in machine learning in last 10 years:I Ever-growing quantities of training data – Advent of Big DataI Success of Deep Learning on many problems
I Lessons learnedI For big data we need low-bias modelsI Feature Engineering: Main reason behind the success of deep
learning
I Big Learning: Feature Engineering (low-bias), Minimal Pass,Minimal Tuning Parameters, Dynamic Models
I Is feature engineering and low-bias models two newphenomenon?
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 5
![Page 6: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/6.jpg)
The Need for Low-Bias
I Much of machine learninghas been conducted in thecontext of small datasets
I Variance dominates most ofthe error
I Low-bias models will lead toover-fitting
I Lots of emphasis onRegularization
I Big datasets requireslow-bias models
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 6
![Page 7: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/7.jpg)
Low-Bias Models
I Bayesian NetworksI Higher-order Logistic Regression
I Generalized Linear Models
I Artificial Neural NetworksI Deep Learning
I Random ForestsI Other ensemble-based and tree models
I Support Vector MachinesI Kernel Engineering ≡ Feature Engineering
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 7
![Page 8: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/8.jpg)
Low-Bias Models
I Bayesian NetworksI Higher-order Logistic Regression
I Generalized Linear Models
I Artificial Neural NetworksI Deep Learning
I Random Forests
I Support Vector Machines
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 8
![Page 9: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/9.jpg)
Low-Bias Models
I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.
and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in
Machine Learning (2017)
I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian
Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)
I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :
Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)
I Artificial Neural NetworksI Why Broad? – One-hidden layer ANN are universal
function-approximators
I Why Deep? – Constant-depth circuits are less powerful than deep
circuits and Less no. of parametersI Why not Deep?
I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 9
![Page 10: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/10.jpg)
Low-Bias Models
I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.
and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in
Machine Learning (2017)
I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian
Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)
I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :
Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)
I Artificial Neural NetworksI Why Broad? – One-hidden layer ANN are universal
function-approximators
I Why Deep? – Constant-depth circuits are less powerful than deep
circuits and Less no. of parametersI Why not Deep?
I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 10
![Page 11: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/11.jpg)
Low-Bias Models
I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.
and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in
Machine Learning (2017)
I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian
Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)
I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :
Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)
I Artificial Neural NetworksI Why Broad? – One-hidden layer ANN are universal
function-approximators
I Why Deep? – Constant-depth circuits are less powerful than deep
circuits and Less no. of parametersI Why not Deep?
I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 11
![Page 12: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/12.jpg)
Low-Bias Models
I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.
and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in
Machine Learning (2017)
I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian
Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)
I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :
Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)
I Artificial Neural Networks
I Why Broad? – One-hidden layer ANN are universal
function-approximators
I Why Deep? – Constant-depth circuits are less powerful than deep
circuits and Less no. of parametersI Why not Deep?
I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 12
![Page 13: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/13.jpg)
Low-Bias Models
I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.
and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in
Machine Learning (2017)
I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian
Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)
I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :
Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)
I Artificial Neural NetworksI Why Broad? – One-hidden layer ANN are universal
function-approximators
I Why Deep? – Constant-depth circuits are less powerful than deep
circuits and Less no. of parametersI Why not Deep?
I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 13
![Page 14: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/14.jpg)
Low-Bias Models
I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.
and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in
Machine Learning (2017)
I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian
Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)
I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :
Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)
I Artificial Neural NetworksI Why Broad? – One-hidden layer ANN are universal
function-approximators
I Why Deep? – Constant-depth circuits are less powerful than deep
circuits and Less no. of parameters
I Why not Deep?I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 14
![Page 15: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/15.jpg)
Low-Bias Models
I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.
and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in
Machine Learning (2017)
I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian
Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)
I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :
Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)
I Artificial Neural NetworksI Why Broad? – One-hidden layer ANN are universal
function-approximators
I Why Deep? – Constant-depth circuits are less powerful than deep
circuits and Less no. of parametersI Why not Deep?
I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 15
![Page 16: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/16.jpg)
Low-Bias Models
I Bayesian NetworksI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Buntine, W. and Hynes, M.
and De Sterck, H. – Efficient Parameter Learning of Bayesian Network Classifiers, to Appear in
Machine Learning (2017)
I Martinez, A. M. Chen, S. and Webb, G. I. and Zaidi, N. A. – Scalable Learning of Bayesian
Network Classifiers, Journal of Machine Learning Research, pp: 1-35, volume: 17, (2016)
I Higher-order Logistic RegressionI Zaidi, N. A. and Webb, G. I. and Carman, M. J. and Petitjean, F. and Cerquides, J. – ALRn :
Accelerated Higher-order Logistic Regression, Machine Learning, pp: 151-194, volume: 104, (2016)
I Artificial Neural NetworksI Why Broad? – One-hidden layer ANN are universal
function-approximators
I Why Deep? – Constant-depth circuits are less powerful than deep
circuits and Less no. of parametersI Why not Deep?
I Architecture SelectionI Vanishing gradientsI Solution: Greedy layer-wise trainings
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 16
![Page 17: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/17.jpg)
Low-Bias Models
I Bayesian Networks
PBNk (y |x) =P(y)
∏ni=1 P(xi |pa(xi ), y)∑C
c=1 P(c)∏n
i=1 P(xi |pa(xi ), c).
I Higher-order Logistic Regression
PLRn (y |x) =exp(βy +
∑α∈(An ) βy ,α,xα
)∑c∈ΩY
exp(βc +
∑α∗∈(An ) βc,α∗,xα∗
) .I Artificial Neural Networks
PANNb,d (y |x) =f1[∑nH
j=1 βk,0 + wk,j f0(βj,0 + βT
j x)]
Z.
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 17
![Page 18: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/18.jpg)
Observations and Motivations
Observations
I We know that:I higher k will lead to low-bias BNk
I higher n will lead to low-bias LRn
I We do not know:I higher b or d will lead to low-bias ANNb,d
I should b be preferred over d or vice-versaI what is the effect on the convergence?
Motivations
I A comparative analysis of low-bias models warrantsfurther investigation
I Efficient, low-bias and dynamic models are the key tosolving big data enigma
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 18
![Page 19: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/19.jpg)
Experimental Design: Broad vs. Deep ANN
I 73 datasets from UCI repository
I 2-fold cross-validation
I 0-1 Loss, RMSE, Bias, Variance and Convergence performance
I Bias and Variance definition of Kohavi and Wolpart
I Win-Draw-Loss results are reported
I Separate analysis on Big Datasets
I 12 datasets with more than 10000 instances
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 19
![Page 20: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/20.jpg)
Experimental Design: Broad vs. Deep ANN
I Deep Models denoted as: NN2, NN22, NN222, NN2222,NN2222, representing 1, 2, 3, 4, and 5 hidden layers each withtwo nodes each
I Broad Models denoted as: NN2, NN4, NN6, NN8, NN10,representing 1 hidden layer with 2, 4, 6, 8 and 10 nodes
I For sake of comparison, we also include NN0, this zero-hiddenlayer ANN and is equivalent to linear Logistic Regression
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 20
![Page 21: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/21.jpg)
Broad ANN – Bias, Variance Comparison
vs. NN0 vs. NN2 vs. NN4 vs. NN6 vs. NN8
W-D-L p W-D-L p W-D-L p W-D-L p W-D-L p
All Datasets - Bias
NN2 35/3/34 1
NN4 45/4/23 0.010 49/7/16 <0.001
NN6 47/4/21 0.002 47/5/20 0.001 37/7/28 0.321
NN8 48/3/21 0.002 44/5/23 0.014 37/7/28 0.321 36/11/25 0.200
NN10 52/3/17 <0.001 47/5/20 0.001 41/9/22 0.023 43/10/19 0.003 40/15/17 0.003
All Datasets - Variance
NN2 20/2/50 <0.001
NN4 21/2/49 0.001 38/6/28 0.268
NN6 27/3/42 0.091 43/7/22 0.013 40/8/24 0.060
NN8 32/2/38 0.550 42/7/23 0.025 44/8/20 0.004 36/9/27 0.314
NN10 30/3/39 0.336 42/7/23 0.025 43/9/20 0.005 34/13/25 0.298 33/10/29 0.704
Table: A comparison of Bias and Variance of broad models in terms of W-D-L on Alldatasets. p is two-tail binomial sign test. Results are significant if p ≤ 0.05.
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 21
![Page 22: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/22.jpg)
Broad ANN – Error Comparison
vs. NN0 vs. NN2 vs. NN4 vs. NN6 vs. NN8
W-D-L p W-D-L p W-D-L p W-D-L p W-D-L p
All Datasets – 0-1 Loss
NN2 27/2/43 0.072
NN4 31/6/35 0.712 50/9/13 <0.001
NN6 33/3/36 0.801 49/3/20 <0.001 45/7/20 0.003
NN8 37/1/34 0.813 50/5/17 <0.001 44/8/20 0.004 31/14/27 0.694
NN10 40/2/30 0.282 51/4/17 <0.001 49/5/18 <0.001 38/9/25 0.130 40/8/24 0.060
Big Datasets – 0-1 Loss
NN2 6/0/6 1.226
NN4 7/0/5 0.774 12/0/0 0.011
NN6 7/0/5 0.774 12/0/0 0.001 11/0/1 0.006
NN8 8/0/4 0.388 12/0/0 <0.001 9/0/3 0.146 8/0/4 0.388
NN10 8/0/4 0.388 12/0/0 <0.001 10/0/2 0.039 9/0/3 0.146 9/0/3 0.146
Table: A comparison of 0-1 Loss and RMSE of broad models in terms of W-D-L onAll and Big datasets. p is two-tail binomial sign test. Results are significant ifp ≤ 0.05.
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 22
![Page 23: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/23.jpg)
Broad ANN – Geometric Averages
All Big0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
20-1 Loss
NN0NN2NN4NN6NN8NN10
All Big0
0.5
1
1.5RMSE
NN0NN2NN4NN6NN8NN10
All Big0
0.5
1
1.5Bias
NN0NN2NN4NN6NN8NN10
All Big0
0.5
1
1.5
2
2.5
3Variance
NN0NN2NN4NN6NN8NN10
Figure: Comparison (geometric average) of 0-1 Loss, RMSE, Bias and Variance forbroad models on All and Big datasets. Results are normalized w.r.t NN0.
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 23
![Page 24: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/24.jpg)
Deep ANN – Bias, Variance Comparison
vs. NN0 vs. NN2 vs. NN22 vs. NN222 vs. NN2222
W-D-L p W-D-L p W-D-L p W-D-L p W-D-L p
All Datasets – Bias
NN2 35/3/34 1
NN22 30/3/39 0.336 28/4/40 0.182
NN222 26/1/45 0.032 21/3/48 0.002 24/4/44 0.021
NN2222 5/0/67 <0.001 3/1/68 <0.001 3/2/67 <0.001 4/9/59 <0.001
NN22222 0/1/71 <0.001 0/1/71 <0.001 1/2/69 <0.001 1/9/62 <0.001 0/61/11 <0.001
All Datasets – Variance
NN2 20/2/50 <0.001
NN22 20/1/51 <0.001 27/6/39 0.175
NN222 24/1/47 0.009 34/3/35 1 32/4/36 0.905
NN2222 34/1/37 0.813 34/1/37 0.813 36/2/34 0.905 32/9/31 1
NN22222 40/2/30 0.282 38/1/33 0.6353 39/2/31 0.403 35/9/28 0.450 8/61/3 0.227
Table: Bias W-D-L on All and Big datasets. p is two-tail binomial sign test. Resultsare significant if p ≤ 0.05.
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 24
![Page 25: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/25.jpg)
Deep ANN – Error Comparison
vs. NN0 vs. NN2 vs. NN22 vs. NN222 vs. NN2222
W-D-L p W-D-L p W-D-L p W-D-L p W-D-L p
All Datasets – 0-1 Loss
NN2 27/2/43 0.072
NN22 28/1/43 0.096 24/5/43 0.027
NN222 24/1/47 0.009 25/5/42 0.050 28/3/41 0.148
NN2222 7/0/65 <0.001 4/2/66 <0.001 4/2/66 <0.001 3/9/60 <0.001
NN22222 7/1/64 <0.001 5/1/66 <0.001 4/2/66 <0.001 3/9/60 <0.001 1/61/10 0.012
Big Datasets – 0-1 Loss
NN2 6/0/6 1.226
NN22 5/0/7 0.774 4/0/8 0.388
NN222 4/0/8 0.388 2/0/10 0.039 4/0/8 0.388
NN2222 2/0/10 0.039 0/0/12 <0.001 1/0/11 0.006 1/1/10 0.012
NN22222 1/1/10 0.012 0/0/12 <0.001 0/0/12 <0.001 0/1/11 <0.001 0/6/6 0.031
Table: 0-1 Loss W-D-L on All and Big datasets. p is two-tail binomial sign test.Results are significant if p ≤ 0.05.
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 25
![Page 26: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/26.jpg)
Deep ANN – Geometric Averages
All Big0
0.5
1
1.5
2
2.5
3
3.5
40-1 Loss
NN0NN2NN22NN222NN2222NN22222
All Big0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2RMSE
NN0NN2NN22NN222NN2222NN22222
All Big0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5Bias
NN0NN2NN22NN222NN2222NN22222
All Big0
0.5
1
1.5
2
2.5
3Variance
NN0NN2NN22NN222NN2222NN22222
Figure: Comparison (geometric average) of 0-1 Loss, RMSE, Bias and Variance fordeep models on Little and Big datasets. Results are normalized w.r.t NN0.
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 26
![Page 27: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/27.jpg)
Convergence Analysis (Broad)
100
101
102
103
No. of Iterations
0.09
0.095
0.1
0.105
0.11
0.115
0.12
0.125
0.13
0.135
Mean
S
qu
are E
rro
r
Connect-4
NN2NN4NN6NN8NN10
100
101
102
103
No. of Iterations
0.048
0.05
0.052
0.054
0.056
0.058
0.06
0.062
0.064
0.066
Mean
S
qu
are E
rro
r
Localization
NN2NN4NN6NN8NN10
100
101
102
103
No. of Iterations
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
Mean
S
qu
are E
rro
r
Nursery
NN2NN4NN6NN8NN10
100
101
102
103
No. of Iterations
0.01
0.015
0.02
0.025
0.03
0.035
0.04
Mean
S
qu
are E
rro
r
Letter-recog
NN2NN4NN6NN8NN10
100
101
102
103
No. of Iterations
0.095
0.1
0.105
0.11
0.115
0.12
0.125
0.13
0.135
0.14
Mean
S
qu
are E
rro
r
Magic
NN2NN4NN6NN8NN10
100
101
102
103
No. of Iterations
0.09
0.1
0.11
0.12
0.13
0.14
0.15
0.16
0.17
Mean
S
qu
are E
rro
r
Sign
NN2NN4NN6NN8NN10
Figure: Variation in Mean Square Error of NN2, NN4, NN6, NN8 and NN10 withincreasing number of (optimization) iterations on sample datasets.
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 27
![Page 28: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/28.jpg)
Convergence Analysis (Deep)
100
101
102
103
No. of Iterations
0.11
0.12
0.13
0.14
0.15
0.16
0.17
Mean
S
qu
are E
rro
r
Connect-4
NN2NN22NN222NN2222NN22222
100
101
102
103
No. of Iterations
0.06
0.062
0.064
0.066
0.068
0.07
0.072
0.074
Mean
S
qu
are E
rro
r
Localization
NN2NN22NN222NN2222NN22222
100
101
102
103
No. of Iterations
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Mean
S
qu
are E
rro
r
Nursery
NN2NN22NN222NN2222NN22222
100
101
102
103
No. of Iterations
0.034
0.0345
0.035
0.0355
0.036
0.0365
0.037
0.0375
Mean
S
qu
are E
rro
r
Letter-recog
NN2NN22NN222NN2222NN22222
100
101
102
103
No. of Iterations
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
Mean
S
qu
are E
rro
r
Magic
NN2NN22NN222NN2222NN22222
100
101
102
103
No. of Iterations
0.14
0.15
0.16
0.17
0.18
0.19
0.2
0.21
0.22
0.23
Mean
S
qu
are E
rro
r
Sign
NN2NN22NN222NN2222NN22222
Figure: Variation in Mean Square Error of NN2, NN22, NN222, NN2222 andNN22222 with increasing number of (optimization) iterations on sample datasets.
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 28
![Page 29: Artificial Neural Networks: Deep or Broad? An Empirical Studynzaidi/presentations/mltalk.pdfAI 2016: The 29th Australasian Joint Conference on Arti cial Intelligence Nian Liu and Nayyar](https://reader033.fdocuments.net/reader033/viewer/2022051311/6036ae0d5b3bb145b54699f0/html5/thumbnails/29.jpg)
Conclusion
I Results warrants further investigation
I Deep versus Broad
I Deep versus Shallow
I Q & A
I For Further Discussions
I @nayyar zaidi
I nayyar zaidi
I http://users.monash.edu.au/~nzaidi
AI 2016: The 29th Australasian Joint Conference on Artificial Intelligence Nian Liu and Nayyar A. Zaidi 29