Labels Unsupervised Learning - Semantic Scholar · 2017. 8. 3. · Efficient Estimation of Word...
Transcript of Labels Unsupervised Learning - Semantic Scholar · 2017. 8. 3. · Efficient Estimation of Word...
6/26/17
1
Unsupervised LearningPresentor:Yevgeny Shapiro
Whatissupervised learning?
Labels
Whatisunsupervised learning?
Labels
Whatisunsupervised learning?
• Lookingforinnerstructure
6/26/17
2
• Tonsofdata,veryfewlabels
• Canimprovethestartingpointofsupervisedmodels
• Learnfeatureswithoutaspecifictaskforsemi-supervisedlearning
• Selfsupervision
• Closertohowthehumanbrainworks
Whyuseunsupervised learning? Classicmodels
Clustering
Autoencoders
RBMs
Self- supervision
Unsupervised LearninginNLP
•Lotsofdata
•HighlyStructured
•Highlysparse
לבןשהסתתר מאחורי העמוד הגבוה ברחוב היה הכלב
NLPtasks
•Lemmatization•Part-of-speechtagging•Translation•Grammarcorrection
• Languagemodels– Thecatis[?]• P(White| The catis)=0.1• P(while| Thecatis)=0
Dogs à dogDrove à Drive
6/26/17
3
EfficientEstimationofWordRepresentationsinVectorSpace
TomasMikolov,KaiChen,GregCorrado,JeffreyDean
Whylearnvectorrepresentations?
• Traditionally,wordsarerepresentedaslabels• The[id443]cat[id23]is[id554]white[id444]
• Thisencodingisarbitrary
• “Dog”issimilarto“Dogs”
• Knowledgetransferbetweensimilarwords
Whatkindofsimilarity?
• Semanticregularities• Dog,Cat• Israel,Jerusalem• Mother,Father
• Syntacticregularities• work,works• Run,Running• Fast,Faster
Previousmethods
• Onehot• Verysparse• Noknowledgetransferbetweensimilarwords
• LSA– Latentsemanticanalysis• UsesSVDfordimensionalityreduction
6/26/17
4
Previousmethods
• LDA– LatentDirichlet allocations• Bayesiantopicmodel• Veryexpensivetocompute
NNLM– neuralnetworklanguagemodel
• Npreviouswordswith1-of-Vcoding
• Firstlayer- linearembedding• Secondlayer– hiddenlayer• Outputlayer- Softmax
Problem- Computationalcomplexity
• Simplemodelstrainedwithlotsofdataarebetterthancomplexmodelstrainedwithlessdata
• Solution:Tryandreducethemodelcomplexity!• How?• Startbyanalyzingpreviousneuralnetmodels
Computationalcomplexity- NNLM
• Npreviouswordswith1-of-V coding• Firstlayer- linearembedding• Second layer– hidden layer• Output layer- Softmax
• Totalcomplexity• N– #previouswords• D– sizeofinputlayer• H– sizeofhiddenlayer• V– Vocabularysize
Hierarch ical So ftmax
BottleneckHuffman cod in g
6/26/17
5
• Problem:very sparsevocabulary• Softmax
• Solution:Approximatestandardsoftmax• HierarchicalSoftmax
HierarchicalSoftmax Computationalcomplexity- RNNM
• Nolimitofpreviouswords• Onehiddenlayer
• Total complexity• Worddimensionality=H• H– sizeofhiddenlayer• V– Vocabularysize
Hierarch ical So ftmax
Bottleneck
Conclusions
• Mostofthecomplexityiscausedbythenon-linearhiddenunit• Toreducecomplexity– removethehiddenunit!
FirstModel– Continuous Bag-of-Words
• Similar toNNLM• Projectionisshared
• Total complexity• N– #inputwords• D– sizeofinputlayer• V– Vocabularysize
NNLM:
6/26/17
6
SecondModel– Continuous Skip-gram
• Total complexity• C– maximumdistanceofthewords• D– sizeofinputlayer• V– Vocabularysize
• Samplingclosewordsmore
Training
• Adagrad – well suitedforsparsedata• Smalllearningrateforcommonparameters/words• Largelearningrateforrareparameters/words
Results
• Differentkindsofsimilarity• Computingpairs
• King– Man+Woman=Queen
Results
6/26/17
7
Results ContextEncoders:FeatureLearningbyInpainting
DeepakPathak PhilippKrahenbuhlJeffDonahue TrevorDarrellAlexeiA.Efros
Motivation
• Thevisualdataisstructure,butinamorecomplexwaythantext.
• Humanscaneasilyunderstandthestructure.
Contextencoder
• Autoencoder –mightcauseordinarycompression
• Denoising autoencoder – learntodifferbetweensignalandnoise,nothingsemantic
6/26/17
8
Previousworks
• UnsupervisedVisualRepresentationLearningbyContextPrediction–discriminativemodel
• Doersch, Carl, Abhinav Gupta, andAlexei A.Efros
Previousworks
• UnsupervisedLearningofVisualRepresentationsusingVideos
Xiaolong Wang,Abhinav GuptaRoboticsInstitute,CarnegieMellonUniversity
Previousworks
• SceneCompletionUsingMillionsofPhotographs
• Hays, James, and Alexei A. Efros
SystemArchitecture
• 1.Encoder– similartoAlexnet, trainedfromscratch• Input:227x227,output:6x6x256(pool5ofalexnet)
• 2.Connection– fullyconnectedlayer• Doesn’tconnectdifferentfeaturesforcomputationalreasons• Stride1convolutionisdonelater,
toconnecteddifferentchannels
• 3.Decoder– 5up-convolutions• Output– originaltargetsize
6/26/17
9
LossFunction
• ReconstructionLoss• - Abinarymaskcorrespondingtodroppedregions• F(X)– encodedimage
LossFunction
• Adversarial loss• TolearnagenerativemodelG(=Finourcase)• WelearnadiscriminativemodelD– Loglikelihoodforrealsamples
• D=1 for real samples, andD=0 forgenerated samples
• Total lossfunction
Regionmasks
• Centralblock• Workswellforinpainting• Learnslowlevelboundaryfeatureswhichdon’tgeneralizewell
• Randomblock• Randomrectangleblocksareremoved• Upto¼oftheimage• ThemasksstillhavesharpboundarieswhichCNNfeatureslatchon
• Randomregion• RandommasksfromPASCALVOC2012,deformed• Upto¼oftheimage
Evaluation– SemanticInpainting
• Nonoiseintheadversarialsetting
• Thediscriminatorisnotconditionedonthecontext
• Nopoolinglayers
6/26/17
10
Evaluation– SemanticInpainting Evaluation– SemanticInpainting
Evaluation– SemanticInpainting TheProblemwithPSNR
6/26/17
11
Evaluation– Featurelearning
• Adversarial lossdidnotconvergewithAlexnet• Nearestneighborofthemissingregionusingcontextfeatures• Alexnet wastrainedasasupervisedclassificationproblem
Evaluation
• Detection:Fast– RCNNframework• Segmentation:FCNN
Thankyou!
RNNandLSTM
6/26/17
12
CharacterLevelModel