Labels Unsupervised Learning - Semantic Scholar · 2017. 8. 3. · Efficient Estimation of Word...

6/26/17

1

Unsupervised LearningPresentor:Yevgeny Shapiro

Whatissupervised learning?

Labels

Whatisunsupervised learning?

Labels

Whatisunsupervised learning?

• Lookingforinnerstructure

6/26/17

2

• Tonsofdata,veryfewlabels

• Canimprovethestartingpointofsupervisedmodels

• Learnfeatureswithoutaspecifictaskforsemi-supervisedlearning

• Selfsupervision

• Closertohowthehumanbrainworks

Whyuseunsupervised learning? Classicmodels

Clustering

Autoencoders

RBMs

Self- supervision

Unsupervised LearninginNLP

•Lotsofdata

•HighlyStructured

•Highlysparse

לבןשהסתתר מאחורי העמוד הגבוה ברחוב היה הכלב

NLPtasks

•Lemmatization•Part-of-speechtagging•Translation•Grammarcorrection

• Languagemodels– Thecatis[?]• P(White| The catis)=0.1• P(while| Thecatis)=0

Dogs à dogDrove à Drive

6/26/17

3

EfficientEstimationofWordRepresentationsinVectorSpace

TomasMikolov,KaiChen,GregCorrado,JeffreyDean

Whylearnvectorrepresentations?

• Traditionally,wordsarerepresentedaslabels• The[id443]cat[id23]is[id554]white[id444]

• Thisencodingisarbitrary

• “Dog”issimilarto“Dogs”

• Knowledgetransferbetweensimilarwords

Whatkindofsimilarity?

• Semanticregularities• Dog,Cat• Israel,Jerusalem• Mother,Father

• Syntacticregularities• work,works• Run,Running• Fast,Faster

Previousmethods

• Onehot• Verysparse• Noknowledgetransferbetweensimilarwords

• LSA– Latentsemanticanalysis• UsesSVDfordimensionalityreduction

6/26/17

4

Previousmethods

• LDA– LatentDirichlet allocations• Bayesiantopicmodel• Veryexpensivetocompute

NNLM– neuralnetworklanguagemodel

• Npreviouswordswith1-of-Vcoding

• Firstlayer- linearembedding• Secondlayer– hiddenlayer• Outputlayer- Softmax

Problem- Computationalcomplexity

• Simplemodelstrainedwithlotsofdataarebetterthancomplexmodelstrainedwithlessdata

• Solution:Tryandreducethemodelcomplexity!• How?• Startbyanalyzingpreviousneuralnetmodels

Computationalcomplexity- NNLM

• Npreviouswordswith1-of-V coding• Firstlayer- linearembedding• Second layer– hidden layer• Output layer- Softmax

• Totalcomplexity• N– #previouswords• D– sizeofinputlayer• H– sizeofhiddenlayer• V– Vocabularysize

Hierarch ical So ftmax

BottleneckHuffman cod in g

6/26/17

5

• Problem:very sparsevocabulary• Softmax

• Solution:Approximatestandardsoftmax• HierarchicalSoftmax

HierarchicalSoftmax Computationalcomplexity- RNNM

• Nolimitofpreviouswords• Onehiddenlayer

• Total complexity• Worddimensionality=H• H– sizeofhiddenlayer• V– Vocabularysize

Hierarch ical So ftmax

Bottleneck

Conclusions

• Mostofthecomplexityiscausedbythenon-linearhiddenunit• Toreducecomplexity– removethehiddenunit!

FirstModel– Continuous Bag-of-Words

• Similar toNNLM• Projectionisshared

• Total complexity• N– #inputwords• D– sizeofinputlayer• V– Vocabularysize

NNLM:

6/26/17

6

SecondModel– Continuous Skip-gram

• Total complexity• C– maximumdistanceofthewords• D– sizeofinputlayer• V– Vocabularysize

• Samplingclosewordsmore

Training

• Adagrad – well suitedforsparsedata• Smalllearningrateforcommonparameters/words• Largelearningrateforrareparameters/words

Results

• Differentkindsofsimilarity• Computingpairs

• King– Man+Woman=Queen

Results

6/26/17

7

Results ContextEncoders:FeatureLearningbyInpainting

DeepakPathak PhilippKrahenbuhlJeffDonahue TrevorDarrellAlexeiA.Efros

Motivation

• Thevisualdataisstructure,butinamorecomplexwaythantext.

• Humanscaneasilyunderstandthestructure.

Contextencoder

• Autoencoder –mightcauseordinarycompression

• Denoising autoencoder – learntodifferbetweensignalandnoise,nothingsemantic

6/26/17

8

Previousworks

• UnsupervisedVisualRepresentationLearningbyContextPrediction–discriminativemodel

• Doersch, Carl, Abhinav Gupta, andAlexei A.Efros

Previousworks

• UnsupervisedLearningofVisualRepresentationsusingVideos

Xiaolong Wang,Abhinav GuptaRoboticsInstitute,CarnegieMellonUniversity

Previousworks

• SceneCompletionUsingMillionsofPhotographs

• Hays, James, and Alexei A. Efros

SystemArchitecture

• 1.Encoder– similartoAlexnet, trainedfromscratch• Input:227x227,output:6x6x256(pool5ofalexnet)

• 2.Connection– fullyconnectedlayer• Doesn’tconnectdifferentfeaturesforcomputationalreasons• Stride1convolutionisdonelater,

toconnecteddifferentchannels

• 3.Decoder– 5up-convolutions• Output– originaltargetsize

6/26/17

9

LossFunction

• ReconstructionLoss• - Abinarymaskcorrespondingtodroppedregions• F(X)– encodedimage

LossFunction

• Adversarial loss• TolearnagenerativemodelG(=Finourcase)• WelearnadiscriminativemodelD– Loglikelihoodforrealsamples

• D=1 for real samples, andD=0 forgenerated samples

• Total lossfunction

Regionmasks

• Centralblock• Workswellforinpainting• Learnslowlevelboundaryfeatureswhichdon’tgeneralizewell

• Randomblock• Randomrectangleblocksareremoved• Upto¼oftheimage• ThemasksstillhavesharpboundarieswhichCNNfeatureslatchon

• Randomregion• RandommasksfromPASCALVOC2012,deformed• Upto¼oftheimage

Evaluation– SemanticInpainting

• Nonoiseintheadversarialsetting

• Thediscriminatorisnotconditionedonthecontext

• Nopoolinglayers

6/26/17

10

Evaluation– SemanticInpainting Evaluation– SemanticInpainting

Evaluation– SemanticInpainting TheProblemwithPSNR

6/26/17

11

Evaluation– Featurelearning

• Adversarial lossdidnotconvergewithAlexnet• Nearestneighborofthemissingregionusingcontextfeatures• Alexnet wastrainedasasupervisedclassificationproblem

Evaluation

• Detection:Fast– RCNNframework• Segmentation:FCNN

Thankyou!

RNNandLSTM

6/26/17

12

CharacterLevelModel

Labels Unsupervised Learning - Semantic Scholar · 2017. 8. 3. · Efficient Estimation of Word...

Documents

Transcript of Labels Unsupervised Learning - Semantic Scholar · 2017. 8. 3. · Efficient Estimation of Word...