social networks analysis seminar introductory lecture #2

49
SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis

description

social networks analysis seminar introductory lecture #2. Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis. Seminar schedule. Introductory lecture #1. 5/3/14. 10/3/14. Papers list published, students send their 3 preferences. 12/3/14. - PowerPoint PPT Presentation

Transcript of social networks analysis seminar introductory lecture #2

social networks analysis seminarintroductory lecture #2Danny Hendler and Yehonatan CohenAdvanced Topics in on-line Social Networks Analysis1Introductory lecture #15/3/14No seminar (Purim!)Semesterends12/3/14Introductory lecture #2Papers list published, students send their 3 preferences14/3/1411 weeks of Student talks19/3/14Student talks startAll students preferences must be received10/3/1426/3/14Seminar schedule2Nodes centralityDegreeClosenessBetweennessMachine-learningTalk outline3Name the most central/significant node:12345678910111213Nodes centrality4Name the most central/significant node:12345678910111213Nodes centrality5Nodes centralityWhat makes a node central?Number of connectionsIt is central if it disconnects the graphHigh number of paths passing through the nodeProximity to all other nodesCentral node is the one whose neighbors are central

6Detection of the most popular actor in a network Spamming / AdvertisingNetwork vulnerability Health care / EpidemicsClustering similar structural positions Recommendation systemsNodes centrality: Applications7Nodes centrality: Degree8Name the most central/significant node:123456789Nodes centrality: Degree9Nodes centrality: Degree12345678910111213DegreeNode443637383931021121210Nodes centrality: Closeness (Reach)11Nodes centrality: Closeness (Reach)12345678910111213ReachDegreeNode5.84445.93366.12375.75385.25395.1831021121212Nodes centrality: Betweenness13Nodes centrality: Beetweenness12345678910111213BetweennessReachNode605.844785.936726.127435.758155.259415.1810111214Nodes centralityMachine LearningThe learning processClassificationEvaluationTalk outline15Herbert Alexander Simon: Learning is any process by which a system improves performance from experience.Machine Learning is concerned with computer programs that automatically improve their performance through experience.

Herbert Simon Turing Award 1975Nobel Prize in Economics 1978

Machine Learning16Learning = Improving with experience at some task Improve over task T,With respect to performance measure, PBased on experience, E.

Herbert Simon Turing Award 1975Nobel Prize in Economics 1978

Machine Learning17Machine Learning

Example: Spam FilteringT: Identify Spam EmailsP: % of spam emails that were filtered% of ham/ (non-spam) emails that were incorrectly filtered-outE: a database of emails that were labelled by users i.e. Feedback on emails:Move to Spam , Move to Inbox

18Machine Learning

Applications?19Machine Learning: The learning process

Model LearningModel Testing20Machine Learning: The learning process

Email Server Content of the email Number of recipients Size of message Number of attachments Number of "re's" in the subject lineModel LearningModel Testing21From e-mails to feature vectors:Textual-Based Content Features:Email is tokenizedEach token is a feature

Meta-Features:Number of recipients Size of messageMachine Learning: The learning process22Machine Learning: The learning processEmail TypeFree. . .LotteryViagraHam010Ham101Spam000Spam111Ham000Ham110Spam001VocabularyTarget Attribute

InstancesBinary23Machine Learning: The learning processEmail TypeCustomer TypeCountry (IP)Email Length (K)Number of new RecipientsHamGoldGermany20HamSilverGermany41SpamBronzeNigeria25SpamBronzeRussia42HamBronzeGermany43HamSilverUSA10SpamSilverUSA24Input AttributesTarget Attribute

InstancesNumericNominalOrdinal

24Machine Learning: Model learning

LearnerClassifier25Machine Learning: Model testingDatabaseTraining Set

Learner

26Machine Learning: Decision trees

categoricalcategoricalcontinuousclassTraining Data27Machine Learning: Decision trees

categoricalcategoricalcontinuousclassRefundYesSplitting AttributeTraining DataModel: Decision Tree28Machine Learning: Decision trees

categoricalcategoricalcontinuousclassRefundNOYesSplitting AttributeTraining DataModel: Decision Tree29Machine Learning: Decision trees

categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Splitting AttributesTraining DataModel: Decision Tree30Machine Learning: Decision trees

categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Splitting AttributesTraining DataModel: Decision Tree31Machine Learning: Decision trees

categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Splitting AttributesTraining DataModel: Decision TreeNO32Machine Learning: Decision trees

categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Single, DivorcedSplitting AttributesTraining DataModel: Decision TreeNO33Machine Learning: Decision trees

categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Single, DivorcedSplitting AttributesTraining DataModel: Decision TreeNOTaxInc> 80K34Machine Learning: Decision trees

categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Single, DivorcedSplitting AttributesTraining DataModel: Decision TreeNOTaxInc> 80KYES35Machine Learning: Decision trees

categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Single, DivorcedSplitting AttributesTraining DataModel: Decision TreeNOTaxInc> 80KYES36Machine Learning: Decision trees

categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Single, DivorcedSplitting AttributesTraining DataModel: Decision TreeNOTaxInc> 80KYES< 80K37Machine Learning: Decision trees

categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Single, DivorcedSplitting AttributesTraining DataModel: Decision TreeNOTaxInc> 80KYESNO< 80K38Machine Learning: ClassificationBinary classification(Instances, Class labels): (x1, y1), (x2, y2), ..., (xn, yn)yi {1,-1} - valuedClassifier: provides class prediction for an instanceOutcomes for a prediction:

1-11True positive (TP)False positive(FP)-1False negative(FP)True negative(TN)True classPredictedclass39Machine Learning: ClassificationP( = Y): accuracyP( = 1 | Y = 1): true positive rateP( = 1 | Y = -1): false positive rateP(Y = 1 | = 1): precision

1-11True positive (TP)False positive(FP)-1False negative(FP)True negative(TN)True classPredictedclass40Machine Learning: ClassificationConsider diagnostic test for a diseaseTest has 2 possible outcomes:positive = suggesting presence of disease negative An individual can test either positive or negative for the disease41Machine Learning: ClassificationTest ResultIndividuals with diseaseIndividuals without the disease42Machine Learning: ClassificationTest ResultCall these patients negativeCall these patients positive43Machine Learning: ClassificationTest ResultCall these patients negativeCall these patients positivewithout the diseasewith the diseaseTrue Positives44Machine Learning: ClassificationTest ResultCall these patients negativeCall these patients positivewithout the diseasewith the diseaseFalse Positives45Machine Learning: ClassificationTest ResultCall these patients negativeCall these patients positivewithout the diseasewith the diseaseTrue negatives46Machine Learning: ClassificationTest ResultCall these patients negativeCall these patients positivewithout the diseasewith the diseaseFalse negatives47Machine Learning: Cross-ValidationWhat if we dont have enough data to set aside a test dataset?Cross-Validation:Each data point is used both as train and test data.Basic idea:Fit model on 90% of the data; test on other 10%.Now do this on a different 90/10 split.Cycle through all 10 cases.10 folds a common rule of thumb.

48Machine Learning: Cross-ValidationDivide data into 10 equal pieces P1P10.Fit 10 models, each on 90% of the data.Each data point is treated as an out-of-sample data point by exactly one of the models.

49TidRefundMarital

StatusTaxable

IncomeCheat

1YesSingle125KNo

2NoMarried100KNo

3NoSingle70KNo

4YesMarried120KNo

5NoDivorced95KYes

6NoMarried60KNo

7YesDivorced220KNo

8NoSingle85KYes

9NoMarried75KNo

10NoSingle90KYes

10

TidRefundMarital

StatusTaxable

IncomeCheat

1YesSingle125KNo

2NoMarried100KNo

3NoSingle70KNo

4YesMarried120KNo

5NoDivorced95KYes

6NoMarried60KNo

7YesDivorced220KNo

8NoSingle85KYes

9NoMarried75KNo

10NoSingle90KYes

10

TidRefundMarital

StatusTaxable

IncomeCheat

1YesSingle125KNo

2NoMarried100KNo

3NoSingle70KNo

4YesMarried120KNo

5NoDivorced95KYes

6NoMarried60KNo

7YesDivorced220KNo

8NoSingle85KYes

9NoMarried75KNo

10NoSingle90KYes

10

TidRefundMarital

StatusTaxable

IncomeCheat

1YesSingle125KNo

2NoMarried100KNo

3NoSingle70KNo

4YesMarried120KNo

5NoDivorced95KYes

6NoMarried60KNo

7YesDivorced220KNo

8NoSingle85KYes

9NoMarried75KNo

10NoSingle90KYes

10

TidRefundMarital

StatusTaxable

IncomeCheat

1YesSingle125KNo

2NoMarried100KNo

3NoSingle70KNo

4YesMarried120KNo

5NoDivorced95KYes

6NoMarried60KNo

7YesDivorced220KNo

8NoSingle85KYes

9NoMarried75KNo

10NoSingle90KYes

10

TidRefundMarital

StatusTaxable

IncomeCheat

1YesSingle125KNo

2NoMarried100KNo

3NoSingle70KNo

4YesMarried120KNo

5NoDivorced95KYes

6NoMarried60KNo

7YesDivorced220KNo

8NoSingle85KYes

9NoMarried75KNo

10NoSingle90KYes

10

TidRefundMarital

StatusTaxable

IncomeCheat

1YesSingle125KNo

2NoMarried100KNo

3NoSingle70KNo

4YesMarried120KNo

5NoDivorced95KYes

6NoMarried60KNo

7YesDivorced220KNo

8NoSingle85KYes

9NoMarried75KNo

10NoSingle90KYes

10

TidRefundMarital

StatusTaxable

IncomeCheat

1YesSingle125KNo

2NoMarried100KNo

3NoSingle70KNo

4YesMarried120KNo

5NoDivorced95KYes

6NoMarried60KNo

7YesDivorced220KNo

8NoSingle85KYes

9NoMarried75KNo

10NoSingle90KYes

10

TidRefundMarital

StatusTaxable

IncomeCheat

1YesSingle125KNo

2NoMarried100KNo

3NoSingle70KNo

4YesMarried120KNo

5NoDivorced95KYes

6NoMarried60KNo

7YesDivorced220KNo

8NoSingle85KYes

9NoMarried75KNo

10NoSingle90KYes

10

TidRefundMarital

StatusTaxable

IncomeCheat

1YesSingle125KNo

2NoMarried100KNo

3NoSingle70KNo

4YesMarried120KNo

5NoDivorced95KYes

6NoMarried60KNo

7YesDivorced220KNo

8NoSingle85KYes

9NoMarried75KNo

10NoSingle90KYes

10

TidRefundMarital

StatusTaxable

IncomeCheat

1YesSingle125KNo

2NoMarried100KNo

3NoSingle70KNo

4YesMarried120KNo

5NoDivorced95KYes

6NoMarried60KNo

7YesDivorced220KNo

8NoSingle85KYes

9NoMarried75KNo

10NoSingle90KYes

10

TidRefundMarital

StatusTaxable

IncomeCheat

1YesSingle125KNo

2NoMarried100KNo

3NoSingle70KNo

4YesMarried120KNo

5NoDivorced95KYes

6NoMarried60KNo

7YesDivorced220KNo

8NoSingle85KYes

9NoMarried75KNo

10NoSingle90KYes

10