Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic:...

61
Heterogeneous (Information) Networks CS 6604: Data Mining Large Networks and Time-Series Paper Presentation Prashant Chandrasekar 11/01/17

Transcript of Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic:...

Page 1: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Heterogeneous (Information) NetworksCS 6604: Data Mining Large Networks and

Time-SeriesPaper Presentation

Prashant Chandrasekar11/01/17

Page 2: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

OverviewTopic: Heterogenous Information Networks

Outline

- Paper introducing the field of HIN mining- Two really cool applications of HIN

Objective/Takeaway: Piqued interest in the field, but more importantly, see how HINs can be a part of your personal research / class project / hobbies.

Page 3: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Heterogeneous Information Network Analysis

Paper: A Survey on Heterogeneous Information Network AnalysisAuthors: Shi, Chuan, Yitong Li, Jiawei Zhang, Yizhou Sun, and S. Yu Philip IEEE Transactions on Knowledge and Data Engineering. 2017 Jan 1;29(1):17-37.

Page 4: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

BackgroundReal systems have large number of interactions between multi-typed components.

An “information network” is ubiquitous in terms of modeling/representing interacting components.

Mining of such; related to works in link analysis, network analysis, network science and graph mining.

Contemporary information network analyses restricted to single-type objects/nodes and/or links/edges.

HIN: Allows fusing more information, more richer semantic representation

Page 5: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Concepts and DefinitionsDef 1: Information Network

- G = (V,E)- Object mapping function An object belongs to only one type- Link mapping function: A link belongs to one relation type- If two links belong to same relation type, they share same starting and ending

object type.

Page 6: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Concepts and DefinitionsDef 1: Heterogenous/Homogenous Information Network

- G = (V,E)- Object mapping function An object belongs to one type- Link mapping function: A link belongs to one relation type- If two links belong to same relation type, they share same starting and ending

object type. - Heterogenous if |A| > 1 OR |R| > 1; else Homogenous

Page 7: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

HIN Example: Bibliographic dataset

Page 8: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

HIN: Meta-PathsKey difference between homogeneous networks: Two objects can be connected via different paths. Each path can have it’s own meaning.

Page 9: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Meta-Path DefinitionGiven network schema S = (A, R) (remember from previous slide)

Meta-path P is of form: , where

Composite relation , between objects

If, no multiple relations between two object types, the above can be represented via object types. For ex for bibliographic data, we have 2-length meta-path, or “APA” for short.

Page 10: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Meta-Path: Bibliographic Dataset

Question/Challenge: Would a task output depend on the metapath.

For ex: Finding similar authors. Would the result be different if we chose meta path (a) as compared to meta path (b) ?

Page 11: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Related network typesHomogeneous Network: |A| = 1, |R| = 1. Special case of HIN. Can be derived from HIN through network projection. Analysis techniques not directly applicable to HIN.

Multi-Relational Network: |A| = 1, |R| > 1. Special case of HIN.

Multi-Dimensional/Mode Network: Same as Multi-Relational Network

Composite Network: Users in network have various relationships, diff behavior in subnetwork, share latent variables. Same as Multi-Relational Network

Complex Network: Non-trivial topological features. Fields of study include math, physics, biology, CS, etc. Real world networks (like social, biological) are complex networks. Real world HIN might be complex networks.

Page 12: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Common HIN Network Schemas

Multi-Relational network with single-typed object: Facebook, Xiaonei, etc.

Bipartite: User-Item, Document-word, (extended to k-partities)

Star-Schema: Bibliographic, movie data, US patent data. (Typically derived from DB tables. Most Popular)

Multi-Hub Network: Most commonly for Bioinformatics data

Page 13: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Complex HINs

Multiple HINs (Studying connection across two social networks).Schema-rich network (based on ontologies written through semantic web standards) such as KnowledgeGraph.

Page 14: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Summary of Research Work on HIN

Page 15: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Summary of Research Work on HIN mining

- 100 papers analyzed- Seven main data mining tasks: - Similarity Measure - Clustering - Classification - Link Prediction - Recommendation - Others

Page 16: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

HIN Data Mining: Similarity Measure- Two approaches:

● Link-based; (Personalized PageRank [54], SimRank [55], etc. )

● Attribute-based (Feature value comparison using Jaccard coefficient, cosine similarity, etc.)

- Similarity on HIN: Considers meta path along with structure similarity.

● Two different meta path have different semantic meaning.

- Example: Find authors most similar to “Christos Faloutos”.

● APA says his students are most similar● APVPA (correctly) shows most similar in

the same field.

Other works:- PathSim uses symmetric metapaths [14]- RelSim uses metapaths to measure similarity in relations [59]- HeteSim measures multi-typed object relevance using arbitrary meta path [13][62]- Social Influence using object similarity + influence in HIN [67]

Page 17: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

HIN Data Mining: Clustering- Traditionally based on object features and done on homogeneous networks. Heterogeneity in object types makes the task harder. - Example: Cluster bibliographical dataset.

● Result in “sub-network” clusters, each pertaining to a particular research/cs domain.

● Clustering this way preserves information.

Rich information in HIN helps clustering by integration of additional information and/or improve learning tasks.- Attribute information integration using attribute incompleteness, vertex attributes, random fields, etc. - Text information integration: Ex: topic model of contents, clusters based on topics. - Integration with other mining tasks, such as ranking: Ex: ranking-based clustering, mutually enhancing ranking and clustering- Other information : Social influence based clustering based on connections and social activities

Page 18: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

HIN Data Mining: Classification

- Traditionally done on objects satisfying IID (may not hold in HIN)- Classification in HIN:

● Can classify multiple-type of objects simultaneously

● Metapath widely used in classification in HIN

- Example: ● 4 types of objects interlinked● Classification = process of knowledge

propagation. ● Deriving correlations among objects. ●

Other approaches- Represent meta path in latent space to label multiple nodes- Modeling mutual influence for multi-label classification- Mine multiple relationships for multi-label classification- Meta paths as feature generators (GNetMine [21], HetPathMine[99]) - Meta path based dependences for collective classification

Page 19: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

HIN Data Mining: Link Prediction- Challenge with HIN: Links to be predicted are of different types. Need to predict multiple types of links collectively. - Meta Path-based approaches

● Two-step process: 1) Extract meta-path based features; 2) Train regression/classification model to predict link. [23][24][110][111][112]

● PathPredict solves for co-authorship prediction using meta paths and logistic regression. [23]● Path based features to predict company organizational chart.

- Probabilistic models-based approaches● Predict links by modeling influence propagation between heterogenous relationships.

- Some work include link prediction across multiple HINs and dynamic link prediction such as predicting community members evolution

Page 20: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

HIN Data Mining: Recommendation

- Richer information/semantics of HIN make better for recommendations. - Constructing HIN for recommendation would help fuse all information, potentially utilized for the task. - Meta path is used well to explore relations between objects.

● HeteRecom finds similarities between movies based on semantic info on meta path. [43]

● SemRec is a personalized recommender system that builds a weighted HIN by using movie ratings on links. [48]

- Fusing heterogeneous information to help with recommendation

● Context-dependent matrix factorization models

Page 21: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

HIN Data Mining: OthersInformation Fusion:

● Process of merging information from heterogeneous sources with different conceptual, contextual and typographical representation.

● As seen in tasks such as data schema integration in DW, protein-protein interaction networks, ontology mapping in web semantics.

● Related work includes social network matching and various solutions for the alignment problem. ● Intuition: Fusing of HINs improve other previous covered tasks. (More contextual data)

Application System● Create systems with design based on HIN

○ System for exploring and analyzing a topical hierarchy constructed from an HIN○ Online social media spam detection system for social network security○ Malware detection (details in following paper)

Page 22: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Shortcomings / Future Mining The field of HIN and HIN mining is relatively young. Future consideration include:

● Integrating attribute values to build weighted HIN: Real networks may contain attribute values on links, and these attribute values may contain important information.

● Dynamic HIN: To represent and model time-series data.● Network construction for complex data: Semantically-rich RDF-based graph

(Management of objects and relations with so many types and meta paths) ● HIN with more descriptive meta path ● Methods to optimize/rank selection of meta path for data mining tasks

Page 23: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Heterogeneous Information Network to detect Malware in Android Apps

Paper: HinDroid: An Intelligent Android Malware Detection System Based on Structured Heterogeneous Information NetworkAuthors: Hou, S., Ye, Y., Song, Y., & Abdulhayoglu, M.ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2017

Page 24: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

BackgroundExample App: Locker.apk

Malicious: Once installed, victim is locked out from phone and is asked to pay a ransom.

Goal: Predict malicious apps published in Android.

Key methodology: Feature extraction and learning via HIN and meta-paths

Page 25: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Preliminary conceptsAndroid App

● Compiled and packaged as a single .apk file. ● Includes app code (.dex file), resources, assets and manifest file.● Dex (a Dalivik exec) file format has compiled code. (Unreadable)● Smali is a .dex assembler/disassembler

○ Provides code in “Smali code”

Page 26: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Smali Code

Page 27: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Feature Extraction: Common APIs across AppsFor each App, note down the APIs called in Smali code

- Parse smali code for API call extraction

Represent occurrence as a matrix, A:

Source:http://community.wvu.edu/~yaye/files/HinDroid_KDD2017_Slides_Ye.pdf

Page 28: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

FE: Relationship with APIs -> Code BlockFind APIs that occur in the same code block.

- Smali Code block markup: “.method” -> “.endmethod”

Represent occurrence as a matrix, B:

Source:http://community.wvu.edu/~yaye/files/HinDroid_KDD2017_Slides_Ye.pdf

Page 29: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

FE: Relationship with APIs -> PackageIntuition: API calls belonging to the same package show similar intent.

- API-[1-4] is part of Package 1- API-[5-8] is part of Package 2

Represent occurrence as a matrix, P:

Source:http://community.wvu.edu/~yaye/files/HinDroid_KDD2017_Slides_Ye.pdf

Page 30: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

FE: Relationship with APIs -> InvokeMethodsIntuition: API calls using same invoke method LIKE “Words having same part of speech”

Represent occurrence as a matrix, I:

Source:http://community.wvu.edu/~yaye/files/HinDroid_KDD2017_Slides_Ye.pdf

Page 31: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

HIN ConstructionRationale: |A| = {App, API}; |R| = {contains, codeblock, package, invokeMethod}

Page 32: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

HIN

Source:http://community.wvu.edu/~yaye/files/HinDroid_KDD2017_Slides_Ye.pdf

Page 33: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Meta-paths: Revision

There can be multiple API calls satisfying this particular meta-path constraint

Page 34: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Computing Similarities: Commuting Matrices

Page 35: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Commuting Matrices ExampleIf GApp, API is the matrix between Apps and API calls.

For meta-path: , commuting of apps is

, which is equal to AAT.

Therefore, given this matrix AAT, similarity between app ai and app aj is: aTi aj

This represents the dot product of two feature vectors.

Each feature vector for this meta-path matrix is simply “bag-of-APIs” for an app

Page 36: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Possible Meta-Paths

Source:http://community.wvu.edu/~yaye/files/HinDroid_KDD2017_Slides_Ye.pdf

Page 37: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Source:http://community.wvu.edu/~yaye/files/HinDroid_KDD2017_Slides_Ye.pdf

Page 38: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Source:http://community.wvu.edu/~yaye/files/HinDroid_KDD2017_Slides_Ye.pdf

Page 39: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Experiment SetupData: Two datasets from Comodo Cloud Security Center

● Android apps from Jan 30 2017 to Feb 5 2017:○ 1834 training samples (920 benign, 914 malicious)○ 500 test samples (198 benign, 302 malicious)

● One month of data: ○ 30,000 Android apps. ○ 50-50 split on benign and malicious.

Experiments: 1) Evaluate performance of proposed method; 2) Compare system against other classification models; 3) Compare against commercial mobile security products; and 4) Evaluate on Large and Real Sample from Industry

Page 40: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Source:http://community.wvu.edu/~yaye/files/HinDroid_KDD2017_Slides_Ye.pdf

Page 41: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Source:http://community.wvu.edu/~yaye/files/HinDroid_KDD2017_Slides_Ye.pdf

Page 42: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Source:http://community.wvu.edu/~yaye/files/HinDroid_KDD2017_Slides_Ye.pdf

Page 43: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Source:http://community.wvu.edu/~yaye/files/HinDroid_KDD2017_Slides_Ye.pdf

Page 44: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

ImpactHinDroid has already been incorporated into the scanning tool of Comodo’s Mobile Security Product. HinDroid has been used to predict the daily sample collection from Comodo Cloud Security Center.

HinDroid has been deployed and tested based on the real daily sample collection for around half a year (about 2,700,000 Android apps in total have either been trained or tested).

In practice, an anti-malware analyst has to spend at least 8 hours to manually analyze 40 Android apps for malware detection. Using the developed system HinDroid, the analysis of about 15,000 file samples can be performed within minutes with multiple servers. Cost Effective!

Page 45: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Bottom LineAs stated in the paper:

“HinDroid we use more expressive representation for the data, and build the connection between the higher-level semantics of the data and the final results.”

and..

“...more labels is not as important as the need of more expressive representations of data.”

Page 46: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Heterogeneous Networks to Study Critical Infrastructure Failure Cascades

Paper: HotSpots: Failure Cascades on Heterogeneous Critical Infrastructure NetworksAuthors: Chen, L., Xu, X., Lee, S., Duan, S., Tarditi, A. G., Chinthavali, S., & Prakash, B. A.ACM International Conference on Information and Knowledge Management (CIKM) 2017

Page 47: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

HN to study critical infrastructure failure cascades Domain: Critical infrastructure systems that provide power/electricity, water, communication etc., each of which is dependent on one another (in some way or the other).

Overall objective: Study cascading effects of failure of such systems and its effects on one another when CIs fail (more likely during a crisis such as blackouts, hurricanes, etc.)

Specific task: Find k such CI, the failure of which, would maximize failures across ALL CIs (or CI networks).

Problem with current efforts: 1) Work on one CI; 2) Don’t consider dynamics of the system, 3) Relatively simple models.

Page 48: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Study Design: Network CreationRepresenting CIs interdependencies as a heterogenous network.

*CI/components extracted from HSIP and EIA dataset from power systems and natural gas system. [1][3]

SEND GENERATED POWER VIA TRANSMISSION LINES

MOVE POWER TO SUBSTATIONS

DISTRIBUTE POWER TO LOCAL NATURAL GAS COMPRESSORS

SEND COMPRESSED NATURAL GAS VIA PIPELINES

DELIVER NATURAL GAS AS FUEL TO GENERATE POWER

Page 49: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Cascade Model: F-CasRULES OF CASCADE PER CI

- SUBSTATION: FAIL WHEN NO PATH TO POWER PLANT- GAS COMPRESSORS: FAIL WHEN CONNECTED SUBSTATION FAILS- POWER PLANTS: FAIL WHEN CONNECTED GAS COMPRESSORS - PIPELINE: Connection between power plants and gas compressors. DON’T DEPEND ON ANYBODY. NO CASCADING EFFECTS IN FAILURES. - TRANSMISSION: - NAIVE: BUILD CO-PARENT NETWORK, THEN IC MODEL - REAL: PROB. of FAIL BASED ON FAIL of PARENT VULNERABLE TO LOCAL FAILURES BEING AMPLIFIED

SYSTEM-WIDE (FURTHER GREATER FAILURES)

PROBLEM: HARD TO MODEL BEHAVIOR

Page 50: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Problem DefinitionProblem 1 (Max-Sub): Given heterogenous network, G, F-CAS, and value “k”:

Find the best set S* of “k” transmissions nodes to fail, such that the expected number of final failed substations are maximized.

S* = arg max E[#s | S]

#s = number of substations that would eventually fail, given initial failure set S.

Page 51: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Problem DefinitionProblem 2 (Max-SubBus): Given heterogenous network, G, F-CAS, and value “k”:

Find the best set S* of “k” transmissions nodes to fail, such that the expected number of final failed substations and transmission nodes/lines are maximized.

S* = arg max E[#s + #t | S]

#t = number of transmission nodes/lines that would eventually fail, given initial failure set S.

For both Max-Sub & Max-SubBus:

*Note: Max-Sub & Max-SubBus are NP-hard

Page 52: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Approach/MethodologyTwo scenarios: 1) No loop on failure cascade; 2) Loop on failure cascade

Estimating Pr(si|S) empirically is hard to optimize. Solution: Dominator Tree

Page 53: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Approach/Methodology: Scenario 1Scenario 1: No loop on failure cascade (Power plant -> substation failure)

Estimation of Pr(si|S) is based on probability of any transmission node (in the dominator tree path) failing. If any ti fails, si is bound to fail.

Given that,

Objective function for Max-Sub:

Objective function for Max-SubBus:

Main Contribution: Dominator-tree-based method for estimation can be solve near-optimally using greedy algorithm. (Otherwise, not possible).

Page 54: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Experiment 1: EffectivenessExperiment 1: Dataset: HSIP Gold data and EIA data for states: TN, PA, FL, OH

Page 55: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Experiment 2: ScalabilityHow scales as number of seeds k and size of network |V| changes

Page 56: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Case study 1: Estimate/Predict damage of hurricaneSetup: Overlay hurricane Sandy path with heterogeneous network G.

Estimate: Immediate impact/damage and predicted damage.

Study result, of predicting cascading loop trends, complemented existing hurricane assessment tools by including cascade effect.

Page 57: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Case study 2: 2003 NE Blackout Background: Initial study showed that over the course of a couple of hours since the first transmission line failure, many more failed causing a cascade of failures throughout southeastern Canada and 8 NE states.

Case Study: Heterogeneous network, G, overlapping Ohio to identify top 5 vulnerable/critical nodes.

Page 58: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Case study 2: ResultsInsights- OH map, top right node identified was indeed truly critical- Nodes identified should either be on large generation plants or on transmission lines- As seen in figure, identified nodes corresponded with areas of several converging lines or High Voltage lines.

Page 59: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Study Extendability via User Interface- UI provided to run simulations on finding critical nodes in various other maps. - This involves: - Generating the Heterogeneous networks. - Running cascade simulations - Getting real-time failure statistics via visualizations.

Page 60: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Impact and Bottom Line - HotSpots algo, HIN generation toolkit and F-CAS model are first attempt to

analyze upto 5 different critical infrastructures.- Adding additional components easy.- Methods capture path-based and neighbor-based failure conditions.- Path-based failure cascading not restricted to transmission networks and is

applicable to wide range of CI systems.

Page 61: Heterogeneous - Virginia Techpeople.cs.vt.edu/.../prashant-hetero-networks.pdf · Topic: Heterogenous Information Networks Outline - Paper introducing the field of HIN mining - Two

Reflections: Lessons Learnt- What are heterogenous networks: 1) Network structure and 2) Rich semantic meaning of structural types of objects and links

- Types of datasets that have been represented via HINs

- Various graph mining algorithms that have been designed for HINs

- More specifically, how heterogeneous representation has helped:

● Predict/Classify malicious Android apps, and ● Identify a subset of critical infrastructures, the failure of which would have the

biggest catastrophic impact on availability of vital resources