Post on 27-Mar-2015
Reasoning and Assessing Trust in Uncertain Information using
Bayesian Description Logics
Achille Fokoue, Mudhakar Srivatsa (IBM-US)Rob Young (dstl-UK)
ITA BootcampJuly 12, 2010
2
Sources of Uncertainty:(Accuracy, Stochasticity and Beyond)
Decision Making under Uncertainty
•Coalition warfare
– Ephemeral groups (special forces, local militia, Medecins Sans Frontieres, etc) with heterogeneous trust levels respond to emerging threats
• Secure Information Flows
– Can I share this information with an (un)trusted entity?
– Can I trust this piece of information?
Information Flow in Yahoo!
Limitations of traditional approaches
• Coarse grained and static access control information– Rich security metadata [QoISN’08]– Semantic knowledgebase for situation awareness (e.g., need-to-
share) [SACMAT’09]
• Fail to treat uncertainty as a first-class citizen– Scalable algorithms and meaningful query answering semantics
(possible worlds model*) to reason over uncertain data [submitted]
• Lack of explanations– Provide dominant justifications to decision makers [SACMAT’09’]– Use justifications for estimating info credibility [submitted]
[QoISN’08: IBM-US & RHUL][SACMAT’09: IBM-US, CESG & dstl]
[submitted (https://www.usukitacs.com/?q=node/5401): IBM-US & CESG]
[submitted: IBM-US & dstl]
Our approach in a nutshell
•Goal: More flexible and situation aware decision support mechanisms for information sharing
•Key technical principles– Perform late binding of decisions (flexibility)
– Shareability/trust in information is expressed as logical statements over rich security metadata and a semantic KB• Domain specific concepts and relationships• Current state of the world
– Logical framework that supports explanations that• Allow a sender to intelligently downgrade information (e.g., delete
participant list in a meeting) • Allow a recipient to judge the credibility of information
Architecture
• A Global Awareness Module continually maintains and updates a knowledge base encoding, in a BDL language, the relevant state of the world for our application – (e.g., locations of allied and
enemy forces)
• A hybrid reasoner is responsible for making decisions on information flows– The reasoner provides
dominant explanation(s) over uncertain data that justifies the decision
• This architecture is replicated at every decision center
Global Awarenes
s
Global Awarenes
sBDL KBBDL KB
BDL Reasoner
BDL Reasoner
Rich MetadataRich Metadata
Rules & Policy
Rules & Policy
JustificationsJustifications
DL: Semantic Knowledgebase [SACMAT’09: IBM-US, CESG, dstl]
• SHIN Description logics (OWL)– Very expressive decidable
subset of first order logic– Reasoning intractable in the
worst-case, but• SHER (Scalable Highly
Expressive Reasoner) good scalability characteristics in practice
– DL KB consists of:• TBox: terminology box
Description of the concepts and relations in the domain of discourse. Extension of KANI ontology
• ABox: extensional part. Description of instance information
ABox
Extended KANI TBox
Traditional approaches to deriving trust from data
• Drawbacks of a pure DL based approach [SACMAT 09]– Does not account for uncertainty– Trust in information and sources given – not derived from data, history
of interactions
• Limitations of traditional approaches to deriving trust in data– Assumes pair-wise numeric (dis)similarity metric between two entities:
• e.g., eBay recommendation, Netflix ratings
– Lack of support for conflicts spanning multiple entities: e.g.,• 3 Sources: S1, S2,S3• Ax1 = all men are mortal• Ax2 = Socrates is a man • Ax3 = Socrates is not mortal
– Lack of support for uncertainty in information
Bayesian Description Logics (BDL)
• Challenge 1: How to scalably reason over inconsistent and uncertain knowledgebase?
• BDL experimental evaluation on an open source DL reasoner shown to scale up to 7.2 million probabilistic axioms
• Pellet (a state of the art DL reasoner) broke down at 0.2 million axioms
• Pronto (probabilistic reasoner) uses an alternate richer formulation, but does not scale beyond a few dozen axioms
• Challenge 2: What is a meaningful query answering semantics for uncertain knowledgebase
• Possible worlds model* (concrete definition in paper)
Bayesian Description Logics (BDL)
• Challenge 3: How to efficiently compute justifications over uncertain data?
• Sampling
• Challenge 4: How to use justifications?• Assess the credibility of information sources (trust-
based decision making) • Intelligently transform data to make it shareable
[TBD]
Notation: Bayesian Network
•V: set of all random variables in a Bayesian network
•V = {V1, V2}
•D(Vi): set of all values that Vi can take
•D(V1) = D(V2) = {0, 1}
•v: assignment of all random variables to a possible value
•v = {V1 = 0, V2 = 1}
•v|X (for some X V): projection of v that includes random variables in X
•v|{V2} = {V2 = 1}
•D(X) (for some X V): Cartesian product of domains D(Xi) for all Xi in X
Notation: BDL•Probabilistic knowledge base K = (A, T, BN)
• BN = Bayesian network over a set V of variables
• T = { : X = x}, where is a classical Tbox axiom; annotates with X =x
• X V• x in D(X)• e.g., Road SlipperyRoad : Rain = true
• A= { : X = x}, where is a classical Abox axiom
: p, where p [0, 1] assigns a probability value directly to a classical axiom
: Xnew = true,
• Xnew new independent random boolean variable
BDL: Simplified Example
• TBox:• SlipperyRoad OpenedRoad
HazardousCondition• Road SlipperyRoad : Rain = true
• ABox: • Road(route9A)• OpenedRoad(route9A) : TrustSource = true
• BN has three variables: Rain, TrustSource, Source• PrBN(TrustSource = true | Source = Mary) = 0.8
• PrBN(TrustSource = true | Source = John) = 0.5
• PrBN(Rain = true) = 0.7
• PrBN(Source = John) = 1
• Informally, the probability values computed through the Bayesian network is propagated to the DL side as follows
BDL: Simplified Example
• Primitive event e: Each assignment v for all random variables in BN (e.g., {Rain = true, TrustSource = false, Source = John}) corresponds to a primitive event e (or a scenario or a possible world)
• Each primitive event e is associated with
• A probability value (PrBN(V=v)) through BN
• and to a set of classical DL axioms (Ke) annotated with compatible annotations (e.g., SlipperyRoad OpenedRoad HazardousCondition, Road SlipperyRoad, Road(route9A))
• Intuitively the probability value associated with an statement (e.g., HazardousCondition(route9A)) is obtained by summing the probabilities of all primitive events e such that the classical KB Ke entails (see full definition in paper)
Handling Inconsistent KBs
BDL: Query Answering Semantics
Scalable Query Answering
Experimental Evaluation
•SHER – A Highly Scalable SOUND and COMPLETE Reasoner for large OWL-DL KB– Reasons over highly expressive ontologies– Reasons over data in relational databases – Highly scalable
•Can scale to more than 60 million triples
•Semantically index 300 million triples from the medical literature.
– Provide explanations
•PSHER – Probabilistic extension to SHER using BDL
Scalability via Summarization (ISWC 2006)
C1
M1
H1
isTaughtBy
C2
M2
H2
Original ABox
likes likes
P1
P2
Summary
M’
H’
likes
P’
C’
Legend: C – Course P - Person M - ManW – WomanH - Hobby
C’{C1, C2}
isTaughtBy
• The summary mapping function f that satisfies the constraints:– If any individual a is an explicit member of a concept C in the original
Abox, and f(a) is an explicit member of C in the summary Abox.– If a≠b is explicitly in the original Abox, then f(a) ≠f(b) is explicitly in
the summary Abox.– If a relation R(a, b) exists in the original ABox, then R(f(a), f(b)) exists
in the summary.• If the summary is consistent, then the original Abox is consistent
(converse is not true).
isTaughtBy isTaughtByisTaughtBy
isTaughtBy
TBox:Functional (isTaughtBy)Disjoint (Man, Woman)
Results: Scalability
20
• UOBM benchmark data set (university data set)
• PSHER has sub-linear scalability with # axioms• Exact query answering (computing exact pr for ground
substitutions) is very expensive
• State of art reasoner (Pellet) broke down on UOBM-1
Results: Response Time
21
• PSHER performs well on threshold queries• 99.5% of
answers were obtained in a few 10s of seconds
• Further enhancements• PSHER is
parallelizable
Traditional approaches to deriving trust from data
• Assumes pair-wise numeric (dis)similarity metric between two entities:–e.g., eBay recommendation, Netflix ratings
• Lack of support for conflicts spanning multiple entities: e.g.,–3 Sources: S1, S2,S3–Ax1 = all men are mortal–Ax2 = Socrates is a man –Ax3 = Socrates is not mortal respectively
• Lack of support for uncertainty in information
Can I trust this information?
At the command and control center PSHER detects inconsistency (justifications point to SIGINT Vs agent X) SIGINT is deemed more trusted by the decision maker Cautiously reduce trust in information source X
Decision maker weighs in the severity of a possible biological attack and performs “what if” analysis (what if X is compromised? What if sensing device (SIGINT) had a minor glitch?, which information should be considered and which information should be discarded?)
Courtesy: E.J. Wright and K. B. Laskey. Credibility Models for Multi-Source Fusion. In 9th International Conference on Information Fusion, 2006
Overview
• Encode information as axioms in a BDL KB• Detect inconsistencies and weighted justifications using possible
world reasoning• Use justifications to assess trust in information sources• trust scoring mechanism
– Weighted scheme based on prior trust (belief) in information sources and weight of justification
Characteristics of the trust model
• Security:– robust to shilling– robust to bad-mouthing
• Scalability:– scale with the volume of information and the
number of information sources
• security-scalability trade-off– Cost of an exhaustive justification search– Cost of a perfectly random uniform sample
Trust Assessment: Degree of unsatisfiability
• Probabilistic Socrates’ example:– Axiom1: p1, Axiom2: p2, Axiom3: p3– 8 possible worlds (power set)
• Only one inconsistent world: {Axiom1, Axiom2, Axiom3}
– Probability measure of a possible world derived from the join prob. distribution of BN
• Pr({Axiom1, Axiom2}) = p1*p2*(1-p3)
– Degree of Unsatisfiability
• DU = p1*p2*p3
• Trust value of a source S: Beta(α,β)– α (reward) : function of non conflicting interesting axioms– β (penalty): function of conflicting axioms
• Compute justifications of K = (A, T, BN)– J (A,T)– (J, BN) is consistent to the degree d’<1– For all J’ s.t. J’ J, (J’, BN) consistent to the degree 1⊂
• How to assign penalty to sources involved in a justification?– Probability measure, weight(J), of a justification J : DU((J,BN))
– Penalty(J) proportional to weight(J)– Penalty(J) distributed across sources contributing axioms to J
inverse proportionally to their previous trust value
Trust Assessment: Justification Weight
Security-Scalability Tradeoff
• Impracticality of computing all justifications– Exhaustive exploration of Reiter Search Tree
• Alternative approach: unbiased sampling– Malicious source cannot systematically hide conflicts
• Retaining the first K node in the Reiter Search not a solution:– The probability π(vd) the node vd in the path < v0, v1, …., vd > to be selected
is– π(vd) = ∏ (1/|vi|)
• Tradeoff : select node vi with probability min(β/π(vi), 1) with β > 0
Experimental evaluation
Summary
•Decision Support System for Secure Information Flows
– Uncertainty: support inconsistent KB and reason over uncertain information
– Derived trust values from data
– Flexibility: e.g., sensitivity of tactical information decays with space, time and external events
– Situation-awareness: e.g., encodes need-to-know based access control policies
– Supports for explanations : support intelligent information downgrade and provenance data for “what if” analysis
THANKS!
Contact: Achille FokoueEmail: achille@us.ibm.com
Scenario
•Coalition: A & B
•Geo location G={G1, …,G4}
•A’s operations described in the table
Summarization effectiveness
Ontology Instances Role Assertions
I R A
Biopax 261,149 582,655 81 583
UOBM-1 42,585 214,177 410 16,233
UOBM-5 179,871 927,854 598 35,375
UOBM-10 351,422 1,816,153 673 49,176
UOBM-30 1,106,858 6,494,950 765 79,845
NIMD 1,278,540 1,999,787 19 55
ST 874,319 3,595,132 21 183
I – Instances after summarizationRA – Role assertions after summarization
Filtering effectiveness
Ontology Instances Role Assertions
I R A
Biopax 261,149 582,655 38 98
UOBM-1 42,585 214,177 280 284
UOBM-5 179,871 927,854 426 444
UOBM-10 351,422 1,816,153 474 492
UOBM-30 1,106,858 6,494,950 545 574
NIMD 1,278,540 1,999,787 2 1
ST 874,319 3,595,132 18 50
I – Instances after filteringRA – Role assertions after filtering
Refinement (AAAI 2007)
• What if summary is inconsistent?– Either,
• Original ABox has a real inconsistencyOr,• ABox was consistent but the process of summarization introduced
fake inconsistency in the summary
• Therefore, we follow a process of Refinement to check for real inconsistency
• Refinement = Selectively decompress portions of the summary• Use Justifications for the inconsistency to select portion of
summary to refine– Justification = minimal set of assertions responsible for inconsistency
• Repeat process iteratively till refined summary is consistent or justification is “precise”
Refinement: Resolving inconsistencies in a summary
C1
M1
H1
isTaughtBy
C2
M2
H2
C3
W1
Original ABox
likes likes
P1
P3
P2
Summary
M’
H’
likes
P’
C’
W’
isTaughtBy
Legend: C – Course P - Person M - ManW – WomanH - Hobby
M’
H’
likes
P’
Cx’
W’
isTaughtBy
Cy’
M’
likes
Px’
Cx’
W’
isTaughtBy
Cy’
Py’
H’
After 1st Refinement After 2nd Refinement – Consistent Summary
Summary is inconsistent
Summary still inconsistent!
C’{C1, C2, C3}
Cx’{C1, C2} Cy’{C3}
isTaughtBy
isTaughtBy
Py’{P3}Px’{P1, P2}
isTaughtBy isTaughtBy isTaughtByisTaughtByisTaughtBy isTaughtBy
TBox:Functional (isTaughtBy)Disjoint (Man, Woman)
isTaughtBy isTaughtBy isTaughtByisTaughtBy
C1
M1
H1
isTaughtBy
C2
M2
H2
C3
W1
Original ABox
likes likes
P1
P3
P2
Summary
M’
H’
likes
P’
C’
W’
isTaughtBy
Legend: C – Course P - Person M - ManW – WomanH - Hobby
M’
H’
likes
P’
Cx’
W’
isTaughtBy
Cy’
M’
likes
Px’
Cx’
W’
isTaughtBy
Cy’
Py’
H’
After 1st Refinement After 2nd Refinement – Consistent Summary
Summary is inconsistent
Summary still inconsistent!
C’{C1, C2, C3}
Cx’{C1, C2} Cy’{C3}
isTaughtBy
isTaughtBy
Py’{P3}Px’{P1, P2}
Sample Q: PeopleWithHobby?
Not(Q)
Not(Q)
Not(Q)
Solns: P1, P2
Px’
Not(Q)
Not(Q)
Refinement: Solving Membership Query (AAAI 2007)
TBox:Functional (isTaughtBy)Disjoint (Man, Woman)
isTaughtBy isTaughtBy isTaughtBy
isTaughtByisTaughtByisTaughtBy isTaughtBy
Results : Consistency Check
Ontology Instances Role Assertions Time for consistency check (in s)
Biopax 261,149 582,655 2.3
UOBM-1 42,585 214,177 2.9
UOBM-5 179,871 927,854 5.4
UOBM-10 351,422 1,816,153 5.1
UOBM-30 1,106,858 6,494,950 7.9
NIMD 1,278,540 1,999,787 0.8
ST 874,319 3,595,132 0.4
Results: Membership Query Answering
Ontology Type assertions Role Assertions
UOBM-1 25,453 214,177
UOBM-10 224,879 1,816,153
UOBM-30 709,159 6,494,950
Reasoner Dataset Avg. Time (in sec)
St. Dev (in sec)
Range
(in sec)
KAON2 UOBM-1 21 1 18 - 37
KAON2 UOBM-10 448 23 414 - 530
SHER UOBM-1 4 4 2 - 24
SHER UOBM-10 15 26 6 - 191
SHER UOBM-30 35 63 12 - 391