Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng...
-
Upload
darleen-walton -
Category
Documents
-
view
214 -
download
1
Transcript of Nguyễn Phạm Luân Tiến 50702449 Trần Đình H ươ ng Trà 50702573 D ươ ng Bách Tùng...
Nguyễn Phạm Luân Tiến 50702449Trần Đình Hương Trà 50702573
Dương Bách Tùng 50702839
Content
1• Introduction about OLAP
Systems.
2• Security requirement in OLAP
Systems.
3• Some Security Issues.
Introduction of OLAP Systems
Nowaday database is used in two main context:
1.OLTP: On-Line Transaction Processing
2.OLAP / DS: On-Line Analytical Processing / Decision
Support
OLTP vs OLAPOLTP OLAP
Function Constanly handling Decision support
Database design
Applications – Oriented Subjects – Oriented
Data Now, update, detail,… History, aggregation of multidimensions
Access Read / Write / Index Review many times
Unit of work Short single transactions Complex queries
# Record access
k . 10 k . 106
# User k . 103 k . 102
Database size 100 Mb – GB 100 Gb – Tb
Data WarehouseA data warehouse (DW) is a database used for reporting. The data is offloaded from the operational systems for reporting. DW collect data in support of manager’s decesion – making process.
Subjects – oriented Integrated Time – variant Non – volatile
Subject OrientedData is categorized and stored by business subject rather than
by application.
OperationalOperational SystemsSystems
Savings
Shares
Loans
Insurance
EquityPlans
CustomerProduct, Sales
Information
CustomerProduct, Sales
Information
Data Warehouse Data Warehouse Subject AreaSubject Area
Integrated
Data WarehouseData WarehouseOperational EnvironmentOperational Environment
Subject = CustomerSubject = Customer
SavingsApplication
Current Accounts
Application
LoansApplication
NoNo
ApplicationApplication
FlavorFlavor
Time VariantData is stored as a series of snapshots, each representing a
period of time.
DataTime
01/97
02/97
03/97
Data for January
Data for February
Data for March
Data Data WarehouseWarehouse
Non Volatile Typically data in the data warehouse is not updated
or deleted.
ReadRead
LoadLoad
INSERT ReadINSERT Read
UPDATEUPDATE
DELETEDELETE
Operational DatabasesOperational Databases Warehouse DatabaseWarehouse Database
OLAPIn computing, online analytical processing, or
OLAP is an approach to swiftly answer multi-dimensional analytical queries.
The OLAP database is usually updated in batch, often from multiple sources which most people want from their applications is consistently fast response time.
OLAP is a protocol for processing business data. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling.
OLAP SERVICES
OLAP ARCHITECTURESPopular architectures of OLAP systems include ROLAP
(relational OLAP) and MOLAP (multidimensional OLAP).
1)ROLAP provides a front-end tool that translates multidimensional queries into corresponding SQL queries to be processed by the relational backend.
2)MOLAP does not rely on the relational model but instead materializes the multidimensional views.
3)Using MOLAP for dense parts of the data and ROLAP for the others leads to a hybrid architecture, namely, the HOLAP or hybrid OLAP.
ROLAP
ColumnsColumns
RowsRows
TableTable
Key values to joinKey values to join
KEY IN ROLAP
TimeTime
ProductProduct
StoreStore
Single ColumnSingle ColumnTime KeyTime Key
Single ColumnSingle ColumnProduct KeyProduct Key
Single ColumnSingle ColumnStore KeyStore KeyCompositeComposite
KeyKey
Star schema – 4 dimensions
Snowflake schema
MOLAP
MOLAP
MOLAP
all
time item city supplier
time,item time,city
time,supplier
item,city
item,supplier
city,supplier
time,item,location
time,item,supplier
time,city,supplier
item,city,supplier
time, item, city, supplier
0-D(apex) cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D(base) cuboid
Geography
Product
Item
Type
Category
All
City
State
Country
All Time
Month
Year
Day
Week
All
Quarter
Geography
Product
Item
Type
Category
All
City
State
Country
All Time
Month
Year
Day
Week
All
Quarter
SalesYear to date ($millions)
ProductsTime
Q1 Q2
ElectronicsToys
ClothingsCosmetics
$5.2$1.9$2.3$1.1
ElectronicsToys
ClothingsCosmetics
$8.9$0.75$4.6$1.5
Store 1Store 2
SalesYear to date ($millions)
ProductsQ1
Store 1 Store 2
ElectronicsToys
ClothingsCosmetics
$5.2$1.9$2.3$1.1
$8.9$0.75$4.6$1.5
ElectronicsToys
ClothingsCosmetics
$8.9$0.75$4.6$1.5
Store 1Store 2
Relational Multidimentsional
Data representation
Two dimensions Multiple dimenstions
Data extraction Specific rows Specific dimensions
Computations Functions High speed matrix
Results Tool specific Matrix
HOLAP
OLAP ARCHITECTURESMOLAP ROLAP HOLAP
Underlying data storage
Cube Relational Table
Relational Table
Aggregative data storage
Cube Relation Table Cube
Productivity of Queries Fastest Slowest Fast
Consumption of storage space
High Low Normal
Maintenance cost High Low Normal
Security requirement in OLAP Systems
OLAP system heavily depends on aggregates of data.
They are very vulnerable to indirect inferences of protected data.
Threat of Inferences It is illustrated through 4 Examples:
1. 1 Dimensional Inference (1-d Inference)
2.Multi-Dimensional Inference (m-d Inference) with SUM only.
3.M-d Inference with MAX only.4.M-d Inference with SUM, MAX and MIN.
One dimensional Inference(1-d Inference):
Security requirement in OLAP Systems
Suggest that adversary :
• Can’t access the cuboid <Employee,Quarter> but is allowed to access <Department,Quarter>.
• Knows the Empty cells ‘ value through the outbound channels. Then he can infer that <Bob,Q1> as exactly same value as <A1,Q1>.
Organization
Multi-Dimensional Inferences( m-d Inference) with SumSecurity requirement in OLAP Systems
Suggest that adversary can:• Only Access to <Department,Quarter> and <Employee,Year>.• Know the empty cells ‘ value through out the outbound channels.
A m-d inference is possible as follow:• He first sum the cells <Bob,Y1> and <Alice,Y1> then subtract the cells <A1,Q2> and <A1,Q3>. The final result yeilds a sensitive cell: <Bob,Q1>.
<Bob,Q1> = 1500.
Time
Organization
Multi-Dimensional Inferences( m-d Inference) with MaxSecurity requirement in OLAP Systems
Now, adversary don’t know the value of the empty cells( core cuboid is full of unknown values). The cube will be free of inferences with the SUM aggregations. Can make a m-d inference with MAX aggregations as follow:
- MAX values in cells <Janny,Y1> and <A1,Q4>( that is 6000 and 5000).
- From here, he can infer 1 of 3 cells <Janny, Q1>, <Janny,Q2> or <Janny,Q3> is 6000.
- Neither <Janny,Q2> nor <Janny,Q3> can be 6000.
<Janny,Q1> = 6000
Multi-Dimensional Inferences( m-d Inference) with Sum, Max and Min:
Security requirement in OLAP Systems
Security requirement in OLAP SystemsMulti-Dimensional Inferences( m-d Inference) with
Sum, Max and Min:
• Now suppose that adversary can ask queries using SUM, MAX, MIN on the data cube.
• Following last example, he can infer <Janny,Q1> = 6000.
• SUM, MAX, MIN of <Janny,Y1> are 11000, 6000, and 5000.
• From here, he can infer that <Janny,Q2>,<Janny,Q3> <Janny,Q4> must be 5000 and 2 zeros.(but don’t know exactly).
• With the SUM, MAX, MIN of <A1,Q2>, <A1,Q3> and <A1,Q4> , he can concludes that <Janny,Q4> must be 5000 and the others are zeros.
A security solution for OLAP systems must combine access control and inference control to remove threats.
A practical solution must achieve a balance among following objectives:Security
- Sesitive data should be guarded from both unauthorized accesses and malicious inferences.
Applicability- The solution should not rely on any unrealistic assumptions and should cover a wide range of scenarios without the need for significant modifications.
Security requirement in OLAP SystemsRequirement
Effeciency- Queries should be answered in a matter of seconds or
minutes.- A desired security must be computationally efficient, especially with respect to on-line overhead.
Availability- Data should be available to legitimate users who have sufficient privileges.
Practicality- The solution should not demand significant modifications to the existing infrastructure of an OLAP system.
The main challenge is the inherent tradeoff between above objectives.
Security requirement in OLAP Systems
Some Security Issues
Three-Tier Security
Architeture
Securing Data Cubes in OLAP
Systems
Sum-Only Data Cubes
Generic Data Cubes
Three-Tier Security Architecture
Security in statistic databases usually has 2 tiers:Sensitive Data.
Inference ControlAggregation Queries.
Inference Control mechanisms are used to check each aggregation query to decide whether answering the query.
Through the previously answered queries, many protected data may be disclosed.
Applying two-tier architecture to OLAP has some inherent drawbacks:Checking queries for inferences at run time
may cause unacceptable delay to processing queries.
The complexity of this checking is usually high.
Inference control methods can’t take advantage due to the special characteristic of OLAP system.
Three-Tier Security Architecture
This Architecture has:3 tiers.3 relations.3 properties satified by
aggregation tier.
Three-Tier Security ArchitectureUser
Queries
Pre-defined Aggregations
Data Set
Access
Control
InferenceControl
Securing Data Cubes in OLAP Systems
SUM-Only Data Cubes:As an inherited limitation of statistical
databases, Only SUMs are considered.Only core cuboid is considered as sensitive.2 methods :
Cardinality-BasedMethod. Parity-BasedMethod.
Cardinality-Based MethodNumbers of Empty Cells.
The existance of 1-d inferences only be determined in 2 cases:Core cuboid has no empty cell.Core cuboid of any data cube has fewer non-
empty cells than the given upper bound 2k-1 * d max.
Securing Data Cubes in OLAP Systems
Securing Data Cubes in OLAP SystemsCardinality-Based Method
Numbers of Empty Cells.
1-d Inferences:Core cuboid has no empty cell.Core cuboid of any data cube has fewer non-
empty cells than the given upper bound 2k-1 * d max.
Securing Data Cubes in OLAP SystemsCardinality-Based Method
M-d Inferences:Core cuboid has no empty cell.Data cube is free of inferences if it has fewer
empty cells than the given upper bound.Data cube having more empty cells than the
given bound always has inferences.Upper bound :
2(du − 4)+2(dv − 4) − 1du, dv are the 2 smallest among di values.di ‘s are values of attribute ith in core
cuboid.
Securing Data Cubes in OLAP SystemsCardinality-Based Method
Above results can beused to compute inference-free aggregations based on the three-tier architecture.
Data tier corresponds to core cuboid.
The aggregation tier corresponds to a collection of cells in aggregation cuboids that are free of inferences.
The query tier includes any query that can be rewritten using the cells in the aggregation tier.
Parity-Based MethodBased on a simple fact that even number is
closed under the operation of addition and subtraction.
Suppose now all the sets of queries include even number of cells.
Adding and subtracting these sets to get one cell would be more difficult .
Securing Data Cubes in OLAP Systems
Securing Data Cubes in OLAP Systems• Parity-Based Method
X1+X2+X3+X4+X5+X6X1+X2X4+X5X5+X6X3+X5
X5 =<Q3,Allice>= 2500
Securing Data Cubes in OLAP Systems
• Parity-Based Method• If a set of queries (set 2) is derivable from another set(set 1) then the answer of the set 2 can be computed using the answer of the set .
If set 1 is free of inference then set 2 is so.
• To detect inferences caused by sets of MDR queries (Q*), we find another collection of queries that are equivalent to Q* and whose inferences are easier to detect.
Securing Data Cubes in OLAP Systems
• Parity-Based Method
Securing Data Cubes in OLAP Systems
• Parity-Based Method
This method can be enforced based on the three-tier inference control architecture described earlier:
• A partition of the core cuboid based on dimension hierarchies composes the data tier.
• The parity-based method is applied to each block in the partition to compute the aggregation tier.
• The query tier includes any query that is derivable from the aggregation tier.
Generic Data Cubes
A method that does not directly detect inferences, but prevents m-d inferences and then removes 1-d inferences.
It’ s able to deal with datacubes with generic aggregation types.Access Control.Lattice-Based Inference Control.
Securing Data Cubes in OLAP Systems
Access ControlLimit access control to the core cuboid is not
always appropriate.Values in aggregation cuboids may also
carry sensitive information.
Securing Data Cubes in OLAP Systems
Access ControlDescribe a Object:
Function Below() partitions data cube along the dependency lattice.
Function Slice() partitions data cube along dimensions.An Object is simply the intersection of two.
Example: Object (L,S) ,L = <year,employee> and S includes all the cells in the first four quarters of the core cuboids(<Quarter,Employee>).
Securing Data Cubes in OLAP Systems
Lattice-Based Inference ControlGiven two set of cells in a data cube ( S and
T): Cell c is redundant to T if S includes c and it’s
ancestors in any single cuboid. Cell c is non-comparable to T if for every c’ ∈ T, c is
neither ancestor or descendant of c’.
Securing Data Cubes in OLAP Systems
Lattice-Based Inference ControlConsider an Object(L,S):
This object is the union of the cuboids in Below(L).
Let T be the object and S be it’s complement to the data cube.
To remove inferences from S to T, we find a subset of S that is free of m-d inferences to T.
Securing Data Cubes in OLAP Systems
Lattice-Based Inference Control
Securing Data Cubes in OLAP Systems
After m-d inferences are prevented,need to remove 1-d inferences.
Procedure to remove 1-d inferences: Check each cell and add those that cause 1-d
inferences to the object so they will be prohibited by access control.
We control m-d inferences to this new object by applying the last resultsRepeat these steps, we remove all 1-d inferencesFinal set of cells are free of inferences to the object.
Securing Data Cubes in OLAP SystemsLattice-Based Inference Control
Securing Data Cubes in OLAP SystemsLattice-Based Inference Control
This method can be implemented based on the three-tier security model:
The authorization object computed through the above process comprises the data tier.
The complement of the object is the aggregation tier because it does not cause any inferences to the data tier.
And the user are free to input queries to the query tier.
THANK YOU !