Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG...

32
Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Transcript of Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG...

Page 1: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Multidimensional analysis modelfor a document warehousethat includes textual measures

KIM JEONG RAE

UOS.DML. 2015.11.27.

1

Page 2: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Introduction

Author Martha Mendoza, Erwin Alegria, Manuel Maca, Carlos Cobos, Elizabeth Leon

Location Information Technology Research Group(GTI), etc. Colombia

Title Multidimensional analysis model for a document warehouse that includes textual

measures

Document Type Decision Support Systems 72(2015) 44-59

Date February 2015

2

Page 3: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Contents

Abstract Analysis Model

Proposed document warehouse model Multi-dimensional model

Textual measures and aggregation function

OLAP document visualization

Conclusion Evaluation results

3

Page 4: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Abstract(1/2)4

Motivation Business systems are increasingly required to handle substantial quantities of unstruc-

tured textual information.

Problem To manage unstructured text data stored in data warehouses

Approach The new multi-dimensional analysis model is proposed that includes textual measures

as well as a topic hierarchy.

The textual measures that associate the topics with the text documents are generated by Probabilistic Latent Semantic Analysis, while the hierarchy is created automatically using a clustering algorithm.

Page 5: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Abstract(2/2)5

Result The model gained an increasing acceptance with use, while the visualization of the

model was also well received by users.

Contribution This paper proposes a multidimensional model that incorporates textual.

The model allows documents to be queried using OLAP operations.

Page 6: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Proposed document warehouse model6

Four main Processes

Page 7: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Proposed document warehouse model7

Topic Hierarchy Building ① Two algorithms process

Cosme(step1)

Modified IGBHSK(Iterative Global-Best Harmony Search K-means algorithm)

Page 8: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

8

Topic Hierarchy Building ① Modified IGBHSK(Iterative Global-Best Harmony Search K-means algorithm) : Three levels

Proposed document warehouse model

Page 9: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

9

Topic Hierarchy Building ① IGBHSK algorithm[Ref.#2] for Topic hierarchy

Proposed document warehouse model

Page 10: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Proposed document warehouse model10

Probabilistic measures calculation ②

Page 11: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

11

Probabilistic measures calculation ② PLSA(Probabilistic Latent Semantic Analysis) algorithm [Ref.#24]

A Probability model given a set of documents with words

P(d|z) : the probabilities of the topics in the document

P(w|z) : the probabilities of the words in the topics

EM(Expectation Maximization) algorithm[Ref.#6,17]

Proposed document warehouse model

Page 12: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Proposed document warehouse model12

ETL(Extract-Transform-Load) ③

Page 13: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Multi-dimensional model13

Relational DB

Schema

Page 14: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Multi-dimensional model14

Standard dimensions Document dimension : name, document type

Author dimension : name, email

Date dimension : publish date

Location dimension : city, country

Word dimension : all words from the stored document set

Topic dimension : Topic hierarchy

M-M relationships Author-Group Bridge, Topic-Document-Group Bridge, Topic-Word-Group Bridge

Measures of the fact table and the topic and word dimension bridge tables Topics_Probab_TM : A average Probability of Topics

Documents_TM : Probabilities of a Document within topics

Word_Probab_TM : Probabilities of a word within topics

Page 15: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Proposed document warehouse model15

Multidimensional cube building ④

Page 16: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Textual measures and aggregation function16

Topic_Probab_TM Measure

R : the number of documents recovered by the query

A : the total number of distinct topics in the documents recovered in AM

Page 17: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Textual measures and aggregation function17

Documents_TM Measure

: each row in the query

B : the total number of distinct documents recovered in the query

m : the number of topics in the Topic dimension

Page 18: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Textual measures and aggregation function18

Word_Probab_TM Measure

: each row in the query

B : the total number of distinct words recovered in the query

m : the number of topics in the Topic dimension

Page 19: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

OLAP document visualization19

Topics_Probab_TM : Document dimension - Type of Document

Page 20: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

OLAP document visualization20

Topics_Probab_TM : Date Dimension - year

Page 21: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

OLAP document visualization21

Topics_Probab_TM : Document type(rows) and year attribute(columns)

Page 22: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

OLAP document visualization22

Topics_Probab_TM : Attribute of year and Document type Slice – “Journal Article”

Page 23: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

OLAP document visualization23

Topics_Probab_TM : Attribute of year and Document type and author name Dice operation

Page 24: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

OLAP document visualization24

Document_TM : each Topic and Document

Page 25: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

OLAP document visualization25

Document_TM : each Topic and year and Document

Page 26: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Conclusion - Evaluation results26

Execution time results

Page 27: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Conclusion - Evaluation results27

Execution time results

Page 28: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Conclusion - Evaluation results28

User satisfaction results Statistical frequency analysis

Page 29: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Conclusion - Evaluation results29

User satisfaction results Multivariate analysis

Page 30: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Thank you

30

Page 31: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Proposed document warehouse model31

Results Cosme : XML file(Metadata)

Page 32: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1.

Proposed document warehouse model32

Result IGBHSK