Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล
Transcript of Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล
![Page 1: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/1.jpg)
The First NIDA Business Analytics and Data Sciences Contest/Conferenceวันที่ 1-2 กันยายน 2559 ณ อาคารนวมินทราธิราช สถาบันบัณฑิตพัฒนบริหารศาสตร์
https://businessanalyticsnida.wordpress.comhttps://www.facebook.com/BusinessAnalyticsNIDA/
โดย รศ. ดร. โอม ศรนิล สาขาวิชาวิทยาการข้อมูลคณะสถิติประยุกต์ สถาบันบัณฑติพฒันบรหิารศาสตร์
Text Mining in Business Intelligence
การท าเหมืองข้อความท าไดอ้ย่างไร มีหลักการอย่างไรท าเหมืองข้อความภาษาไทยได้หรอืไม่
เราจะประยุกต์ใช้การท าเหมืองข้อความกับธุรกิจได้อย่างไรต้องเขียนโปรแกรมเป็นหรือไม่หากจะท าเหมืองข้อความ
ท าเหมืองข้อความแล้วจะได้ความรู้อะไรบา้ง
นวมินทราธิราช 3003 วันที่ 1 กันยายน 2559 9.30-10.00 น.
![Page 2: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/2.jpg)
TEXT MINING IN BUSINESS INTELLIGENCE
OHM SORNIL, Ph.D. Department of Computer Science, NIDA
![Page 3: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/3.jpg)
BUSINESS INTELLIGENCE
“the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.”
(H. P. Luhn, 1958)
“a set of techniques and tools for the acquisition and transformation of raw data into meaningful and useful information for business analysis purposes.”
(D. M. Turner, 2016)
![Page 4: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/4.jpg)
UNSTRUCTURED DATA
◉ Unstructured data is like Text, video, a voice recording of a customer service transaction
◉ Generally accepted maxim is that structured data represents only 20%. The rest is unstructured.
◉ If it can be counted, it can be analyzed.
◉ If it can be analyzed, it can be interpreted.
![Page 5: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/5.jpg)
Source: http://www.csc.com/insights/flxwd/78931-big_data_universe_beginning_to_explode
![Page 6: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/6.jpg)
JUST MARKETING TERMS
◉ Text mining = Text analytics = Natural language processing (NLP)
◉ A move from university research to real-world business problems
![Page 7: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/7.jpg)
Internal◉ Company documents◉ Emails◉ Reports◉ Media releases◉ Customer records and communication
SOURCES OF TEXTUAL DATA
External◉ News◉ Websites◉ Blogs◉ Social media posts
![Page 8: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/8.jpg)
CHALLENGES
◉ Text is generally unstructured◉ Large quantities and increasing rapidly◉ Noisy (e.g., typoerrors, slangs, informal words, etc.) ◉ Synonymy and polysemy
![Page 9: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/9.jpg)
TEXT MINING
◉ Process of extracting interesting information or patterns from unstructured text
◉ An interdisciplinary field: computational linguistics, statistics, and machine learning
◉ Can lead to the development of new opportunities in business
![Page 10: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/10.jpg)
Business Applications
![Page 11: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/11.jpg)
CUSTOMER RELATIONSHIP MANAGEMENT (CRM)
Input◉ Text documents produced from
a variety of sources in contact centers
Output◉ Contents of client’s messages ◉ Routing specific requests to the
appropriate service◉ Supplying immediate answers to
the most frequently asked questions
![Page 12: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/12.jpg)
OPINION ANALYSIS
Output◉ Frequency of words mentioned is an indicator for concept salience, e.g., “unbreakable”, “fragile”
◉ Frequency of co-occurrence represents the strength of connection in the customer‘s mind, e.g., <“Samsung”, “camera”>, <“iPhone”, “expensive”>
Input◉ Customers’ messages in websites, blogs, Tweeter,
Facebook, etc.
![Page 13: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/13.jpg)
MEDICAL RECORD ANALYSIS
Input◉ Doctors’ comments
Output◉ An early warning regarding
specific diseases
If frequency of “lungs” or “breathing” appears more than 45 appearances in the last 30 days for a given ZIP code or region, it can be a clue to excessive environmental conditions which are resulting in respiratory problems. A proactive intervention can be activated to remedy the situation.
![Page 14: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/14.jpg)
SENTIMENT ANALYSIS
Input◉ Customers’ messages in
websites, blogs, Tweeter, Facebook, etc.
Output◉ Positive, negative or neutral
opinions/feelings (polarity) expressed by a writer in a document collection
![Page 15: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/15.jpg)
SENTIMENT ANALYSIS (FEATURE-BASED)
![Page 16: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/16.jpg)
EMOTIONAL STATE CLASSIFICATION
SOURCE: http://emotion-research.net/toolbox/toolboxlabellingtool.2006-09-26.9095478150
https://annaszymanska1324161.wordpress.com/2014/04/28/very-emotional-research/
![Page 17: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/17.jpg)
HUMAN RESOURCE MANAGEMENT
Input◉ Staff’s opinions◉ CVs from applicants
Output◉ Level of employee satisfaction◉ Selection of new personnel
![Page 18: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/18.jpg)
INSURANCE CLAIM DIAGNOSIS
Input◉ Note of all the details related to
the claim/health issues in the form of a brief description
Output◉ Identified a common group of
problems
![Page 19: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/19.jpg)
CORPORATE FINANCE
Input◉ Publicly available descriptions of any startups' business
- products/services, investors and social links between individuals in 2 firms
Output◉ Targets for mergers and acquisitions
Source: http://phys.org/news/2016-07-text-mining-intelligence-startups.html#jCp
![Page 20: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/20.jpg)
INVESTMENT
Input◉ Security related newsfeed
Output◉ A model to predict movements of markets for everything
from government bonds to commodities.
![Page 21: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/21.jpg)
MEANINGThe key is to capture the meaning of text.
![Page 22: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/22.jpg)
TEXT MINING PROCESS
Text Sources Preprocessing
Presentation(Visualization/
Browsing)Modeling
![Page 23: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/23.jpg)
COMMON PREPROCESSING
◉ Extracting text◉ Tokenization◉ Stopword elimination: is, am, are, the, of, for, … (http://www.ranks.nl/stopwords/thai-stopwords)
◉ Stemming: run, runs, ran, running run
![Page 24: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/24.jpg)
TEXT REPRESENTATION FOR MINING
![Page 25: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/25.jpg)
INVERSE DOCUMENT FREQUENCY
SOURCE: http://nlp.stanford.edu/IR-book/pdf/06vect.pdf
![Page 26: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/26.jpg)
TF-IDF TERM WEIGHTING
![Page 27: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/27.jpg)
REAL-VALUED VECTOR
![Page 28: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/28.jpg)
COSINE SIMILARITY BETWEEN 2 VECTORS
![Page 29: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/29.jpg)
WORD CO-OCCURRENCE STRENGTH
◉ Mutual Information (MI) between words x and y
![Page 30: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/30.jpg)
ADD-ON COMPONENTS
◉ WordNet◉ Feature selection/reduction
![Page 31: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/31.jpg)
WordNet
◉ WordNet is essentially Dictionary + Thesaurus Relations: hyponymy, meronymy, antonymy
![Page 32: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/32.jpg)
TASK SPECIFIC COMPONENTS
◉ Part-of-Speech (POS) tagging
◉ SentiWordNet- Results of automatic annotation of all synsets of WordNet
according to the notions of “positivity”, “negativity” and “neutral”
◉ Emoticons
![Page 33: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/33.jpg)
MINING ALGORITHMS
◉ General machine learning algorithms are applicable
Classification
Naïve Bayes
Support Vector Machine
Bayesian Network
Neural Network
Logistic Regression
etc.
Clustering
K-means
Fuzzy C-means
Hierarhical Clustering
Self-Organizing Map
etc.
Association Analysis
and Sequence Analysis
Apriori
Generalized Rule Induction
Influential Apriori
FP-Growth
etc.
![Page 34: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/34.jpg)
Analysis Tasks
![Page 35: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/35.jpg)
GENERAL DATA MINING TASKS
◉ Classification◉ Clustering◉ Association Analysis◉ Prediction◉ Sequence Analysis
![Page 36: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/36.jpg)
INFORMATION EXTRACTION
![Page 37: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/37.jpg)
Analytics Tools with Text Mining Capabilities
![Page 38: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/38.jpg)
OPEN SOURCED SOFTWARE
SOURCE: http://www.predictiveanalyticstoday.com/top-free-software-for-text-analysis-text-mining-text-analytics/
R package TM
![Page 39: Text Mining in Business Intelligence โดย รศ.ดร.โอม ศรนิล](https://reader031.fdocuments.net/reader031/viewer/2022021417/58754ba41a28abb8208b75f7/html5/thumbnails/39.jpg)
COMMERCIAL SOFTWARE
SOURCE: http://www.predictiveanalyticstoday.com/top-free-software-for-text-analysis-text-mining-text-analytics/