8. Machine Learning - fbi · 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard...
Transcript of 8. Machine Learning - fbi · 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard...
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014 1
8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt – University of Applied Sciences
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Retrospective Natural Language Processing
• Name and explain different areas of NLP
• What are the “7 levels of language understanding“?
• What is tokenizing, sentence splitting, POS tagging, and parsing?
• What do language resources offer to NLP? Give examples
• What do NLP frameworks offer? Give examples
• What do NLP services offer? Give examples
2
Agenda
• Overview
• ML Applications
• ML Tasks
• ML Approaches
• ML Tools
• Services / Product Map
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014 3
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
What is Machine Learning (ML)?
4
Generating a model based on inputs and using it for making decisions or predictions
( rather than programming instructions explicitly )
Agenda
• Overview
• ML Applications
• ML Tasks
• ML Approaches
• ML Tools
• Services / Product Map
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014 5
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Applications of ML: Spam filtering
• Task: classify new e-mails as spam or not spam
6
Spam filter
New e-mails
Automatically classified
Manually classified
Corrections
ML input
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Stock market analysis
• Task: make recommendations on buying and selling stocks
7
Prediction
Current stock values
History of stock values
ML input
Recommendation
Decision
Image source: Wikimedia
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Detecting credit card fraud
• Task: Detect fraud in credit card payments
8
Fraud detection
CC payments
Automatically classified
Manually classified
Corrections
ML input
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Recommender systems
• Task: Recommending customers suitable products
9
Recommender system
Order
Recommendation of related products
ML input
Purchasing behaviour of other customers or customer groups
Agenda
• Overview
• ML Applications
• ML Tasks
• ML Approaches
• ML Tools
• Services / Product Map
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014 10
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Categories of ML tasks
• P.S. Other categorizations / groupings are possible
11
Machine Learning Task
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Classifi-cation
Regression Clustering Feature
selection / extraction
Topic modeling
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Categories of ML tasks
• Given: Example inputs and desired outputs
• Goal: Learn a general rule that maps inputs to outputs
Supervised learning
• Given: Data inputs (e.g., documents)
• Goal: Find structure in the inputs
Unsupervised learning
• Setting: An agent interacts with a dynamic environment in which it must perform a goal
• Goal: Improving the agent‘s behaviour
Reinforcement learning
12
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Supervised learning subcategories
• Given: Training inputs (records) which are divided into two or more classes
• Goal: Produce model to classify new inputs
• Examples: spam filter, fraud detection, …
Classification
• Given: Training data (records) with continuous (not discrete) output values
• Goal: Produce model to predict output values for new inputs
• Example: stock value prediction
Regression
13
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Unsupervised learning subcategories
•Given: Set of input records
•Goal: Identifying clusters (groups of similar records)
•Example: Customer grouping Clustering
•Given: Set of input records with attributes („features“)
•Goal: Find a subset of the original attributes that are equally well suited for classification / clustering tasks
Feature selection / extraction
•Given: Set of text documents
•Goal: Find abstract topics that occur in several documents and classify documents accordingly
Topic modeling
14
Agenda
• Overview
• ML Applications
• ML Tasks
• ML Approaches
• ML Tools
• Services / Product Map
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014 15
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Decision Tree Learning
• Used for supervised learning
(classification, regression)
• Training input: Training data
(records) with output values
(discrete or continuous
• Learning result: decision tree that
allows classifying / predicting output
values of new data records
• Example (figure): Decision tree for
classfying passengers on the Titanic
in survived / died
16 Image source: Wikipedia
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Artificial Neural Networks (ANN)
• Inspired by brain / nervous system:
- Neurons connected via dentrites
- Reduce resistance if fired repeatedly
• Artificial Neuron:
- Weighted inputs
- Function, e.g., weighted sum
- Filter, e.g, threshold output
• Artificial Neural Network (ANN):
- Input layer, output layer, and possibly
intermediate layers of neurons
- Training phase: weights are adjusted via
known cases
- Regognition phase: output is produced for
new cases
17 Source: Ivan Galkin, U. MASS Lowell ( http://ulcar.uml.edu/~iag/CS/Intro-to-ANN.html )
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Bayesian Networks
• Directed acyclic graph (DAG) with:
- Nodes: random variables
+ probability function
- Edges: conditional dependencies
• Example:
- Probablility of rain
- Sprinkler is turned on if it hasn‘t rained for a while
- Grass is wet if it is raining or the sprinkler is turned on
• Bayes Network inference allows answering questions like:
- What is the probability that it is raining, given the grass is wet?
- What is the impact of turning the sprinkler on?
18
Source: http://en.wikipedia.org/wiki/Bayesian_network
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Inductive Logic Programming
• Given:
- Set of logic facts (background knowledge), e.g.
male(Tom), female(Eve), parent (Tom, Eve)
- Positive and / or negative examples, e.g.,
daughter (Eve, Tom)
• Learning goal:
- General rules that are consistent with the examples and the
background knowledge, e.g.,
parent(p1, p2) and female(p2) daughter(p2, p1)
19
George
Tom Mary
Helen
Nancy
Eve
parent
male female
Agenda
• Overview
• ML Applications
• ML Tasks
• ML Approaches
• ML Tools
• Services / Product Map
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014 20
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
WEKA
21
http://www.cs.waikato.ac.nz/ml/weka/
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
Tasks supported by WEKA
• Numerous approaches for supervised and unsupervised learning
22
•Choose and modify the data being acted on Preprocess
•Train and test learning schemes that classify or perform regression Classify
•Learn clusters for the data Cluster
•Learn association rules for the data Associate
•Select the most relevant attributes in the data Select attributes
•View an interactive 2D plot of the data Visualize
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
WEKA Datasets
• Collection of examples
• Each instance consists of attributes
• Attribute types:
- Nominal (enumeration)
- Numeric (real or integer number)
- String
• Example:
23
@relation golfWeatherMichigan_1988/02/10_14days
@attribute outlook {sunny, overcast, rainy}
@attribute windy {TRUE, FALSE}
@attribute temperature real
@attribute humidity real
@attribute play {yes, no}
@data
sunny,FALSE,85,85,no
sunny,TRUE,80,90,no
overcast,FALSE,83,86,yes
rainy,FALSE,70,96,yes
rainy,FALSE,68,80,yes
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
WEKA GUI
24
Agenda
• Overview
• ML Applications
• ML Tasks
• ML Approaches
• ML Tools
• Services / Product Map
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014 25
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
ML Services Map
26
ML libraries
ML services
ML development environments / frameworks
IDEs and frameworks for experimenting with
different ML approaches and
configuring solutions
Web services for for experimenting with
different ML approaches and
configuring solutions
Algorithms for classification, regression, clustering, feature selection / extraction, tropic modelling, etc. using different approaches, e.g., decision tree learning, Artificial Neural Networks, Bayes networks, inductive logic
programming, Support Vector machines, Hidden Markov Chains, etc.
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
ML Product Map
27
ML libraries
ML services
ML development environments / frameworks
bigml, wise.io, procog, ersatz, …
Eblearn, OpenNN, aisolver, CURRENNT, …
WEKA, Orange, Shogun, scikt-learn, …
Prof. Dr. Bernhard Humm, Darmstadt University of Applied Sciences. www.fbi.h-da.de/~b.humm. 18.11.2014
ML product map (table)
28
Product ML library ML development environment / framework
ML service
Java Neural Network Framework Neuroph
x x
Fast Artificial Neural Network Library
x
eblearn x
Jaden x x
OpenNN - Open Neural Networks Library
x
aisolver x
CURRENNT x
WEKA x x
Orange x x
Shogun x x
scikit-learn x x
bigml x
wise.io x
procog x
ersatz x