2012: Natural Computing - The Grand Challenges and Two Case Studies
-
Upload
leandro-de-castro -
Category
Internet
-
view
144 -
download
0
description
Transcript of 2012: Natural Computing - The Grand Challenges and Two Case Studies
Natural Computing: The Grand Challenges and
Two Case Studies
Leandro Nunes de Castro [email protected]
@lndecastro
Computing and Informatics Faculty &
Graduate Program in Electrical Engineering
Natural Computing Laboratory (LCoN)
www.mackenzie.br/lcon.html
1
• Natural Computing
– An Overview
– The Grand Challenges in Natural Computing Research
• Case Studies
– Social Media Mining
– Mining Association Rules for Recommender Systems
• Discussion
2
Summary
Natural Computing
An Overview*
3
* de Castro, L. N. (2007), “Fundamentals of Natural Computing: An Overview”, Physics of Life Reviews, 4(1), pp. 1-36.
• 1940s: Study of automatic computing;
• 1950s: Study of information processing;
• 1960s: Study of phenomena surrounding computers;
• 1970s: Study of what can be automated;
• 1980s: Study of computation;
• 2000s: Study of information processes, both natural and artificial.
4
Computing: Yesterday, Today and Tomorrow*
* Denning, P. (2008), “Computing Field: Structure”, In B. Wah (Ed.), Wiley Encyclopedia of Computer Science and Engineering, Wiley Interscience.
5
From the early days of computer science, by the 1940s, researchers
have been interested in tracing parallels and designing
computational models and abstractions of natural phenomena.
The GCs aim at defining research questions that tend to be important in the long term, identifying and characterizing potential grand research problems. These may allow the formulation of projects capable of producing major scientific advancements, with practical applications for society and technology. Emphasis is in advancing science, a vision beyond specific projects, a clear and objective success evaluation and a great ambition.
6
The Grand Challenges (GCs)
Theoretical Works
Empirical Works
Natural Computing
Mathematical Models
Bioinspiration
Computational Synthesis of Natural
Phenomena
Computing with Natural Materials
Natural Computing: The Old View
Natural Computing: The New Perspective
Natural Computing
Computer Modeling of
Nature
Nature-Inspired
Computing
Computer Synthesis of
Natural Phenomena
Computing with New Materials
Natural computing is a
science concerned with
the investigation and design of information processing in natural and
computational systems.
Natural Computing
The Grand Challenges*
9
* de Castro, L. N.; Xavier, R. S.; Pasti, R.; Maia, R. D.; Szabo, A.; Ferrari, D. G. (2012), "The Grand Challenges in Natural Computing Research: The Quest for a New Science", Int. J. Nat. Comp. Res., 2(4), p. 16.
10
Natural Computing
Biology
Physics
Chemistry
Computer Science
Natural Computing
Biology
Physics
Chemistry
Computer Science
Multidisciplinarity
Interdisciplinarity
11
Natural Computing
Biology
Physics
Chemistry
Computer Science
GC 1: How to transpose Natural Computing into a transdisciplinary context?
12
“Computer science differs from physics in that it is not actually a science. It does not study natural
objects. Neither is it mathematics. It’s like engineering – about getting to do something, rather than dealing with
abstractions”.* “Biology is today an information
science”** * Feynman, R. P. (1996), “The Feynman Lectures on Computation”, In A. J. G. Hey and R. W. Allen (Ed.), (Reading, MA: Addison-Wesley). ** Denning, P. J., (2001) (Ed.), The Invisible Future: The Seamless Integration of Technology in Everyday Life, McGraw-Hill.
13
GC 2: What is the Natural Computing role in this Informational Natural Sciences Era?
Overcoming this challenge will bring two important benefits to Computing and Nature: • A Rethinking (and probably Redesign) of Computing • A New Form of Interacting With and Using Nature
14
Natural systems are open systems that communicate with the environment presenting a complex and emergent
behavior. Complex biological systems must be modeled as self-referential, self-
organizing, and auto-generative systems whose computational behavior goes far
beyond the TM/VN paradigm. The system restructures itself in a hardware-software non-dissociable interaction: the hardware
defines the software, and the software defines the hardware.
15
Are there standards to design (engineer) natural computing systems?*
GC 3: To what degree defining standards for the engineering of Natural Computing systems is a limiting factor for the creative
development of the field? * Brueckner, S. A.; Serugendo, G. D. M.; Karageorgos, A.; Nagpal, R., (2005), Engineering Self-Organizing Systems, Lecture Notes in Artificial Intelligence, 3464, Springer. * de Castro, L. N. (2001), Immune Engineering: Development and Application of Computational Tools Inspired by Artificial Immune Systems, Ph. D. Thesis presented at the Computer and Electrical Engineering School, Unicamp, Brazil. * Fernandez-Marquez, J. L.; Serugendo, G. D. M.; Montagna, S.; Viroli M.; Arcos J. L (2012), “Description and Composition of Bio-Inspired Design Patterns: A Complete Overview”, Natural Computing, Online, DOI 10.1007/s11047-012-9324-y. * Nagpal, R.; Mamei, M. (2004), “Engineering Amorphous Computing Systems”, Multiagent Systems, Artificial Societies, and Simulated Organizations, 11, Part V, pp. 303-320.
Case Studies
Applied Research
16
Web Mining
Social Media
17
18
110 billion minutes spent in social networks
13 years = 50 million people
9 months = 100 million users
250 million tweets/day
(Nielsen, 2011)
(Alé, 2012)
(Alé, 2012)
(Datasift, 2012)
Data and Social Media
19
Qualitative analysis of tweets.
Methodology based on text mining, natural language processing and ontologies for Sentiment Analysis (SA).
Word Sense Disambiguation (WSD).
Research Focus
Social Media Analysis Tool
Text Mining; NLP; Web Semantics
Context Twitter
20
Social media and Microblog.
Messages (tweets) with up to 140 characters.
Stimulates simultaneous activities.
Informal, allows the creation of new terms, slangs, mix
of languages, ironies.
Twitter Features
21
Text Mining
Semi- or unstructured data
Data Mining
Structured Data
Unstructured Data Analysis
• Tokens •Stopwords removal • Stemming • Representation • Term (feature) selection
• Association • Classification • Clustering
• APIs • Crawlers
•Confusion Matrix • Accuracy • Precision • Recall • F-measure
22
Text Analysis t1 t2 tc
d1 w11 w12 ... w1c
d2 w21 w22 ... w2c
... ... ... ... ...
dN wN1 wN2 ... wNc
Vector Space Model
23
Objeto
Entrar
Trancar
Porta
Molho
Guardar
Abrir
Pessoa
Presidente
Ditador
Hugo
Venezuela
Pessoa
SBT
Madruga
Kiko
Chiquinha Bruxa do
71
TV
Girafales
Chaves
In Portuguese
24
Sentiment Analysis:
Text classification based on the author’s opinion.
Word Sense Disambiguation:
Polysemic word: different meanings in different contexts.
Word Sense Disambiguation: appropriate meaning to a text with polysemic words.
WSD: words are classified according with a predefined set of meanings.
Research Focus
25
Predicted Class
Correct Class
Positive Negative
Positive TP FN
Negative FP TN
FNTP
TP
P
TPTPR
TNFP
FP
N
FPFPR
FNTNFPTP
TNTPACC
TPFP
TP
Pr
TPFN
TP
Re
ered
eredlevantecision
covRe
covReRePr
levant
eredlevantcall
Re
covReReRe
Interest Measures
26
Context-Based Word Sense Disambiguation (CBWSD):
Polysemic words: e.g. Chaves, Estrelas, Na Brasa, Agora é tarde.
Context (semantic graph): OntoGeneral; OntoSpecific.
Classification based on the semantic graph.
Sentiment analysis based on Emoticons, Ontologies and Natural Computing:
Need to train the classifier.
Emoticon: graphic representation of a facial expression.
Example: :) :( :| :D
Ontology: concepts and their relations within a domain.
Case Study: Social TV
27
Materials and Methods: CBWDS
Tweets about “Agora é tarde”:
Total: 6030 tweets
Period: 6-7 July 2012 (24 hours).
Generation of the Semantic Graph.
Case Study: Social TV
• INCLUDE NEW RESULTS
28
Partial Results Without the Neutral Class
Predicted Class Measure Result Measure Positive Negative Positivo Negativo ACC 0.9580 Precision 0.9558 0.0544
Correct
Class
Positive 2877 0 TPR 1 Recall 1 0.5521
Negative 133 164 FPR 0.4478 F-measure 0.9774 0.0991
Total: 142766 ms - Per tweet: 36 ms
Neutral as Positive
Predicted Class Measure Result Measure Positive Negative Positive Negative ACC 0.9689 Precision 0.9741 0.0318
Correct
Class
Positive 5015 33 TPR 0.9934 Recall 0.9934 0.5521
Negative 133 164 FPR 0.4478 F-measure 0.9837 0.0602
Total: 118310 ms - Per tweet: 30 ms
Mining Association Rules for Recommender Systems
Artificial Immune Systems
29
• Discovery of association relations between items (attributes) in transactional databases.
30
Association Rules
Milk Bread
Cereals Butter
Milk Biscuit
Cereals Chocolate
Bread Coffee
Eggs Sugar Bread Coffee
Yogurt Sweetener
• Given a set of transactions, where each transaction is a set of items, na association rule is a rule X Y in which X and Y are itemsets.
• Concepts:
– Coverage or support: number of transactions for which the prediction rule is correct.
– Accuracy or confidence: number of objects that the rule predicts correctly, proportionally to the instances to which it applies.
support(A B) = P(A B) = (Freq. of A and B) / (Total of T).
confidence(A B) = P(B|A) = (Freq. of A and B) / (Freq. of A).
31
Association Rules
The problem of mining association rules corresponds
to finding all the rules that satisfy a minimal support and
confidence.
32
33
Evolutionary Design of ARs
• Approaches:
– Pittsburgh: each individual represents the whole set of rules.
– Michigan: each individual represents a single rule, and the whole population composes the set of rules.
• Encoding scheme: A B C D E F G H
11 00 01 10 00 11 10 00
00: antecedent 11: consequent 01 ou 10: not part of the rule
• Comprehensibility:
• Interestingness:
• Operators:
– Binary encoding allos the use of standard operators, such as single-point mutation and crossover.
34
Interest Measures and Operators
C1(R) = log(1 + |C|)/log(1 + |A C|).
I(R) = (|A C|/|A|) * (|A C|/|C|) * (1(|A C|/|D|)).
C2(R) = log(1 + |C|) + log(1 + |A C|).
35
Algorithms Evaluated
procedure [P] = eGA(pc,pm,pe,D)
initialize P
f := evaluate(P,D);
P := select(P,f,pe);
while not_stopping_criterion do,
P := reproduce(P,f,pc);
P := variate(P,pm);
f := evaluate(P,D);
P := select(P,f,pe);
t := t+1;
end while
end procedure
procedure [P] = CLONALG1-2(D,max_it,n1,n2)
initialize P
t := 1;
while t >= max_it do,
f := evaluate(P);
P1 := select(P,n1,f)**;
C := clone(P1,f);
C := mutate(C,f);
f1 := evaluate(C1);
P1 := select(C1,n1,f1);
P := replace(P,n2);
t ← t + 1;
end while
end procedure Evolutionary
Immune
• SPECT Heart database from UCI.
36
Case Study: Recommendation for a Synthetic Dataset
Apriori eGA CLONALG1 CLONALG2
Support 0.35 ± 0.04 0.37 ± 0.03 0.46 ± 0.02 0.37 ± 0.02
Confidence 0.65 ± 0.16 0.86 ± 0.05 0.94 ± 0.01 0.92 ± 0.01
Compreheensibility 1 0.54 ± 0.06 0.50 ± 0.05 0.50 ± 0.01 0.46 ± 0.02
Compreheensibility 2 0.14 ± 0.03 0.14 ± 0.01 0.13 ± 0.00 0.14 ± 0.01
Interestingness 0.35 ± 0.08 0.35 ± 0.08 0.30 ± 0.00 0.26 ± 0.03
Unique Rule 17 ± 0.00 1.60 ± 0.60 1.50 ± 1.50 6.40 ± 2.30
Processing Time 6.5s ± 0.00 4.5s ± 1.01 9.3s ± 1.13 9.3s ± 1.16
37
Case Study: Recommendation for E-Commerce
Apriori eGA CLONALG1 CLONALG2
Support 0.024 0.009 ± 0.002 (0.006; 0.014) 0.013 ± 0.002 (0.011; 0.016) 0.012 ± 0.003 (0.007; 0.016)
Confidence 1.000 1.000 ± 0.000 (1.000; 1.000) 1.000 ± 0.000 (1.000; 1.000) 1.000 ± 0.000 (1.000; 1.000)
Compreheensibility 1 0.800 0.770 ± 0.028 (0.744; 0.826) 0.787 ± 0.021 (0.747; 0.822) 0.811 ± 0.022 (0.774; 0.843)
Compreheensibility 2 0.030 0.684 ± 0.001 (0.682; 0.685) 0.087 ± 0.030 (0.035; 0.136) 0.110 ± 0.024 (0.059; 0.139)
Interestingness 0.994 0.997 ± 0.000 (0.997; 0.997) 0.982 ± 0.018 (0.941; 0.997) 0.997 ± 0.000 (0.997; 0.997)
Processing Time 639.026 s 82.281 s 112,636 s 99.116 s
Discussion
Natural Computing: The Past, Present and Future
38
• Focus on: – Designing novel nature-inspired algorithms.
– Synthesizing natural phenomena.
– Using natural materials for computing.
• Real-world applications are unquestionable, but the field seems to be stuck on the same types of algorithms.
• Researchers are taking efforts to look at and formalize information processing in natural and computational systems.*
39
The Past and Present
* Zenil, H. (2012) (Ed.), A Computable Universe: Understanding Computation & Exploring Nature as Computation, World Scientific.
• Grand Challenges for the field:
– Transforming Natural Computing into a Transdisciplinary Discipline.
– Unveiling and Harnessing Information Processing in Natural Systems.
– Engineering Natural Computing Systems.
40
And the Future?
Thank You! Questions? Comments?
Leandro Nunes de Castro
http://slideshare.net/lndecastro
@lndecastro
www.mackenzie.br/lcon.html
www.computacaonatural.com.br
41