Title: Agent Intelligence through Data Mining · Title: Agent Intelligence through Data Mining ......
Transcript of Title: Agent Intelligence through Data Mining · Title: Agent Intelligence through Data Mining ......
Title: Agent Intelligence through Data Mining
Presenters:
Andreas L. Symeonidis Pericles A. Mitkas Department of Electrical and Computer Engineering
Aristotle University of Thessaloniki
& Laboratory of Intelligent Systems and Software Engineering,
Informatics and Telematics Institute / CERTH
Thessaloniki, Greece Contact Info:
Department of Electrical and Computer Engineering
Aristotle University of Thessaloniki -54124
Thessaloniki, Greece
E-mail: [email protected]
Tel.: +30 2310 99 6399
Fax: +30 2310 99 6398
E-mail: [email protected]
Tel.: +30 2310 99 6390
Fax: +30 2310 99 6398
THE 17TH EUROPEAN CONFERENCEON MACHINE LEARNING
AND THE 10TH EUROPEAN CONFERENCE ON PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES
~ECML/PKDD 2006~
September 18th, 2006, Berlin, Germany
References
1. Ackley D.H. & M.L. Littman, 1992. “Learning from Natural Selection in an Artificial Environment” in Artificial Life II Video Proceedings, C.G. Langton, Ed.. Redwood City, California, Addison-Wesley..
2. Adriaans P. & D. Zantige, 1996. Data Mining. Addison-Wesley. 3. Agent Working Group, 2000. Agent Technology Green Paper, Object Management
Group. 4. Agrawal R., C. Aggarwal, & V. Prasad, 1999. "A tree projection algorithm for
generation of frequent itemsets", in Proceedings of High Performance Data Mining Workshop, Puerto Rico.
5. Agrawal R. & R. Srikant, 1994. "Fast algorithms for mining association rules", in Proceedings of the 20th VLDB Conference, Santiago, Chile, pp. 487-499.
6. Agrawal R. & R. Srikant, 1995. "Mining Sequential Patterns", in Proceedings of the International Conference on Data Engineering (ICDE), Taipei, Taiwan.
7. Amir A., R. Feldman, & R. Kashi, 1997. "A new and versatile method for association generation", Information Systems, vol. 22, no. 6-7, pp. 333-347.
8. Arthur B.W., 1994. "Inductive Reasoning and Bounded Rationality", American Economic Review, vol. 84, no. 2, pp. 406-411.
9. Athanasiadis I.N. & P.A. Mitkas, 2004. "An agent-based intelligent environmental monitoring system", Management of Environmental Quality, vol. 15, no. 3, pp. 229-237.
10. Bellifemine F., A. Poggi, & G. Rimassa, 2000. "Developing multi-agent systems with JADE", in Seventh International Workshop on Agent Theories, Architectures, and Languages, Boston MA.
11. Bigus J.P., 1996. Data Mining with Neural Networks Solving Business Problems from Application Development to Decision Support. Mc Graw-Hill.
12. Booker L., D.E. Goldberg, & J.H. Holland, 1989. "Classifier systems and genetic algorithms ", Artificial Intelligence, vol. 40, no. 1-3, pp. 235-282.
13. Bossel H., 1977. "Orientors of Nonroutine Behavior" in Concepts and Tools of Computer-Assisted Policy Analysis, H. Bossel Ed. pp. 227-265. Basel: Birkhauser, Verlag.
14. Bousquet F., C. Cambier, & P. Morand, 1994. "Distributed Artificial Intelligence and Object-Oriented Modelling of a Fishery", Mathematical Computation Modelling, vol. 20, no. 8, pp. 97-107.
A
B
15. Caglayan A., C. Harrison, & C.G. Harrison, 1997. Agent Sourcebook: A Complete Guide to Desktop, Internet, and Intranet Agents. John Wiley & Sons.
16. Carlsson C. & E.Turban, 2002. "DSS: directions for the next decade", Decision Support Systems, vol. 33, pp. 105-110.
17. Caswell H., 1989. Matrix population models: Construction, analysis, and interpretation. Sunderland, MA: Sinauer Associates.
18. Chen M.S., J. Han, & P.S. Yu, 1996. "Data Mining: An Overview from a Database Perspective", IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 866-883.
19. Chen Z., 1999. Computational Intelligence for Decision Support. CRC Press, Boca Raton.
20. Choy K.L., W.B. Lee, & V. Lo, 2002. "Development of a case based intelligent customer-supplier relationship management system", Expert Systems with Applications, vol. 23, no. 3, pp. 281-297.
21. Choy K.L., W.B. Lee, & V. Lo, 2003. "Design of an intelligent supplier relationship management system: a hybrid case based neural network approach", Expert Systems with Applications, vol. 24, no. 2, pp. 225-237.
22. Crist T.O. & J.W. Haefner, 1994. "Spatial Model of Movement and Foraging in Harvester Ants (Pogonomyrmex) (II): The Roles of Environment and Seed Dispersion", Journal of Theoretical Biology, vol. 166, pp. 315-323.
23. Davenport T.H., 2000. "The future of enterprise system-enabled organizations", Information Systems Frontiers, vol. 2, no. 2, pp. 163-180.
24. Dean J., 1998. "Animats and what they can tell us", Trends in Cognitive Sciences, vol. 2, no. 2, pp. 60-67.
25. DeAngelis D. L.& L.J. Gross, 1992. Individual-based models and approaches in ecology: Populations, communities and ecosystems. Chapman and Hill, New York.
26. Durrett R. & S.A. Levin, 1994. "Stochastic spatial models: A user's guide to ecological applications", Philosophical Transactions of the Royal Society of London, vol. 343, (B), pp. 329-350.
27. Epstein J.M. & R.L. Axtell, 1996. Growing Artificial Societies: Social Science from the Bottom Up. The MIT Press, Washington.
28. Farquhar A., R. Fikes, & J. Rice, 1996. "The Ontolingua Server: A tool for Collaborative Ontology Construction", Knowledge Systems Laboratory, Stanford University, Technical Report KSL-96-26.
C
D
E
F
29. Fayyad U., 1996. "Mining Databases: Towards Algorithms for Knowledge Discovery", Bulletin of the Technical Committee on Data Engineering, vol. 21, no. 1, pp. 39-48.
30. Fayyad U., G. Piatetsky-Shapiro, & P. Smyth, 1996. "Knowledge Discovery and Data Mining: Towards a unifying framework", in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining, Portland, USA, pp. 82-88.
31. Ferber J., 1999. Multi-Agent Systems – An introduction to Distributed Artificial Intelligence. Addison-Wesley, London.
32. Fernandes A.A.A., 2000. "Combining Inductive and Deductive Inference in Knowledge Management Tasks", in Proceedings of the 11th International Workshop on Database and Expert Systems Applications - TAKMA 2000, IEEE Computer Society, pp. 1109-1114.
33. Freitas A.A., 1999. "On Rule Interestingness measures", Knowledge-Based Systems, vol. 12, no. 5-6, pp. 309-315.
34. Friedman-Hill E.J., 2003. Jess, The Expert System Shell for the Java Platform, version 6.1. Available: http://herzberg.ca.sandia.gov/jess.
35. Galitsky B. & R. Pampapathi, 2003. "Deductive and inductive reasoning for processing the claims of unsatisfied customers", in Proceedings of the 16th Int. Conf. on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (IEA/AIE 2003), Springer-Verlag, Heidelberg, pp. 21-30.
36. Ganti V., J. Gehrke, & R. Ramakrishman, 1999. "Mining Very Large Databases", Computer Magazine, vol. 32, no. 8, pp. 38-45.
37. Gasser L., 1991. "Social Conceptions of Knowledge and Action: DAI Foundations and Open Systems Semantics", Artificial Intelligence, vol. 47, pp. 107-138.
38. Genesereth M.R. & S. Ketchpel, 1994. "Software agents", Communications of the ACM, vol. 37, no. 7, pp. 48-53.
39. Goldberg D.E., 1989. Genetic Algorithms in Search, Optimization & Machine Learning. Addison-Wesley, Massachusetts.
40. Haeckel S.H. & R. Nolan, 1994. "Managing by wire", Harvard Business Review. 41. Haefner J.W. & T.O. Crist, 1994. "Spatial Model of Movement and Foraging in
Harvester Ants (Pogonomyrmex) (I): The Roles of Memory and Communication", Journal of Theoretical Biology, vol. 166, pp. 299-313.
42. Han J. & M. Kamber, 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann, Burnaby.
43. Hillbrand E. & J. Stender, 1994. Many-Agent simulation and Artificial Life. IOS Press. 44. Holland J.H., 1975. Adaptation in Natural and Artificial Systems. The University of
Michigan Press, Ann Arbor. 45. Holland J.H., 1987. "Genetic Algorithms and Classifier Systems: Foundations and
Future Directions", in Proceedings of the second international conference on genetic algorithms and their applications, Lawrence Erlbaum Associates, Hillsdale, New Jersey, pp. 82-89.
G
H
46. Holland J.H., 1995. Hidden order: How adaptation builds complexity. Addison-Wesley, Reading, MA.
47. Holsapple C.W.& M.P. Sena, 2004. "ERP plans and decision-support benefits", Decision Support Systems, to be published.
48. Hraber P.T., T. Jones, & S. Forrest, 1997. "The Ecology of Echo" in Artificial Life III, C.G. Langton Ed. Longman, Addison Wesley, pp. 165-190.
49. Information Discovery Inc 1999. Datamines for Data Warehousing.
50. Jennings N.R., 1993. "Commitements and Conventions: The Foundation of Coordination in Multi-Agent Systems", The Knowledge Engineering Review, vol. 2, no. 3, pp. 223-250.
51. Jennings N.R., J. Corera, I. Laresgoiti, E.H. Mamdani, F. Perriolat, P. Sharek, & L.Z. Varga 1996. "Using ARCHON to develop real-world DAI applications for electricity transportation management and particle accelarator control", IEEE Expert.
52. Jennings N.R., K. Sycara, & M.J. Wooldridge, 1998. "A roadmap of agent research and development", International Journal of Autonomopus Agents and Multi-Agent Systems, vol. 1, pp. 7-38.
53. Kaelbling L.P. & S.J.Rosenschein, 1990. Action and planning in embedded agents. The MIT Press, Cambridge.
54. Kargupta H., I. Hamzaoglou, & B. Stafford, 1996. "PADMA: PArallel Data Mining Agents for scalable text classification" in the Proceedings of High Performance Computing.
55. Kero B., L. Russell, S. Tsur, & W.M. Shen, 1995. "An Overview of Data Mining Technologies", in the KDD Workshop in the 4th International Conference on Deductive and Object-Oriented Databases, Singapore.
56. Knapik M. & J.Johnson, 1998. Developing Intelligent Agents for Distributed Systems. McGraw Hill.
57. Kodratoff Y., 1988. Introduction to Machine Learning. Pitman Publishing, London. 58. Koonce D.A., C-H. Fang, & S-C. Tsai, 1997. "A Data Mining tool for Manufacturing
Systems", Computers ind.Engineering, vol. 33, no. 1-2, pp. 27-30. 59. Krebs F. & H. Bossel, 1996. "Emergent value orientation in self-organization of an
animat", Ecological Modelling, vol. 96, pp. 143-164. 60. Kwon O.B. & J.J. Lee, 2001. "A multi agent intelligent system for efficient ERP
maintenance", Expert Systems with Applications, vol. 21, pp. 191-202.
I
J
K
61. Langton C.G., 1994. Personal Communication. 62. Lee C., 1961. "An algorithm for path connections and its applications", IRE Trans
Electron.Computers, vol. 10, pp. 346-365. 63. Levi S.D., P. Kaminsky, & S.E. Levi, 2000. Designing and managing the supply chain.
McGraw-Hill, Illinois. 64. Looney C.G., 1997. Pattern Recognition Using Neural Networks: Theory and
Algorithms for Engineers and Scientists. Oxford University Press.
65. MacQueen J., 1967. "Some methods for classification and analysis of multivariate observations", in Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, pp. 281-297.
66. Mahalingam K. & M.N. Huhns, 1997. "An Ontology Tool for Distributed Information Environments", IEEE Computer, vol. 30, no. 6, pp. 80-83.
67. Malone T.W., 1998. "Inventing the organizations of the twentieth first century: control, empowerment and information technology", in Sense and Respond: Capturing Value in the Network Era, S.P. Bradley & R. Nolan, Eds. Harvard Business School Press, Boston MA, pp. 263-284.
68. May R.M., 1973. Stability and Complexity in model ecosystems Princeton University Press, Princeton, N. J.
69. Mitkas P.A., A.L. Symeonidis, D. Kehagias, & I. Athanasiadis, 2002. "An agent framework for dynamic agent retraining: Agent academy", in Challenges and Achievements in e-business and e-work Prague, pp. 757-764.
70. Mitkas P.A., D. Kehagias, A.L. Symeonidis, & I. Athanasiadis, 2003. "A Framework for Constructing Multi-Agent Applications and Training Intelligent Agents", in Proceedings of the 4th International Workshop on Agent-Oriented Software Engineering (AOSE-2003), Springer-Verlag, Melbourne, Australia, pp. 1-16.
71. Mobasher B., 1999. "A Web personalization engine based on user transaction clustering", in Proceedings of the 9th Workshop on Information Technologies and Systems (WITS'99).
72. Mobasher B., R. Cooley, & J. Srivastava, 1999. "Creating adaptive web sites through usage-based clustering of URLs" in IEEE Knowledge and Data Engineering Workshop (KDEX'99).
73. Mobasher B., R. Cooley, & J. Srivastava, 2000. "Automatic personalization based on Web usage mining", Communications of the ACM, vol. 43, no. 8.
74. Mohammadian M., 2004. Intelligent Agents for Data Mining and Information Retrieval. Idea Group Inc..
75. Murrel D.J., J.M.J. Travis, & C. Dytham, 2002. "The evolution of dispersal distance in spatially-structured populations", Oikos, vol. 97, pp. 229-236.
L
M
76. Nwana H.S., 1995. "Software Agents: An Overview", The Knowledge Engineering Review, vol. 11, no. 3, pp. 205-244.
77. O' Conner M. & J. Herlocker, 1999. "Clustering items for collaborative filtering", in Proceedings of the ACM SIGIR Workshop on Recommender Systems, Berkeley, CA.
78. Papoulis A. 1991. Probability, Random Variables, and Stochastic Processes. McGraw-Hill.
79. Pecala S.W., 1986. "Neighborhood models of plant population dynamics. 2. Multispecies models of annuals", Theoretical Population Biology, vol. 29, pp. 262-292.
80. Peng Y., T. Finin, Y. Labrou, B. Chu, W. Tolone, & A. Boughannam, 1999. "A multi agent system for enterprise integration", Applied Artificial Intelligence, vol. 13, no. 1-2, pp. 39-63.
81. Perkowitz M. & O. Etzioni, 1998. "Adaptive Web sites: automatically synthesizing Web pages", in Proceedings of Fifteenth National Conference on Artificial Intelligence, Madison, WI.
82. Pilot Software 1999, White Paper: An introduction to Data Mining.
Q 83. Quinlan, J.R., 1993. C4.5: Programs for Machine Learning. San Mateo, Morgan
Kaufmann.
R 84. Ray T.S., 1992. "An approach to the synthesis of life" in Artificial Life II, C.G.
Langton, C. Taylor, J.D. Farmer, and S. Rasmussen, Eds. Redwood City, CA, Addison-Wesley pp. 371-408.
85. Rosenschein J.S. & G. Zlotkin, 1994. "Designing Conventions for Automated Negotiation", AI Magazine, pp. 29-46.
86. Rousset F. & S. Gandon, 2002. "Evolution of the distribution of dispersal distance under distance-dependent cost of dispersal", Journal of Evolutionary Biology, vol. 15, pp. 515-523.
87. Rust R.T., V.A. Zeithaml, & K. Lemon, 2000. Driving customer Equity: How customer lifetime value is reshaping corporate strategy. The Free Press, New York.
88. Rygielsky C., J.C. Wang, & D.C. Yen, 2002. "Data mining techniques for customer relationship management", Technology in Society, vol. 24, no. 4, pp. 483-502.
N
O
P
S 89. Shahabi C., A. Zarkesh, J. Adibi, & V. Shah, 1997. "Knowledge discovery from users
Web-page navigation", in Proceedings of Workshop on Research Issues in Data Engineering, Birmingham, England.
90. Shapiro J.,1999. "Bottom-up vs. top-down approaches to supply chain modeling", in Quantitative models for supply chain management, S. Tayur, R. Ganeshan, and M. Magazine Eds. Kluwer Publishing, pp. 737-759.
91. Simon H., 1996. The Sciences of the Artificial. MIT Press, MA, Cambridge. 92. Singh M.P., 1997. "Considerations on Agent Communication", in FIPA Workshop,
FIPA97. 93. Spiliopoulou M. & L.C. Faulstich, 1999. "WUM: A Web Utilization Miner", in
Proceedings of EDBT Workshop WebDB98, Valencia, Spain. 94. Spiliopoulou M., C. Pohle, & L.C. Faulstich, 1999. "Improving the effectiveness of a
Web site with Web usage mining", in Workshop on Web Usage Analysis and User Profiling (WebKKD99), San Diego.
95. Stolfo S.J., A.L. Prodromidis, S. Tselepis, W. Lee, D.W. Fan, & P.K. Chan, 1997. "Jam: Java agents for meta-learning over distributed databases", in Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, AAAI Press Publisher, Newport Beach, CA , pp. 74-81.
96. Symeonidis A.L., D. Kehagias, & P.A. Mitkas, 2003. "Intelligent policy recommendations on enterprise resource planning by the use of agent technology and data mining techniques", Expert Systems with Applications, vol. 25, no. 4, pp. 589-602.
97. Symeonidis A.L., P.A. Mitkas, & D. Kehagias, 2002. "Mining patterns and rules for improving agent intelligence through an integrated multi-agent platform", in 6th IASTED International Conference, Artificial Intelligence and Soft Computing, Banff, Alberta, Canada.
T 98. Talavera L. & U. Cortes, 1997. "Inductive hypothesis validation and bias selection in
unsupervised learning", in Proceedings of the 4th European Symposium on the Validation and Verification of Knowledge Based Systems, Leuven, Belgium, pp. 169-179.
99. The Data Mining Group, 2001. Predictive Model Markup Language Specifications (PMML), ver. 2.0. Available: http://www.dmg.org.
100. The FIPA Foundations, 2000. Foundation for Intelligent Physical Agents Specifications. Available: http://www.fipa.org.
101. The FIPA Foundations. FIPA-SL Specifications, 2000. FIPA SL Content Language Specification. Available: http://www.fipa.org/specs/fipa00008/SC00008I.html.
102. Turney P.D., 1993. "Robust Classification With Context-Sensitive Features", in 6th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, pp. 268-276.
U 103. UCI Group, 2004. UCI Machine Learning Repository. Available:
http://www.ics.uci.edu/~mlearn/MLRepository.html.
W 104. Webopedia, 2003. Online dictionary for computer and internet terms. Available:
http://www.webopedia.com. 105. Weiss G., 2000. Multiagent Systems: A Modern Approach to Artificial Intelligence. The
MIT Press, Massachuetts, USA. 106. Werner G.M. & M.G. Dyer, 1994. "Bioland: A Massively Parallel Simulation
Environment for Evolving Distributed Foms of Intelligent Behavior", in Massively Parallel Artificial Intelligence, H. Kitano and J.A. Handler Eds. Menlo Park, California, AAAI Press/MIT Press.
107. Westerberg L. & U. Wennergren, 2003. "Predicting the spatial distribution of a population in a heterogeneous landscape", Ecological Modelling, vol. 166, pp. 53-65.
108. Wilson S.W., 1987. "Classifier Systems and the Animat Problem", Machine Learning, vol. 2, pp. 199-228.
109. Wilson S.W., 1991. "The Animat Path to AI", in From Animals to Animats: Proceedings of the First International Conference on the Simulation of Adaptive Behavior, J.A. Meyer and S.W. Wilson Eds. Cambridge, Massachussets, The MIT Press/Bradford Books.
110. Wilson S.W. & D.E. Goldberg, 1989. "A Critical Review of Classifier Systems", Proceedings of the Third International Conference on Genetic Algorithms, Morgan Kaufmann, Los Altos, California, pp. 244-255.
111. Witten I.H. & E. Frank, 1999. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufman, New Zealand.
112. Wooldridge M. & N.R. Jennings, 1995. "Intelligent agents: Theory and practice.", The Knowledge Engineering Review, vol. 10, no. 2, pp. 115-152.
113. Wooldridge M., 1999. "Intelligent Agents". In Multiagent Systems, G. Weiss Ed. The MIT Press.
114. Worley J.H., G.R. Castillo, L. Geneste, & B. Grabot, 2002. "Adding decision support to workflow systems by reusable standard software components", Computers in Industry, vol. 49, pp. 123-140.
Y 115. Yeager L., 1994. "Computational Genetics, Physiology, Metabolism, Neural Systems,
Learning, Vision, and Behavior, or Polyworld: Life in a New Context", in Artificial Life III, C.G.Langton Ed. Redwood City, California, Addison-Wesley.
Z 116. Zhang Z., C. Zhang, & S. Zhang, 2003. "An agent-based hybrid framework for database
mining", Applied Artificial Intelligence, vol. 17, pp. 383-398.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 1
Agent Intelligence through Data Mining
18/09/2006 1ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent Intelligence Agent Intelligence through through
Data MiningData MiningAndreas L. Symeonidis
Research AssociateInformatics & Telematics InstituteCenter for Engineering Research and Technology – Hellas (CERTH) Email:[email protected]
Pericles A. MitkasAssociate Professor
Electrical and Computer EngineeringAristotle Univ. of Thessaloniki (AUTh)
and CERTHEmail: [email protected]
Thessaloniki, GREECE
Agent Intelligence through Data Mining
18/09/2006 2ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The goal of the tutorialThe goal of the tutorial……
Reasoning Agent
Software Agent Paradigm
Knowledge Discovery Process
Historical Data Knowledge Model
ApplicationDomain
Agent Modelling
AgentType
InferenceStructure
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 2
Agent Intelligence through Data Mining
18/09/2006 3ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Related TechnologiesRelated Technologies
Data Mining (DM): the extraction of interesting non-trivial, implicit, previously unknown and potentially useful information or patterns from data in large databases.
Software Agent (SA): a software entity that acts autonomously (on behalf of another entity) in a goal-oriented manner. It is able to perceive its environment through sensors and act on it through effectors.
Multi-agent Systems (MAS): a Software Engineering methodology for developing applications with the deployment of agents and agent primitives.
Agent Intelligence through Data Mining
18/09/2006 4ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Merging TechnologiesMerging Technologies (2/2)(2/2)
Software agents have been repeatedly used for executing DM tasks, but
DM results have not, yet, been dynamically incorporated to MAS
The reason: The inductive nature of DM and the lack of the appropriate tools hinders the unflustered
incorporation of knowledge to MAS
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 3
Agent Intelligence through Data Mining
18/09/2006 5ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Presentation Outline (1/3)Presentation Outline (1/3)
1 Basic Primitives of DM technology
Data Preprocessing
DM Techniques:
Cluster Analysis – Unsupervised Learning
Classification – Supervised Learning
Association Rules
2 Data Mining & Semantics
3 Embedding Domain Knowledge into DM
Agent Intelligence through Data Mining
18/09/2006 6ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Presentation Outline (2/3)Presentation Outline (2/3)
4 Data Mining Applications & Trends
5 Intelligent Agents
Definitions
Attributes & Communication
6 Agent Intelligence Infusion
The Levels of Intelligence Infusion
Tools
Methodologies
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 4
Agent Intelligence through Data Mining
18/09/2006 7ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Presentation Outline (3/3)Presentation Outline (3/3)
7 MAS exploiting DM extracted intelligence
An ERP add-on for intelligent CRM/SRM
A near real-time EMS
A decentralized maintenance management system
A self-organizing MAS, “in danger”
An agent-based, e-auction system
8 Open Issues
Agent Intelligence through Data Mining
18/09/2006 8ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Part 1 Part 1 -- IntroductionIntroduction
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 5
Agent Intelligence through Data Mining
18/09/2006 10ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
MotivationMotivationData explosion problemAutomated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories
Data warehousing and data mining
Data warehousing and on-line analytical processing
Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases
Knowledge StarvationThe need to see through and interpret all this “useless” data
Parts of the KDDKDD process
Agent Intelligence through Data Mining
18/09/2006 11ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The KDD processThe KDD process
What is KDD?
Knowledge Discovery in Databases (KDD) is the extraction of interesting non-trivial, implicit, previously unknown and potentially useful information or patterns from data in large databases.
What is Data Mining?Data Mining is the most important step in the KDD process, consisting of applying data analysis and discovery algorithms that, under acceptable computational efficiency limitations, produce a particular enumeration of patterns over the data.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 6
Agent Intelligence through Data Mining
18/09/2006 12ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining related technologiesData Mining related technologies
Machine LearningMachine LearningKnowledge Extraction Knowledge Extraction –– Extended retrievalExtended retrievalData/Pattern AnalysisData/Pattern AnalysisStatistical Analysis Statistical Analysis Business IntelligenceBusiness Intelligence
(Deductive) query processing. (Deductive) query processing. Expert systems or small ML/statistical programsExpert systems or small ML/statistical programs
What is not data mining
Agent Intelligence through Data Mining
18/09/2006 13ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining vs. Machine LearningData Mining vs. Machine Learning
The size of the dataset is different:The size of the dataset is different:For machine learning, datasets are loaded to main memory – thus smallFor data mining, there is no such restriction (usually large datasets)
The objective is different:The objective is different:ML focuses on the inference mechanisms involved in the learning processDM focuses on the business exploitation of extracted results
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 7
Agent Intelligence through Data Mining
18/09/2006 14ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining vs. Extended RetrievalData Mining vs. Extended Retrieval
The approach is different: The approach is different: Extended Retrieval (ER) is based on individual examples (retrieved and stored as analogs), whereas DM is ONLYinterested in a flood of data.
The extracted knowledge is different:The extracted knowledge is different:DM generated knowledge contains the condensed information extracted from structured databases. In the case of ER knowledge comes through the mapping of structure information
ER and DM are complementary approachesER and DM are complementary approaches
Agent Intelligence through Data Mining
18/09/2006 15ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining vs. Statistical AnalysisData Mining vs. Statistical Analysis
The objective of statistical data analysis:The objective of statistical data analysis:To model the underlying structures in order to lead to evaluation hypothesis.
For SA Computational efficiency is not a For SA Computational efficiency is not a concernconcern
Data for SA are static and clean and datasets Data for SA are static and clean and datasets are smallare smallThe inference procedure is different:The inference procedure is different:
SA procedure involves repeated sampling under a given statistical model from an unknown distribution of the dataDM seeks to identify modeling procedures with a high probability of near-optimality over all possible distributions of data.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 8
Agent Intelligence through Data Mining
18/09/2006 16ECML/PKDD 2006, Berlin, Germany
AUTHAUTHData Mining: Confluence of Multiple Data Mining: Confluence of Multiple DisciplinesDisciplines
Data Mining
Database Technology Statistics
OtherDisciplines
InformationScience
MachineLearning Visualization
Agent Intelligence through Data Mining
18/09/2006 17ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Why data Mining?Why data Mining?
Data mining is a Data mining is a computercomputer--drivendriven application, as it application, as it is performed by computers and not by humans.is performed by computers and not by humans.
Data mining solves the Data mining solves the ““query formulationquery formulation””problem.problem.
Data mining confronts the visualisation and Data mining confronts the visualisation and understanding of large data sets.understanding of large data sets.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 9
Agent Intelligence through Data Mining
18/09/2006 18ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining
Proactive and summarizing data analysis
0000’’ss
Data Warehousing & DSS
Multiple-level dynamic and summarizing data analysis
9090’’ss
Data Access
Dynamic and summarizingdata analysis
8080’’ss
The evolutionThe evolution……
Data Collection
Static and summarizingdata analysis
6060’’ss
Agent Intelligence through Data Mining
18/09/2006 19ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The KDD process in detail (1/3)The KDD process in detail (1/3)
1. Identify the goal of the KDD process:Develop an understanding of the application domain and the relevant prior knowledge.
2. Create a target data set:Select a data set, or focus on a subset of variables or data samples, on which discovery will be performed.
3. Clean and pre-process data:Remove noise, handle missing data fields, account for time sequence information and known changes.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 10
Agent Intelligence through Data Mining
18/09/2006 20ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
4. Reduce and project data:Find useful features to represent the data depending on the goal of the task.
5. Identify data mining method:Match the goals of the KDD process to a particular data mining method: e.g. summarization, classification, regression, clustering, etc.
6. Choose a data mining algorithm:Select method(s) to be used for searching for patterns in the data.
7. Apply data mining
The KDD process in detail (2/3)The KDD process in detail (2/3)
Agent Intelligence through Data Mining
18/09/2006 21ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
8. Evaluate data mining results:Interpret mined patterns, possibly return to steps 1-7 for further iteration.
9. Consolidate discovered knowledge:Incorporate this knowledge into another system for further action, or simply document it and report it to interested parties.
The KDD process in detail (3/3)The KDD process in detail (3/3)
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 11
Agent Intelligence through Data Mining
18/09/2006 22ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
TaskTask--relevant relevant
DataData
Pattern EvaluationPattern Evaluation
Data IntegrationData Integration
Unformatted Unformatted datadata
Data Data WarehouseWarehouse
Data CleaningData Cleaning
SelectionSelection
A graphical representation of the A graphical representation of the KDD processKDD process
KKnnoowwlleeddggee
Data MiningData Mining
Agent Intelligence through Data Mining
18/09/2006 23ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Relational databases
Data warehouses
Transactional databases
Object-oriented and object-relational databases
Spatial databases
Data Mining: On What Kind of Data? (1/2)Data Mining: On What Kind of Data? (1/2)
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 12
Agent Intelligence through Data Mining
18/09/2006 24ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining: On What Kind of Data? (2/2)Data Mining: On What Kind of Data? (2/2)
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
Genomics databases
World Wide Web
Agent Intelligence through Data Mining
18/09/2006 26ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining Functionalities (1/6)Data Mining Functionalities (1/6)
Concept descriptionData characterization: Summarize the features of the class under study (target class) in general terms.
E.g. Summarize the characteristics of customers who spent more than $1000 during 2003.
Data discrimination: Compare the feature of the target class with one or a set of comparative classes (contrasting classes).
E.g. Create a comparative profile of customers that shop often to customers that shop rarely in our store
Usual concept description outputs are:Characteristic rulesDiscriminant rules
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 13
Agent Intelligence through Data Mining
18/09/2006 27ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining Functionalities (2/6)Data Mining Functionalities (2/6)Association (correlation and causality)
The discovery of association rules showing attribute-value conditions that frequently occur
Widely used for market-basket analysis and transaction data analysis
Example:If the age of a customer is between 20 and 29 and his income is between $20.000 and $29.000, then he buys a PC with a certainty of 60%:
Age(“20-29”) ^ Income(“$20..$29K”) ⇒Buys(“PC”)
[support = 2%, confidence = 60%]
Agent Intelligence through Data Mining
18/09/2006 28ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining Functionalities (3/6)Data Mining Functionalities (3/6)
Cluster analysisThe identification of a finite set of categories or clusters to
describe the data.
Class label is unknown: The training data does not have any!!!
Clustering principle: Objects that belong to the same cluster
must be similar to each other, while objects that belong to
different clusters must be dissimilar to each other.
ExampleIdentify homogeneous subpopulations of customers.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 14
Agent Intelligence through Data Mining
18/09/2006 29ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining Functionalities (4/6)Data Mining Functionalities (4/6)
Classification and Prediction The process of finding a set of models (or functions) that describe and distinguish data classes or concepts.The derived model is based on the analysis of training data, whose class label is known.The derived model can be presented as:
Classification rules (If-Then rules)Decision tree (A flow-chart-like tree structure) Mathematical formulaeNeural Networks
ExamplesClassify countries based on climateClassify cars based on gas mileage
Agent Intelligence through Data Mining
18/09/2006 30ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining Functionalities (5/6)Data Mining Functionalities (5/6)
Outlier analysisIdentify outliers in a set of data.
Outlier: a data object that does not comply with the general
behavior of the data
It can be considered as noise or exception but is quite useful in
fraud detection, rare events analysis
Example
Discovery of fraudulent usage of credit cards.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 15
Agent Intelligence through Data Mining
18/09/2006 31ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining Functionalities (6/6)Data Mining Functionalities (6/6)
Trend and evolution analysisTrend and deviation: regression analysis
Sequential pattern mining, periodicity analysis
Similarity-based analysis
Other pattern-directed or statistical analysis
Agent Intelligence through Data Mining
18/09/2006 32ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Are all the Are all the ““DiscoveredDiscovered”” Patterns Interesting?Patterns Interesting?
A data mining system/query may generate thousands of patterns. Not all of them are interesting!!!
InterestingnessA pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 16
Agent Intelligence through Data Mining
18/09/2006 33ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Are all the Are all the ““DiscoveredDiscovered”” Patterns Interesting?Patterns Interesting?
Different types of InterestingnessObjective: based on statistics and the structure of discovered patterns
Support,
Confidence, etc.
Subjective: based on user’s belief in the data
Unexpected,
Novel,
Actionable, etc.
An interesting pattern represents knowledge
Agent Intelligence through Data Mining
18/09/2006 34ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Major Issues in Data Mining (1/3)Major Issues in Data Mining (1/3)
Mining methodology and user interactionMining different kinds of knowledge in databasesInteractive mining of knowledge at multiple levels of abstractionIncorporation of background knowledgeData mining query languages and ad-hoc data miningExpression and visualization of data mining resultsHandling noise and incomplete dataPattern evaluation: the interestingness problem
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 17
Agent Intelligence through Data Mining
18/09/2006 35ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Major Issues in Data Mining (2/3)Major Issues in Data Mining (2/3)
Performance and scalabilityEfficiency and scalability of data mining algorithmsParallel, distributed and incremental mining methods
Issues relating to the diversity of data typesHandling relational and complex types of dataMining information from heterogeneous databases and global information systems (WWW)
Agent Intelligence through Data Mining
18/09/2006 36ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Major Issues in Data Mining (3/3)Major Issues in Data Mining (3/3)
Issues related to applications and social impactsApplication of discovered knowledge
Domain-specific data mining tools
Intelligent query answering
Process control and decision making
Integration of the discovered knowledge with existing knowledge: A knowledge fusion problem
Protection of data security, integrity, and privacy
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 18
Agent Intelligence through Data Mining
18/09/2006 38ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
SummarySummary
Mining can be performed on a variety of information repositoriesData mining functionalities: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc.Classification of data mining systemsMajor issues in data mining
Agent Intelligence through Data Mining
18/09/2006 39ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Part 2Part 2--Data Data PreprocessingPreprocessing
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 19
Agent Intelligence through Data Mining
18/09/2006 41ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
1. Data cleaning
2. Data integration
3. Data transformation
4. Data reduction
5. Data discretization
Major Tasks in Data PreprocessingMajor Tasks in Data Preprocessing
Agent Intelligence through Data Mining
18/09/2006 42ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data cleaning tasks
Fill in missing values
Identify outliers and smooth out noisy data
Correct inconsistent data
1. Data Cleaning1. Data Cleaning
Real-world data tend to be “dirty” and incomplete…
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 20
Agent Intelligence through Data Mining
18/09/2006 43ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data is not always availableMany tuples have no recorded value for several attributes,
i.e. customer income in sales data
Missing data may be due to:Equipment malfunction
Inconsistency with other recorded data and thus deleted
Data not entered due to misunderstanding
Certain data may not be considered important at the time of entry
History or changes of the data may not be registered
Missing data may need to be inferred.
Missing DataMissing Data
Agent Intelligence through Data Mining
18/09/2006 45ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Noise: Random error or Variance in a measured variable.
Incorrect attribute values may be due to:Faulty data collection instrumentsData entry problemsData transmission problemsTechnology limitationInconsistency in naming convention
Other data problems which require data cleaning:Duplicate recordsIncomplete dataInconsistent data
Noisy DataNoisy Data
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 21
Agent Intelligence through Data Mining
18/09/2006 47ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data integration: Integration of multiple databases, data cubes, or files. Data integration combines data from multiple sources into a coherent store.
Schema integrationIntegrate metadata from different sourcesEntity identification problem: identify real world entities from
multiple data sources, e.g., A.cust-id ≡ B.cust-id#
Detecting and resolving data value conflictsFor the same real world entity, attribute values from different
sources are differentPossible reasons: different representations, different scales, e.g.,
metric vs. British units
2. Data Integration2. Data Integration
Agent Intelligence through Data Mining
18/09/2006 49ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Smoothing: remove noise from data
Aggregation: summarization, data cube construction
Generalization: concept hierarchy climbing
Normalization:scaled to fall within a small, specified range:
min-max normalization
z-score normalization
normalization by decimal scaling
Attribute/feature construction: New attributes constructed from the given ones
3. Data Transformation3. Data Transformation
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 22
Agent Intelligence through Data Mining
18/09/2006 51ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data reduction: Obtains a reduced representation of the data set that is much smaller in volume, but yet produces the same (or almost the same) analytical results
Data reduction strategies
Data cube aggregation
Dimensionality reduction
Numerosity reduction
Discretization and concept hierarchy generation
4. Data Reduction Strategies4. Data Reduction Strategies
Agent Intelligence through Data Mining
18/09/2006 52ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
5. Data 5. Data DiscretizationDiscretization (1/2)(1/2)
Part of data reduction but with particular importance
Discretization methods are applied both on numericand categorical data
Discretization methods for categorical data
Specification of a partial ordering of attributes explicitly at the schema level by users or experts
Specification of a portion of a hierarchy by explicit data grouping
Specification of a set of attributes, but not of their partial ordering
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 23
Agent Intelligence through Data Mining
18/09/2006 54ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Mainstream DM techniquesMainstream DM techniques
Clustering
Classification
Association Rules
Agent Intelligence through Data Mining
18/09/2006 55ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Part 3 Part 3 –– ClusteringClustering(Unsupervised Learning)(Unsupervised Learning)
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 24
Agent Intelligence through Data Mining
18/09/2006 56ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Cluster AnalysisCluster Analysis
The identification of a finite set of categories or clusters to describe the data.
Agent Intelligence through Data Mining
18/09/2006 57ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
What is a Cluster?What is a Cluster?
Cluster is the collection of data objects that are:
Similar to one another within the same cluster
Dissimilar to the objects in other clusters
Clustering: The BasicsClustering: The Basics……
Clustering as learning:Clustering as learning:
Clustering is unsupervised learning, which means that there are no predefined classes and no examplesthat would show what kind of desirable relations should be valid among the data.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 25
Agent Intelligence through Data Mining
18/09/2006 58ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Clustering criteriaClustering criteria
High intra-cluster similarityLow inter-cluster similarity
Cluster 1
x
x
Cluster 3
Cluster 2
x
Inter-class similarity
Intra-classsimilarity
Agent Intelligence through Data Mining
18/09/2006 59ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Requirements of Clustering in Requirements of Clustering in Data MiningData Mining
Scalability
Ability to deal with different types of attributes
Discovery of clusters with arbitrary shape
Minimal requirements for domain knowledge to determine input parameters
Able to deal with noise and outliers
Insensitive to order of input records
High dimensionality
Interpretability and usability
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 26
Agent Intelligence through Data Mining
18/09/2006 60ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Steps to develop a clustering process Steps to develop a clustering process
1 1 Feature selectionFeature selectionSelect properly the features on which clustering is to be performed
2 2 Clustering algorithmClustering algorithmProximity measure. Clustering criterion
3 3 Validation of the resultsValidation of the resultsThe correctness of clustering algorithm results is verified using appropriate criteria and techniques
4 4 Interpretation of the resultsInterpretation of the results
Agent Intelligence through Data Mining
18/09/2006 61ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Technique usedTechnique used in order to define clustersPartitional MethodsHierarchical MethodsDensity-Based MethodsGrid-Based Methods
The type of variablestype of variablesStatistical – Numerical DataConceptual- Categorical Data
TheoryTheory used in order to extract clustersFuzzy ClusteringCrisp ClusteringKohonen net clustering
A Categorization of Clustering MethodsA Categorization of Clustering Methods
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 27
Agent Intelligence through Data Mining
18/09/2006 62ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
PartitionalPartitional method:method: Decompose the data set into a set of k disjoint clusters.
Problem DefinitionProblem Definition
Given an integer k, find a partition of k clusters that optimizes the chosen partitioning criterion
Partitioning Algorithms: Basic ConceptPartitioning Algorithms: Basic Concept
ℜ: Best partitioning algorithm representative:
The KK--MeansMeans Method
Agent Intelligence through Data Mining
18/09/2006 63ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
E x mK k c xk
k= −∑ ( )
2
The KThe K--Means Clustering MethodMeans Clustering Method
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 28
Agent Intelligence through Data Mining
18/09/2006 64ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
AdvantagesAdvantagesRelatively efficient: O(tkn), where n is the number of objects,
k is the number of clusters, and t is the number of iterations.
Normally, k, t << n.
Often terminates at a local optimum.
ProblemsProblemsApplicable only to numerical data sets
Need to specify the number of clusters in advance
Unable to handle noisy data and outliers
Not suitable to discover clusters with non-convex shapes
Comments on the KComments on the K--Means MethodMeans Method
Agent Intelligence through Data Mining
18/09/2006 65ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
A way of overcoming the K-Means Method problems is the K-Medoids method
The KThe K--MedoidsMedoids Clustering MethodClustering Method
Medoid: a representative object in clusters.
Most Known K-Medoids Algorithms:
PAMPAM (Kaufmann & (Kaufmann & RousseeuwRousseeuw, 1987), 1987)
CLARACLARA (Kaufmann & (Kaufmann & RousseeuwRousseeuw, 1990), 1990)
CLARANSCLARANS (Ng & Han, 1994): Randomized sampling(Ng & Han, 1994): Randomized sampling
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 29
Agent Intelligence through Data Mining
18/09/2006 66ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Hierarchical Clustering AlgorithmsHierarchical Clustering Algorithms
BIRCH (1996):BIRCH (1996): uses CF-tree and incrementally adjusts the quality of sub-clusters
CURE (1998):CURE (1998): is robust to outliers and identifies clusters of non-spherical shapes.
ROCK (1999):ROCK (1999): is a robust clustering algorithm for Boolean and categorical data. It introduces two new concepts, that is a point's neighboursneighbours and linkslinks.
Agent Intelligence through Data Mining
18/09/2006 68ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
BIRCH BIRCH -- CFCF
What is the CF? A triplet summarizing information about subclusters of objects. The CF of a subcluster is defined as:
CF = (N, LS, SS)
N: Number of data points
LS: ∑Ni=1=Xi SS: ∑N
i=1=Xi2
Scales linearly:Scales linearly: Finds a good clustering with a single scan and improves the quality with a few additional scans.
Problem:Problem: Handles only numeric data, and is sensitive to the order of the data record.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 30
Agent Intelligence through Data Mining
18/09/2006 70ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
ClusteringClustering based on density (local cluster criterion), such as density-connected points.
Major features:Discover clusters of arbitrary shapeHandle noiseNeed density parameters as termination condition
Representative algorithms:
DBSCANDBSCAN: Ester, et al. (KDD’96)DENCLUEDENCLUE: Hinneburg & D. Keim (KDD’98)
DensityDensity--Based Clustering AlgorithmsBased Clustering Algorithms
Agent Intelligence through Data Mining
18/09/2006 71ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Relies on a densitydensity--basedbased notion of cluster: A clusterclusteris defined as a maximal set of density-connected points
Discovers clusters of arbitrary shape in spatial databases with noise
Core
Border
Outlier
Eps = 1cm
MinPts = 5
DBSCANDBSCAN
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 31
Agent Intelligence through Data Mining
18/09/2006 72ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
1. Arbitrary select a point p
2. Retrieve all points density-reachable from p w.r.t. Eps and MinPts.
3. If p is a core point, a cluster is formed.
4. If p is a border point, no points are density-reachable from p and DBSCAN visits the next point of the database.
5. Continue the process until all points are processed.
DBSCAN: The AlgorithmDBSCAN: The Algorithm
Agent Intelligence through Data Mining
18/09/2006 73ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Part 4 Part 4 -- ClassificationClassification(Supervised Learning)(Supervised Learning)
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 32
Agent Intelligence through Data Mining
18/09/2006 74ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
ClassificationClassification can be described as a function that maps (classifies) a data item into one of the several predefined classes.
RequirementsRequirementsA well-defined set of classesA training set of pre-classified examples characterize
the classification.
Classification Classification
GoalGoalInduce a model that can be used to classify future
data items whose classification is unknown.
Agent Intelligence through Data Mining
18/09/2006 75ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Bayesian classification
Decision Trees
Neural Networks
Classification MethodsClassification Methods
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 33
Agent Intelligence through Data Mining
18/09/2006 76ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Bayesian classificationBayesian classification
Aim: To classify a sample x to one of the given classes c1, c2,.., cNusing a probability model defined according to Bayes theory
RequirementsRequirementsA priori probability for each class ci. Conditional probability density function p(x/ci)∈[0,1]
⇓⇓ BayesBayes FormulaFormula
Posterior probability
A pattern is classified in the class with the highest posterior probability.
ProblemProblem:: Complete knowledge of probability laws is necessary in order to perform the classification
( ) ( )( ) ( )
q c xp x c p c
p x c p ci
i i
i ij
C( / )/
/=
=∑
1
Agent Intelligence through Data Mining
18/09/2006 77ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
RequirementsRequirementsClusters (categories)A training set of pre-classified data.
One of the most widely used techniques for classification and prediction.
Decision TreesDecision Trees
CharacteristicsCharacteristics
Internal node: A test of an attribute
Branch descending of a node: One of the possible values for this attribute
Leaf: One of the defined classes
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 34
Agent Intelligence through Data Mining
18/09/2006 78ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Decision TreesDecision Trees
A decision tree example:
ooututlooklook
sunny
humidityhumidityovercast rain
windwind
PPhigh normal
PPNN
true false
NN PP
The mechanismBuilding phase: The training data set is recursively partitioned until
all the instances in a partition have the same class
Pruning phase: Nodes are pruned to prevent over fitting and to obtain a tree with higher accuracy
Agent Intelligence through Data Mining
18/09/2006 79ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Decision tree algorithms differ on the test criterion for partitioning a set of records
ID3, C4.5:ID3, C4.5: Information gain algorithms
CLS:CLS: Examines the solution space of all possible decision trees to some fixed depth. It selects a test that minimizes the computational cost of classifying a record.
SLIQ, SPRINT:SLIQ, SPRINT: select the attribute to test, based on the GINI index
Decision Tree AlgorithmsDecision Tree Algorithms
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 35
Agent Intelligence through Data Mining
18/09/2006 80ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The mechanismStep 1:
If all instances in C are positive, then create YES node and halt.If all instances in C are negative, create a NO node and halt.Otherwise select an attribute, A with values v1, ..., vn and create
a decision node.
Step 2:Partition the training instances in C into subsets C1, C2,...,Cn
according to the values of A.
Step 3: Apply the algorithm recursively to each of the sets Ci.
ID3 : The AlgorithmID3 : The Algorithm
Agent Intelligence through Data Mining
18/09/2006 81ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
A statistical property, called information gaininformation gain, is used.
The information needed to identify the class of an element of S, called Entropy of SEntropy of S, is:
ID3: Definitions (1/2)ID3: Definitions (1/2)
ℜℜ: : How does ID3 decide which How does ID3 decide which attribute is the best?attribute is the best?
where p(I) is the proportion of S belonging to class I.
2( ) ( )log ( )Info S p I p I=∑
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 36
Agent Intelligence through Data Mining
18/09/2006 82ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
ID3: Definitions (2/2)ID3: Definitions (2/2)
The information needed to identify the class of an element of S, Info (S,A)Info (S,A), , after we partition S on basis of the value of an attribute A into sets Sv:
Gain(S, A) = Info(S) – Info ( S,A)
Gain(S, A)Gain(S, A) is information gain of example set S on attribute A.
( , ) [( / ) ( )]v vInfo S A S S Entropy S= ×∑
Agent Intelligence through Data Mining
18/09/2006 83ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Outlook Temperature Humidity Wind Play_ballSunny Hot High Weak NoSunny Hot High Strong NoOvercast Hot High Weak YesRain Mild High Weak YesRain Cool Normal Weak YesRain Cool Normal Strong NoOvercast Cool Normal Strong YesSunny Mild High Weak NoSunny Cool Normal Weak YesRain Mild Normal Weak YesSunny Mild Normal Strong YesOvercast Mild High Strong YesOvercast Hot Normal Weak YesRain Mild High Strong No
ID3: ExampleID3: Example
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 37
Agent Intelligence through Data Mining
18/09/2006 84ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
OutlookOutlook
Sunny RainyOvercast
?
Which attribute will be the root node ?Gain(S, Outlook) = 0.246
Gain(S, Temperature) = 0.029
Gain(S, Humidity) = 0.151
Gain(S, Wind) = 0.048
ID3 : Attribute SelectionID3 : Attribute Selection
Gain(Sunny, Humidity) = 0.970
Gain(Sunny, Temperature) = 0.570
Gain(Sunny, Wind) = 0.019
NoNo
OutlookOutlook
Sunny RainyOvercast
HumidityHumidity
High Normal
YesYes
Agent Intelligence through Data Mining
18/09/2006 85ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
OutlookOutlook
Sunny RainOvercast
HumidityHumidity
High Normal
NoNo YesYes
YesYesWindWind
Strong Weak
NoNo YesYes
ID3 Example : Final Decision TreeID3 Example : Final Decision Tree
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 38
Agent Intelligence through Data Mining
18/09/2006 86ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
C4.5C4.5 is an extension of ID3extension of ID3 that accounts for:
Unavailable values
Continuous attribute values ranges
Pruning of decision trees
Rule derivation
C4.5C4.5
Agent Intelligence through Data Mining
18/09/2006 87ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
PART 5 PART 5 ––Association RulesAssociation Rules
SS1 1 ⇒⇒ SS22
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 39
Agent Intelligence through Data Mining
18/09/2006 88ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The problem:
Given:1) database of transactions, 2) each transaction is a list of items (purchased by a customer in a visit)
Find: All rules that correlate the presence of one set of items with that of another set of items
Association mining: The ideaAssociation mining: The idea……
Agent Intelligence through Data Mining
18/09/2006 89ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Association miningFinding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.
Association mining: The solutionAssociation mining: The solution……
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 40
Agent Intelligence through Data Mining
18/09/2006 90ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The basic concepts on rule miningThe basic concepts on rule mining……
Support:Support: There is enough support for the rule S1=>S2 if the number of records whose attributes include S1 or S2 is at least minsupp.
Confidence: Confidence: We have enough confidence in an association rule if the ratio of records having attributes that include S1 or S2 over records having attributes that include S1 is at least minconf.
Agent Intelligence through Data Mining
18/09/2006 91ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Association Rule: a definitionAssociation Rule: a definition
An An association ruleassociation rule is an expression of the form is an expression of the form S1=>S2 where where S1, S2 are sets of attributes with are sets of attributes with
sufficient support and confidence.sufficient support and confidence.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 41
Agent Intelligence through Data Mining
18/09/2006 92ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
An association rule A=>BA=>B is thought to be interestinginteresting when:
P(A∩B)P(A) - P(B)>d
where dd is an appropriate value
When to pick an association rule?When to pick an association rule?
Agent Intelligence through Data Mining
18/09/2006 93ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
TransactionID Items Bought
2000 A,B,C
1000 A,C
4000 A,D
5000 B,E,F
Customerbuys diaper
Customerbuys both
Customerbuys beer
What is the support and confidence for the rules:1) A ⇒ C 2) C ⇒ A
1) A and C appear in 2 of the 4 transactions, so support is 50%. In addition, A appears in 3 transactions, two of which contain C too, so confidence is 2/3=66,6%.
2) A and C appear in 2 of the 4 transactions, so support is 50%. In addition, A appears in all the transactions that C appears, so confidence is 2/2=100%.
Counting Confidence and SupportCounting Confidence and Support……
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 42
Agent Intelligence through Data Mining
18/09/2006 94ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Frequent itemsets: Sets of items that have minimum support
A subset of a frequent itemset must also be a frequent itemset
i.e., if A B is a frequent itemset, both A and Bshould be frequent itemsets
Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)
Use the frequent itemsets to generate association rules.
Mining Frequent Mining Frequent ItemsetsItemsets::the Key Stepthe Key Step
Agent Intelligence through Data Mining
18/09/2006 95ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Association rule algorithmsAssociation rule algorithms
AprioriApriori
DHPDHP
TrieTrie Data StructureData Structure
Iceberg QueriesIceberg Queries
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 43
Agent Intelligence through Data Mining
18/09/2006 96ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Procedure Synopsis:Procedure Synopsis:Constructs a set of large itemsCounts the number of each set’s appearancesDetermines large itemsets on a predetermined minsupp
The major StepsThe major Steps……1) Join Step: Ck is generated by joining Lk-1 with itself
2) Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset
The The AprioriApriori AlgorithmAlgorithm
Agent Intelligence through Data Mining
18/09/2006 97ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Candidate generationCandidate generation--Pseudo CodePseudo Code
In order to estimate Ck, Apriori uses Lk-1× Lk-1. Then Ck consists of all (k-1)-itemsets.
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 : frequent items;
1
2
L⎡ ⎤⎢ ⎥⎣ ⎦
Pseudo-code:for (k = 1; Lk !=∅; k++) do beginCk+1 = candidates generated from Lk;for each transaction t in database doincrement the count of all candidates in Ck+1 that are
contained in tLk+1 = candidates in Ck+1 with min_support
endreturn ∪k Lk;
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 44
Agent Intelligence through Data Mining
18/09/2006 98ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
1. L3=abc, abd, acd, ace, bcd
2. Self-joining: L3 × L3
abcd from abc and abd
acde from acd and ace
3. Pruning:
acde is removed because ade is not in L3
4. C4=abcd
Candidate Generation ExampleCandidate Generation Example……
Agent Intelligence through Data Mining
18/09/2006 99ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Association Rule ExtractionAssociation Rule ExtractionAn ExampleAn Example……
TID Items100 1 3 4200 2 3 5300 1 2 3 5400 2 5
itemset sup.1 22 33 34 1
5 3
itemset sup.
1 22 33 3
5 3
Scan D
C1 L1
itemset
1 21 31 52 3
2 53 5
itemset sup1 2 11 3 21 5 12 3 2
2 5 33 5 2
itemset sup1 3 22 3 22 5 33 5 2
L2C2
C2
Scan D
C3 L3
itemset2 3 5 Scan D
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 45
Agent Intelligence through Data Mining
18/09/2006 100ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Hash-based itemset counting: A k-itemset whose corresponding hashing bucket count is below the threshold cannot be frequent
Transaction reduction: A transaction that does not contain any frequent k-itemset is useless in subsequent scans
Partitioning: Any itemset that is potentially frequent in DB must be frequent in at least one of the partitions of DB
Sampling: Mine on a subset of given data, lower support threshold + a method to determine the completeness
Dynamic itemset counting: Add new candidate itemsetsonly when all of their subsets are estimated to be frequent
Methods to Improve Methods to Improve AprioriApriori’’ss EfficiencyEfficiency
Agent Intelligence through Data Mining
18/09/2006 101ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
DHPDHP
Creates a hashing table that controls the legitimacy of k-itemsets and reduces dramatically the size of C2 and the itemset database in result.
FTDAFTDA It is used in order to extract association rules on
quantitative values. Fuzzy methods are applied and fuzzy rules are extracted.
Integration Algorithms (1/2)Integration Algorithms (1/2)
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 46
Agent Intelligence through Data Mining
18/09/2006 102ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
PartitionPartition
Divides the itemset database and applies only two scans in order to execute the algorithm.
DICDIC
Optimization through dynamic itemset counting.
TRIE Data StructureTRIE Data Structure Used to create covers: itemsets that have greater than or
equal to a specified minimum support threshold. Extracts exclusive association rules.
Integration Algorithms (2/2)Integration Algorithms (2/2)
Agent Intelligence through Data Mining
18/09/2006 103ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Exclusive Association RulesExclusive Association Rules
Definition:We say that, is an excluding association, if S1, S2 and S3 are mutually disjoint and the following conditions hold:
S1 ∪ S2 => S3
supp(S1 ∪S3)supp(S1) <minconf1.
2. supp(S1 ∪S3) - supp(S1 ∪S2∪S3) ≥ minsupp
3. supp(S1 ∪S3) - supp(S1 ∪S2∪S3)
supp(S1) - supp(S1 ∪S2)≥ minconf
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 47
Agent Intelligence through Data Mining
18/09/2006 105ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Part 7 Part 7 --Data Mining and SemanticsData Mining and Semantics
Agent Intelligence through Data Mining
18/09/2006 106ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The different perspectives on The different perspectives on SemanticsSemantics
The DM community refers to semantics as a way of representing knowledge by the use of “some” unified language
The AI community deals with semantics as a way of creating meta-knowledge through the refinement of decision structures
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 48
Agent Intelligence through Data Mining
18/09/2006 107ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
DM Perspective DM Perspective –– The ReasonsThe Reasons……
Each DM tool uses a different way of describing knowledge (same metrics, same concepts)
Not all tools provide all dm techniques
In order to incorporate results in your system, reusability is now mandatory
Agent Intelligence through Data Mining
18/09/2006 108ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The SolutionThe Solution……
The development of a modeling language, that would allow users to:
Develop models within one vendor's applicationUse other vendors' applications to test, visualize, analyze, evaluate or otherwise use the models
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 49
Agent Intelligence through Data Mining
18/09/2006 109ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The LanguageThe Language……
PMMLPMMLPredictive Markup Model Language
http://www.dmg.org
Developed by the DMG (Oracle, SAS, SPSS, MineIt, etc.)XML-based language, with its own DTD, mistakes are difficult to occurIt has a stable version (2.0)
Agent Intelligence through Data Mining
18/09/2006 110ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
A PMML document exampleA PMML document example
The Play_Golf model: A model that decides whether the golf game should be played, depending on humidity, wind, temperature and weather outlook.
After performing data mining on the Play_Golf dataset, the resulting decision tree is:
And the resulting PMML document is :
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 50
Agent Intelligence through Data Mining
18/09/2006 111ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
AI perspective AI perspective ––The ReasonsThe Reasons……
In order to incorporate meta-knowledge into data mining, new interestingness measures are applied, in order to indicate the validity of the knowledge extracted, according to domain understanding.
Fuzzy weights
Penalty matrices
Agent Intelligence through Data Mining
18/09/2006 112ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Part 8 Part 8 --New data mining algorithms: New data mining algorithms:
the AI waythe AI way……
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 51
Agent Intelligence through Data Mining
18/09/2006 113ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The new agentThe new agent--oriented oriented algorithmalgorithm……
The need to develop an algorithm that deals with agent actions has led us to the development of κ-Profile.
κ-Profile is a data mining mechanism that was first introduced for the dynamic segregation of web roaming attitudes.
It is now being adapted in order to be able to predict agent behaviors and actions.
Agent Intelligence through Data Mining
18/09/2006 114ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The web roaming mechanismThe web roaming mechanism……
OfflineData extractionData preprocessingTransaction identificationPageview identificationClusteringProfiling
OnlineActive sessionRecommendation
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 52
Agent Intelligence through Data Mining
18/09/2006 115ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Techniques and Algorithms usedTechniques and Algorithms used
Weighted vectorsFIS (Fuzzy Inference System)
ClusteringSimple k-means Maximin
Profiling Profiling The k-Profiler
RecommendationRecommendation
EvaluationEvaluation
Agent Intelligence through Data Mining
18/09/2006 116ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data PreprocessingData Preprocessing
Transaction vectorsOutliersRobots, spiders, etc.
Pageview vectorsHomepageBanners, etc.Very frequent pagesVery rare pages
Fuzziness
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 53
Agent Intelligence through Data Mining
18/09/2006 117ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
ClusteringClustering andand ProfilingProfiling
MaximinDecides on the number of clustersFinds primary cluster centers
Simple k-meansCreates transaction clusters
K-ProfilerCreates the transaction profiles
Agent Intelligence through Data Mining
18/09/2006 118ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
κκ--Profiler Profiler –– From theoryFrom theory......T = t1, t2, …, tm – Transaction Set
P = p1, p2, …, pn – Pageview Set
t = <w(p1,t), …, w(pn,t)> – Transaction Vector
TC = c1, c2, ... , ck – Transaction Groups
prc = <p,weight(p, prc)> | p∈P, weight(p, prc) ≥ μ– User Profiles
weight(p, prc) =1
( , )| | t C
w p tc ∈⋅ ∑
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 54
Agent Intelligence through Data Mining
18/09/2006 119ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
ProfileProfile 11
GroupGroup 11t1 = p1, p2
t2 = p3, p7
t3 = p1, p7X
X
X
p1: (1+1)/3 = 66,67%
p2: 1/3 = 33,33%
p3: 1/3 = 33,33%
p7: (1+1)/3 = 66,67%
k-Profilerfilter >50%
(p1, 0.6667)
(p7, 0.6667)
... ... to practiceto practice
Agent Intelligence through Data Mining
18/09/2006 120ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The Recommendation engineThe Recommendation engine
User profile C = w , w , …, w
where w =
match(S, C) =
Rec(S, p) =
UREC(S) = w | C∈UP, and Rec(S, w) ≥ ρ
1C
2C C
n
( , ),
0,i iweight p C p C
otherwise
∈⎧⎨⎩
2 2( ) ( )
Ck k
k
Ck k
k k
w S
S w
⋅
×
∑
∑ ∑
( , ) ( , )weight p C match S C⋅
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 55
Agent Intelligence through Data Mining
18/09/2006 121ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Work on agentsWork on agents
Modeling agent actions according to the ‘κ-Profile’ paradigmImplementation of the agent behavior that shall incorporate the recommendation engine into the agentsTesting and dissemination…
Agent Intelligence through Data Mining
18/09/2006 122ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Part 9 Part 9 --Data Mining Applications Data Mining Applications
and Trendsand Trends
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 56
Agent Intelligence through Data Mining
18/09/2006 123ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Web miningBiomedical and DNA data analysisFinancial data analysisRetail industryTelecommunication industry
Market analysis and management
Risk analysis and management
Fraud detection and management
Basic Data Mining application domainsBasic Data Mining application domains
Agent Intelligence through Data Mining
18/09/2006 124ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Web mining Web mining -- Data on the WebData on the Web
Primary data (Web content):Mainly text, with some multimedia content and mark-up commands.Underlying databases (not directly accessible).
Secondary data (Web usage):Access logs collected by a Web server.A variety of navigational information collected by Web clients.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 57
Agent Intelligence through Data Mining
18/09/2006 125ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Approaches to Web miningApproaches to Web mining
Web content mining:Pattern discovery in Web content data.Mainly mining unstructured textual data.
Web structure miningPattern discovery in the Web graph.The graph is defined by the hyperlinks.
Web usage miningDiscovery of interesting usage patterns.Mainly in server logs.
Agent Intelligence through Data Mining
18/09/2006 126ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Web content miningWeb content mining
Information AccessDocument category modelingConstruction of thematic hierarchies
Fact ExtractionExtraction of product information, presented in different
formats
Information ExtractionStructured “event” summaries from large textual corpora
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 58
Agent Intelligence through Data Mining
18/09/2006 127ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Web structure mining (1/2)Web structure mining (1/2)
Information retrieval can be improved by:Identifying authoritative pages.Identifying resource index pages.Summarizing common references.
Linked pages often contain complementary information
(e.g. product offers).
Structural analysis of a Web site facilitates its
improvement.
Agent Intelligence through Data Mining
18/09/2006 128ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Web structure mining (2/2)Web structure mining (2/2)
Social network analysis:Nodes with large fan-in (authorities) provide high quality information.Nodes with large fan-out (hubs) are good starting points.
Disconnected subgraphs correspond to different
social (e.g. research) communities.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 59
Agent Intelligence through Data Mining
18/09/2006 129ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Web Usage Mining (1/2)Web Usage Mining (1/2)
Personalization:Better service for the user:
Reduction of the information overload.More accurate information retrieval and extraction.
Customer relationship management:Customer segmentation and targeted advertisement.Customer attraction and retention strategy.Service improvement (site structure and content).
Agent Intelligence through Data Mining
18/09/2006 130ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Web Usage Mining (2/2)Web Usage Mining (2/2)
Data collection
Data pre-processing
Pattern discovery
Knowledge post-processing
Collection of usage data by the server and the client.
Data cleaning, user identification, session identification
Construction of user models
Report generation, visualization, personalization module.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 60
Agent Intelligence through Data Mining
18/09/2006 131ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
DNA sequences: 4 basic building blocks (nucleotides):
adenine (A) cytosine (C) guanine (G)thymine (T)
Gene: a sequence of hundreds of individual nucleotides arranged in a particular order
Biomedical Data Mining and DNA Biomedical Data Mining and DNA Analysis (1/2)Analysis (1/2)
Agent Intelligence through Data Mining
18/09/2006 132ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Tremendous number of ways that the nucleotides can be ordered and sequenced to form distinct genesSemantic integration of heterogeneous, distributed genome databases
Current: highly distributed, uncontrolled generation and use of a wide variety of DNA dataData cleaning and data integration methods developed in data mining will help
Biomedical Data Mining and DNA Biomedical Data Mining and DNA Analysis (2/2)Analysis (2/2)
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 61
Agent Intelligence through Data Mining
18/09/2006 133ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining for Financial Data Analysis (1/2)Data Mining for Financial Data Analysis (1/2)
Financial data collected in banks and financial institutions areoften relatively complete, reliable, and of high quality
Design and construction of data warehouses for multidimensional data analysis and data mining
View the debt and revenue changes by month, by region, by sector, and by other factorsAccess statistical information such as max, min, total, average, trend, etc.
Loan payment prediction/consumer credit policy analysis
feature selection and attribute relevance rankingLoan payment performanceConsumer credit rating
Agent Intelligence through Data Mining
18/09/2006 134ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Classification and clustering of customers for targeted marketing
Multi-dimensional segmentation by nearest-neighbor, classification, decision trees, etc. to identify customer groupsor associate a new customer to an appropriate customer group
Detection of money laundering and other financial crimes
Integration of data from multiple DBs (e.g., bank transactions, federal/state crime history DBs)Tools: Data visualization, linkage analysis, classification, clustering tools, outlier analysis, and sequential pattern analysis tools (find unusual access sequences)
Data Mining for Financial Data Analysis (2/2)Data Mining for Financial Data Analysis (2/2)
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 62
Agent Intelligence through Data Mining
18/09/2006 135ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Retail industry: Huge amounts of data on sales, customer shopping history, etc.
Applications of retail data mining:Identify customer buying behaviors
Discover customer shopping patterns and trends
Improve the quality of customer service
Achieve better customer retention and satisfaction
Enhance goods consumption ratios
Design more effective goods transportation and distribution policies
Data Mining for Retail IndustryData Mining for Retail Industry
Agent Intelligence through Data Mining
18/09/2006 136ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
A rapidly expanding and highly competitive industry and a great demand for data mining
Understand the business involved
Identify telecommunication patterns
Catch fraudulent activities
Make better use of resources
Improve the quality of service
Multidimensional analysis of telecommunication dataIntrinsically multidimensional: calling-time, duration, location of caller, location of callee, type of call, etc.
Data Mining for TelecommunicationsData Mining for Telecommunications
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 63
Agent Intelligence through Data Mining
18/09/2006 137ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data types:Relational
Transactional
Text
Time sequence
Spatial, etc.
Data sources:ASCII text files, multiple relational data sourcessupport ODBC connections (OLE DB, JDBC)?
Need for multiple dimensional views in selection
Data Mining Tools Data Mining Tools -- Criteria (1/3)Criteria (1/3)
Agent Intelligence through Data Mining
18/09/2006 138ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining Tools Data Mining Tools -- Criteria (2/3)Criteria (2/3)
Data mining functions and methodologies:One vs. multiple data mining functionsOne vs. variety of methods per function
Coupling with DB and/or data warehouse systems:Four forms of coupling: no coupling, loose coupling, semi-tight coupling, and tight coupling.
ScalabilityRow (or database size) scalabilityColumn (or dimension) scalabilityCourse of dimensionality: it is much more challenging to make a
system column scalable that row scalable
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 64
Agent Intelligence through Data Mining
18/09/2006 139ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Visualization tools“A picture is worth a thousand words”Visualization categories: data visualization, mining result
visualization, mining process visualization, and visual data mining
System issuesRunning on only one or on several operating systems?Client/server architecture?Provide Web-based interfaces and allow XML data as input
and/or output?
Data Mining Tools Data Mining Tools -- Criteria (3/3)Criteria (3/3)
Agent Intelligence through Data Mining
18/09/2006 140ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The Generations of DM systemsThe Generations of DM systems
First generation: Systems that performed classification or clustering and were based on a certain algorithm
Second generation: Systems that provide better support throughout the whole KDD process
Third generation: Systems that deal with the business end user rather than advanced data analysis
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 65
Agent Intelligence through Data Mining
18/09/2006 143ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Part 10 Part 10 ––Intelligent Agent Intelligent Agent
TechnologyTechnology
Agent Intelligence through Data Mining
18/09/2006 144ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Positioning agents in the software Positioning agents in the software development processdevelopment process
Agent technology is the next step in object-oriented programming.
It satisfies all the requirements, while it supports major key properties, since agents are:
autonomous, goal-oriented, cooperative, communicative, adaptive…
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 66
Agent Intelligence through Data Mining
18/09/2006 145ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agents: A systemAgents: A system--building paradigmbuilding paradigm
RelatedTechnologies
Distributed SystemsDistributed Systems
Mobile CodeMobile Code
Database &Database &Knowledge base Knowledge base
TechnologyTechnologyMachine LearningMachine Learning
AI & CognitiveAI & CognitiveScienceScience
Information RetrievalInformation Retrieval
agents2004
objects1982
structured programming
1974= =
Agent Intelligence through Data Mining
18/09/2006 146ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Intelligent Agents: An IntroIntelligent Agents: An Intro……
Intelligent Agents (IAs) are considered to be very important because they promise to change the way that people interact with computers.
The underpinning concepts of IAs can be traced back to the early years of Artificial Intelligence -40 years ago
Research on IAs is considered to be in its first stages
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 67
Agent Intelligence through Data Mining
18/09/2006 147ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Definitions of AgentsDefinitions of Agents
There is not a consensus definition of IAs!!!
There are several operational definitions:Dictionary DefinitionCommon Definition PatternsDefinition of IAs based on their characteristics
Agent Intelligence through Data Mining
18/09/2006 148ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Dictionary DefinitionDictionary Definition
“An agent is one who acts for or in An agent is one who acts for or in the place of another by authority from the place of another by authority from
himhim…… “
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 68
Agent Intelligence through Data Mining
18/09/2006 149ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Common Definition PatternsCommon Definition Patterns
An agent is a software entity that…Acts autonomously in a goal-oriented mannerFunctions continuouslyActs on behalf of another entity (human, IA)Is able to perceive its environment through sensors and act on it through effectorsEmploys some degree of knowledge
Agent Intelligence through Data Mining
18/09/2006 150ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Attributes of an Agent (1/2)Attributes of an Agent (1/2)
AutonomyInteractivity
ReactivityPro-activity
AdaptivitySociabilityCollaborative Behavior
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 69
Agent Intelligence through Data Mining
18/09/2006 151ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Attributes of an Agent (2/2)Attributes of an Agent (2/2)
CompetitivenessTemporal ContinuityPersonalityMobilityLearning
Agent Intelligence through Data Mining
18/09/2006 152ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Taxonomies of Agents (1/2)Taxonomies of Agents (1/2)
Nwana’s 3 dimensional classification:
Collaborative Collaborative Learning AgentsLearning AgentsSmartSmart
AgentsAgents
Collaborative Collaborative AgentsAgents Interface AgentsInterface Agents
LearnCooperate
Autonomous
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 70
Agent Intelligence through Data Mining
18/09/2006 153ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Taxonomies of Agents (2/2)Taxonomies of Agents (2/2)
We can also classify agents by:What they do (Web Search, Information Filtering, Notification, Financial Services, Entertainment, etc.)
Where they act (Desktop, Internet, Intranet)
The degree of the characteristics they enjoy(Mobile, Collaborative, Reactive, etc.)
A Combination of the above (Hybrid Agents)
Agent Intelligence through Data Mining
18/09/2006 154ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Underlying TechnologiesUnderlying Technologies
Technologies for Developing AgentsInternal InfrastructureProgramming LanguagesStandards for distributed computing
Agents Communication
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 71
Agent Intelligence through Data Mining
18/09/2006 155ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Internal InfrastructuresInternal Infrastructures
Knowledge-based expert systems
Agents and Objects
AI technologies (Neural Networks, Genetic Algorithms, Fuzzy Systems)
Agent Intelligence through Data Mining
18/09/2006 156ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Programming LanguagesProgramming Languages
TelescriptSmalltalkJAVAC++Other languages and development frameworks
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 72
Agent Intelligence through Data Mining
18/09/2006 157ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Standards for Distributed SystemsStandards for Distributed Systems
CORBA (Common Object Request Broker Architecture)
DCOM (Distributed Component Object Model)
JAVA / RMI (Remote Method Invocation)
Agent Intelligence through Data Mining
18/09/2006 158ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent CommunicationAgent Communication
KQML (Knowledge and Querying Manipulation Language)
KIF (Knowledge Interchange Format)
FIPA (Agent Communication Language)
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 73
Agent Intelligence through Data Mining
18/09/2006 159ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Some Important Issues of Agent DesignSome Important Issues of Agent Design
Multi-agent architecture vs. Single-agent architecture
Mobility
Security
Agent Intelligence through Data Mining
18/09/2006 160ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Why MAS instead of a Single Agent? (1/2)Why MAS instead of a Single Agent? (1/2)
A single Agent that handles a great amount of tasks lacks performance, reliability, maintainability, etc.
MAS can provide modularity, flexibility, modifiability, extensibility, due to Distributed Environments
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 74
Agent Intelligence through Data Mining
18/09/2006 161ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
A single Agent cannot obtain extensive and specialized knowledge
A MAS can access more knowledge resources
Applications requiring Distributed Computing are better supported by MAS
Intelligence, in neuroscience terms, can be approached by a multi-processing system such as a wide distributed environment of MAS
Why MAS instead of a Single Agent? (2/2)Why MAS instead of a Single Agent? (2/2)
Agent Intelligence through Data Mining
18/09/2006 162ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
MobilityMobility
Mobility: The transportation of an Agent to remote services resourcesRationale for mobility: Improved performance
New requirements are introduced:Presence of an agent server
Security
Common standards for IAs Communication
Complexities about system maintenance and IAsidentification in a distributed environment
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 75
Agent Intelligence through Data Mining
18/09/2006 163ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
SecuritySecurity
Unauthorized disclosure
Unauthorized alteration
Unauthorized damage
Unauthorized copy and replay
Denial of Service
Repudiation
Spoofing and Masquerading
Agent Intelligence through Data Mining
18/09/2006 164ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Current Status of Agent Applications (2/2)Current Status of Agent Applications (2/2)
Electronic Commerce
Information Retrieval and Knowledge Management
Mobile Computing
Planning and Scheduling
Scientific Applications
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 76
Agent Intelligence through Data Mining
18/09/2006 165ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Distributed Project Management
Manufacturing
Networking
Other applications (economics, business, military, etc.)
Tools for Agent Development
Current Status of Agent ApplicationsCurrent Status of Agent Applications
Agent Intelligence through Data Mining
18/09/2006 166ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Standards and OrganizationsStandards and Organizations
AgentCities
AgentLink
FIPA
Knowledge Sharing Effort
OMG
Other Organizations (Agentx Working Group, International Foundation for MAS, Ontology.Org)
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 77
Agent Intelligence through Data Mining
18/09/2006 168ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Part 11 Part 11 ––Agent Intelligence & Data Agent Intelligence & Data
MiningMining
Agent Intelligence through Data Mining
18/09/2006 169ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The main problemThe main problem……
Data Mining Intelligent Agents
Do they need each other?Do they need each other?
Symbiosis?Symbiosis?
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 78
Agent Intelligence through Data Mining
18/09/2006 170ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Reasoning Agent
Software Agent Paradigm
Knowledge Discovery Roadmap
AgentAgent--oriented Software Engineeringoriented Software Engineering
Historical Data Knowledge Model
ApplicationDomain Agent Modelling
AgentType
InferenceStructure
Agent Intelligence through Data Mining
18/09/2006 171ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The main ideaThe main idea......
Data / Problem model Knowledge Extraction
Knowledge Model 1
Knowledge Model 2
Knowledge Model n
…….
Agent Modeling
..…
Agent Type 1
Agent Type 2
Agent Type m
Agent A Agent B
++
Agent EAgent D
Agent C
++
Agent F
++
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 79
Agent Intelligence through Data Mining
18/09/2006 172ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The main goalThe main goal
The development of a unified methodology that:Takes logic limitations into accountIs supported by the appropriate toolsHas been applied to a satisfactory number of applications, in order to:
Provide the capability of dynamically incorporating knowledge to SAs and MAS. This knowledge has been
extracted with the use of DM techniques
Agent Intelligence through Data Mining
18/09/2006 173ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Knowledge diffusion levelsKnowledge diffusion levels
Extracting knowledge on a MAS application level:Perform data mining techniques on application data, in order to discover useful rules – associations – patterns.
Extracting knowledge on a MAS behavioral level:Perform data mining techniques on agent behavior data, in order to predict their behaviors and, thus, reduce system work load. The extracted knowledge is related to agent actions.
Extracting knowledge on evolutionary agent communities:Deploy evolutionary DM techniques, in order to study societal issues. It has to do with the satisfaction of the goal of a community, which evolves and learns through interaction.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 80
Agent Intelligence through Data Mining
18/09/2006 174ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Defining trainingDefining training
Training: The process of dynamically incorporating DM-extracted knowledge models
to SAs and MAS.
Agent Intelligence through Data Mining
18/09/2006 175ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The MethodologyThe Methodology
Common steps
BehaviorLevel
ApplicationLevel
EvolutionaryCommunities
Multi-agent system
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 81
Agent Intelligence through Data Mining
18/09/2006 176ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The Common stepsThe Common steps
Create agents
Create agent types
Create agent behaviors
Extract knowledge model
Apply data mining techniques
Instantiate application
Agent Monitoring
CreateOntology
Retrain agents
Agent Intelligence through Data Mining
18/09/2006 177ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
ApplicationApplication--oriented DMoriented DM……
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 82
Agent Intelligence through Data Mining
18/09/2006 178ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
ApplicationApplication--oriented DMoriented DM……
Data mining (clustering, classification…) is performed on application-specific data
For example, develop a MAS that decides whether a golf game will be played, depending on humidity, outlook, wind, etc.
The decision is extracted through data mining
Agent Intelligence through Data Mining
18/09/2006 179ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
A significant data volume must be available (the bigger the better)
The DM expert has to decide on the best way to exploit the resulting knowledge
The multi-agent system architecture has to seriously take into consideration specific limitations, such as the safety and soundness of the application.
PrerequisitesPrerequisites……
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 83
Agent Intelligence through Data Mining
18/09/2006 180ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
AdvantagesAdvantages……
The size of knowledge bases of intelligent agents can dramatically affect their performance.
Agent retraining on new data is feasible
The retraining process can be automatedand dynamic, providing versatility to MAS implementation.
Agent Intelligence through Data Mining
18/09/2006 181ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
An integrated tool for embedding data mining extracted intelligence into agents
is Agent Academy (AA):http://sourceforge.net/projects/agentacademy
Tools and techniquesTools and techniques……
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 84
Agent Intelligence through Data Mining
18/09/2006 182ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The AA hypothesisThe AA hypothesis……
Using Agent Academy and its data-mining capabilities, the MAS developer can reduce the effort required to:
a. Develop new or expand existing applications with intelligent agents, and
b. Upgrade them, as needed, by retraining the agents deployed
Agent Intelligence through Data Mining
18/09/2006 183ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The AAThe AA--1.0 architecture1.0 architecture……
Agent Factory
Data A
cquisition
AgentTrainingModule
Agent UseRepository
Agent Request
Agent AcademyAgent Academy
Agent Tracking
DataWEB
Data Miner
Application Data
Agent Behavior Data
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 85
Agent Intelligence through Data Mining
18/09/2006 184ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Components of the AAComponents of the AA--1.0 System1.0 System
instantiated MAS
Agent Academy
AURDMM
ATM
Agent Factory (main GUI)
Protégé 2000-
Agent Types Definition
Behaviour Types Definition
MAS Creation Tool
Ontology definition
Application Data
Agent Intelligence through Data Mining
18/09/2006 185ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent FactoryAgent Factory
instantiated MAS
Agent Academy
AURDMM
ATM
Agent Factory (main GUI)
Protégé 2000-
Agent Types Definition
Behaviour Types Definition
MAS Creation Tool
Ontology definition
Application Data
1. Allows developers to design a complete agent-based scenario through GUIs
2. Agent behaviors are modeled as workflow processes and designed through a GUI
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 86
Agent Intelligence through Data Mining
18/09/2006 186ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent Use RepositoryAgent Use Repository
instantiated MAS
Agent Academy
DMM
ATM
Agent Factory (main GUI)
Protégé 2000-
Agent Types Definition
Behaviour Types Definition
MAS Creation Tool
Ontology definition
AUR
Application Data1. A database for all the information
• Related with DMM:
i. Application and tracking data
ii. Decision structures
• Related with AF and ATM:
i. Ontologies
ii. Behavior types
iii.Agent types
iv.Agent instances
v. And more…
2. Application data stored in a generic way
3. Data structures defined by the ontologies
Agent Intelligence through Data Mining
18/09/2006 187ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data Mining Module (DMM)Data Mining Module (DMM)
instantiated MAS
Agent Academy
AURDMM
ATM
Agent Factory (main GUI)
Protégé 2000-
Agent Types Definition
Behaviour Types Definition
MAS Creation Tool
Ontology definition
Application Data
1. DMM applies data mining techniques on “application data” and “tracking data”
2. “Application data” describe the application domain
3. “Tracking data” reveal agent-specific interaction patterns (mainly used for retraining)
4. Selection from a certain data set
5. Choice of specific DM algorithms
6. Allows the user to evaluate, store, visualize data mining results
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 87
Agent Intelligence through Data Mining
18/09/2006 188ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent Training Module (ATM)Agent Training Module (ATM)
instantiated MAS
Agent Academy
AURDMM
ATM
Agent Factory (main GUI)
Protégé 2000-
Agent Types Definition
Behaviour Types Definition
MAS Creation Tool
Ontology definition
Application Data
1. ATM keeps track of
1. Which agent implements which behaviors
2. Which behavior uses which decision structures and physical assets
2. ATM informs the related agents
1. Whenever there is an update in the physical assets or modifications in the behaviors
2. When a change in a decision structure is reported by DMM
Agent Intelligence through Data Mining
18/09/2006 189ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The Integrated PlatformThe Integrated Platform
instantiated MAS
Agent Academy
AURDMM
ATM
Agent Factory (main GUI)
XMLApplication Data
Protégé 2000-
Agent Types Definition
Behaviour Types Definition
MAS Creation Tool
Ontology definition
Extraction of the decision
model
Insertion of the decision model
into agents
Instantiation of a new
MAS Agent with reasoning
Dummy agents
Data storage
Design of agent application
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 88
Agent Intelligence through Data Mining
18/09/2006 190ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
MultiMulti--Agent Scenario CreationAgent Scenario Creation
instantiated MAS
Agent Academy
AURDMM
ATM
Agent Factory (main GUI)
Protégé 2000-
Agent Types Definition
Behaviour Types Definition
MAS Creation Tool
Ontology definition
Instantiation of a new
MAS
Agent with reasoning
Dummy agents
DSsPAsOntologies
Insertion of the decision model
into agents
ACL
RMI
Agent Intelligence through Data Mining
18/09/2006 191ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent RetrainingAgent Retraining
Application Developer
AA PlatformAA Platform
Application Owner
Knowledge-base
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 89
Agent Intelligence through Data Mining
18/09/2006 192ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Retraining through DMMRetraining through DMM
instantiated MAS
Agent Academy
AURDMM
ATM
Agent Factory (main GUI)
Protégé 2000-
Agent Types Definition
Behaviour Types Definition
MAS Creation Tool
Ontology definition
JESS Rules new
decision model
decision structureid
RMI
RetrainedAgent
XMLApplication Data1. Decision structures are
modified by DMM
2. DMM passes the id of the modified DS to ATM
3. ATM obtains the content of the modified DS from AUR
4. ATM converts the DS into JESS rules
5. ATM informs the related task agents
Agent Intelligence through Data Mining
18/09/2006 193ECML/PKDD 2006, Berlin, Germany
AUTHAUTHAAAA--1.01.0……
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 90
Agent Intelligence through Data Mining
18/09/2006 194ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
NeverthelessNevertheless……
Although ambitious, AA-1.0 was not very easy to use. No software code is produced (agent code is not transparent)Training and retraining is very complicatedAA-1.0 is a MAS itself, inserting limitations on module communicationThis is why AA-2.0 has been released
Agent Intelligence through Data Mining
18/09/2006 195ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
22
AAAA--2.0 Main Frame2.0 Main Frame
Behavior Design Tool
Agent Design Tool
MAS Design Tool
Project Management
(Clean – Build-Run
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 91
Agent Intelligence through Data Mining
18/09/2006 196ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
22
AAAA--2.0 Project Actions2.0 Project Actions
Project Build
Successful
Project Notepad
Behaviors, Types, Agents
etc
Agent Intelligence through Data Mining
18/09/2006 197ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
AAAA--2.0 Behavior Design Tool2.0 Behavior Design ToolStructured
Source Code
Editing
Method Blocks
Behavior Tools`
Generated Source Code
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 92
Agent Intelligence through Data Mining
18/09/2006 198ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
AAAA--2.0 Agent Design Tool2.0 Agent Design Tool
Generated Source Code
Behavior Execution
Debug Tool
Data MiningTool
Agent Behaviors Tab
Agent Intelligence through Data Mining
18/09/2006 199ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
AAAA--2.0 Data Mining Steps2.0 Data Mining Steps
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 93
Agent Intelligence through Data Mining
18/09/2006 200ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
AAAA--2.0 DM Integration2.0 DM Integration
Newly created AI Behavior ready to
be imported
Add AI Behavior Button
Agent Intelligence through Data Mining
18/09/2006 201ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
AAAA--2.0 MAS Design Tool2.0 MAS Design Tool
Add Agents into MAS
Run Agent Scenario
Agents in MAS
Agent State
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 94
Agent Intelligence through Data Mining
18/09/2006 202ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
AAAA--2.0 Running Scenarios2.0 Running Scenarios
Agent is printing a message (Cyclic
Behavior)
Agents is active
Agent present in JADE Platform
Agent Intelligence through Data Mining
18/09/2006 203ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent behaviorAgent behavior--oriented DMoriented DM……
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 95
Agent Intelligence through Data Mining
18/09/2006 204ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Why BehaviorWhy Behavior--oriented DM (1/2)oriented DM (1/2)
Interaction among agents is crucial for the efficiency of MAS
New agents enter into the system without the necessary knowledge and skills New agents are not able to learn from the others’behavior It is not possible to define and represent a priori the relevant knowledge the agents need for the interaction
Agent Intelligence through Data Mining
18/09/2006 205ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Why BehaviorWhy Behavior--oriented DM (2/2)oriented DM (2/2)
In order to improve its behavior, a new agent should act consistently with the knowledge and the behaviors (culture) of the other agents.
A way for supporting multi-agent interaction based on the idea of
Implicit Culture (IC)http://www.science.unitn.it/~pgiorgio/ic/
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 96
Agent Intelligence through Data Mining
18/09/2006 206ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The main idea of Implicit CultureThe main idea of Implicit Culture……
A situation of the agent environment is represented as a set:
<a, <a, σσ, , tt>>where:
a: set of agentsσ: set of scenes (environment + actions)t: time slot
The goal is to predict the next executed action IC is realized through SICS…
Agent Intelligence through Data Mining
18/09/2006 207ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Systems for Implicit Culture Support (SICS)Systems for Implicit Culture Support (SICS)
SICS main goals are to:Establish the implicit culture phenomenon and
Propose the next expected situated actions based on the already “played” scenes.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 97
Agent Intelligence through Data Mining
18/09/2006 208ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
a
σtα
b
σ’tβ c
σ”tγ
G’G
ε
The SICS architecture...The SICS architecture...
Observer DB
Observer stores in a data base the situated executedactions of the agents of G.
Σ0InductiveModuleΣ
Inductive Module that,using the data of the DB and the a priori theory Σo, induces a cultural constraint theory Σ.
Composer
Composer proposes to a group G’ a set of scenes such that the expected situated actions satisfy Σ.
Agent Intelligence through Data Mining
18/09/2006 209ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
a
bc
G’G
ε
The SICS architecture...The SICS architecture...
Observer DB
Observer stores in a data base the situated executedactions of the agents of G.
Σ0InductiveModuleΣ
Inductive Module that,using the data of the DB and the a priori theory Σo, induces a cultural constraint theory Σ.
Composer
Composer proposes to a group G’ a set of scenes such that the expected situated actions satisfy Σ.
σt+1
σ’t+1
σ”t+1
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 98
Agent Intelligence through Data Mining
18/09/2006 210ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The methodologyThe methodology
CommonSteps
ApplicationLevel
Evolutionary Communities
BehaviorLevel
Multi-agent system
Agent Intelligence through Data Mining
18/09/2006 211ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 99
Agent Intelligence through Data Mining
18/09/2006 212ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Online processOffline process
Behavior level: The training Behavior level: The training frameworkframework
Recommendation Engine
Compare current action sequence
with stored profiles
Produce recommendationsfor agent actions
Agent action monitoring
Agent action dataset processing
Agent action monitoring
Profile storing
Apply Data Mining techniques
Agent action dataset processing
Agent Intelligence through Data Mining
18/09/2006 213ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
κ-Profile is a mechanism, which:
Represents agent actions, Creates agent action profiles, Produces recommendations, andEvaluates them.
The developed DM algorithmThe developed DM algorithm
Apply Maximin
Apply K-Means
Create profiles
Produce recommendations
Evaluate recommendations
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 100
Agent Intelligence through Data Mining
18/09/2006 214ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The parameters to determineThe parameters to determine......
In order to develop a MAS with recommendation abilities (that follows the presented methodology), the parameters that have to be specified are:
1. All possible agent actions.
2. The specific MAS goal.
3. The fuzzy inference engine parameters.
Agent Intelligence through Data Mining
18/09/2006 215ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Knowledge extraction Knowledge extraction in agent communitiesin agent communities
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 101
Agent Intelligence through Data Mining
18/09/2006 216ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The orientationThe orientation……
Self-organization is 100% pure AI (Artificial Intelligence) orientedA lot of research work on this topic…
Agent Intelligence through Data Mining
18/09/2006 217ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The goal of selfThe goal of self--organizing MASorganizing MAS……
To achieve a certain MAS goal, through “self-learning”Self-learning has to do with the penalties or awards the MAS receives whenever it performs an action.The most difficult part in developing a self-organizing MAS is to find the best award/penalty function…
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 102
Agent Intelligence through Data Mining
18/09/2006 218ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The main technologiesThe main technologies……
Neural NetworksMainly used for classification (the output is categorical)Really efficient when confronted with a specific problem
Genetic AlgorithmsMainly used for optimization problems, where dynamic programming cannot be usedThe correct representation of the problem is crucial
Agent Intelligence through Data Mining
18/09/2006 219ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent Communities Agent Communities (1/(1/22))
Agent communities simulate problems that are complicated, heterogeneous and non-linear.
The internal structure of such systems is inherently complicated, due to continually varying interactions.
The goal of agent communities is collective, and its evaluation is done through indicators.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 103
Agent Intelligence through Data Mining
18/09/2006 220ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent Communities Agent Communities (2/(2/22))
Agent communities do not offer historical data, for DM techniques to be applied on and knowledge models to be extracted.
The presented approach is based on way evolutionary techniques can be exploited in order to augment agent community intelligence.
The knowledge extraction mechanism employed evaluates agent decisions (rewards or punishes) and updates the agent decision model.
Agent Intelligence through Data Mining
18/09/2006 221ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The methodologyThe methodology
CommonSteps
ApplicationLevel
BehaviorLevel
Evolutionary Communities
Multi-agent System
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 104
Agent Intelligence through Data Mining
18/09/2006 222ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Evolutionary communities Evolutionary communities ––training frameworktraining framework
Genetic AlgorithmMechanism
Decision Evaluation Mechanism
Agent action monitoringAgent
Knowledge Model
Agent action
AgentDecision
EvaluationKnowledge
ModelUpdate
Agent Intelligence through Data Mining
18/09/2006 223ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Part 12 –MAS exploiting
DM-extracted intelligence
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 105
Agent Intelligence through Data Mining
18/09/2006 224ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The presented systemsThe presented systems……
An ERP add-on, providing intelligent policy recommendations on customer and supplier management.An intelligent environmental monitoring system.A decentralized maintenance management system.A self-organizing agent community “in danger”.An agent-based, e-auction system.
Agent Intelligence through Data Mining
18/09/2006 225ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The ERP addThe ERP add--onon
Provides an Intelligent Shell on-top of an ERP, working in close cooperation with it.Incorporates intelligence inside a business process: the handling of a customer’s order.Makes a specific recommendation for a specific customer and a specific order.Manipulates important ERP data with efficient algorithms for reducing information overload.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 106
Agent Intelligence through Data Mining
18/09/2006 226ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Main GoalsMain Goals (2/2)(2/2)
Order Recommendation
SupplierRelationshipManagement
(SRM)
DemandForecasting
MarketBasket
Analysis
CustomerRelationshipManagement
(CRM)
CustomersInventory
Suppliers
Agent Intelligence through Data Mining
18/09/2006 227ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Why do it with agents?Why do it with agents?
Multiple Loci of Control:Changes in the goals and behaviors of one agent adequate to adjust the whole system.
Software & Business Engineering Perspective:
Coordination is done easily both in the design and the development phase.
Adoption of Agents - no questions to us
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 107
Agent Intelligence through Data Mining
18/09/2006 228ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Why do it with data mining?Why do it with data mining?
Has been widely used for CRMExtended to SRMWith Clustering
Deal with noisy data (missing, unknown or erroneous)Find non-trivial, unknown, implicit, potentially useful patternsLearn and provide quick responses
Market Basket Analysis and Association RulesDiscover customer oriented buying habitsDiscover market oriented buying habits
Agent Intelligence through Data Mining
18/09/2006 229ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Responsible for producing product profiles based on
historical data
CustomerCustomer
OrderAgent
RecommendationAgent
CustomerPattern
IdentificationAgent
InventoryPattern
IdentificationAgent
SupplierPattern
IdentificationAgent
ERPAgent
CustomerData
SupplierData
InventoryData
MAS
ERP
GUI agent for accepting orders and displaying the final recommendation.
Gets the order preferences
and request the profiles of the
involved entities from the PIAs
Aggregates and manipulates the resulted profiles
into a single recommendation
Responsible for producing customer
profiles based on historical data
Responsible for producing supplier profiles based on
historical data
Transducer from XML to ACL.
Responsible for retrieving up-to-
date data.
ProfileRepository
Architectural DiagramArchitectural Diagram
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 108
Agent Intelligence through Data Mining
18/09/2006 230ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Final RecommendationFinal RecommendationProductProfiles
Availabilities
Ordered Quantities
Split Policy
sS Point
Customer & SupplierProfiles
OrderPreferences
RA
Static BusinessRules
Delivery Quantities
Quantities to Procure
Delivery Days
Due Days
Additional Discounts(Positive, Negative)
Agent Intelligence through Data Mining
18/09/2006 231ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Conclusions (1/2)Conclusions (1/2)
Added value without rebuilding or reengineering the systemReduction of work effort through adaptability of business rules and automatic generation of the recommendationReusable and customizable system (match different ERP platforms with minimum effort)
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 109
Agent Intelligence through Data Mining
18/09/2006 232ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Conclusions (2/2)Conclusions (2/2)
YesNoAutonomy
YesNoCustomer Management &Pricing Policy
YesNoSupplier Management
HighLowAdaptability
AutomaticallyIndirectly, through reports
Recommendations
YesNoMarket Basket Analysis
ERP + DKEClassic ERPImprovement
Agent Intelligence through Data Mining
18/09/2006 235ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The systemThe system……
Real-time Environmental MonitoringAgents detect ozone alarms and notify interested partiesThe application domain
Real–time systems managementEnvironmental assessmentEmergency real-time systemsReal time applicationsDecision support systems
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 110
Agent Intelligence through Data Mining
18/09/2006 236ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The architectural diagramThe architectural diagram
Early Alarms
Alarms
Distribution Agent
Diagnosis Agent1
Sensor1
Diagnosis Agent2
Sensor2
Diagnosis Agentn
Sensor n
Alarm Agent
D.B.
Distribution Agent
Semantic
Representation
FeedBack Agent
Human Expert
Human Expert
DISTRIBUTIONSYSTEM
MANAGEMENTSYSTEM
DISTRIBUTION SYSTEM
Measurements
Alarms
Alarms
Short Forecast AgentRequest
Short TimePrediction
Agent Intelligence through Data Mining
18/09/2006 237ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Diagnosis AgentsDiagnosis Agents
Diagnosis Agents:Get measurements at real-time from sensorsVerify that the sensors operate properly
Inductive reasoning enginesValidate incoming measurementsPredict a missing ozone measurement level
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 111
Agent Intelligence through Data Mining
18/09/2006 238ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Knowledge ExtractionKnowledge Extraction
From raw data to data-driven reasoning Quinlan’s C4.5 for decision tree inductionTwo reasoning engines in Diagnosis Agent:1. Incoming Data Validation2. Missing Measurement Estimation
Agent Intelligence through Data Mining
18/09/2006 239ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Validation Reasoning EngineValidation Reasoning EngineAttributesO3 The current ozone valueO3_30 The ozone value 30 min agoO3_90 The ozone value 90 min agoMinMax60 The difference between the maximum
and the minimum ozone value in the last 60 minMinMax150 The difference between the maximum
and the minimum ozone value in the last 150 minO3val The corresponding validation tag (valid/erroneous)
Confusion MatrixValidation Decision ModelRecords classified as : valid erroneousNo. records in class ‘valid’: 34,454 21No. records in class ‘erroneous’: 63 420
Decision Tree ParametersSize of Decision Tree: 29 (15 Leaves)Correctly classified records: 99.71%
C4.5
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 112
Agent Intelligence through Data Mining
18/09/2006 240ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Estimation Reasoning EngineEstimation Reasoning EngineAttributesNO concurrent value of NO concentrationNO2 concurrent value of NO2 concentrationNOX concurrent value of NOx concentrationTEM concurrent value of TemperatureHR concurrent value of Relative HumidityO3_15 ozone value 15 min agoO3_30 ozone value 30 min agoO3Class missing ozone value level (low/med)
Confusion MatrixEstimation Decision ModelRecords classified as : low medNo. Records in class ‘low’: 9905 2,351No. Records in class ‘med’: 752 4,384
Decision Tree ParametersSize of Decision Tree: 29 (15 Leaves)Correctly classified records: 93.80%
C4.5
Agent Intelligence through Data Mining
18/09/2006 241ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Conclusions on EMSConclusions on EMS
In this scenario, agents act as mediators, delivering validated information in a distributed environment.
Inductive Reasoning Agents seem suitable for building Intelligent Environmental Software Applications
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 113
Agent Intelligence through Data Mining
18/09/2006 242ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Distributed Service ManagementDistributed Service Management
Customer ofCompany XY
CallCenter Field
ServiceTechnician
Company XY Subcontractorof Company XY
ControllingDepartment
Service contract
Subcontractorcontract
1. Problemdescription
2. Info andorder
3. Executionof order
4. Invoice
5. Invoice
Management of physical (maintenance) service to be carried out at the customer’s location.
Web based applicationWeb based applicationProcess monitoringProcess monitoringAutomated helpdeskAutomated helpdeskDecision supportDecision supportProactive event predictorProactive event predictorAdaptabilityAdaptabilityCustomizabilityCustomizability
Agent Intelligence through Data Mining
18/09/2006 244ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Subcontractor
<<institutional>>
Petrol Station
<<institutional>>
Union Technik<<institutional>>
Field servicetechnician
<<human>>
Call CenterEmployee
<<human>>SLIM
<<artificial>>
BUISY
<<artificial>> Controllingdepartmentemployee
<<human>>
Petrol stationleaseholder
<<human>>
Subcontracting Contract
BreakdownManagement
BreakdownFinancialControl
Maintenance of Incidental Breakdowns
GUI Agent<<artificial>>
AuthenticationAgent
<<artificial>>
PersonalAgent
<<artificial>>
Broker
<<artificial>>
DirectoryServer
<<artificial>>
User
<<human>>
authentication
is-a
is-a
is-a
is-a
AuthenticationSystem
<<artificial>>
customization
system-userinterface
EENEXPlatform
<<artificial>>
developmentsolution
developmentsolution
Plausibilitycheck
<<artificial>>
TechnicianAssignment
<<artificial>>
ProblemSolver
<<artificial>>
RequestAssistant
<<artificial>>
Architectural DiagramArchitectural Diagram
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 114
Agent Intelligence through Data Mining
18/09/2006 245ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
EmbeddingEmbedding Agents Agents intointo a Legacy a Legacy ArchitectureArchitecture
AMADEEWorkflow Control
AMADEEConnector
AGENT ACADEMYAgent Action
AMADEEConnector
AGENT ACADEMYAgent Action
Web Interface
-Full process Control through AMADEE Workflow Control-Seamless embedding in client‘s and AA architecture
AMADEEConnector
MS SQL Database
AMADEEConnector
BUISY system Action
… … …
Agent Intelligence through Data Mining
18/09/2006 246ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent Agent CommunicationCommunication FlowFlow((forfor PlausibilityPlausibility checkschecks))
AMADEEWorkflow Control
AMADEEConnector
AGENT ACADEMYAgent ActionWeb Interface
<Request Environment="Development"><Message Type="plausibility">
<FacilityID>74143</FacilityID><Amount>50</Amount><Duration>10</Duration><UnitID>47085</UnitID><SpareParts>
<SparePart><ID>9</ID><Name>Abdeckband 80 x 2500 x 1,8 mm</Name><Supplier>ETC</Supplier><Price>28.12</Price>
</SparePart><SparePart>
<ID>12</ID><Name>Abdeckband rot, neue Ausf. f. H105</Name><Supplier>ETG</Supplier><Price>33.75</Price>
</SparePart></SpareParts><SparePartID/><Action>Check</Action><UnitName>1/2 2306</UnitName>
</Message></Request>
<Message><Plausible>True</Plausible>
</Message>
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 115
Agent Intelligence through Data Mining
18/09/2006 247ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Biotope: A selfBiotope: A self--organizing organizing agent community agent community
Simulates a parametrical ecosystem, with food, traps and obstacles.The living organisms of this ecosystem are intelligent agents.The agents live in an unknown, probably “hostile” environment.
X
Y
00
00
00
0 0 0
0 0
0
0 0 0 00
00 0 0
0 00 0
00 0
0
00 0
0
00
00
0 000
00 0
0 00
00
000
0
0 0 00
00
00 00
0 000
000
00
00
00
000
00
000
000
000
0
000 0
0
000
00 0
000
0
00
0
0 0
00
00
0 000
0
0
00
00
0
000
0
0
00
00
0
000
0
0
00
00
0
0 00
0
00
0
0 00
000
000 0
0
000
0 0
00
0
00
000
000 0
0
0
0 0
0
0
0
1
1
1
1
1
1
1
1
1
11
2
2
2
3
3
3
33 3
3
3 3
3
3 3 3
33
3
3
33 3
3
3
4
Agent Intelligence through Data Mining
18/09/2006 248ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
00
00
00
0 0 00 0
0
0 0 0 00
00 0 00 00 000 0
000 0
000
000 0000
0 0
0 0
0
0
0
0
00
00 0 0
0000
0 0
00 0
00
0
0
000
0
0
0
0
000
0
0
0
0
0
0
00
0
00
0000 0
0
00
0
00 0
0
000000
0 00
000
0 0000
0
00000
0000
0
00000
0000
0
00000
0 00
0000
0 00
00
0
000 0
0
00
0
0 0
0
0
000
000
000 0
000 0
0
0
0
1
1
1
1
1
1
1
1
1
11
2
2
2
3
3
3
33 3
33 3
3
3 3 3
33
33
33 3
33
4
00
00
00
0 0 00 0
0
0 0 0 00
00 0 00 00 000 0
000 0
000
000 0000
0 0
0 0
0
0
0
0
00
00 0 0
0000
0 0
00 0
00
0
0
000
0
0
0
0
000
0
0
0
0
0
0
00
0
00
0000 0
0
00
0
00 0
0
000000
0 00
000
0 0000
0
00000
0000
0
00000
0000
0
00000
0 00
0000
0 00
00
0
000 0
0
00
0
0 0
0
0
000
000
000 0
000 0
0
0
0
1
1
1
1
1
1
1
1
1
11
2
2
2
3
3
3
33 3
33 3
3
3 3 3
33
33
33 3
33
400
00 0
000000 0
00
0 0 001 1
2 333
4
00
00 0
000000 0
00
0 0 001 1
2 333
4
Agent SightAgent Sight
0 00 0 0
001 13
2 3 0 3
0 0 000
0 0 000
0 00 0 0 001 13 2 3 0 3 0 0 000 0 0 000
Vision Field
Vision Vector
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 116
Agent Intelligence through Data Mining
18/09/2006 249ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent MovementAgent Movement
00
00 0
000000 0
00
0 0 001 1
2 333
4
1 2 3 4 5
6 7 8 9 10
11 12 13 14
15 16 17 18 19
20 21 22 23 24
0 00 0 00 000 0 03001 1 2 330 00 0 0 10
0 00 0 00 000 0 03001 1 2 330 00 0 0 100 00 0 00 000 0 03001 1 2 330 00 0 0 10
10
00
00 0
000040 0
00
0 0 001 1
2 333
0
1 2 3 4 5
6 7 8 9 10
11 12 13 14
15 16 17 18 19
20 21 22 23 24
00
00 0
000000 0
00
0 0 001 4
2 333
0
1 2 3 4 5
6 7 8 9 10
11 12 13 14
15 16 17 18 19
20 21 22 23 24
Agent Intelligence through Data Mining
18/09/2006 250ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent IntelligenceAgent Intelligence
They perceive part of the environment (according to their vision capabilities), and learn what to eat and what to avoid.
They communicate with each other (when “in sight”) and exchange knowledge
0 00 0 0 001 13 2 3 0 3 0 0 000 0 0 000
Vision Vector
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 117
Agent Intelligence through Data Mining
18/09/2006 251ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The scope of BiotopeThe scope of Biotope
To use the simulating environment for evolution experiments.To develop the suitable tools for monitoring the evolution of the ecosystem.To monitor and model agent behaviors.To enhance agent intelligence by the use of genetic algorithms
Agent Intelligence through Data Mining
18/09/2006 252ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent ReproductionAgent ReproductionReproduction condition: Energy > 800 unitsReproduction Outcome
1 offspringInherits part of the parent’s Classifier set
Dispersal distance dsDistance: Exp(ds)~(1/m).exp(-ds/m)Orientation: θ~(0,2π)
Energy variationParent: exponential decrease 400 --> 0Offspring: exponential increase 200 --> 600
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 118
Agent Intelligence through Data Mining
18/09/2006 253ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent Intelligence through Data Mining
18/09/2006 254ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Rule evaluationRule evaluation
0 00 0 00 000 0 03001 1 2 330 00 0 0 10
0 00 0 00 000 0 03001 1 2 330 00 0 0 21
0 00 1 03 300 0 00330 1 0 020 10 0 0 7
StrengthsClassifier Set
176
210
00
00 0
0000
00 0
00
0 0 001 1
2 33
3
4240
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 119
Agent Intelligence through Data Mining
18/09/2006 255ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
00
00 0
0000
00 0
00
0 0 001 1
2 33
3
4
00
00 0
0000
00 0
00
0 0 001 4
2 33
3
0
Rule evaluationRule evaluation
0 00 0 00 000 0 03001 1 2 330 00 0 0 10
0 00 0 00 000 0 03001 1 2 330 00 0 0 21
0 00 1 03 300 0 00330 1 0 020 10 0 0 7
StrengthsClassifier Set
176
210
216
200
+8
224 24
Agent Intelligence through Data Mining
18/09/2006 256ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Different Environments Different Environments --Conclusions (1/2)Conclusions (1/2)……
tra p c ollision rate
steps
0.02
0.03
0.04
0.01
2000 400 600 800 1000 1200
ΠΡΟΣΟΜΟΙΩΣΗ 1
ΠΡΟΣΟΜΟΙΩΣΗ 2
effe ctiveness
steps
2
3
4
1
2000 400 600 800 1000 1200
ΠΡΟΣΟΜΟΙΩΣΗ 1
ΠΡΟΣΟΜΟΙΩΣΗ 2
2020Traps
240240Obstacles
140100Food
301Vision Error Percentage
3030Food refresh Steps
SIMULATION 2SIMULATION 1
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 120
Agent Intelligence through Data Mining
18/09/2006 257ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
120Exchanged Rules Percentage
200200Communication Step
1000200Genetic Algorithm Step
5003000Number of Rules
30003000Ageing Step
500500Energy
SIMULATION 2SIMULATION 1
effectiveness
steps
2
3
4
1
2000 400 600 800 1000 1200
ΠΡΟΣΟΜΟΙΩΣΗ 1
ΠΡΟΣΟΜΟΙΩΣΗ 2
unknown situa tion rate
steps
0,4
0,6
0,8
0,2
2000 400 600 800 1000 1200
ΠΡΟΣΟΜΟΙΩΣΗ 1
ΠΡΟΣΟΜΟΙΩΣΗ 2
Different Agents Different Agents -- Conclusions (2/2)Conclusions (2/2)……
Agent Intelligence through Data Mining
18/09/2006 258ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Environmental IndicatorsEnvironmental Indicators
Resource availabilityEnvironmental varietyEnvironmental reliability
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 121
Agent Intelligence through Data Mining
18/09/2006 259ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Agent Performance IndicatorsAgent Performance Indicators
Energy EffectivenessAgingFood Consumption RateTrap Collision RateReproduction RateUnknown Situation Rate
Agent Intelligence through Data Mining
18/09/2006 260ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
ConclusionsConclusions & Future & Future DirectionsDirections
In agent communities, the emphasis is on how to model the common problem.
The combination of Genetic Algorithms, a classifier system, and an advanced agent communication framework, proves capable of handling dynamic and complex problems.
The Biotope infrastructure can be used to model and simulate distributed computational systems, where the agents are the computational entities, food represents the resources of the system, traps resource losses, and obstacles represent system incompatibilities.
A test case could be a community of agents roaming the Web, either collaborating or competing over its digital resources, while fragmentarily perceiving their environments.
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 122
Agent Intelligence through Data Mining
18/09/2006 261ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
An agentAn agent--based ebased e--auction auction environmentenvironment
Protocol resembles to English double auctionsMany different strategies deployed by agentsAgents: multiple Buyers and multiple Sellers that trade over many goods
Agent Intelligence through Data Mining
18/09/2006 262ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
The auction environmentThe auction environment
sellerj
selleri
Auction workspaceAuction workspacebuyersbuyers
bb11
bb22
bbNN
……
…………
Hello!Hello!
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 123
Agent Intelligence through Data Mining
18/09/2006 263ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Negotiating agentsNegotiating agents
Negotiations take place between a buyer and a seller in order to reach an agreement
buyerbuyer
PROPOSAL
bid1
ask1
…
ACCEPT
bidn
seller
n transactions
Agent Intelligence through Data Mining
18/09/2006 264ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Improving the behavior of a Improving the behavior of a buyerbuyer
Transactional Transactional Historical DataHistorical Data
Strategy
Rule Base
Data Mining
Buyer Agent
Trend Trend AnalysisAnalysis
buyerbuyer
PROPOSAL
bid1
ask1
…
ACCEPT
bidm
seller
m < ntransactions
Reducing communication by training
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 124
Agent Intelligence through Data Mining
18/09/2006 265ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
How can DM improve How can DM improve decisions in auctions?decisions in auctions?
The number of messages in a negotiation can be decreasedDM can provide a tool for identifying hidden patterns (trends) in the history of previous transactionsEvaluation criteria can be used to update-improve an agent’s rule base (e.g. fuzzy criteria)
Agent Intelligence through Data Mining
18/09/2006 266ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Part 13 Part 13 –– Open IssuesOpen Issues
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 125
Agent Intelligence through Data Mining
18/09/2006 267ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data mining and intelligent agents (1/2)Data mining and intelligent agents (1/2)
How to determine safety and soundness in multi-agent systemsHow to specify a methodology for developing intelligent (through data mining) multi-agent applicationsWhen and how to perform agent retraining
Agent Intelligence through Data Mining
18/09/2006 268ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Data mining and intelligent agents (2/2)Data mining and intelligent agents (2/2)
How to develop self-improving agents through dm How to develop tools and techniques for data mining agent behaviorHow to specify dm metrics that take semantics seriously into accountHow to evaluate agent “intelligence”
Agent Intelligence through Data Mining 18/9/2006
ECML/PKDD 2006 - Berlin 126
Agent Intelligence through Data Mining
18/09/2006 269ECML/PKDD 2006, Berlin, Germany
AUTHAUTH
Thank youThank youPericles A. Mitkas
Associate ProfessorElectrical and Computer Engineering
Aristotle Univ. of ThessalonikiEmail: [email protected]
Andreas L. Symeonidis
Research AssociateInformatics & Telematics InstituteCenter for Engineering Research and Technology – Hellas (CERTH) Email:[email protected]