Computing & Information Sciences Kansas State University CIS 690 Data Mining in Mobile and Cloud...
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of Computing & Information Sciences Kansas State University CIS 690 Data Mining in Mobile and Cloud...
Computing & Information SciencesKansas State University
CIS 690Data Mining in Mobile and Cloud Computing Environments
William H. Hsu, Computing and Information Sciences
Shih-Hsiung Chou, Industrial and Manufacturing Systems Engineering
Kansas State University
KSOL course page: http://bit.ly/a68KuL
Course web site: http://www.kddresearch.org/Courses/CIS690
Instructor home page: http://www.cis.ksu.edu/~bhsu
Reading for Next Class:
Syllabus and Introductory Handouts
Instructions for Labs 0 – 1
Han & Kamber 2e, Sections 1.1 – 1.4.3 (pp. 1 – 25), 6.1 (pp. 285 – 289)
Data Mining inMobile and Cloud Computing Environments:
Course Organization and Survey
Lecture 0 of 27:Part A – Course Organization
Computing & Information SciencesKansas State University
CIS 690Data Mining in Mobile and Cloud Computing Environments
Course Administration
Course Page (KSOL): http://bit.ly/a68KuL Class Web Page: www.kddresearch.org/Courses/CIS690 Instructional E-Mail Addresses – Best Way to Reach Instructor
[email protected] (always use this to reach instructor and TA) [email protected]
Instructor: William Hsu, Nichols 324C Office phone: +1 785 532 7905; home phone: +1 785 539 7180 IM: AIM/MSN/YIM hsuwh/rizanabsith, ICQ 28651394/191317559, Google banazir Office hours: after class Mon/Wed/Fri; other times by appointment
Graduate Teaching Assistant: To Be Announced Office location: Nichols 124 (CIS Visualization Lab) & Nichols 218 Office hours: to be announced on class web board
Grading Policy: Overview Midterm exam: 15% Homework: 15% Term project: 50% Labs: 20% (1% each; see calendar)
Computing & Information SciencesKansas State University
CIS 690Data Mining in Mobile and Cloud Computing Environments
Course Policies
Letter Grades 15% graduations (85+%: A, 70+%: B, etc.) Cutoffs may be more lenient, but a) never higher and b) seldom much lower
Grading Policy Exams: midterm (in-class, open-book/notes) 15% Homework: 15% (2 written, 2 programming, 2 mixed; drop lowest 2, 3% each) Term project (including proposal, interim, final reports): 50% Labs (upload solutions to K-State On-Line file dropbox): 20%
Late Homework Policy Allowed only in case of medical excusal All other late homework: see drop policy
Attendance Policy Absence due to travel or personal reasons: e-mail CIS690TA-L in advance See instructor, Office of the Dean of Student Life as needed
Honor System Policy: http://www.ksu.edu/honor/ On plagiarism: cite sources, use quotes if verbatim, includes textbooks OK to discuss work, but turn in your own work only When in doubt, ask instructor
Computing & Information SciencesKansas State University
CIS 690Data Mining in Mobile and Cloud Computing Environments
Course Content Management System (CMS) http://www.kddresearch.org/Courses/CIS690 Lecture notes (MS PowerPoint 97-2010, PDF) Homeworks (MS Word 97-2010, PDF) Exam and homework solutions (MS PowerPoint 97-2010, PDF) Class announcements (students’ responsibility) and grade postings
Course Notes Online and at Copy Center (Required) Mailing List (Automatic): [email protected]
Homework/exams (before uploading to CMS, KSOL), sample data, solutions Class participation Project info, course calendar reminders Dated research announcements (seminars, conferences, calls for papers)
LISTSERV Web Archive http://listserv.ksu.edu/archives/cis690-l.html Stores e-mails to class mailing list as browsable/searchable posts
Class Resources
Computing & Information SciencesKansas State University
CIS 690Data Mining in Mobile and Cloud Computing Environments
Recommended Text
Witten, I. H. & Frank, E. (2006). Data Mining: Practical Machine Learning Tools and Techniques, second edition. San Francisco, CA, USA: Morgan Kauffman.
Other References[on Reserve in Main or CIS Library]
Han, J. & Kamber, M. (2006). Data Mining: Concepts and Techniques, second edition. San Francisco, CA, USA: Morgan Kauffman.
Mitchell, T. M. (1997) Machine Learning. New York, NY, USA: McGraw-Hill.
Tan, P.-N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining. Reading, MA, USA: Addison-Wesley.
Textbookand Recommended References
Mitchell (1997)
Witten & Frank 2e
Tan et al. (2006)
1st edition (outdated)Han & Kamber 2nd edition
Computing & Information SciencesKansas State University
CIS 690Data Mining in Mobile and Cloud Computing Environments
Both Courses Proficiency in high-level programming language (C++/C#, Java, Python, etc.) Required: course in data structures Recommended: discrete mathematics, probability At least 80 hours for semester (up to 120 depending on term project) Textbook – Data Mining: Concepts and Techniques, 2e , Han & Kamber (2006) Reserve texts: Mitchell’s Machine Learning, several other outside references
CIS 690 Data Mining in Mobile and Cloud Computing Environments Fresh background in symbolic logic, discrete math (sets, relations, counting) Some background assumed in linear algebra, calculus New topics: classification/regression, association, optimization, clustering “Mathematical maturity”: ready to learn more
CIS 798 Topics in Computer Science Recommended: two programming courses Read up on heuristic search, games, constraints, knowledge representation AI programming experience helps (background lectures as needed) Watch advanced topics lectures; see list before choosing project topic
Background Expected
Computing & Information SciencesKansas State University
CIS 690Data Mining in Mobile and Cloud Computing Environments
Syllabus [1]:First Half of Course
Computing & Information SciencesKansas State University
CIS 690Data Mining in Mobile and Cloud Computing Environments
Syllabus [2]:Second Half of Course
Computing & Information SciencesKansas State University
CIS 690Data Mining in Mobile and Cloud Computing Environments
Basics: First Two Weeks (Hours 2 – 9 of Course) Review of mathematical foundations: set theory, discrete math, probability
Types of machine learning algorithms
Combinatorial analysis: mappings and counting
Bayesian classification
Bayesian Inference Hour 3: association rules, statistical evaluation
Hours 6 – 10: Naïve Bayes, classification in R
Hours 15 – 18: clustering, Expectation-Maximization (EM)
Other Math Topics to be Covered Information theory: decision tree induction, rule induction
Basic statistical hypothesis testing
Frequent itemsets: association rule mining
Convex optimization: constraints, linear and quadratic programming (QP)
Distance measures: clustering
Logic: propositional, first-order, resolution
Math BackgroundTo Be Covered
Computing & Information SciencesKansas State University
CIS 690Data Mining in Mobile and Cloud Computing Environments
Computing Platform:Mobile/Cloud Environments
Android
Operating system: modified Linux
For mobile devices (Motorola Droid, HTC Incredible, etc.)
Android, Inc. & Open Handset Alliance
Software development kit: download from http://developer.android.com/sdk/
Software Environment for the Advancement of Scholarly Research
Originally developed for compute clusters
Adapted for cloud computing environments
SEASR – overall environment: http://seasr.org
Meandre – data mining flows: http://seasr.org/meandre/
© 2005 – present, National Center for Supercomputing Applications (NCSA)
Computing & Information SciencesKansas State University
CIS 690Data Mining in Mobile and Cloud Computing Environments
Computing Platform:Data Mining Software
Waikato Environment for Knowledge Analysis (WEKA)
Data mining package
Most popular machine learning and data mining software at present
Download from http://www.cs.waikato.ac.nz/ml/weka/
R Interpreter
R: popular programming language for computational statistics
Used for data mining implementations
Comprehensive R Archive Network (CRAN): http://cran.r-project.org
Apache Hadoop
Java software framework
Data-intensive distributed applications
Inspired by Google MapReduce and Google File System (GFS)
Computing & Information SciencesKansas State University
CIS 690Data Mining in Mobile and Cloud Computing Environments
About Project Proposals
Proposals
About 1-2 pages; due at end of second week of course, one revision allowed
Team projects: up to 2 people
Contents: at least one paragraph on each of
– 1. Problem statement: describe task, objectives, purpose
– 2. Background: survey related work and applicable approaches
– 3. Methodology: describe planned approach
– 4. Evaluation criteria: how will performance be assessed?
– 5. Milestones: what will be done, when?
Post Questions and Drafts to Class Mailing List