Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1
-
Upload
bayu-aldi-yansyah -
Category
Education
-
view
448 -
download
2
Transcript of Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1
![Page 1: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/1.jpg)
Pertemuan 2 & 3Penerapan Konsep & Evaluasi
![Page 2: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/2.jpg)
Agenda• Review & Pertanyaan di Slack• Penerapan Konsep Supervised Machine Learning (full-coding)• Hands-on feature engineering• Intuisi dasar, step-by-step nya• Algoritma -> Python program
• Studi Kasus• Production-grade machine learning, dengan spark + hdfs
• Final project Evaluasi
![Page 3: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/3.jpg)
Review & Pertanyaan di Slack• Model Machine Learning• Classification VS Clustering
![Page 4: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/4.jpg)
Model itu gimana?
Input Output
Training Data
AlgoritmaMachine Learning Model
Input ?
Data Baru
Model Output
![Page 5: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/5.jpg)
Contoh Model Logistic Regression
Tujuan:
Weight vector Training data
Class training data
Regulasi
Fungsi tujuan
Fungsi kerugian
![Page 6: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/6.jpg)
Visualisasi Model Logistic Regression 4 class
![Page 7: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/7.jpg)
Classification VS ClusteringClassification Clustering
Class data Sudah di ketahui Belum di Ketahui
Training data Ada Tidak ada
Metode Supervised Unsupervised
Tujuan Menentukan data baru masuk class yang mana
Menemukan pola dan relasi antar data
![Page 8: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/8.jpg)
Studi Kasus: LestariLestari akan kita latih untuk bisa membantu Pak Jokowi, Presiden Indonesia, menganalisa respon publik.
https://artificialintelligence.id/model-machine-learning-untuk-membantu-pak-presiden-jokowi-menganalisa-respon-publik-63cc89a098ed
![Page 9: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/9.jpg)
Masih Ingat?
Input Output
Training Data
AlgoritmaMachine Learning Model
Input ?
Data Baru
Model Output
![Page 10: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/10.jpg)
Training Data
Input Output
Training DataInput:Respon masyarakat di facebook
Output:Jenis responnya• -1 = spam• 0 = netral• 1 = berisi harapan,
pembelaan, usulan• 2 = meminta kejelasan, aduan
![Page 11: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/11.jpg)
Algoritma & Model
AlgoritmaMachine Learning Model
• Logistic Regression• Decision Trees• Random Forests• Naïve-Bayes
![Page 12: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/12.jpg)
Tujuan Akhir
Input ?
Data Baru
Model Output
Intuisi: Ada respon baru masuk. Apakah response tersebut berisi pembelaan/harapan terhadap pak Jokowi, spam, atau malah mengadukan sesuatu ke pak Jokowi ya?
![Page 13: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/13.jpg)
Feature Engineering• Intuisi:
Bagaimana cara mesin bisa membedakan dan mencari kesamaan sebuah dokumen teks?
Pakai Vektor! (demo via matlab)
• Tujuan: Merepresentasikan input/training data untuk bisa digunakan oleh algoritma Machine Learning
![Page 14: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/14.jpg)
Feature Engineering• Macam Macam Representasi:• Bag of words (en.m.wikipedia.org/wiki/Bag-of-words_model)• TF-IDF (www.tfidf.com)
• Demo dengan simple teks!• Kata kunci:• Corpus: kumpulan Document• Document: Satu dokumen teks (satu komentar)• Term: Satu kata dalam sebuah Document
![Page 15: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/15.jpg)
Feature Engineering: Bag of words• Corpus:• “Mantaaap....Pak Presiden..habisi para pencuri ikan diwilayah kita......jangan
kasi ampun.....sanksi keras akan membuat mereka jera!”• “Insya alloh indonesia akan di sgani dan menjadi macan asia.. Kalau pemimpin
ny sprti bapa presiden kita skarang. Lanjutkan pa kami alloh slalu brsma mu.. Amiin”
![Page 16: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/16.jpg)
Feature Engineering: Bag of words• Dictionary:• mantaaap, pak, presiden, habisi, para, pencuri, ikan, diwilayah, kita, jangan,
kasi, ampun, sanksi, keras, akan, membuat, mereka, jera, insya, alloh, indonesia, di, sgani, dan, menjadi, macan, asia, kalau, pemimpin, ny, sprti, bapa, skarang, lanjutkan, pa, kami, slalu, brsma, mu, amiin
• Representasi corpus:• [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0]• [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1]
![Page 17: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/17.jpg)
Feature Engineering: TF-IDF• Corpus:• “Mantaaap....Pak Presiden..habisi para pencuri ikan diwilayah kita......jangan
kasi ampun.....sanksi keras akan membuat mereka jera!”• “Insya alloh indonesia akan di sgani dan menjadi macan asia.. Kalau pemimpin
ny sprti bapa presiden kita skarang. Lanjutkan pa kami alloh slalu brsma mu.. Amiin”
![Page 18: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/18.jpg)
Feature Engineering: TF-IDF• Dictionary:
• mantaaap, pak, presiden, habisi, para, pencuri, ikan, diwilayah, kita, jangan, kasi, ampun, sanksi, keras, akan, membuat, mereka, jera, insya, alloh, indonesia, di, sgani, dan, menjadi, macan, asia, kalau, pemimpin, ny, sprti, bapa, skarang, lanjutkan, pa, kami, slalu, brsma, mu, amiin
• Representasi corpus:• [0.03850817669777474, 0.03850817669777474, 0.0, 0.03850817669777474, 0.03850817669777474,
0.03850817669777474, 0.03850817669777474, 0.03850817669777474, 0.0, 0.03850817669777474, 0.03850817669777474, 0.03850817669777474, 0.03850817669777474, 0.03850817669777474, 0.0, 0.03850817669777474, 0.03850817669777474, 0.03850817669777474, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
• [0, 0, 0.0, 0, 0, 0, 0, 0, 0.0, 0, 0, 0, 0, 0, 0.0, 0, 0, 0, 0.026659506944613283, 0.053319013889226566, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283, 0.026659506944613283]
![Page 19: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/19.jpg)
Feature Engineering(
(“bravo pak jokowi! klo ….”, “usulan”)
![Page 20: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/20.jpg)
Feature Engineering (catatan tambahan)• Representasi data yang populer adalah LIBSVM format label index1:value1 index2:value2 ... 1 1:0.0953796017474 4:0.227945493411 ... 2 27:0.111566195021 16:0.178174397043 ...
• Index: Index kata di dalam global dictionary• Value: Nilai Frekeunsi kata (Bag of Words)/TF-IDF
![Page 21: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/21.jpg)
Feature Engineeringlabel index1:value1 index2:value2 ...
1. Membuat global dictionary• Dari semua dokumen di korpus• Sebagai data index tiap kata
2. Merepresentasikan dokumen sebagai vektor• Index dari global dictionary• Mencari nilai TF-IDF tiap kata sebagai Value
![Page 22: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/22.jpg)
Feature EngineeringWaktunya Praktek!
Data: data_3k_comments.csvOutput: tf-idf
![Page 23: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/23.jpg)
ML in Production notes:• Banyaknya data yang kita proses kita perlu framework untuk
mempercepat proses analisa• Solusi: Hadoop MapReduce, Spark
• Dengan data yang banyak kita perlu storage engine yang scalable• Solusi: HDFS
![Page 24: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/24.jpg)
Hands-on: Spark + HDFS
Distributed File System
![Page 25: Pertemuan 2 & 3: A.I. Indonesia Academy Surabaya Batch #1](https://reader035.fdocuments.net/reader035/viewer/2022081515/58f273e41a28ab8e358b4579/html5/thumbnails/25.jpg)
Evaluasi• Metrics yang penting:• Precision• Recall• F1-score