孫民/從電腦視覺看人工智慧 : 下一件大事
-
date post
12-Jan-2017 -
Category
Data & Analytics
-
view
3.278 -
download
1
Transcript of 孫民/從電腦視覺看人工智慧 : 下一件大事
![Page 1: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/1.jpg)
Ar#ficial Intelligence: The Next Big Thing
from a computer vision perspec0ve VSLab
清大電機 孫民
![Page 2: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/2.jpg)
What’s the Next Big Thing?
h2p://research.microso6.com/en-‐us/um/redmond/events/fs2015
![Page 3: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/3.jpg)
Goal
“big data being the source, machine learning being the technique, and AI being the outcome” by Prof. Hsuan-‐Tien Lin at IEEE BigData 2016
Many kinds of source (data) and outcomes (AI tasks) can be trained end-‐to-‐end using Deep Learning (DL)
![Page 4: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/4.jpg)
Classical AI Tests: Turing Test
by Alan Turing in 1950
![Page 5: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/5.jpg)
Chatbot@F8
h2ps://developers.facebook.com/videos/f8-‐2016/keynote/
![Page 6: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/6.jpg)
Classical AI Tests: CAPTCHA
![Page 7: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/7.jpg)
Breaking CAPTCHA
by vicarious.com
![Page 8: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/8.jpg)
AlphaGo
2016 by Google DeepMind
Are these what AI all about?
![Page 9: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/9.jpg)
2014 Subfields of AI
![Page 10: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/10.jpg)
2015
Ar#fical General Intelligence (AGI)
![Page 11: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/11.jpg)
Deep Learning (DL)
• Data • GPU Compu0ng • Talents
![Page 12: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/12.jpg)
DL Fuses AI-‐subfields • Vision and Language
• Vision and Control
h2p://mscoco.org/
Atari Breakout game & AlphaGo, DeepMind.
-‐> AGI
• Mul0ple Encoding and Decoding
![Page 13: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/13.jpg)
Image Cap#oning
f( ) = The man at bat is ready to swing at the pitch
Vision Language
Recurrent Neuron Network (RNN) credit: Nature
convolu0ons
Convolu#on Neuron Network (CNN) credit: wiki
![Page 14: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/14.jpg)
Image Ques#on Answering
h2p://visualqa.org/
![Page 15: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/15.jpg)
Zhen et al. ECCV 2016 from VSLab and Stanford AI Lab
![Page 16: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/16.jpg)
Big Video Data with Titles • Pairs of
Raw Video
CNN CNN CNN CNN
Title
![Page 17: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/17.jpg)
Viral Videos
Google for “viral video company”
![Page 18: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/18.jpg)
Large Video Repository
Currently 28740 videos and keep growing
![Page 19: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/19.jpg)
DL Fuses AI-‐subfields • Vision and Language
• Vision and Control
h2p://mscoco.org/
Atari Breakout game & AlphaGo, DeepMind.
-‐> AGI
• Mul0ple Encoding and Decoding
![Page 20: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/20.jpg)
Vision and Control
h2ps://gym.openai.com/
• Learning to play game with weak supervision: Reinforcement Learning (RL)
![Page 21: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/21.jpg)
Where It All Begins …
by DeepMind in NIPS 2013 Deep Learning Wrokshop
Playing Atari with Deep Reinforcement Learning
slides by Yen-‐Chen Lin
![Page 22: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/22.jpg)
Control: Learning to Act
Play Breakout equals to • Input: screen images • Output: ac0ons (do nothing | left | right)
Supervised Classifica0on
slides by Yen-‐Chen Lin
![Page 23: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/23.jpg)
Supervised Solu#on • Training data: Record experts game sessions
• Target label: Ac0on experts take at every step
• What if there’s no expert?
• This is not how human learns
Problems:
slides by Yen-‐Chen Lin
![Page 24: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/24.jpg)
How Human Learns • Don’t need somebody to tell us a million 0mes which move to choose at each screen
• Just need occasional feedback that we did the right thing
slides by Yen-‐Chen Lin
![Page 25: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/25.jpg)
Reinforcement Learning • Somewhere between supervised and
unsupervised learning • Sparse and time-delayed labels
Based only on those rewards, the agent has to learn to behave in the environment. A ra0onal agent should op0mize total reward.
slides by Yen-‐Chen Lin
![Page 26: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/26.jpg)
RL in A Nutshell
slides by Yen-‐Chen Lin
![Page 27: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/27.jpg)
Markov Decision Process
• State
• Action
• Reward
The probability of the next state si+1 depends only on current state si and ac0on ai. slides by
Yen-‐Chen Lin
![Page 28: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/28.jpg)
Episode
One episode of this process (e.g. one game) forms a finite sequence of states, ac0ons and rewards:
slides by Yen-‐Chen Lin
![Page 29: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/29.jpg)
Example: Breakout
• State: game screen
• Action:
• Reward: game score
1. do nothing 2. le6 3. right
slides by Yen-‐Chen Lin
![Page 30: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/30.jpg)
Example: Breakout
• State: successive game screens
• Action:
• Reward: game score
1. do nothing 2. le6 3. right
slides by Yen-‐Chen Lin
![Page 31: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/31.jpg)
• To perform well, we should also take future rewards into account, how to do that?
Total reward:
Total future reward:
Reward
slides by Yen-‐Chen Lin
![Page 32: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/32.jpg)
Discounted Future Reward
• However, since the environment is stochas0c, intui0vely one should earn reward as soon as possible
Total discounted future reward:
slides by Yen-‐Chen Lin
![Page 33: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/33.jpg)
Q func#on
• Q(s, a):
The maximum discounted future reward when we perform ac0on a in state s, and con0nue optimally from that point on.
It represents the “quality” of a certain action in a given state.
slides by Yen-‐Chen Lin
![Page 34: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/34.jpg)
How to Choose Ac#on?
Here π represents the policy, the rule how we choose an ac0on in each state.
If we know Q func0on,
slides by Yen-‐Chen Lin
![Page 35: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/35.jpg)
Q Func#on Implementa#on
ac#on 0 ac#on 1 ac#on 2
state 0 -‐2 -‐1 5
state 1 3 2 3
state 2 5 6 -‐6
slides by Yen-‐Chen Lin
![Page 36: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/36.jpg)
If We Use Pixels as State
1. Resize images to 84x84 2. Convert to grayscale with 256 levels 3. Use last 4 frames to represent state
25684x84x4 = 1067970 possible game states
We can never cover all the cases!
slides by Yen-‐Chen Lin
![Page 37: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/37.jpg)
Vision & Controal: Deep Q Network
We use CNN to represent Q func0on, which takes:
• Input: the state (4 game screens) and ac0on
• Output: Q-‐values of different ac0ons a (i.e., Q(s,a))
slides by Yen-‐Chen Lin
π( )=argmaxaQ( ,a)
![Page 38: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/38.jpg)
Fusing Mul#ple Sensors
Ke#le%
Medium+wrap%
Ke#le%
Medium+wrap%
thumb+4+finger%
Manipula7on%Region%
Side+view%
Chan et al. ECCV 2015 from VSLab
![Page 39: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/39.jpg)
Left Hand Head Right Hand 81
Lab
Office
Home
![Page 40: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/40.jpg)
Left Hand Head Right Hand 82
Lab
Office
Home
![Page 41: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/41.jpg)
Recogni#on from Wearable Cameras
Pred%
GT%
Pred%
GT%
Gesture%Recogni1on%
Object%Category%Recogni1on%
![Page 42: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/42.jpg)
Real-‐#me Wearable Demo
Fisheye camera NVIDIA TK1
![Page 43: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/43.jpg)
Real-‐#me Wearable Demo cellphone, bo2le, keyboard, mouse, free hand
![Page 44: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/44.jpg)
Take-‐Home Message • Encoding Source (data)
– N-‐D observa0on – N-‐D sequence of observa0ons
• Decoding Outcome (AI tasks) – N-‐D single output – N-‐D open-‐ended sequence as output
• Mul0ple Encoding and Decoding • If each module is differen0able/approximately differen0able -‐> End-‐to-‐End Learning
We get many tools to tackle Ar#ficial General Intelligence
Just Try!
Worse Thing: Do Nothing
![Page 45: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/45.jpg)
My Two Cents for Taiwan
![Page 46: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/46.jpg)
Ques#ons • Can I simply ask my engineers to use open source deep learning tools to create new products?
Answer: Yes and Not really. Yes – if you want to complete a well-‐known task. But Google’s MLaaS product will almost always beat you. Not really – if you want to solve your own problem, with your own data. You need talents or make engineers not afraid of failure.
![Page 47: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/47.jpg)
Where can I find talents? • Most talents are PhD students or young professionals in the US and EU.
h2p://www.economist.com/news/business/21695908-‐silicon-‐valley-‐fights-‐talent-‐universi0es-‐struggle-‐hold-‐their
How can we compete?
![Page 48: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/48.jpg)
Local Students • Our students know deep learning is HOT!
[ Deep Learning Workshop 中研院 ] 500 位參加者
![Page 49: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/49.jpg)
Case Study: NTHU@TW Undergraduate
h2ps://github.com/yenchenlin1994/DeepLearningFlappyBird
![Page 50: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/50.jpg)
Case Study: UNIST@Korean Undergraduate
![Page 51: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/51.jpg)
To-‐Do for Local Students • We need more students to work on
– realis0c deep learning projects with – enough computer resource
• We need some of them to stay in our local industry
Advanced Deep Learning Course at NTHU (105學年) 1. Taught by a group of profs 2. Topics including latest DNN models, distributed
training, DL for embedded system 3. Sponsored by MTK and ITRI 巨資中心 4. More sponsors are welcomed!
![Page 52: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/52.jpg)
For Talents Abroad Get in the Talents Race!
h2p://cvpr2016.thecvf.com/exhibit/industry_expo
![Page 53: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/53.jpg)
For Talents Abroad
Most of them fresh PhDs
1 Billion Pledged USD
![Page 54: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/54.jpg)
For Talents Abroad
![Page 55: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/55.jpg)
AI is happening Fast
![Page 56: 孫民/從電腦視覺看人工智慧 : 下一件大事](https://reader033.fdocuments.net/reader033/viewer/2022061615/587789dc1a28abc85f8b70d7/html5/thumbnails/56.jpg)
Thanks!