2010 09-20-張志威老師-機器學習到人類創新

79
2010-9-20 @ NCCU 1 Machine Learning to Human Innovation 機器學習到人類創新 Edward Chang 张智威 Director of Research

Transcript of 2010 09-20-張志威老師-機器學習到人類創新

Page 1: 2010 09-20-張志威老師-機器學習到人類創新

2010-9-20 @ NCCU 1

Machine Learning to Human Innovation機器學習到人類創新

Edward Chang 张智威Director of Research

Page 2: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 22010-9-20

學習與創新

• 學習

孟子

Page 3: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 32010-9-20

孟母三遷

Page 4: 2010 09-20-張志威老師-機器學習到人類創新

孟子 滕文公章 聖人之道

孟子謂戴不勝曰:「子欲子之王之善與?我明告子。有楚大夫於此,欲其子之齊語也,則使齊人傅諸?使楚人傅諸?」

曰:「使齊人傅之。」

曰: 「一齊人傅之,眾楚人咻之,雖日撻而求其齊也,不可得矣;引而置之莊嶽之間數年,雖日撻而求其楚,亦不可得矣。」

@ NCCU 42010-9-20

Page 5: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 62010-9-20

學習與創新

• 學習

• 近朱者赤,近墨者黑, 形正则影直

• 上樑不正下樑歪

• 創新

• 近朱避墨吗?

• 下樑上樑必须正吗?

Page 6: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 72010-9-20

Page 7: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 82010-9-20

Page 8: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 92010-9-20

演講提綱

• 機器學習

• 機器學習人類學習

• 人類學習人類創新

• 結語

Page 9: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 102010-9-20

機器學習的定義

Program the computers to learn!

Computers improve performancewith experience at some task

Example #1:Task: 下圍棋

Performance: 勝率

Experience: 與專家學習

Example #2:

Task: 基因分類

Performance: 疾病預測準確率

Experience: 基因病例

Page 10: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 112010-9-20

基因分類 D = 4026 genes, L = 3, N = 59 cases

Page 11: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 122010-9-20

監督學習 Supervised Learning

X: Samples U: Unlabeled data

L: Labeled data

Φ: Learning algorithm Implied hypothesis

f = Φ (L) Minimize some error function

Regularize parameters to prevent overfitting (過擬合) ŷ = f (u ∈U)

Page 12: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 132010-9-20

经典學習算法 Φ

线性模型 Linear Model 近鄰法 Nearest Neighbors 神經網路 Neural Networks 決策樹 Decision Trees 核方法 Kernel Methods 支持向量機 Support Vector Machines etc.

Page 13: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 142010-9-20

延展问题 Scalability Issue

f = Φ (L) D = 4026 genes, L = 3, N = 59 cases

Scarce labeled data

訓練數據太少

f = Φ (L* + U) L* Collect most useful labeled data

U? Use unlabeled data

Page 14: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 152010-9-20

人類學習

監督學習 Supervised Learning Being taught by e.g., teachers and parents

無監督學習 Unsupervised Learning Surfing Web, watching TV

主動學習 Active Learning Asking questions

强化學習 Reinforcement Learning Taking exams

Page 15: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 162010-9-20

统一學習机器Unified Learning Machines (KDD06)

半監督學習

强化學習

主動學習

Page 16: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 172010-9-20

信息檢索 機器學習問題

Presenter
Presentation Notes
This figure shows a 200*200 checkerboard, which is divided into four quadrants. The top-left and bottom-right ones are occupied by the majority instances, as shown in a red cross. The rest two quadrants are occupied by the minority instances, as shown in a blue circle. In each quadrant, its instances are uniformly distributed. We define the ratio of a checkerboard dataset to be the number of majorities divided by the number of minorities. For example, in this figure, the ratio is 10 to 1. Then, at each time, we keep the minority instances unchanged, but uniformly add more new majority instances into the top-left and bottom-right quadrants, so as to induce another checkerboard dataset with a different ratio.
Page 17: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 182010-9-20

Page 18: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 192010-9-20

Page 19: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 202010-9-20

Text-based image search limitations. . .

Page 20: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 212010-9-20

VIMA Visual Search

Page 21: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 222010-9-20

Step #1 Acquire Labels

Presenter
Presentation Notes
This figure shows a 200*200 checkerboard, which is divided into four quadrants. The top-left and bottom-right ones are occupied by the majority instances, as shown in a red cross. The rest two quadrants are occupied by the minority instances, as shown in a blue circle. In each quadrant, its instances are uniformly distributed. We define the ratio of a checkerboard dataset to be the number of majorities divided by the number of minorities. For example, in this figure, the ratio is 10 to 1. Then, at each time, we keep the minority instances unchanged, but uniformly add more new majority instances into the top-left and bottom-right quadrants, so as to induce another checkerboard dataset with a different ratio.
Page 22: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 232010-9-20

Step #2 Compute Boundary

Presenter
Presentation Notes
This figure shows a 200*200 checkerboard, which is divided into four quadrants. The top-left and bottom-right ones are occupied by the majority instances, as shown in a red cross. The rest two quadrants are occupied by the minority instances, as shown in a blue circle. In each quadrant, its instances are uniformly distributed. We define the ratio of a checkerboard dataset to be the number of majorities divided by the number of minorities. For example, in this figure, the ratio is 10 to 1. Then, at each time, we keep the minority instances unchanged, but uniformly add more new majority instances into the top-left and bottom-right quadrants, so as to induce another checkerboard dataset with a different ratio.
Page 23: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 242010-9-20

Step #3 Identify Useful Samples

Presenter
Presentation Notes
This figure shows a 200*200 checkerboard, which is divided into four quadrants. The top-left and bottom-right ones are occupied by the majority instances, as shown in a red cross. The rest two quadrants are occupied by the minority instances, as shown in a blue circle. In each quadrant, its instances are uniformly distributed. We define the ratio of a checkerboard dataset to be the number of majorities divided by the number of minorities. For example, in this figure, the ratio is 10 to 1. Then, at each time, we keep the minority instances unchanged, but uniformly add more new majority instances into the top-left and bottom-right quadrants, so as to induce another checkerboard dataset with a different ratio.
Page 24: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 252010-9-20

Step #4 Acquire Labels

Presenter
Presentation Notes
This figure shows a 200*200 checkerboard, which is divided into four quadrants. The top-left and bottom-right ones are occupied by the majority instances, as shown in a red cross. The rest two quadrants are occupied by the minority instances, as shown in a blue circle. In each quadrant, its instances are uniformly distributed. We define the ratio of a checkerboard dataset to be the number of majorities divided by the number of minorities. For example, in this figure, the ratio is 10 to 1. Then, at each time, we keep the minority instances unchanged, but uniformly add more new majority instances into the top-left and bottom-right quadrants, so as to induce another checkerboard dataset with a different ratio.
Page 25: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 262010-9-20

Step #5 Refine Boundary

Presenter
Presentation Notes
This figure shows a 200*200 checkerboard, which is divided into four quadrants. The top-left and bottom-right ones are occupied by the majority instances, as shown in a red cross. The rest two quadrants are occupied by the minority instances, as shown in a blue circle. In each quadrant, its instances are uniformly distributed. We define the ratio of a checkerboard dataset to be the number of majorities divided by the number of minorities. For example, in this figure, the ratio is 10 to 1. Then, at each time, we keep the minority instances unchanged, but uniformly add more new majority instances into the top-left and bottom-right quadrants, so as to induce another checkerboard dataset with a different ratio.
Page 26: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 272010-9-20

Step #6 Return Results

Presenter
Presentation Notes
This figure shows a 200*200 checkerboard, which is divided into four quadrants. The top-left and bottom-right ones are occupied by the majority instances, as shown in a red cross. The rest two quadrants are occupied by the minority instances, as shown in a blue circle. In each quadrant, its instances are uniformly distributed. We define the ratio of a checkerboard dataset to be the number of majorities divided by the number of minorities. For example, in this figure, the ratio is 10 to 1. Then, at each time, we keep the minority instances unchanged, but uniformly add more new majority instances into the top-left and bottom-right quadrants, so as to induce another checkerboard dataset with a different ratio.
Page 27: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 282010-9-20

Midpoint Observations

Find good training instances

Find diversified training instances

Is a linear model sufficient?頭腦簡單?

Page 28: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 292010-9-20

信息檢索 機器學習問題

Presenter
Presentation Notes
This figure shows a 200*200 checkerboard, which is divided into four quadrants. The top-left and bottom-right ones are occupied by the majority instances, as shown in a red cross. The rest two quadrants are occupied by the minority instances, as shown in a blue circle. In each quadrant, its instances are uniformly distributed. We define the ratio of a checkerboard dataset to be the number of majorities divided by the number of minorities. For example, in this figure, the ratio is 10 to 1. Then, at each time, we keep the minority instances unchanged, but uniformly add more new majority instances into the top-left and bottom-right quadrants, so as to induce another checkerboard dataset with a different ratio.
Page 29: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 302010-9-20

Nonlinear Boundary

Presenter
Presentation Notes
This figure shows a 200*200 checkerboard, which is divided into four quadrants. The top-left and bottom-right ones are occupied by the majority instances, as shown in a red cross. The rest two quadrants are occupied by the minority instances, as shown in a blue circle. In each quadrant, its instances are uniformly distributed. We define the ratio of a checkerboard dataset to be the number of majorities divided by the number of minorities. For example, in this figure, the ratio is 10 to 1. Then, at each time, we keep the minority instances unchanged, but uniformly add more new majority instances into the top-left and bottom-right quadrants, so as to induce another checkerboard dataset with a different ratio.
Page 30: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 312010-9-20

Linear Model Fits All Data?

Page 31: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 322010-9-20

连接点 Connecting Dots

Page 32: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 332010-9-20

连接点算法 NN近鄰法

Y(x) = 1/k Σ yi,

xi ∈Nk(x)

k = 1

Page 33: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 342010-9-20

近鄰法 with k = 1

Page 34: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 352010-9-20

近鄰法 Nearest Neighbors

Four Things Make a Nearest Neighbor ModelA distance function to measure nearness?

k: number of neighbors to consider?

A weighted function (optional)?

How to fit with the local points?

Page 35: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 362010-9-20

NN with k = 1

Page 36: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 372010-9-20

Problems of k = 1

Fitting Noise (過擬合) Jagged Boundaries

RemedyPicking a larger k

Page 37: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 382010-9-20

NN with k = 15

Page 38: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 392010-9-20

Page 39: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 402010-9-20

Page 40: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 412010-9-20

LM 線性 vs. NN 近鄰法 k = ∞

Linear Model

穩定

準確性低

k = 1不穩定

準確性低

k moderate穩定

準確

Page 41: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 422010-9-20

演講提綱

• 機器學習

• 機器學習人類學習

• 人類學習人類創新

• 結語

Page 42: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 432010-9-20

机器學習人類學習

Four Conjectures 四个推理

Page 43: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 442010-9-20

推理 #1 Find good training instances Good role models (e.g., good mentors)

Good learning environment

近朱者赤

Page 44: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 452010-9-20

推理 #2 Find diversified training instances Repetition improves only speed

Repetition improves no intuition

近朱者赤,近青者明,近白者潔

Page 45: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 462010-9-20

推理 #3 Do not overfit

Being able to generalize is what only counts

Do not memorize materials that do not contribute to generalization

Some overfitting examples中國那幾省產棉花?康熙那年诞生?始皇元年是西元幾年?

近朱者赤,非浸朱者赤

Page 46: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 472010-9-20

推理 #4 Exploring vs. Exploiting(探索 vs.開發)

Explore beyond nearest neighborhood of positive instances

Look at things in different perspectives

Find real boundaries

近红諳知青红皂白

Page 47: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 482010-9-20

四个推理

• 近朱者赤

• 近青者明, 近白者潔

• 近朱者赤,非浸朱者赤

• 近红諳知青红皂白

Page 48: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 492010-9-20

演講提綱

• 機器學習

• 機器學習人類學習

• 人類學習人類創新

• 結語

Page 49: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 502010-9-2050

創新的基础

人才

學習優良

工作勤奮

團隊精神

創新

激情

Prepared mind

堅持不懈

革命性的思维

Presenter
Presentation Notes
Page 50: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 512010-9-20

Michelangelo

Page 51: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 522010-9-20

Page 52: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 532010-9-20

Renaissance

Page 53: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 542010-9-20

Last Supper

Page 54: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 552010-9-20

Page 55: 2010 09-20-張志威老師-機器學習到人類創新

Linear PerspectiveFillipo Brunelleschi 1377-1466

實驗光學效應

顯示物體近距離看起來

比較大,远距離

比較小

確認 via 針孔實驗

實踐洗禮堂平面板繪畫

發表 by Alberti

2010-9-20 56@ NCCU

Page 56: 2010 09-20-張志威老師-機器學習到人類創新

3D 2D Projections

Angel Figure 5.32010-9-20 57@ NCCU

Page 57: 2010 09-20-張志威老師-機器學習到人類創新

2010-9-20 @ NCCU58

Color PrinciplesAristotelian

Color existed as property of surfaces of an object

Seven colors: white, black, yellow, red, purple, green & blueWhite: water and air

Yellow: fire and the sun

Black: results from elements in transition

Rainbow colors: red, green and purple

Page 58: 2010 09-20-張志威老師-機器學習到人類創新

2010-9-20 @ NCCU59

Color PrinciplesLeonardo

Percussions and rebounds (直擊反射) of light! Should be analyzed with mathematical precisions

Light: various colors best reveal their beauty at different light level… yellow (light), blue (saturated)

Shadows: the variety of colors in shadow must be as great as that the colors of the objects in that shadow

Colors in motion: light shimmering through the moving leaves, waters flowing along a spring brook

Page 59: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 602010-9-20

Page 60: 2010 09-20-張志威老師-機器學習到人類創新

2010-9-20 @ NCCU61

Color PrinciplesRuben

Optical mixture

Colors should be tormented, no more than two pigments should be mixed

Rather, colors should be applied simply, directly, and separately onto the canvas

Page 61: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 632010-9-20Georges Seurat

1859-91

Page 62: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 642010-9-20

Page 63: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 652010-9-20

王维

苏轼说:“味摩诘之诗,诗中有画;

观摩诘之画,画中有诗”

明月松间照,清泉石上流。

大漠孤烟直,长河落日圆。

水墨渲淡

Page 64: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 662010-9-20

王维

Page 65: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 672010-9-20

董源

水墨三维 “點、線、面” 的立體結構。

線 — 披麻皴

點 — 點错皴

面 — 斫垛皴

用明度展現陰陽、疏密、遠近

Page 66: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 682010-9-20

董源

Page 67: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 692010-9-20

诗 要素

结构格式

对仗

韵 (rhyme)律 (rhythm)節奏

比喻 (Comparison)明喻 (simile)暗喻 (metaphor)

Page 68: 2010 09-20-張志威老師-機器學習到人類創新

《使至塞上》王維

單車欲問邊, 屬國過居延。

征蓬出漢塞, 歸雁入胡天。

大漠孤煙直, 長河落日圓。

蕭關逢候騎, 都護在燕然。

@ NCCU 702010-9-20

Page 69: 2010 09-20-張志威老師-機器學習到人類創新

《臨江仙》蘇軾

夜飲東坡醒复醉,

歸來彷彿三更。

家童鼻息已雷鳴。

敲門都不應,

倚杖聽江聲。

長恨此身非我有,

何時忘卻營營?

夜闌風靜縠紋平。

小舟從此逝,

江海寄馀生。

@ NCCU 712010-9-20

Page 70: 2010 09-20-張志威老師-機器學習到人類創新

La Belle Dame Sans MerciJohn Keats

Oh what can ail thee, knight-at-arms,Alone and palely loitering?The sedge has withered from the lake,And no birds sing.

@ NCCU 722010-9-20

Page 71: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 732010-9-20

等 November 5th, 1993; by Ed

踱過了MJH 前的每一塊石磚等候你出現四角花園裡爭奪地嬌豔駐足不了 我游移的視線似倆支箭詰問每一張彷彿的容顏

Page 72: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 742010-9-20

等 …continue

踱過了MJH 前的每一塊石磚等候你出現四角花園裡爭奪地嬌豔駐足不了 我游移的視線似倆支箭詰問每一張彷彿的容顏只可惜不能在那長廊伸盡處轉彎 或上屋簷探探 像只輕燕否則也不會有這般的懸念

Page 73: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 752010-9-20

等 Second Stanza

天空的形狀 四方分散陣風 你在做第幾度方向的改變?背後 疑似你走近的聲音回首 只見秋寒一片唉 這風景的顏色怕已褪的有些疲倦

Page 74: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 762010-9-20

等 Third Stanza

然後終於覓著了你那熟悉的眉和眼姍姍前來 糅和著十月芙蓉的靦腆你說你錯等在教堂後邊的書店一直提著抱歉接了過來我說:“真高興又再相見”那秋風也拂不亂你微笑的臉報答了我這時辰固執的心願

Page 75: 2010 09-20-張志威老師-機器學習到人類創新

生命四季

第一季 年轻

你我似两片云朵

一阵风起 各分东西

另一阵风起

又相聚在一起

希望风能应许

将我们吹在一起

降雨时

溶在一地的青草里

第二季 苦难

纵使降落在异域

流过陌生的湖泊瀑布

也不至惊怕

我知道你会寻我

在每一个河口 溪谷

你会回头望我

与海洋交汇的地方

你会永远在那儿等我

@ NCCU 772010-9-20

Page 76: 2010 09-20-張志威老師-機器學習到人類創新

生命四季

第三季 成长

鄹雨歇去

剩余的残云掩不住

整个天空 几带彩鱼

出港的人们

快快扬起船帆

出发啦

第四季 使命

Thank you for guiding me straight and true through the many obstacles in my path. And for keeping me resolute when all around seemed lost.

--- From the book of Eli

to be continued…

@ NCCU 782010-9-20

Page 77: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 792010-9-20

演講提綱

• 機器學習

• 機器學習人類學習

• 人類學習人類創新

• 結語

Page 78: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 802010-9-20

Renaissance artists Striving to introduce mathematical rigor into perfecting art

Lessing, Wieland, Schiller, Herder, to Goethe Setting the aesthetic vision for Germany! What is beauty? What is your ideal beauty?

William Blake“I will not cease from mental fight, Nor shall my sword sleep in my hand Till we have built Jerusalem In England's green and pleasant land.”

You & I ?

創新文化

我不會停止

精神戰鬥

我的劍不會

睡在我的手

中直到我們已

經建立創新

的文化

在台湾的綠

色和愉快的

土地上。

Page 79: 2010 09-20-張志威老師-機器學習到人類創新

@ NCCU 812010-9-20

創新

激情

Prepared mind堅持不懈

革命性的思维