Entertaining with Word Play- · 2018. 12. 8. · Entertaining with Word Play-Computer Couplet,...
Transcript of Entertaining with Word Play- · 2018. 12. 8. · Entertaining with Word Play-Computer Couplet,...
Entertaining with Word Play-Computer Couplet, Poetry, Lyric and Riddle
以字为乐:电脑对联、诗词和谜语
Dr. Ming Zhou(周明博士)
Microsoft Research Asia(微软亚洲研究院)
ROCLING 2016
Agenda
• Introduction
• Computer couplets
• Computer poetry
• Computer riddle
• Computer lyrics(new)
• Conclusions
Agenda
• Introduction
• Computer couplets
• Computer poetry
• Computer riddle
• Computer lyrics(new)
• Conclusions
Chinese Couplets (对联)
创大业一帆风顺
展宏图万事胜意
Chinese Classic Poetry(古典诗)
床 前 明 月 光 In front of my bed the moonlight is very bright.
Chuang QianMingYueGuang
疑 是 地 上 霜 I wonder if that can be frost on the floor?
Yi Shi Di Shang Shuang
举 头 望 明 月 I list up my head and look at the full noon, the dazzling moon.
Ju Tou Wang Ming Yue
低 头 思 故 乡 I drop my head, and think of the home of old days.Di Tou Si Gu Xiang
Night Thoughts (静夜思) by Li Bai (李白), Translated by Amy Lowell
Input: keywords or NL expressionsOutput: a classic poetry
Lyrics (歌词)
又见炊烟升起 暮色罩大地想问阵阵炊烟 你要去哪里
夕阳有诗情 黄昏有画意诗情画意虽然美丽我心中只有你
又见炊烟升起 勾起我回忆愿你变作彩霞 飞到我梦里
夕阳有诗情 黄昏有画意诗情画意虽然美丽我心中只有你
Input: keywords and a lyrics template (词牌或者模板)Output: a lyrics (词或者歌词)
Character Riddles (字谜)
儿女双全
好
思
一心入画中三星伴月似画里
Solving Riddles(猜谜底)
Generating Riddles(出谜面)
Our Journey of Play with the Word
• Milestones– Couplets (2005)Poetry (2010)Riddle (2014)Lyrics (2016)
• Approach– Model: data-driven, statistical ML and deep learning
– App: hybrid machine intelligence plus human intelligence
• Research results
– COLING 2008, AAAI 2012, EMNLP 2016, PACLIC 2009, 中文信息学报(2010), 计算机学报(2016)
– Award of “Best Innovations of MSRA 10 Years” (2008)
– IP licensed to Sina (2008) and Neteasy (2016) for mobile gaming
– Featured by CCTV in 2006 more than 3 times
• Refer to website: http://duilian.msra.cn
Agenda
• Introduction
• Computer couplets
• Computer poetry
• Computer riddle
• Computer lyrics
• Conclusions
Problem Definition
• Given the FS, generate the SS so that the two sentences form a qualified Chinese couplet
FS: 海(hai) 阔(kuo) 凭(pin) 鱼(yu) 跃(yue)
sea wide allow fish jump
| | | | |
SS: 天(tian) 高(gao) 任(ren)
鸟(niao) 飞(fei)
sky high permit bird fly
FS: first sentence (上联)SS: second sentence (下联)
Regulation 1
• The FS and SS should be identical in length in terms of char and words, and keep the same manner of word segmentation
– FS: 知识能致富 (knowledge can bring richness)
– SS: 勤劳可兴家 (work can raise family)
Regulation 2
• Corresponding words in FS and SS should agree in their part of speech
海 阔 凭 鱼 跃
sea wide allow fish jump
天 高 任 鸟 飞
sky high permit bird fly
noun adjective conjunction noun verb
Regulation 3
• The contents of the FS and SS should be related but normally the characters in FS cannot be duplicated in SS
– FS: 海阔凭鱼跃 (sea wide allow fish jump)
– SS: 天高任鸟飞 (sky high permit bird fly)
Regulation 4
• The last character of the FS should be pronounced in “仄”(Ze) tone and the last character of the SS should be in “平”(Ping) tone
– FS: 海阔凭鱼跃 (sea wide allow fish jump)
– SS: 天高任鸟飞 (sky high permit bird fly)• 跃-> “仄”(Ze)
• 飞-> “平”(Ping)
In stricter cases, other characters in FS/SS should also conform to this regulation: see 马蹄韵http://baike.baidu.com/view/873689.htm
Regulation 5
• The writing styles of the FS and SS should be identical in
– Character (or word) repetition (or not)
– Pronunciation repetition (or not)
– Character decomposition (or not)
– And other aspects…
FS and SS Share the Same Style
风 (wind)----------------水 (water)
吹 (blow) ---------------使 (make)荞(buckwheat) -- ------舟 (ship)
动(wave)----------------流 (go)
桥 (bridge) -------------洲 (island)
未 (not) -----------------不 (not)
动(wave) ---------------流(go)
Repetition of pronunciations(音韵联)
FS and SS Share the Same Style
有 (have)----------------- 缺 (lack)
子 (son) -------------------鱼 (fish)
有 (have) ------------------缺 (lack)
女 (daughter)-------------羊 (mutton)
方 (so) ---------------------敢 (dare)
称 (call) --------------------叫 (call)
好(good) -------------------鲜(fresh)
Decomposition of characters (拆字联)
鲜鱼 羊
好女 子
FS and SS Share the Same Style
板桥(Banqiao)---------------- 东坡 (Dongpo)
造(produce) -------------------居 (live)
桥(bridge) ---------------------坡 (mountain)
板(board)----------------------东(east)
Person name
(人名联)
Palindrome(回文联)
•Banqiao(板桥) and Dongpo(东坡) are famous litterateurs
•Reading from top to down is identical to down to top
MT vs. SS Generation
• Machine translation– He sent her a bunch of flowers .
– 他给她送了一束花。
• SS generation (monotone decoder)– FS: 海阔凭鱼跃 (sea wide allow fish jump)
– SS: 天高任鸟飞 (sky high permit bird fly)
– No word insertion, deletion and reordering– However, conforming to linguistic regulations is difficult
Couplet Generation Model
SS Generation Approach
• A multi-phase SMT approach
– Phase1: a phrase-based log-linear model
– Phase2: some linguistic filters
– Phase3: a Ranking SVM
Phrase-based log-linear model
SS output
Linguistic filters
FS input
N-best candidates
Ranking SVM model
天高sky high
SS Generation Process
山hill
天sky
高high
深deep
任permit
倚depend
虫insect
鸟bird
虎tiger
飞fly
舞dance
鸣tweedle
鸟飞bird fly
山高hill high
海 阔 凭 鱼 跃Sea wide allow fish jump
虎啸tiger roar
山高任鸟飞天高任鸟鸣天高任鸟飞山高靠虎啸山高任虎啸山深任鸟飞天高任花香
……
SMT decoding Reranking
天高任鸟飞山高任鸟飞天高任鸟鸣天高任鸟舞山深任鸟飞山高任花香天高任花香
……
山高任鸟飞天高任鸟鸣天高任鸟飞山深任鸟飞天高任花香天高任鸟舞山高任花香
……
Linguisticfiltering
Phrase-based Log-linear Model
• Given a FS denoted as F={f1, f2, .. .,fn}, to seek a SS denoted as S*={s1, s2, …, sn} that satisfies
• Where fi and si are Chinese characters
• Five feature functions• Phrase translation model (PTM)• Inverted PTM• Character translation model (CTM)• Inverted CTM• Language model
M
i
ii
S
FShS1
),(logmaxarg*
Word vs. Character
We select character based because – Word based
• Suffers from segmentation mistakes
• OOV is a big problem
• No proper word segmentation tool for the classic Chinese poems
– Character based• Free from the segmentation mistakes
• Free of OOV problem
• Classic couplets are mostly character based
Feature Functions
• Phrase translation model and inverted PTM• FS: [海阔] 凭 [鱼跃] ([sea wide] allow [fish jump])
• SS: [天高] 任 [鸟飞] ([sky high] permit [bird fly])
• Character translation model and inverted CTM• FS: 海阔凭鱼跃 (sea wide allow fish jump)
• SS: 天高任鸟飞 (sky high permit bird fly)
• Language model– Character-based trigram model
• p(海阔凭鱼跃)= p(海|START)*p(阔| START 海)*p(凭|海阔)*p(鱼|阔凭)*p(跃|凭鱼)
Training Data
• Couplet Data
– Classic Chinese couplets
• From books and from the web
– Extract sentence pairs from ancient Chinese poems
– From couplet forum websites
– Finally, 970,000 couplets obtained
• LM data
– Couplets data enriched by Chinese poems (1,600,000 sentences)
Couplet in Poems
• Regulated verse
春望 (唐杜甫)国破山河在,
城春草木深。
感时花溅泪,
恨别鸟惊心。
烽火连三月,
家书抵万金。
白头搔更短,
浑欲不胜簪。
These 2 pairs of sentences are considered having nearly identical style of Chinese couplets
Linguistic Filters
• SMT model can not ensure that the SS has
– The same writing styles as the FS
– Correct tone for the last character
• Use linguistic filters
– Character repetition filter
– Pronunciation repetition filter
– Character decomposition filter
– Phonetic harmony filter
Linguistic Filters 1
• Character repetition filter
– The FS• “有女有子方称好” (have daughter have son so call good)
– SS candidates• “缺鱼缺羊敢叫鲜” (lack fish lack mutton dare call delicious)
√• “缺鱼少羊敢叫鲜” (lack fish miss mutton dare call delicious)
×
Linguistic Filters 2
• Pronunciation repetition filter
– The FS• “风吹荞动桥未动” (wind blow buckwheat wave bridge not
wave)
– SS candidates• “水使舟流洲不流” (water make ship move island not move)
√
• “水使舟流岛不流” (water make ship move island not move)
×
Linguistic Filters 3
• Character decomposition filter
– The FS• “有女有子方称好” (have daughter have son so call good)
– SS candidates• “缺鱼缺羊敢叫鲜” (lack fish lack mutton dare call delicious)
√
• “缺鱼缺牛敢叫鲜” (lack fish lack beef dare call delicious)
×
Linguistic Filters 4
• Phonetic harmony filter
– The FS• “海阔凭鱼跃” (sea wide allow fish jump)• 跃 : Ze
– SS candidates• “天高任鸟飞” (sky high permit bird fly)• 飞: Ping
√
• “山高任虎啸” (mountain high permit tiger roar)• 啸: Ze
×
Improve Cohesiveness by Candidate Re-ranking
• Ranking SVM for re-ranking SS candidate– To leverage long-distance features
• Two more features– Mutual information (MI)– MI-based structural similarity (MISS)
• Parameter estimation– Tool: SVM Light– Training data: 200 FSs and each of them has 50 SSs labeled as positive
or negative by human. 10,000 pairs in total annotated with (+1. -1)
xwxfw ,)(
Candidate Re-ranking (con’t)
• Mutual information (MI)– Motivation
• Candidate 1: “天高任鸟飞” (sky high permit bird fly)
• Candidate 2: “天高任狗叫” (sky high permit dog bark)
• Candidate 1 is better– MI(天,鸟) > MI(天,狗) ( MI(sky, bird) > MI(sky, dog) )
– To measure the semantic consistency of words in a candidate SS
1
1
1
1 11 )()(
),(log),()(
n
i
n
i
n
ij ji
jin
ij
ji
spsp
sspssISMI
Candidate Re-ranking (con’t)
• MI-based structural similarity (MISS)– Motivation: structural similarity:
• 海阔凭鱼跃 (sea wide allow fish jump)
• 天高任鸟飞 (sky high permit bird fly)
– To measure the structural similarity
• Given the FS or SS , its MI vector:
||||),cos(),(
sf
sfsf
VV
VVVVSFMISS
}.,,,..,,{ 12311312 nnn MIMIMIMIMI
},...,,{ 21 nwww
Banner (横批) Generation
• Matching between couplets and banners1. Build a banner database2. Each banner is expressed by a vector space of words used in couplets using
it as banner3. For a couplet, search the most matched banners in the database
Input couplet
Matching algorithm
N-best banners
Banner database
…万民欢腾: 睦 0.36 宵 0.31 庆 0.29 闹 0.28 锣 0.24…九州共乐: 疆 0.38 瑞 0.35 春 0.33 努 0.27 私 0.26…五谷丰盈: 娆 0.48 姿 0.31 凌 0.28 顺 0.27 壮 0.26……
五湖四海迎盛世千家万户庆丰年 景泰新春
九州同乐华夏皆春华夏腾飞安生乐业
…
Experimental Results of Couplet Generation
Automatic Evaluation of SS
• BLEU for SS evaluation
• N = 3; BP = 1
• Pn is position-sensitive
• Data set– 1051 FSs and their SSs mined from couplet forums
• 24 references for each FS on average
– 600 as development set and 451 as test set
)logexp(1
N
n
nn pwBPBLEU
Reference Examples
• FS
– 品茶不为渴 (degust tea not because thirstiness)
• References:
–弹曲却因情 (play zither but for feeling)
– 踏雪只因梅 (trample snow only for plum)
– 醉酒总关情 (drink wine always relate feeling)
– …
Feature Evaluation
Incrementally adding new features
Same linguistic filters for all settings
Features BLEU
Baseline Phrase TM(PTM) + LM 0.276
Phrase-based
SMT Model
+ Inverted PTM 0.282
+ Character TM (CTM) 0.315
+ Inverted CTM 0.348
Ranking SVM
+ Mutual information (MI) 0.356
+ MI-based structural similarity 0.361
Overall Performance
• By human evaluation
– 100 FSs
– Output 10 best SSs by our best system for each FS
– Human labeling: acceptable or not
– Metric• top-n inclusion rate is defined as the percentage of the test
sentences whose top-n outputs contain at least one acceptable SS.
Top-1 Top-10
Top-n inclusion rate 0.21 0.73
Examples
• FS: 月落乌啼霜满天 (from Tang poem)
• SS: 风吹雁过雨连宵
• FS: 千江有水千江月 (from Buddhist words)
• SS: 万里无云万里星
• FS: 秦淮河桨声灯影 (Inspired from Zhu Ziqing’s essay)
• SS: 松花江水色月光
• FS: 此木为柴山山出 (此+木=柴;山+山=出)
• SS: 白水作泉日日昌 (白+水=泉;日+日=昌)
Chinese Couplets (http://duilian.msra.cn)
http://video.sina.com.cn/v/b/10937201-1452530713.html
MS Coupletshttp://duilian.msra.cn/
Couplet Web Service
• Step 1: input a first sentence
Couplet Web Service (con’t)
• Step 2: select a second sentence from automatically generated candidates
Incorporating User’s Intelligence
System Combination
Training data
Source-Channel model
Second sentence output
Translation model
Log data
Re-ranking
First sentence input
Language model
Mutual informatio
n
N-best candidates
Translation model
Language model
Mutual informatio
n
User operation
Agenda
• Introduction
• Computer couplets
• Computer poetry
• Computer riddle
• Computer lyrics(new)
• Conclusions
Models of Chinese Quatrain
Problem and Framework
First sentence generation
Key words
The complete poem
Poetic phrase
taxonomy
SMT-based next sentence
generation
Regulation 1
• A quatrain consists of four same length sentences, each sentence with 5 characters or 7 characters.
【唐】杜甫
两个黄鹂鸣翠柳,一行白鹭上青天。
窗含西岭千秋雪,门泊东吴万里船。
【唐】宋之问
岭外音书绝,经冬复历春。
近乡情更怯,不敢问来人。
7-char lines
5-char lines
Regulation 2
(仄)仄平平仄,(平)平仄仄平。
(平)平平仄仄,(仄)仄仄平平。
【唐】宋之问
岭外音书绝,经冬复历春。
近乡情更怯,不敢问来人。
• The end characters of 2nd sentence and 4th sentence
follow same rhyme (二、四句要押韵)
• Four lines follow rhythmic constraints(平仄约束)
春,人 follow same rhyme
平水韵:http://www.yoyv.com/Blog/log/wuxi2008_114557_/
Regulation 3
【唐】杜甫 绝句
<B>两个黄鹂鸣翠柳↔一行白鹭上青天↔窗含西岭千秋雪↔门泊东吴万里船<E>
【唐】刘禹锡 竹枝词
<B>山桃红花满上头蜀江春水拍山流
花红易衰似郎意水流无限似侬愁<E>
【唐】李白 早发白帝城
<B>朝辞白帝彩云间千里江陵一日还↗两岸猿声啼不住轻舟已过万重山 <E>
【唐】王昌龄 闺怨
<B>闺中少妇不知愁 ∆春日凝妆上翠楼
∆ 忽见陌头杨柳色 ∆悔教夫婿觅封侯<E>
• Four lines follow a logical relationship within sentences, to start, continue, transition and summarize what the poet wants to say
4 lines describe things in coordination (并列) 4 lines contains transition (含转折)
4 lines describes things in sequential (顺接) 4 lines contains things in consequence (因果)
<B> Start; <E> Summarize; ↔ coordination;↗ transition; sequential; ∆ cause and consequence
First Sentence Generation Model
• Given a FS denoted as F={keyword1, keyword2, .. .,keywordn}, seek a SS denoted as S*={s1, s2, …, sn} that satisfies
Where fi and si are Chinese phrases
• Feature functions• Language model, translation model from the key words, …
• As a simple implementation– Only three keywords are allowed– Only language model is used– Keywords are chosen from a fixed taxonomy
M
i
ii
S
FShS1
),(logmaxarg*
A Poetic Word Taxonomy(诗学含英)
• Hierarchical structure
时令类
歌舞类
40 Classes
游眺类
1016 Clusters 41218 Phrases
春色盛夏晚秋
芳草山青燕喃
踏青访友
(清)刘文蔚辑
The First Sentence Generation
char 1 char 2 char 3 char 4 char 5
明 媚 寻 芳 草晴 光 鱼 山 行花 变 新 红 绽江 山 丽 迎 门… … … … …
明媚寻芳草晴光寻芳草江山丽蝶飞明媚鱼迎门花变新红绽。。。。。。
Language model: Character-based trigram modelp(明媚寻芳草)= p(明|START)*p(媚| START 明)*p(寻|明媚)*p(芳|阔媚寻)*p(草|寻芳)
Key Words Phrases 春日明媚,江山丽,晴光郊行寻芳草,山行访友迎门,笑问
Generating the Subsequent Sentence
M
i
ii
S
S
PFSh
PFSpS
1
),,(logmaxarg
),|(maxarg*
M
i
ii
S
S
FSh
FSpS
1
),(logmaxarg
)|(maxarg*
Step 2: Reranking the results with the features of P, the contextual features, as wells as poetry constraints of quatrain writing.
Step 1: Get the top-N results of subsequent sentence using phrasal SMT
Same as Couplet
Generation Model
Log-Linear model
Features for Phrasal SMT
• S and F are segmented into phrases and
Phrase translation model
Inverted phrase translation
model
Lexical weight
Inverted lexical weight
Language model
I
i
ii sfpPFSh1
1 )|(),,(
I
i
ii fspPFSh1
2 )|(),,(
I
i
iiw sfpPFSh1
3 )|(),,(
I
i
iiw fspPFSh1
4 )|(),,(
)(),,(5 SpPFSh
Iss ...1 Iff ...1
Note: These are the features used in Chinese couplets generation
Training Data
• For the training of translation models– <Tang Poems>, <Song Poems>, <Ming Poems>, <Qing
Poems> and <Taiwan Poems>, etc
– More than 300,000 poems and 3,500,000 sentences
– Use all adjacent sentence pairs as parallel sentence pairs
• For the training of language models– Above poetry data (3,500,000 sentences)
– Supplemented by classic Chinese essays (12,000,000 sentences)
Keep Rhyme(押韵)
• “Hard produce” the translation tables that follows the rhyme requirement.
The first sentence: 月落乌啼霜满天The second sentence: 江枫渔火对愁眠The current sentence: “姑苏城外寒山寺”The next sentence:”夜半钟声到客船”(rhythm “an”)。
Insert the particular translation pair into the translation table dynamically.
寺 ||| 寒 ||| 1 2.31509e-009 1 2.17433e-010 寺 ||| 船 ||| 1 2.31509e-009 1 2.17433e-010 寺 ||| 山 ||| 1 2.31509e-009 1 2.17433e-010
Generating Three Individual Sentences
• The functions of the four sentences of a quatrain are different– “start, continue, transition, and summarize”
• 3 individual models – firstsecond, secondthird, and thirdfourth
• Two models– TMs: three specific translation models trained on sentence
pairs at the corresponding positions– TMb: a background translation model trained on all
sentence pairs
• InterpolationTM = 0.8TMs+0.2TMb
Enhance Content Cohesiveness
• The MI score is used as a feature for the re-ranking process as follows:
where the first (i – 1) sentences are all used.
• Note: the structural similarity is not used, because quatrain doesn’t require keeping strong correspondences as couplets
n
x
in
z
zx
n
y
yx psMIfsMIPFSh1
)1(*
11
6 ),(),(),,(
Consistency with all
previous sentences
Consistency with last sentence
Experimental Results of Chinese Quatrain
Evaluations
• Generated first sentence with the given three keywords
• Generated subsequent sentence with the current sentence and all sentences
• Generated poem as a whole
Evaluation Criterion of the Whole Quatrain
Items weight C (poor) B (acceptable) A (good)
Rhyme and ping-ze
templates
5/17 Rhyme and ping-ze
are all wrong
One item is wrong All are correct
Fluency and elegancy 5/17 The whole poem is
not fluent
Only one sentence is
not fluent
Fluent and elegant as a
poem
Structure
start-continue-
transition-
summarize
3/17 The four sentences
express un-related
ideas.
Only one sentence is
not related.
Meet the start-continue-transition-summarize
content
Consistency with the
keywords
3/17 Unrelated with
keywords
Miss one keyword Reflect three key
words together to get a
poem
Image generated 1/17 No clear image
generated
Simple image generated
from the poem
Elegant image or good
argument generated
Evaluation Results
Daoxiangju
Poetry
generator
Full
Automatic
Mode
Interactive Mode
Avg.
score54.83 68.34 77.83
Daoxiangju Poetry generator(http://www.poeming.com/web/index.htm) is a rule based poetry generator.
1. 20 groups of keywords2. 20 groups of poetry with half 5-char and half 7 char-poetry3. Full automatic vs. interactive4. Comparison with Daoxiangju
Generated Quatrains
• 感归
零落泣鬼神
秋来愁人心
肝肠断挥洒
不思归日吟
• 春兴
残花飘黄叶
细雨落青山
蝶飞红杏里
燕舞绿杨湾
• 从军北征
雁字风月一时清
天书云山千里远
锦字凭谁寄笔力
人生何日归来晚
• 望洞庭
移舟雨逐行云水
一路风随日月天
云破长江万里船
风来一水千山川
Agenda
• Introduction
• Computer couplets
• Computer poetry
• Computer riddle
• Computer lyrics(new)
• Conclusions
Workflow
Alignments and Rules
• An alignment is a metaphor to map to a char or radical
• A rule is an operation on a char to get radical
千金出闺门 ( 娃): 千金 女
上岗必戴安全帽( 密): 安全帽 宀
千金出闺门( 娃): 出闺门 圭 (A-B)
上岗必戴安全帽(密) 上岗 山(LowerRemove)
Alignment Extraction
• For riddle q and its corresponding solution s
– q -> 𝑤1, 𝑤2, ⋯ ,𝑤𝑛 and s -> 𝑟1, 𝑟2, ⋯ , 𝑟𝑚
• Count ([𝑤𝑖 , 𝑤𝑗], 𝑟𝑘) for all (i, j ∈ 1, 𝑛 , 𝑘 ∈ 1,𝑚 )
• Select high frequent ([𝑤𝑖 , 𝑤𝑗], 𝑟𝑘) as an alignment
• Example
妥
安
娃
采青归去值千金
一字千金夜半得
千金出闺门
壮士一去赠千金 妆
千金 女
Alignment Extraction (continued)
• Let [w1,w2] denote any two successive chars in a riddle q
• If 𝒘𝟐 is a radical of 𝒘𝟏 , and the rest part of w1
(denoting as r) appears in the solution s, then ([𝑤1, 𝑤2], 𝑟) is an alignment
• Examples驱马而行到南安(妪):
骑马飞奔走四方 (畸):
少给孩子喝汽水(氦): 汽水 气
驱马 区
骑马 奇
孩子 亥
Rule Extraction
西湖 氵
上岗 山
断球后 王
西(.) RightRemove
上(.) LowerRemove
断(.)后 RightRemove
Alignments Rules
Partial List of Rules Extracted
Riddle Solving
Riddle Pairs
西 花
氵 艹 两
Alignment
Riddle
Answer满Alignment
Table
Decoder
Training
Decoding
1. Find all possible parts X in the alignment table and rule table from the input riddle;
2. Calculate similarity between X and candidates Y.
前湖 相 会两
必 戴 全 帽安
宀山
密
上 岗
上(*)
Rule
AlignmentRuleSingle
Matching
Features for Riddle Solving
Feature Description
Correct_Radical Number of matched radicals
Missing_Radical Number of mismatched radicals
Disappearing_Radical Number of radicals that disappear in all characters of riddle descriptions
Single_Matching Number of clues derived from character itself
Alignment_Matching Number of clues derived from alignments
Rule_Matching Number of clues derived from rules
Length_Rate Rate of the length of clues in riddles
Frequency Prior probability of this character as a solution
Riddle Generation
Features for Riddle Generation
Feature Description
Riddle_Length Length in characters of the candidate riddle
Riddle_Relative_Length Abs(Riddle_Length-5) because the length of common riddles is between 3 and 7
Number_Radical Number of radicals that the character decompose
Avg_Freq_Character Average number of frequencies of characters in riddle
Max_Freq_Radical Maximized number of frequencies of characters in riddle
Number_Alignment Number of alignments used for generating the candidate
Length_Alignment Length of characters from alignments
Number_Rule Number of rules used for generating the candidate
Length_Rule Length of characters from rules
LM_Score_R Score of language model trained by Chinese riddles, poems and couplets
LM_Score_G Score of language model trained by web documents
Agenda
• Introduction
• Computer couplets
• Computer poetry
• Computer riddle
• Computer lyrics(new)
• Conclusions
Encoder-Decoder Framework
input = (谁见幽人独往来)
-0.2
0.9-
0.10.50.7 0.0 0.2
Encoder
Decoder
output = ( )飘 渺 孤 鸿 影
Constraints = (词牌/歌词模板 (字数, 平仄, 押韵), 主题词, etc.)
-0.2
0.9-
0.10.50.7 0.0 0.2
渺 孤 鸿 影 </s>飘
RNN
𝑦𝑡
ℎ𝑡
𝑥𝑡
谁 见 幽 人 独 往 来
Songci (Song Lyrics) Generation
减字木兰花 主题:思乡
秋风初起,落叶夕阳无限意。两地风光,三十年来此地长。
相逢何处?无限相思何处去。把酒登楼,独倚危楼慰白头。
如梦令 主题:怀友
年少溪亭日暮,依旧楼台深处。
应记旧游时,二十四桥风露。人去,人去,回首故人何处。
• Model trained with 26,249 pieces of Songci• 300 Patterns of Songci
sunset
cloud
cheer
purple
…
RhythmPattern
Lyrics Generation
Lyrics
User’s Voice Sample
Music/Tunes
Singing SynthesisImage Understanding
Template Song
Generated Song
Input Image
1 2 3
Generation of a Song
一次就好 (原作)
演唱:杨宗纬作词:陈曦作曲:董冬冬
想看你笑想和你闹想拥你入我怀抱上一秒红着脸在争吵下一秒转身就能和好不怕你哭不怕你叫因为你是我的骄傲一双眼睛追着你乱跑一颗心早已经准备好一次就好我带你去看天荒地老在阳光灿烂的日子里开怀大笑在自由自在的空气里吵吵闹闹你可知道我唯一的想要世界还小我陪你去到天涯海角在没有烦恼的角落里停止寻找在无忧无虑的时光里慢慢变老你可知道我全部的心跳随你跳
一次就好(电脑生成)
演唱:SongBot作词:SongBot作曲:董冬冬
想过的你爱过的你终是靠近又远离包容你娇嗔的小脾气心疼你难过时的泪滴怀抱着你亲吻着你却仍担心会失去你曾以为你是我的唯一心心念念记挂着情思原来结束不过是简单的两个字明明说好要一辈子却这样分离曾以为多余的忐忑终成了现实你的心里从此再无痕迹各自转身留下了谁的苦涩回忆风中消散的是谁说的不离不弃雨里淹没的是谁缠的痴痴迷思挥手别过心里你的影子忘记你
• Model trained with 144,276 pieces of lyrics (Chinese songs)
一次就好
演唱:小琪作词:SongBot作曲:董冬冬
想过的你爱过的你终是靠近又远离包容你娇嗔的小脾气心疼你难过时的泪滴怀抱着你亲吻着你却仍担心会失去你曾以为你是我的唯一心心念念记挂着情思原来结束不过是简单的两个字明明说好要一辈子却这样分离曾以为多余的忐忑终成了现实你的心里从此再无痕迹各自转身留下了谁的苦涩回忆风中消散的是谁说的不离不弃雨里淹没的是谁缠的痴痴迷思挥手别过心里你的影子忘记你
Conclusions
• A series of innovations for Chinese language gaming – A SMT approach is applied to solve Chinese couplets, poetry and
riddle with promising results– New work: RNN for lyrics generation including Song Lyrics and
songs– Website: http://duilian.msra.cn
• There are much space for future improvements such as:– Handel more types of couplets – Allow natural language expressions as input for poetry
generation (overcoming the limitation of taxonomy) – Improve the cohesiveness of poetry and lyrics
• It would be interesting to revisit couplet, poetry, riddle with RNN
Acknowledgment
• Dr. Harry Shum (then GM of MSRA, later VP for Bing Search and now EVP for MSR) for his suggestion of Chinese couplets in 2005
• Long Jiang, Yanjun Ma, Fazhou Wu, Litian Tao, Hao Su for their contribution to Chinese couplets
• Jing He, Long Jiang for their contribution to Chinese quatrain
• Chuanqi Tan, Furu Wei for their contribution to riddle• Nan Yang. Furu Wei for their contribution to lyrics• IEG (Intelligent Engineering Group of MSRA) for the
deployment at http://duilian.msra.cn• Late Prof. Feng-zhu Luo (Yuanzhi University) for her advice
on classic poetry
Publications
• Long Jiang, Ming Zhou: Generating Chinese Couplets using a Statistical MT Approach. COLING 2008.
• Jing He, Ming Zhou, Long Jiang: Generating Chinese Classical Poems with Statistical Machine Translation Models. AAAI 2012.
• Generating Chinese Couplets and Quatrain Using a Statistical Approach. Ming Zhou, Long Jiang, Jing He. In PACLIC 2009.
• 何晶, 周明, 蒋龙, 基于统计的汉语格律诗生成研究, 中文信息学报, 2010年第02期。
• 蒋锐滢, 崔磊, 何晶, 周明, 潘志庚, 基于主题模型和统计机器翻译方法的格律诗辅助创作, 计算机学报, 计算机学报2015年第12期。
• Chuanqi Tan, Furu Wei, Li Dong, Weifeng Lv and Ming Zhou, Solving and Generating Chinese Character Riddles, EMNLP 2016.