Chinese Couplet Generation with Neural Network Structures · couplet so as to illustrate that each...

11
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 2347–2357, Berlin, Germany, August 7-12, 2016. c 2016 Association for Computational Linguistics Chinese Couplet Generation with Neural Network Structures Rui Yan 1,2 , Cheng-Te Li 3 , Xiaohua Hu 4 , and Ming Zhang 5 1 Institute of Computer Science and Technology, Peking University, Beijing 100871, China 2 Natural Language Processing Department, Baidu Inc., Beijing 100193, China 3 Academia Sinica, Taipei 11529, Taiwan 4 College of Computing and Informatics, Drexel University, Philadelphia, PA 19104, USA 5 Department of Computer Science, Peking University, Beijing 100871, China [email protected], [email protected] [email protected], mzhang [email protected] Abstract Part of the unique cultural heritage of Chi- na is the Chinese couplet. Given a sen- tence (namely an antecedent clause), peo- ple reply with another sentence (namely a subsequent clause) equal in length. More- over, a special phenomenon is that corre- sponding characters from the same posi- tion in the two clauses match each oth- er by following certain constraints on se- mantic and/or syntactic relatedness. Au- tomatic couplet generation by computer is viewed as a difficult problem and has not been fully explored. In this paper, we formulate the task as a natural language generation problem using neural network structures. Given the issued anteceden- t clause, the system generates the subse- quent clause via sequential language mod- eling. To satisfy special characteristics of couplets, we incorporate the attention mechanism and polishing schema into the encoding-decoding process. The couplet is generated incrementally and iterative- ly. A comprehensive evaluation, using per- plexity and BLEU measurements as well as human judgments, has demonstrated the effectiveness of our proposed approach. 1 Introduction Chinese antithetical couplets, (namely “对联”), form a special type of poetry composed of two clauses (i.e., sentences). The popularity of the game of Chinese couplet challenge manifests itself in many aspects of people’s life, e.g., as a mean- s of expressing personal emotion, political views, or communicating messages at festive occasions. Hence, Chinese couplets are considered an impor- tant cultural heritage. A couplet is often written in calligraphy on red banners during special oc- casions such as wedding ceremonies and the Chi- nese New Year. People also use couplets to cele- brate birthdays, mark the openings of a business, and commemorate historical events. We illustrate a real couplet for Chinese New Year celebration in Figure 1, and translate the couplet into English character-by-character. Usually in the couplet generation game, one person challenges the other person with a sentence (namely an antecedent clause). The other person then replies with another sentence (namely a sub- sequent clause) equal in length and term segmen- tation, in a way that corresponding characters from the same position in the two clauses match each other by obeying certain constraints on semantic and/or syntactic relatedness. We also illustrate the special phenomenon of Chinese couplet in Figure 1: “one” is paired with “two”, “term” is associat- ed with “character”, “hundred” is mapped into “t- housand”, and “happiness” is coupled with “trea- sures”. As opposed to free languages, couplets have unique poetic elegance, e.g., aestheticism and conciseness etc. Filling in the couplet is consid- ered as a challenging task with a set of structural and semantic requirements. Only few best schol- ars are able to master the skill to manipulate and to organize terms. The Chinese couplet generation given the an- tecedent clause can be viewed as a big challenge in the joint area of Artificial Intelligence and Natural Language Processing. With the fast development of computing techniques, we realize that comput- ers might play an important role in helping people to create couplets: 1) it is rather convenient for computers to sort out appropriate term combina- tions from a large corpus, and 2) computer pro- grams can take great advantages to recognize, to learn, and even to remember patterns or rules giv- en the corpus. Although computers are no sub- 2347

Transcript of Chinese Couplet Generation with Neural Network Structures · couplet so as to illustrate that each...

Page 1: Chinese Couplet Generation with Neural Network Structures · couplet so as to illustrate that each character from the same position of the two clauses has the con-straint of certain

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 2347–2357,Berlin, Germany, August 7-12, 2016. c©2016 Association for Computational Linguistics

Chinese Couplet Generation with Neural Network Structures

Rui Yan1,2, Cheng-Te Li3, Xiaohua Hu4, and Ming Zhang5

1Institute of Computer Science and Technology, Peking University, Beijing 100871, China2Natural Language Processing Department, Baidu Inc., Beijing 100193, China

3Academia Sinica, Taipei 11529, Taiwan4College of Computing and Informatics, Drexel University, Philadelphia, PA 19104, USA

5Department of Computer Science, Peking University, Beijing 100871, [email protected], [email protected]

[email protected], mzhang [email protected]

Abstract

Part of the unique cultural heritage of Chi-na is the Chinese couplet. Given a sen-tence (namely an antecedent clause), peo-ple reply with another sentence (namely asubsequent clause) equal in length. More-over, a special phenomenon is that corre-sponding characters from the same posi-tion in the two clauses match each oth-er by following certain constraints on se-mantic and/or syntactic relatedness. Au-tomatic couplet generation by computer isviewed as a difficult problem and has notbeen fully explored. In this paper, weformulate the task as a natural languagegeneration problem using neural networkstructures. Given the issued anteceden-t clause, the system generates the subse-quent clause via sequential language mod-eling. To satisfy special characteristicsof couplets, we incorporate the attentionmechanism and polishing schema into theencoding-decoding process. The coupletis generated incrementally and iterative-ly. A comprehensive evaluation, using per-plexity and BLEU measurements as wellas human judgments, has demonstrated theeffectiveness of our proposed approach.

1 Introduction

Chinese antithetical couplets, (namely “对联”),form a special type of poetry composed of twoclauses (i.e., sentences). The popularity of thegame of Chinese couplet challenge manifests itselfin many aspects of people’s life, e.g., as a mean-s of expressing personal emotion, political views,or communicating messages at festive occasions.Hence, Chinese couplets are considered an impor-tant cultural heritage. A couplet is often written

in calligraphy on red banners during special oc-casions such as wedding ceremonies and the Chi-nese New Year. People also use couplets to cele-brate birthdays, mark the openings of a business,and commemorate historical events. We illustratea real couplet for Chinese New Year celebrationin Figure 1, and translate the couplet into Englishcharacter-by-character.

Usually in the couplet generation game, oneperson challenges the other person with a sentence(namely an antecedent clause). The other personthen replies with another sentence (namely a sub-sequent clause) equal in length and term segmen-tation, in a way that corresponding characters fromthe same position in the two clauses match eachother by obeying certain constraints on semanticand/or syntactic relatedness. We also illustrate thespecial phenomenon of Chinese couplet in Figure1: “one” is paired with “two”, “term” is associat-ed with “character”, “hundred” is mapped into “t-housand”, and “happiness” is coupled with “trea-sures”. As opposed to free languages, coupletshave unique poetic elegance, e.g., aestheticism andconciseness etc. Filling in the couplet is consid-ered as a challenging task with a set of structuraland semantic requirements. Only few best schol-ars are able to master the skill to manipulate andto organize terms.

The Chinese couplet generation given the an-tecedent clause can be viewed as a big challenge inthe joint area of Artificial Intelligence and NaturalLanguage Processing. With the fast developmentof computing techniques, we realize that comput-ers might play an important role in helping peopleto create couplets: 1) it is rather convenient forcomputers to sort out appropriate term combina-tions from a large corpus, and 2) computer pro-grams can take great advantages to recognize, tolearn, and even to remember patterns or rules giv-en the corpus. Although computers are no sub-

2347

Page 2: Chinese Couplet Generation with Neural Network Structures · couplet so as to illustrate that each character from the same position of the two clauses has the con-straint of certain

Figure 1: An example of a Chinese couplet forChinese New Year. We mark the character-wisetranslation under each Chinese character of thecouplet so as to illustrate that each character fromthe same position of the two clauses has the con-straint of certain relatedness. Overall, the coupletcan be translated as: the term of “peaceful andlucky” (i.e., 和顺) indicates countless happiness;the two characters “safe and sound” (a.k.a.,平 and安) worth innumerable treasures.

stitute for human creativity, they can process verylarge text repositories of couplets. Furthermore,it is relatively straightforward for the machine tocheck whether a generated couplet conforms toconstraint requirements. The above observationsmotivate automatic couplet generation using com-putational intelligence. Beyond the long-term goalof building an autonomous intelligent system ca-pable of creating meaningful couplets eventually,there are potential short-term applications for aug-mented human expertise/experience to create cou-plets for entertainment or educational purposes.

To design the automatic couplet generator, wefirst need to empirically study the generation cri-teria. We discuss some of the general generationstandards here. For example, the couplet gener-ally have rigid formats with the same length forboth clauses. Such a syntactic constraint is stric-t: both clauses have exactly the same length whilethe length is measured in Chinese characters. Eachcharacter from the same position of the two claus-es have certain constraints. This constraint is lessstrict. Since Chinese language is flexible some-times, synonyms and antonyms both indicate se-mantic relatedness. Also, semantic coherence is acritical feature in couplets. A well-written coupletis supposed to be semantically coherent amongboth clauses.

In this paper we are concerned with automaticcouplet generation. We propose a neural coupletmachine (NCM) based on neural network struc-tures. Given a large collection of texts, we learnrepresentations of individual characters, and their

combinations within clauses as well as how theymutually reinforce and constrain each other. Givenany specified antecedent clause, the system couldgenerate a subsequent clause via sequential lan-guage modeling using encoding and decoding. Tosatisfy special characteristics of couplets, we in-corporate the attention mechanism and polishingschema into the generation process. The coupletis generated incrementally and iteratively to re-fine wordings. Unlike the single-pass generationprocess, the hidden representations of the draftsubsequent clause will be fed into the neural net-work structure to polish the next version of clausein our proposed system. In contrast to previousapproaches, our generator makes utilizations ofneighboring characters within the clause throughan iterative polishing schema, which is novel.

To sum up, our contributions are as follows.For the first time, we propose a series of neu-ral network-based couplet generation models. Weformulate a new system framework to take in theantecedent clauses and to output the subsequen-t clauses in the couplet pairs. We tackle the specialcharacteristics of couplets, such as correspondingcharacters paired in the two clauses, by incorpo-rating the attention mechanism into the generationprocess. For the 1st time, we propose a novel pol-ishing schema to iteratively refine the generatedcouplet using local pattern of neighboring charac-ters. The draft subsequent clause from the last iter-ation will be used as additional information to gen-erate a revised version of the subsequent clause.

The rest of the paper is organized as follows. InSection 2, we briefly summarize related work ofcouplet generation. Then Sections 3 and 4 showthe overview of our approach paradigm and thendetail the neural models. The experimental resultsand evaluation are reported in Section 5 and wedraw conclusions Section 6.

2 Related Work

There are very few studies focused on Chinesecouplet generation, based on templates (Zhangand Sun, 2009) or statistic translations (Jiang andZhou, 2008). The Chinese couplet generation taskcan be viewed as a reduced form of 2-sentence po-em generation (Jiang and Zhou, 2008). Given thefirst line of the poem, the generator ought to gener-ate the second line accordingly, which is a similarprocess as couplet generation. We consider auto-matic Chinese poetry generation to be a closely re-

2348

Page 3: Chinese Couplet Generation with Neural Network Structures · couplet so as to illustrate that each character from the same position of the two clauses has the con-straint of certain

(a). Sequential couplet generation. (b). Couplet generation with attention.(c). Couplet generation with polishing schema.

Figure 2: Three neural models for couplet generation. More details will be introduced in Section 4.

lated research area. Note that there are still somedifferences between couplet generation and poetrygeneration. The task of generating the subsequen-t clause to match the given antecedent clause ismore well-defined than generating all sentences ofa poem. Moreover, not all of the sentences in thepoems need to follow couplet constraints.

There are some formal researches into the areaof computer-assisted poetry generation. Scientist-s from different countries have studied the auto-matic poem composition in their own languagesthrough different ways: 1) Genetic Algorithms.Manurung et al. (2004; 2011) propose to createpoetic texts in English based on state search; 2)Statistical Machine Translation (SMT). Greene etal. (2010) propose a translation model to genera-tion cross-lingual poetry, from Italian to English;3) Rule-based Templates. Oliveira (2009; 2012)has proposed a system of poem generation plat-form based on semantic and grammar templates inSpanish. An interactive system has been proposedto reproduce the traditional Japanese poem namedHaiku based on rule-based phrase search relatedto user queries (Tosa et al., 2008; Wu et al., 2009).Netzer et al. (2009) propose another way of Haikugeneration using word association rules.

As to computer-assisted Chinese poetry gener-ation. There are now several Chinese poetry gen-erators available. The system named Daoxiang1

basically relies on manual pattern selection. Thesystem maintains a list of manually created termsrelated to pre-defined keywords, and inserts termsrandomly into the selected template as a poem.The system is simple but random term selectionleads to unnatural sentences.

1http://www.poeming.com/web/index.htm

Zhou et al. (2010) use a genetic algorithm forChinese poetry generation by tonal codings and s-tate search. He et al. (2012) extend the coupletmachine translation paradigm (Jiang and Zhou,2008) from a 2-line couplet to a 4-line poem bygiving previous sentences sequentially, consider-ing structural templates. Yan et al. (2013; 2016)proposed a summarization framework to generatepoems. Recently, along with the prosperity of neu-ral networks, a recurrent neural network based lan-guage generation is proposed (Zhang and Lapata,2014): the generation is more or less a transla-tion process. Given previous sentences, the systemgenerates the next sentence of the poem.

We also briefly introduce deep neural network-s, which contribute great improvements in NLP. Aseries of neural models are proposed, such as con-volutional neural networks (CNN) (Kalchbrenneret al., 2014) and recurrent neural networks (RN-N) (Mikolov et al., 2010) with or without gatedrecurrent units (GRU) (Cho et al., 2014) and long-short term memory (LSTM) units (Hochreiter andSchmidhuber, 1997). We conduct a pilot study todesign neural network structures for couplet gen-eration problems. For the first time, we propose apolishing schema for the couplet generation pro-cess, and combine it with the attention mechanismto satisfy the couplet constraints, which is novel.

3 Overview

The basic idea of the Chinese couplet generation isto build a hidden representation of the anteceden-t clause, and then generate the subsequent clauseaccordingly, shown in Figure 2. In this way, oursystem works in an encoding-decoding manner.The units of couplet generation are characters.

Problem formulation. We define the following

2349

Page 4: Chinese Couplet Generation with Neural Network Structures · couplet so as to illustrate that each character from the same position of the two clauses has the con-straint of certain

formulations:• Input. Given the antecedent clause A =

{x1, x2, . . . , xm}, xi ∈ V , where xi is a characterand V is the vocabulary, we then learn an abstrac-tive representation of the antecedent clause A.• Output. We generate a subsequent clause S =

{y1, y2, . . . , ym} according to A, which indicatessemantic coherence. We have yi ∈ V . To be morespecific, each character yi in S is coordinated withthe corresponding character xi in A, which is de-termined by the couplet constraint.

As mentioned, we encode the input clause as ahidden vector, and then decode the vector into anoutput clause so that the two clauses are actuallya pair of couplets. Since we have special charac-teristics for couplet generation, we propose differ-ent neural models for different concerns. The pro-posed models are extended incrementally so thatthe final model would be able to tackle complicat-ed issues for couplet generation. We first introducethese neural models from a high level description,and then elaborate them in more details.

Sequential Couplet Generation. The model ac-cepts the input clause. We use a recurrent neu-ral network (RNN) over characters to capture themeaning of the clause. Thus we obtain a singlevector which represents the antecedent clause. Wethen use another RNN to decode the input vectorinto the subsequent clause by the character-wisegeneration. Basically, the process is a sequence-to-sequence generation via encoding and decod-ing, which is based on the global level of theclause. We show the diagram of sequential cou-plet generation in Figure 2(a).

Couplet Generation with Attention. There isa special phenomenon within a pair of couplets:the characters from the same position in the an-tecedent clause and subsequent clause, i.e., xi andyi, generally have some sort of relationships suchas “coupling” or “pairing”. Hence we ought tomodel such one-to-one correlation between xi andyi in the neural model for couplet generation. Re-cently, the attention mechanism is proposed to al-low the decoder to dynamically select and linearlycombine different parts of the input sequence withdifferent weights. Basically, the attention mecha-nism models the alignment between positions be-tween inputs and outputs, so it can be viewed as alocal matching model. Moreover, the tonal codingissue can also be addressed by the pairwise atten-tion mechanism. The extension of attention mech-

Figure 3: Couplet generation via sequential lan-guage modeling: plain neural couplet machine.

anism to the sequential couplet generation modelis shown in Figure 2(b).

Polishing Schema for Generation. Couplet gen-eration is a form of art, and art usually requirespolishing. Unlike the traditional single-pass gen-eration in previous neural models, our proposedcouplet generator will be able to polish the gener-ated couplets for one or more iterations to refinethe wordings. The model is essentially the sameas the sequential generation with attention excep-t that the information representation of the previ-ous generated clause draft will be again utilized asan input, serving as additional information for se-mantic coherence. The principle is illustrated inFigure 2(c): the generated draft from the previousiteration will be incorporated into the hidden statewhich generates the polished couplet pair in thenext iteration.

To sum up, we introduce three neural models forChinese couplet generation. Each revised modeltargets at tackling an issue for couplet generationso that the system could try to imitate a humancouplet generator. We further elaborate these neu-ral models incrementally in details in Section 4.

4 Neural Generation Models

4.1 Sequential Couplet Generation

The sequential couplet generation model is basi-cally a sequence-to-sequence generation fashion(Sutskever et al., 2014) using encoding and decod-ing shown in Figure 3. We use a recurrent neu-ral network (RNN) to iteratively pick up informa-tion over the character sequence x1, x2, . . . , xm ofthe input antecedent clause A. All characters arevectorized using their embeddings (Mikolov et al.,2013). For each character, the RNN allocates ahidden state si, which is dependent on the curren-t character’s embedding xi and the previous statesi−1. Since usually each clause in the couplet pairwould not be quite long, it is sufficient to use avanilla RNN with basic interactions.

2350

Page 5: Chinese Couplet Generation with Neural Network Structures · couplet so as to illustrate that each character from the same position of the two clauses has the con-straint of certain

Figure 4: Couplet generation with attention mech-anism, namely attention neural couplet machine.Attention signal is generated by both encoder anddecoder, and then fed into the coupling vector.Calculation details are elaborated in Section 4.2.

The equation for encoding is as follows:

si = f(Whsi−1 + Wxxi + b) (1)

x is the vector representation (i.e., embedding)of the character. W and b are parameters forweights and bias. f(·) is the non-linear activa-tion function and we use ReLU (Nair and Hinton,2010) in this paper. As for the hidden state hi inthe decoding RNN, we have:

hi = f(Wxxi−1 + Whhi−1) (2)

4.2 Couplet Generation with AttentionAs mentioned, there is special phenomenon in thecouplet pair that the characters from the same po-sition in the antecedent clause and the subsequentclause comply with certain relatedness, so that twoclauses may, to some extent, look “symmetric”.

Hence we introduce the attention mechanisminto the couplet generation model. The atten-tion mechanism coordinates, either statically ordynamically, different positions of the input se-quence (Shang et al., 2015). To this end, we intro-duce a hidden coupling vector ci =

∑mj=1 αijsj .

The coupling vectors linearly combine all partsfrom the antecedent clause, and determine whichpart should be utilized to generate the charactersin the subsequent clause. The attention signal αij

can be calculated as αij = σatt(sj ,hi−1) after asoftmax function. The score is based on how wellthe inputs from position j and the output at posi-tion i match. σatt(·) is parametrized as a neuralnetwork which is jointly trained with all the othercomponents (Bahdanau et al., 2015; Hermann etal., 2015). This mechanism enjoys the advantage

Figure 5: Couplet generation with the polishingschema, i.e., full neural couplet machine. Notethat for conciseness, we only show the gist of thisschema across polishing iterations. The shadedcircles are the hidden vectors to generate charac-ters in the subsequent clause. We omit the dupli-cated sequential and attention dependencies withineach iteration as we have shown in Figures 3 & 4.

of adaptively focusing on the corresponding char-acters of the input text according to the generatedcharacters in the subsequent clause. The mecha-nism is pictorially shown in Figure 4.

With the coupling vectors generated, we havethe following equation for the decoding processwith attention mechanism:

hi = f(Wxxi−1 + Whhi−1 + Wcci) (3)

4.3 Polishing Schema for Generation

Inspired by the observation that a human coupletgenerator might recompose the clause for sever-al times, we propose a polishing schema for thecouplet generation. Specifically, after a single-pass generation, the couplet generator itself shallbe aware of the generated clause as a draft, so thatpolishing each and every character of the clausebecomes possible.

We hereby propose a convolutionary neural net-work (CNN) based polishing schema shown inFigure 5. The intuition for convolutionary struc-ture is that this polishing schema guarantees bettercoherence: with the batch of neighboring charac-ters, the couplet generator knows which characterto generate during the revision process.

A convolutional neural network applies a fixed-size window to extract local (neighboring) patternsof successive characters. Suppose the window isof size t, the detected features at a certain position

2351

Page 6: Chinese Couplet Generation with Neural Network Structures · couplet so as to illustrate that each character from the same position of the two clauses has the con-straint of certain

xi, · · · , xi+t−1 is given by

o(n)i = f(W [h(n)

i ; · · · ;h(n)i+t−1] + b) (4)

Here h(n) with the superscript n is the hiddenvector representation from the n-th iteration. Wand b are parameters for convolution. Semicolonsrefer to column vector concatenation. Also, f(·) isthe non-linear activation function and we use Re-LU (Nair and Hinton, 2010) as well. Note thatwe pad zero at the end of the term if a characterdoes not have enough following characters to fillthe slots in the convolution window. In this way,we obtain a set of detected features. Then a max-pooling layer aggregates information over differ-ent characters into a fixed-size vector.

Now the couplet generation with both attentionmechanism and polishing schema becomes:

h(n+1)i = f(Wxxi−1 + Whh

(n+1)i−1

+ Wcc(n+1)i + Woo

(n)i )

(5)

Note that in this way ,we feed the informationfrom the n-th generation iteration into the (n+1)-th polishing iteration. For the iterations, we havethe stopping criteria as follows.• After each iteration process, we have the sub-

sequent clause generated; we encode the clause ash using the RNN encoder using the calculationshown in Equation (1). We stop the algorithm iter-ation when the cosine similarity between the twoh(n+1) and h(n) from two successive iterations ex-ceeds a threshold ∆ (∆ = 0.5 in this study).• Ideally, we shall let the algorithm converge by

itself. There will always be some long-tail cases.To be practical, it is necessary to apply a termi-nation schedule when the generator polishes formany times. We stop the couplet generator aftera fixed number of recomposition. Here we em-pirically set the threshold as 5 times of polishing,which means 6 iterations in all.

5 Experiments and Evaluations

5.1 Experimental Setups

Datasets. A large Chinese couplet corpus is nec-essary to learn the model for couplet generation.There is, however, no large-sized pure couplet col-lection available (Jiang and Zhou, 2008). As men-tioned, generally people regard Chinese couplet-s as a reduced form of Chinese poetry and thereare several large Chinese poem datasets publicly

Table 1: Detailed information of the datasets.Each pair of couplets consist of two clauses.

#Pairs #CharacterTANG Poem 26,833 6,358SONG Poem 11,324 3,629

Couplet Forum 46,959 8,826

available, such as Poems of Tang Dynasty (i.e.,Tang Poem) and Poems of Song Dynasty (i.e.,Song Poem). It becomes a widely acceptable ap-proximation to mine couplets out of existing po-ems, even though poems are not specifically in-tended for couplets2 (Jiang and Zhou, 2008; Yanet al., 2013; He et al., 2012). We are able to minesuch sentence pairs out of the poems and filter-ing those do not conform to couplet constraints,which is a similar process mentioned in (Jiang andZhou, 2008). Moreover, we also crawl coupletsfrom couplet forums where couplet fans discuss,practice and show couplet works. We performedstandard Chinese segmentation into characters.

In all, we collect 85,116 couplets. We random-ly choose 2,000 couplets for validation and 1,000couplets for testing, other non-overlap ones fortraining. The details are shown in Table 1.

Hyperparameters and Setups. Word embed-dings (Mikolov et al., 2013) are a standard appa-ratus in neural network-based text processing. Aword is mapped to a low dimensional, real-valuedvector. This process, known as vectorization, cap-tures some underlying meanings. Given enoughdata, usage, and context, word embeddings canmake highly accurate guesses about the meaningof a particular word. Embeddings can equivalent-ly be viewed that a word is first represented as aone-hot vector and multiplied by a look-up table(Mikolov et al., 2013). In our model, we first vec-torize all words using their embeddings. Here weused 128-dimensional word embeddings throughvectorization, and they were initialized random-ly and learned during training. We set the widthof convolution filters as 3. The above parameterswere chosen empirically.

Training. The objective for training is the crossentropy errors of the predicted character distribu-tion and the actual character distribution in our

2For instance, in the 4-sentence poetry (namely quatrain,i.e., 绝句 in Chinese), the 3rd and 4th sentences are usual-ly paired; in the 8-sentence poetry (namely regulated verse,i.e., 律诗 in Chinese), the 3rd-4th and 5th-6th sentences aregenerally form pairs which satisfy couplet constraints.

2352

Page 7: Chinese Couplet Generation with Neural Network Structures · couplet so as to illustrate that each character from the same position of the two clauses has the con-straint of certain

corpus. An ℓ2 regularization term is also addedto the objective. The model is trained with backpropagation through time with the length being thetime step. The objective is minimized by stochas-tic gradient descent with shuffled mini-batches(with a mini-batch size of 100) for optimization.During training, the cross entropy error of the out-put is back-propagated through all hidden layers.Initial learning rate was set to 0.8, and a multi-plicative learning rate decay was applied. We usedthe validation set for early stopping. In practice,the training converges after a few epochs.

5.2 Evaluation MetricsIt is generally difficult to judge the effect of cou-plets generated by computers. We propose to eval-uate results from 3 different evaluation metrics.

Perplexity. For most of the language generationresearch, language perplexity is a sanity check.Our first set of experiments involved intrinsic eval-uation of the “perplexity” evaluation for the gen-erated couplets. Perplexity is actually an entropybased evaluation. In this sense, the lower perplex-ity for the couplets generated, the better perfor-mance in purity for the generations, and the cou-plets are likely to be good. m denotes the length.

pow[2,− 1

m

m∑i=1

log p(yi)]

BLEU. The Bilingual Evaluation Understudy(BLEU) score-based evaluation is usually used formachine translation (Papineni et al., 2002): giventhe reference translation(s), the algorithm evalu-ates the quality of text which has been machine-translated from the reference translation as groundtruth. We adapt the BLEU evaluation under thecouplet generation scenario. Take a couplet fromthe dataset, we generate the computer authoredsubsequent clause given the antecedent clause, andcompare it with the original subsequent clausewritten by humans. There is a concern for suchan evaluation metric is that BLEU score can onlyreflect the partial capability of the models; there is(for most cases) only one ground truth for the gen-erated couplet but actually there are more than oneappropriate ways to generate a well-written cou-plet. The merit of BLEU evaluation is to examinehow likely to approximate the computer generatedcouplet towards human authored ones.

Human Evaluation. We also include humanjudgments from 13 evaluators who are graduate s-tudents majoring in Chinese literature. Evaluators

are requested to express an opinion over the au-tomatically generated couplets. A clear criterionis necessary for human evaluation. We use theevaluation standards discussed in (Wang, 2002;Jiang and Zhou, 2008; He et al., 2012; Yan et al.,2013; Zhang and Lapata, 2014): “syntactic” and“semantic” satisfaction. For the syntactic side, e-valuators consider whether the subsequent claus-es conform the length restriction and word pairingbetween the two clauses. For a higher level of se-mantic side, evaluators then consider whether thetwo clauses are semantically meaningful and co-herent. Evaluators assign 0-1 scores for both syn-tactic and semantic criteria (‘0’-no, ‘1’- yes). Theevaluation process is conducted as a blind-review3

5.3 Algorithms for Comparisons

We implemented several generation methods asbaselines. For fairness, we conduct the same pre-generation process to all algorithms.

Standard SMT. We adapt the standard phrase-based statistical machine translation method(Koehn et al., 2003) for the couplet task, which re-gards the antecedent clause as the source languageand the subsequent clause as the target language.

Couplet SMT. Based on SMT techniques, aphrase-based SMT system for Chinese coupletgeneration is proposed in (Jiang and Zhou,2008), which incorporates extensive couplet-specific character filtering and re-rankings.

LSTM-RNN. We also include a sequence-to-sequence LSTM-RNN (Sutskever et al., 2014).LSTM-RNN is basically a RNN using the LSTMunits, which consists of memory cells in order to s-tore information for extended periods of time. Forgeneration, we first use an LSTM-RNN to encodethe given antecedent sequence to a vector space,and then use another LSTM-RNN to decode thevector into the output sequence.

Since Chinese couplet generation can be viewedas a reduced form of Chinese poetry generation,we also include some approaches designed for po-etry generation as baselines.

iPoet. Given the antecedent clause, the iPoetmethod first retrieves relevant couplets from the

3We understand that acceptability is a gradable concept,especially for the less subjective tasks. Here from our expe-rience, to grade the ”yes”-”no” acceptability is more feasiblefor the human evaluators to judge (with good agreement). Asto couplet evaluation, it might be more difficult for the eval-uators to say ”very acceptable” or ”less acceptable”. We willtry to make scale-based evaluation as the future work.

2353

Page 8: Chinese Couplet Generation with Neural Network Structures · couplet so as to illustrate that each character from the same position of the two clauses has the con-straint of certain

Table 2: Overall performance comparison against baselines.

Algorithm Perplexity BLEUHuman Evaluation

Syntactic Semantic OverallStandard SMT (Koehn et al., 2003) 128 21.68 0.563 0.248 0.811

Couplet SMT (Jiang and Zhou, 2008) 97 28.71 0.916 0.503 1.419LSTM-RNN (Sutskever et al., 2014) 85 24.23 0.648 0.233 0.881

iPoet (Yan et al., 2013) 143 13.77 0.228 0.435 0.663Poetry SMT (He et al., 2012) 121 23.11 0.802 0.516 1.318

RNNPG (Zhang and Lapata, 2014) 99 25.83 0.853 0.600 1.453Neural Couplet Machine (NCM) 68 32.62 0.925 0.631 1.556

corpus, and then summarizes the retrieved cou-plets into a single clause based on a generativesummarization framework (Yan et al., 2013).

Poetry SMT. He et al. (2012) extend the cou-plet SMT method into a poetry-oriented SMT ap-proach, with different focus and different filteringfor different applications from Couplet SMT.

RNNPG. The RNN-based poem generator (RN-NPG) is proposed to generate a poem (Zhang andLapata, 2014), and we adapt it into the coupletgeneration scenario. Given the antecedent clause,the subsequent clause is generated through thestandard RNN process with contextual convolu-tions of the given antecedent clause.

Neural Couplet Machine (NCM). We proposethe neural generation model particularly for cou-plets. Basically we have the RNN based encoding-decoding process with attention mechanism andpolishing schema. We demonstrate with the bestperformance of all NCM variants proposed here.

5.4 Performance

In Table 2 we show the overall performance ofour proposed NCM system compared with strongcompeting methods as described above. We seethat, for perplexity, BLEU and human judgments,our system outperforms other baseline models.

The standard SMT method manipulates char-acters according to the dataset by standard trans-lation but ignores all couplet characteristics inthe model. The Couplet SMT especially estab-lished for couplet generation performs much bet-ter than the general SMT method since it incorpo-rates several filtering with couplet constraints. Asa strongly competitive baseline of the neural mod-el LSTM-RNN, the perplexity performance getsboosted in the generation process, which indicatesthat neural models show strong ability for lan-guage generation. However, there is a major draw-

back that LSTM-RNN does not explicitly mod-el the couplet constraints such as length restric-tions and so on for couplet pairs. LSTM-RNN isnot really a couplet-driven generation method andmight not capture the corresponding patterns be-tween the antecedent clause and subsequent clausewell enough to get a high BLEU score.

For the group of algorithms originally proposedfor poetry generation, we have summarization-based poetry method iPoet, translation-based poet-ry method Poetry SMT and a neural network basedmethod RNNPG. In general, the summarizationbased poetry method iPoet does not perform wellin either perplexity or BLEU evaluation: sum-marization is not an intuitive way to model andcapture the pairwise relationship between the an-tecedent and subsequent clause within the coupletpair. Poetry SMT performs better, indicating thetranslation-based solution makes more sense forcouplet generation than summarization methods.RNNPG is a strong baseline which applies bothneural network structures, while the insufficiencylies in the lack of couplet-oriented constraints dur-ing the generation process. Note that all poetry-oriented methods show worse performance thanthe couplet SMT method, indicating that coupletconstraints should be specially addressed.

We hence introduce the neural couplet machinebased on neural network structures specially de-signed for couplet generation. We incorporateattention mechanism and polishing schema intothe generation process. The attention mechanismstrengthens the coupling characteristics betweenthe antecedent and subsequent clause and the pol-ishing schema enables the system to revise and re-fine the generated couplets, which leads to betterperformance in experimental evaluations.

For evaluations, the perplexity scores andBLEU scores show some consistency. Besides, we

2354

Page 9: Chinese Couplet Generation with Neural Network Structures · couplet so as to illustrate that each character from the same position of the two clauses has the con-straint of certain

Figure 6: Performance comparison of all variantsin the neural couplet machine family.

observe that the BLEU scores are quite low for al-most all methods. It is not surprising that thesemethods are not likely to generate the exactly samecouplets as the ground truth, since that is not howthe objective function works. BLEU can only par-tially calibrate the capability of couplet generationbecause there are many ways to create coupletswhich do not look like the ground truth but alsomake sense to people. Although quite subjective,the human evaluations in Table 2 can to some ex-tent show the potentials of all couplet generators.

5.5 Analysis and Discussions

There are two special strategies in the proposedneural model for couplet generation: 1) attentionmechanism and 2) polishing schema. We henceanalyze the separate contributions of the two com-ponents in all the neural couplet machine variants.We have the NCM-Plain model with no attentionor polishing strategy. We incrementally add theattention mechanism as NCM-Attention, and thenadd the polishing schema as NCM-Full. The threeNCM variants correspond to the three models pro-posed in this paper. Besides, for a complete com-parison, we also include the plain NCM integratedwith polishing schema but without attention mech-anism, namely NCM-Polishing.

The results are shown in Figure 6. We cansee that NCM-Plain shows the weakest perfor-mance, with no strategy tailored for couplet gen-eration. An interesting phenomenon is that NCM-Attention has better performance in BLEU scorewhile NCM-Polishing performs better in terms ofperplexity. We conclude that attention mechanis-m captures the pairing patterns between the twoclauses, and the polishing schema enables betterwordings of semantic coherence in the couplet af-

Figure 7: The distribution of stopping iterationcounts for all test data. Note that 6 iterations ofgeneration means 5 times of polishing.

ter several revisions. The two strategies addressdifferent concerns for couplet generation, henceNCM-Full performs best.

We also take a closer look at the polishingschema proposed in this paper, which enables amulti-pass generation. The couplet generator cangenerate a subsequent clause utilizing additionalinformation from the generated subsequent clausefrom the last iteration. It is a novel insight againstprevious methods. The effect and benefits of thepolishing schema is demonstrated in Figure 6. Wealso examine the stopping criteria, shown in Fig-ure 7. In general, most of the polishing processstops after 2-3 iterations.

6 Conclusions

The Chinese couplet generation is a difficult taskin the field of natural language generation. Wepropose a novel neural couplet machine to tacklethis problem based on neural network structures.Given an antecedent clause, we generate a subse-quent clause to create a couplet pair using a se-quential generation process. The two innovativeinsights are that 1) we adapt the attention mech-anism for the couplet coupling constraint, and 2)we propose a novel polishing schema to refine thegenerated couplets using additional information.

We compare our approach with several base-lines. We apply perplexity and BLEU to evaluatethe performance of couplet generation as well ashuman judgments. We demonstrate that the neuralcouplet machine can generate rather good coupletsand outperform baselines. Besides, both attentionmechanism and polishing schema contribute to thebetter performance of the proposed approach.

2355

Page 10: Chinese Couplet Generation with Neural Network Structures · couplet so as to illustrate that each character from the same position of the two clauses has the con-straint of certain

Acknowledgments

We thank all the anonymous reviewers for theirvaluable and constructive comments. This pa-per is partially supported by the National Nat-ural Science Foundation of China (NSFC GrantNumbers 61272343, 61472006), the DoctoralProgram of Higher Education of China (GrantNo. 20130001110032) as well as the Nation-al Basic Research Program (973 Program No.2014CB340405).

ReferencesDzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-

gio. 2015. Neural machine translation by jointlylearning to align and translate. International Con-ference on Learning Representations.

Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bah-danau, and Yoshua Bengio. 2014. On the propertiesof neural machine translation: Encoder-decoder ap-proaches. arXiv preprint arXiv:1409.1259.

Erica Greene, Tugba Bodrumlu, and Kevin Knight.2010. Automatic analysis of rhythmic poetry withapplications to generation and translation. In Pro-ceedings of the 2010 Conference on Empirical Meth-ods in Natural Language Processing, EMNLP’10,pages 524–533.

Jing He, Ming Zhou, and Long Jiang. 2012. Generat-ing chinese classical poems with statistical machinetranslation models. In Twenty-Sixth AAAI Confer-ence on Artificial Intelligence, pages 1650–1656.

Karl Moritz Hermann, Tomas Kocisky, EdwardGrefenstette, Lasse Espeholt, Will Kay, Mustafa Su-leyman, and Phil Blunsom. 2015. Teaching ma-chines to read and comprehend. In Advances in Neu-ral Information Processing Systems, pages 1684–1692.

Sepp Hochreiter and Jurgen Schmidhuber. 1997.Long short-term memory. Neural computation,9(8):1735–1780.

Long Jiang and Ming Zhou. 2008. Generating chi-nese couplets using a statistical mt approach. InProceedings of the 22nd International Conferenceon Computational Linguistics - Volume 1, COLING’08, pages 377–384.

Nal Kalchbrenner, Edward Grefenstette, and Phil Blun-som. 2014. A convolutional neural networkfor modelling sentences. arXiv preprint arX-iv:1404.2188.

Philipp Koehn, Franz Josef Och, and Daniel Mar-cu. 2003. Statistical phrase-based translation. InProceedings of the 2003 Conference of the North

American Chapter of the Association for Computa-tional Linguistics on Human Language Technology-Volume 1, pages 48–54. Association for Computa-tional Linguistics.

R. Manurung, G. Ritchie, and H. Thompson. 2011.Using genetic algorithms to create meaningful poet-ic text. Journal of Experimental & Theoretical Arti-ficial Intelligence, 24(1):43–64.

H. Manurung. 2004. An evolutionary algorithm ap-proach to poetry generation. University of Edin-burgh. College of Science and Engineering. Schoolof Informatics.

Tomas Mikolov, Martin Karafiat, Lukas Burget, JanCernocky, and Sanjeev Khudanpur. 2010. Recur-rent neural network based language model. In IN-TERSPEECH, volume 2, page 3.

Tomas Mikolov, Kai Chen, Greg Corrado, and Jef-frey Dean. 2013. Efficient estimation of word rep-resentations in vector space. arXiv preprint arX-iv:1301.3781.

Vinod Nair and Geoffrey E Hinton. 2010. Rectifiedlinear units improve restricted boltzmann machines.In Proceedings of the 27th International Conferenceon Machine Learning (ICML-10), pages 807–814.

Yael Netzer, David Gabay, Yoav Goldberg, andMichael Elhadad. 2009. Gaiku: generating haikuwith word associations norms. In Proceedings of theWorkshop on Computational Approaches to Linguis-tic Creativity, CALC ’09, pages 32–39.

H. Oliveira. 2009. Automatic generation of poetry: anoverview. Universidade de Coimbra.

H.G. Oliveira. 2012. Poetryme: a versatile platfor-m for poetry generation. Computational Creativity,Concept Invention, and General Intelligence, 1:21.

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic e-valuation of machine translation. In Proceedings ofthe 40th annual meeting on association for compu-tational linguistics, pages 311–318. Association forComputational Linguistics.

Lifeng Shang, Zhengdong Lu, and Hang Li. 2015.Neural responding machine for short-text conver-sation. In Proceedings of the 53rd Annual Meet-ing of the Association for Computational Linguisticsand the 7th Interational Joint Conference on Natu-ral Language Processing, ACL-IJCNLP’15, pages1577–1586.

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014.Sequence to sequence learning with neural network-s. In Advances in neural information processing sys-tems, pages 3104–3112.

N. Tosa, H. Obara, and M. Minoh. 2008. Hitchhaiku: An interactive supporting system for compos-ing haiku poem. Entertainment Computing-ICEC2008, pages 209–216.

2356

Page 11: Chinese Couplet Generation with Neural Network Structures · couplet so as to illustrate that each character from the same position of the two clauses has the con-straint of certain

Li Wang. 2002. A summary of rhyming constraints ofchinese poems. Beijing Press.

X. Wu, N. Tosa, and R. Nakatsu. 2009. New hitchhaiku: An interactive renku poem composition sup-porting tool applied for sightseeing navigation sys-tem. Entertainment Computing–ICEC 2009, pages191–196.

Rui Yan, Han Jiang, Mirella Lapata, Shou-De Lin, X-ueqiang Lv, and Xiaoming Li. 2013. i, poet: Au-tomatic chinese poetry composition through a gen-erative summarization framework under constrainedoptimization. In Proceedings of the 23rd Interna-tional Joint Conference on Artificial Intelligence, I-JCAI’13, pages 2197–2203.

Rui Yan. 2016. i, poet: Automatic poetry composi-tion through recurrent neural networks with iterativepolishing schema. In Proceedings of the 25th Inter-national Joint Conference on Artificial Intelligence,IJCAI’16.

Xingxing Zhang and Mirella Lapata. 2014. Chinesepoetry generation with recurrent neural networks. InProceedings of Conference on Empirical Methods inNatural Language Processing, pages 670–680.

Kai-Xu Zhang and Mao-Song Sun. 2009. An chi-nese couplet generation model based on statisticsand rules. Journal of Chinese Information Process-ing, 1:017.

Cheng-Le Zhou, Wei You, and Xiaojun Ding. 2010.Genetic algorithm and its implementation of auto-matic generation of chinese songci. Journal of Soft-ware, 21(3):427–437.

2357