Zhongwen Youxi He - University of...

96
Zhongwen Youxi He A dissertation submitted to The University of Manchester for the degree of Master of Science in the Faculty of Engineering and Physical Sciences 2010 By Melville McDonald School of Computer Science

Transcript of Zhongwen Youxi He - University of...

Page 1: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

Zhongwen Youxi He

A dissertation submitted to The University of Manchester for the degree

of Master of Science in the Faculty of Engineering and Physical Sciences

2010

By

Melville McDonald

School of Computer Science

Page 2: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

2

Table of Contents

Table of Contents ................................................................................. 2

Table of Figures .................................................................................... 5

List of Abbreviations ............................................................................. 7

Abstract ............................................................................................... 8

Declaration ........................................................................................... 9

Copyright ........................................................................................... 10

Acknowledgements ............................................................................ 11

Chapter 1: Introduction ...................................................................... 12

1.2 Dissertation Overview ................................................................ 14

Chapter 2: Background ....................................................................... 15

2.1 The Chinese System ................................................................... 15

2.1.1 Overview ............................................................................. 15

2.1.2 Strokes ................................................................................ 16

2.1.3 Radicals .............................................................................. 19

2.1.4 Components ........................................................................ 22

2.1.5 Conclusion .......................................................................... 23

2.2 The Computerised System ......................................................... 23

2.2.1 Database Technology .......................................................... 24

2.2.2 The Editor ........................................................................... 26

2.2.3 Supplementary tools ............................................................ 27

2.2.4 Reflection ............................................................................ 29

Chapter Summary ............................................................................ 30

Chapter 3: Research Methods and Design Considerations .................... 31

3.1 Project Overview ........................................................................ 31

3.2 Project Objectives ...................................................................... 31

3.3 Project Plan ............................................................................... 32

Chapter Summary ............................................................................ 35

Chapter 4: Design ............................................................................... 36

4.1 Database ................................................................................... 36

Page 3: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

3

4.1.1 Division of Work .................................................................. 36

4.1.2 Entity Relationship Model .................................................... 37

4.2 The Editor ................................................................................. 40

4.2.1 The Model ........................................................................... 41

4.2.2 The Controller ..................................................................... 43

4.2.3 The View ............................................................................. 46

4.2.4 The AboutEditor and TextParser .......................................... 48

4.2.4 The SVG Panel ..................................................................... 53

Chapter Summary ............................................................................ 53

Chapter 5: Implementation and Testing .............................................. 54

5.1 The Tools .................................................................................. 54

5.2 The Editor ................................................................................. 54

5.2.1 Radicals Table Test .............................................................. 55

5.2.2 Repository Manager Test ..................................................... 56

5.2.3 View Controller Test ............................................................ 57

5.3 The Visual Elements................................................................... 58

5.3.1 Adding a New Record .......................................................... 60

5.3.2 Updating a record ................................................................ 62

5.3.3 Removing a Record .............................................................. 63

5.3.4 The SVG Panel ..................................................................... 63

5.3.5 The AboutEditor and TextParser .......................................... 64

5.3.6 Discovered Issues ................................................................ 66

5.4 The Database ............................................................................ 66

一 yī : One .................................................................................... 67

丿 piě: Hook or left- falling stroke ................................................. 67

八 bā: Eight .................................................................................. 68

冫 bīng: Ice ................................................................................... 69

卜 bǔ: To Divine ............................................................................. 70

勹 bāo: Wrap ................................................................................ 70

几 jī: Table .................................................................................... 71

门 mén: Door ................................................................................ 72

Page 4: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

4

人 rén: Human being .................................................................... 73

工 gōng: Work .............................................................................. 73

Chapter Summary ............................................................................ 74

Chapter 6: Evaluation.......................................................................... 75

6.1The Editor .................................................................................. 75

6.2The Database ............................................................................. 77

Chapter Summary ............................................................................ 78

Chapter 7: Conclusions and Future Work ............................................. 79

Chapter Summary ............................................................................ 80

References ......................................................................................... 81

Appendix ........................................................................................... 84

TestBase.......................................................................................... 84

RadicalTableTest ............................................................................. 85

RepositoryManagerTest ................................................................... 87

ViewControllerTest .......................................................................... 91

Word count: 13,377

Page 5: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

5

Table of Figures

Figure 1: English Alphabet and Chinese character nǐ (you) ................... 13

Figure 2: Stroke order [3] .................................................................... 17

Figure 3: Top to bottom ...................................................................... 17

Figure 4: Left to right .......................................................................... 18

Figure 5: Horizontal strokes before vertical strokes ............................. 18

Figure 6: Centre strokes before left and right strokes .......................... 18

Figure 7: Outside then inside .............................................................. 19

Figure 8: Fill up before closing ............................................................ 19

Figure 9: The Eight Trigrams [5] .......................................................... 21

Figure 10: Example of component breakdown [3] ................................ 23

Figure 11: Example of an entity relationship for Chinese characters and

components (aggregation) .................................................................. 34

Figure 12: Database entity relationship diagram .................................. 37

Figure 13: Example of the TICCC ........................................................ 38

Figure 14: Example of the TICCC and database fields .......................... 39

Figure 15: Initial Model View Controller design pattern ........................ 40

Figure 16: RadicalTable class .............................................................. 42

Figure 17: RepositoryManager class .................................................... 44

Figure 18: ViewController class ........................................................... 45

Figure 19: Editor Framework ............................................................... 46

Figure 20: MainView and TabView class .............................................. 46

Figure 21:RadicalView input form........................................................ 47

Figure 22: RadicalViewer class diagram ............................................... 48

Figure 23: The AboutEditor ................................................................. 49

Figure 24: AboutEditor class diagram .................................................. 49

Figure 25: Basic flow of control for AboutEditor .................................. 50

Figure 26: The TextParser class diagram ............................................. 52

Figure 27: Radical Table Test passed .................................................. 56

Page 6: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

6

Figure 28: Repository Manager Test passed ........................................ 57

Figure 29: View Controller Test passed ............................................... 58

Figure 30: Main View .......................................................................... 59

Figure 31: RadicalViewer input screen ................................................. 60

Figure 32: Adding a new record .......................................................... 60

Figure 33: Save confirmation .............................................................. 61

Figure 34: Save confirmation code ...................................................... 61

Figure 35: New record added .............................................................. 62

Figure 36: No radical to update ........................................................... 62

Figure 37: Remove from database confirmation .................................. 63

Figure 38: Setup of the SVG Canvas .................................................... 63

Figure 39: The SVG Panel and SVG Field .............................................. 64

Figure 40: AboutEditor rawEditor pane input ....................................... 65

Figure 41: The parsed text .................................................................. 65

Page 7: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

7

List of Abbreviations

TICCC – Table of Indexing Chinese Character Components [12]

DBMS – Database Management System

JDBC - Java Database Connectivity

JPA – Java Persistence Architecture

SVG – Scalable Vector Graphics

XML – Extensible Mark-up Language

HTML – Hyper Text Mark-up Language

SQL – Structured Query Language

GB - Gigabyte

TB - Terabyte

API – Application Programming Interface

SAX - Simple API for XML

VB.NET – Visual basic.NET

URL – Uniform Resource Locator

MVC – Model - View – Controller Architecture

Page 8: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

8

Abstract

As an initial stage of the Zhongwen Youxi He Project this section aims to

look into the foundations of building a tool to allow a native English

speaker to learn about Chinese characters. The tool will be composed

of a database store of the characters, and their main components,

phonetics and radicals much like the official Chinese indexing and

classification system found in literature and the Chinese Dictionary and

the “Table for Indexing Chinese Character Component”(TICCC). The

project will also investigate the architecture of storage and indexing of

the radicals and ways to infer semantics links between their radicals and

the characters and components.

Page 9: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

9

Declaration

No portion of the work referred to in the dissertation has been

submitted in support of an application for another degree or

qualification of this or any other university or other institute of

learning;

Page 10: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

10

Copyright

i. Copyright in text of this dissertation rests with the Author. Copies (by

any process) either in full, or of extracts, may be made only in

accordance with instructions given by the Author. Details may be

obtained from the appropriate Graduate Office. This page must form

part of any such copies made. Further copies (by any process) of copies

made in accordance with such instructions may not be made without the

permission (in writing) of the Author.

ii. The ownership of any intellectual property rights which may be

described in this thesis is vested in the University of Manchester, subject

to any prior agreement to the contrary, and may not be made available

for use by third parties without the written permission of the University,

which will prescribe the terms and conditions of any such agreement.

iii. Further information on the conditions under which disclosures and

exploitation may take place is available from the Head of the School of

Computer Science.

Page 11: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

11

Acknowledgements

I would like to thank my supervisor Dr Richard Banach who

masterminded this project. His guidance helped me to understand the

scope of this project and its potential benefits, but also helped me to

overcome a number of difficulties throughout the course of the project.

I would also like to thank my family; my mother, father and brothers

who have supported and encouraged me in my pursuit of further

education. None of this would have been possible without their help.

Page 12: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

12

Chapter 1: Introduction

When a person attempts to learn another language they must overcome

the challenge of learning new words, grammar systems, and completely

different alphabets. The Chinese and English writing systems differ

drastically, from the direction of reading to the composition of words. It

is a substantial task for an English speaker to learn this new system as it

originated from a completely different cultural perspective. It is this

perspective, the native language and educational system which

influences a person‟s learning processes.

The English and Chinese writing systems are fundamentally different,

with English employing an alphabet, and the Chinese using a

logographic system. An alphabetic system has letters which constitute

phonemes or sounds. These letters usually have no meaning

individually. However in the case of English, an alphabet of 26

meaningless monosyllabic letters can be combined in various legal

permutations to create multisyllabic phonetic and semantics

(words/sounds). Words from this system have the property of a user

being able to decompose them to identify the letters or even spell them

according to how they sound. The Chinese system is composed of core

components which can create monosyllabic words. These words can

also be combined to form new words though they require some

knowledge or shared basic concepts in order to be understood. In

Chinese there are some characters which are pictograms and can be

identified by their similarity to real world objects, but others are not.

Page 13: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

13

Figure 1 shows the entire English alphabet of 26 letters. The letters can

be memorised so that the speaker can combine them to form the word

“you”, whereas the Chinese character nǐ is a logogram which requires the

user to know the character to recognise its meaning.

The Chinese language also has the concept of tones, which in English

are used to convey an emotional context such as inflection for

questions. For example “You have one” with “You have one?” The first

is a statement, possibly a reply usually given with a flat tone. The

intonation of the second however suggests it is a question with the pitch

rising near the end. (Incidentally this could be an example of a very

short conversation). With Chinese the word‟s meaning is dependent on

the phonemes which give the words its sound or pronunciation. But also

the tone associated with it. As a result there are formally documented

tones and tonal signs in the written language.

These differences also extend to the indexing systems of the two

languages, with Chinese being indexed by the radical system in many

dictionaries, and English being indexed alphabetically by starting letters.

The scarcity of reliable information makes it more difficult for a native

Figure 1: English Alphabet and Chinese character nǐ (you)

Page 14: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

14

English person to learn about the Chinese system. Recently however the

Chinese Government Published an official Table of Indexing Chinese

Character Components, (TICCC). The main aim for this stage of the

project is to create the foundations of an indexing system for the

Chinese characters which can be further advanced to create a tool to aid

an English speaker to learn about the Chinese characters. This system

will be built using the officially published character data. In addition to

this the project aims to investigate ways to allow links to be inferred

semantically between the components and the radicals.

1.2 Dissertation Overview

The dissertation structure will be roughly analogous to the chronological

development of the project:

Chapter 2 will introduce the background to the project. It will

discuss the Chinese writing system as well as the history and

development the Chinese character radicals with some of the

research surrounding technologies which may be suitable to the

current project.

Chapter 3 will discuss other project considerations with regard to

the main project objectives, including my contribution and the

basic project plan.

Chapter 4 will describe my contribution to the design process for

the database and the editor.

Chapter 5 outlines the implementation and testing of the project.

Chapter 6 evaluates the success of the project with suggestion of

possible improvements.

Chapter 7 concludes the project and suggests possibilities for the

future of the Zhongwen Youxi He project.

Page 15: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

15

Chapter 2: Background

This chapter will explore the fundamentals of the Chinese writing

system as well as the history and development of the Chinese character

radical indexing system. This chapter will also outline some of the

research conducted with regard to tools which could be used for the

development of this project.

2.1 The Chinese System

2.1.1 Overview

The modern Chinese writing system uses Han Characters (Hànzì). This

logographic system contains over 50,000 characters. The characters

have evolved from pictograms and hieroglyphs, to the more abstract

ideographs and the characters known today. At the same time

phonetics were beginning to be included in the character structure [3].

The sources of the modern characters however are a mixture of new and

old characters composed mostly of pictograms, ideograms, phonetic

loans and some phonetic-semantic components, there are also with

some different characters which have their meaning due to regional

differences. They can be divided into two very broad categories; simple

and compound. The simple characters account for about 4% of the

characters, and contrary to the compound characters are not divisible

[3].

In 1956 the government of The People Republic of China introduced the

first draft of simplified characters. This process was introduced to

simplify the Traditional Chinese characters to in an effort to increase

literacy nationally. According to the government the number of

characters a literate person should know is 2000. Reproducing the

characters is a difficult task however as even the simplified characters

Page 16: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

16

contain diverse shapes. There is a claim that at least five years of

formal schooling would be needed to achieve literacy [3], though in

reality at least 6 years are needed. Most native English speakers would

quite possibly not want to have formal schooling or to have to wait this

long before they felt confident enough to with the language to begin to

use it.

2.1.2 Strokes

The main components of the hànzì characters are the strokes, radicals,

and components. The smallest structural unit is the stroke (similarly to

the English alphabetic system), which represents the action of the brush

or pen on the page. This system of strokes however is more formalised

and a specific technique which are recurrently used for the creation of

all Chinese characters [3] These strokes can be divided into eight main

categories: horizontal (一), vertical, (丨) left-downward, (丿) right-

downward, dot (、), hooks (亅), turning (乛, 乚, 乙) and rising (丶) though

the number of supplementary strokes is 30 [3] but these are only

variants.

Page 17: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

17

Figure 2: Stroke order [3]

There is an order to the strokes for writing Chinese characters [2]:

Figure 3: Top to bottom

Page 18: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

18

Figure 4: Left to right

Figure 5: Horizontal strokes before vertical strokes

Figure 6: Centre strokes before left and right strokes

Page 19: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

19

Figure 7: Outside then inside

Figure 8: Fill up before closing

It is however the radicals and components that form the logical units of

the hànzì system.

2.1.3 Radicals

Radicals are the smallest meaningful unit in the hànzì writing system.

They are used both as independent simple characters and as part of

more complex characters. In modern Chinese dictionaries radicals are

used as section headers (bùshǒu) with characters indexed according to

the radical they most closely match or contain. One of the biggest

issues regarding radicals is that there is no formal and exact way to

describe exactly what they are since there are so many ways to use

them. As a result there is some disagreement as to their exact role and

how they are to be used.

Page 20: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

20

Many Chinese Dictionaries use radicals as section headers, this system

is said to have been introduced with the Shuōwén Jiézì by Xǔ Shèn at

around the 2nd

Century A.D during the Han Dynasty. Xǔ made

distinctions between two types of characters wén 文 (pictograms) and zì

字 (characters). The dictionary contained 9352 characters “as distinct

entries and 1163 in variant form” [4]. They were organised based on

their visual components into headings numbering 540 [4]. This method

though at the time radical in thought has since become the most widely

used system of organisation as it made the process of locating

characters more methodical and convenient [4].

Prior to this dictionaries were organised differently, the earliest known

major character dictionary is the Erya which is argued to have been

created between the 8th

and 2nd

Century BC [4]. It was not created as a

dictionary, it was more of the an encyclopaedia and literary reference,

however “It was the first work to collect arrange and define words in a

systematic fashion” [4], spanning 20 chapters with over 2000 entries

organised into categories such as common terms and kinship terms [4].

This collection however was seen to be too difficult to read, and scholars

felt that the book was less of a reference for consultation and more of a

study text. It was this that led to the creation of the Shuōwén Jiézì.

The idea of organising dictionaries however maybe argued to have

originated much earlier than this even. Chinese lore claims that Chinese

characters were created by a Great Emperor who came across the idea

from observing nature, and how each object seem to fit into a category.

He noted that marks left by the animals could be used as a tool of

lasting communication e.g. claw prints and decided to create the

characters to reflect this. The characters were then placed into eight

categories called the Eight Trigrams. Though these categories were

Page 21: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

21

broad it can be seen that there was some practicality as any text

indexed by this system would have had some search ability.

Figure 9: The Eight Trigrams [5]

The most prominent use of the radical system however was the Kāngxī

Dictionary (Kāngxī zìdiǎn) 18th

Century A.D. Containing nearly 50,000

characters, which was considerably more than that of the Shuōwén Jiézì,

the dictionary was able to reduce the number of radicals to 214. The

dictionary has characters indexed by radicals as well as by the number

of strokes [6] and contains information about variant forms and

pronunciation, and though there have been some obvious changes over

the centuries, these 214 radical characters are still the basis for all

modern radical dictionaries.

Today there are numerous radical indexing systems in use with different

numbers of radicals, sometimes with secondary radicals indexed with

stroke counts. With more than 80,000 characters in the Chinese

language there are many variations [6] such as the TICCC which lists

201 main radicals.

There is some debate as to the extent of the connection a radical has to

the characters under it, and whether there is some semantic relationship

Page 22: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

22

with all of the characters under a certain radical. There an argument that

the term radical implies semantic links since its Latin definition is “root”

[7], and Latin derived languages such as English can be broken up into

“root and termination” [1], and although this does not translate exactly

in to the Chinese system the radical should still be considered “the

meaning part” [1].

“采 cǎi „to pick, pluck‟ is an associative compound] comprising two

elements or components, a hand 爫 (zhǎo or zhuǎ) picking items from a

tree 木 (mù); that is, it is originally a two-part graph” [7].

However the phonetic elements of words need to be taken into

consideration and there is disagreement over how to categorise radicals

and phonetics with semantic-phonetic compounds which have become

increasingly used.

2.1.4 Components

Components seem to have arisen out of a need to “reconstruct

characters into more manoeuvrable units” [3] for the modern age of

computing. The characters are divided into logical units based on the

shape and makeup of the characters. These components are based

purely on the graphic qualities as in a computer system the semantics or

the stroke composition would be of little consequence. The Information

Processing Standard Components for GB 13000.1 Character Set has 560

basic components [3] and The Specification of Common Modern Chinese

Character Components and Component Names list 514. There is some

issue with regard to the representations of the characters and whether

the necessary characters have been included, however this standard will

probably be reviewed or replaced as the character sets change and the

technology for representing them improves.

Page 23: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

23

Figure 10: Example of component breakdown [3]

2.1.5 Conclusion

The system should utilise the most logical system based on the available

documentation and index the radicals in this manner. The most obvious

solution would be to follow the TICCC and the most recent published

character and component data as this is the most accurate and up to

date and the current organisation index will reflect this.

2.2 The Computerised System

The system will comprise two broad parts: A database and an Editor.

The database will be created to store and index characters with, radical,

phonetic, character and component data as well as semantics, and the

editor will be the method for input and retrieval from the database.

Software tools that will allow this will be investigated so that a decision

can be made about the most suitable.

Page 24: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

24

2.2.1 Database Technology

The database will be the main repository for storage and retrieval of

information for the entire project. It is the most fundamental and one of

the most critical areas of the system, and as a result the platform used

needs to be optimal for the current and future advances and the system

architecture will need to be robust. The expected platforms are Linux

and Windows and there are a number of technologies which work on

either. Databases use query languages to allow manipulation of their

data, the current standard for databases is SQL, and there are a number

of implementations in different database systems which have different

features. This is especially true in the case of the free versions which

often limit their functionality or speed in some way. Most of the

implementations here conform to the SQL-92 standard [8].

DB2: This is an IBM created Relational Database Management system

which runs on Linux and Windows. The free version IBM DB2 Express-C1

can be installed for development of database systems for a small

number of users is limited to cores and 2GB memory. It has both a

command line and GUI interface.

Informix: Another IBM owned product, this is similar to DB2 in many

ways except that it is only available for 32bit operating systems, and is

limited to 4GB memory.

H2: This is an open source RDBMS which is written in Java. Using JaQu2

which is a Java Query language it is able to be integrated directly into

Java Applications and boasts an impressive number of features

1

About DB2 Express-C; IBM; http://www-

01.ibm.com/software/data/db2/express/about.html ; Last Accessed 10/05/10

2

H2 Database Engine; H2; http://www.h2database.com/html/jaqu.html ; Last Accessed

10/05/10

Page 25: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

25

compared to some other database platforms including In-Memory

Databases which allows non persisting data to be manipulated. This is

useful for testing and prototyping, and may be an interesting feature to

have for this new system. The platform can run in embedded mode (run

from within the same JVM), server mode (as a server database to client

application) and mixed mode (embedded in application server) which

gives it some flexibility. There also seem to be a number of tutorials for

getting started.

MySQL: “The most popular open source database software”3

is available

for use on both Windows and Linux. Although it does not come with an

integrated GUI third party products and the MySQL Workbench can be

downloaded to provide this functionality. Most programming languages

can interface with it via the Open Database Connectivity API (or the JDBC

for Java). One of the most interesting features of this system apart from

its wide acceptance is that it allows multiple storage engines [11]

allowing different engine technologies to be used to implement

individual tables within a database .e.g. H2 for and Employees Table and

DB2 for a Payroll Table.

Microsoft SQL Server Express Edition: The Microsoft implementation

of SQL4

limits the size of the databases to 4GB and the hardware to a

single CPU with 1GB RAM. It provides native support for XML data and

can manipulate it using XQuery. It provides and easily accessible

backend for applications written in the MS.NET Framework.

3

About MySQL; Oracle; http://www.mysql.com/about/; Last Accessed 10/05/10.

4

Microsoft SQL Server 2008 Express; Microsoft Corporation;

http://www.microsoft.com/sqlserver/2008/en/us/express.aspx ; Last Accessed

10/05/10.

Page 26: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

26

PostgreSQL: Is a PostgreSQL licensed product which in effect makes it

free to use and distribute. It is widely used and available on a range of

platforms including Windows and Linux, and can be used for enterprise

sized systems5

with an operational limit in excess of 4TB of data.

Oracle: The free version of Oracle‟s database limits the size of the

database to 4GB with a single processor with 1GB RAM. It is compatible

with both Windows and supports a number of programming languages

however only in a 32bit environment.

2.2.2 The Editor

Databases as above can be manipulated through command line and

graphical interfaces. For the purposes of this system the tools will need

to store data on Chinese characters, components, phonetics and

radicals. The integrated offerings from the DBMS developers in many

cases may not be flexible enough to handle this character data fully. As

a result an editor will be created which can allow creation, manipulation

and viewing of the character data from within the database. This tool

can be written in a number of languages such as:

Java: This would be compatible with many through the JDBC and can be

fully integrated with the H2 implementation. As an object oriented

language with a wide acceptance it is well matured has many useful

features in its API such as the DOM3 and SAX database views. It is also

widely available in the university, and is compatible with a number of

environments such as Windows and Linux meaning the implementations

of any application should have to be changed little if at all between

platforms.

5

PostgreSQL; PostgreSQL Global Development Group

http://www.postgresql.org/about/ ; Last Accessed 10/05/10.

Page 27: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

27

Microsoft.NET Framework: This Microsoft family of languages including

VB.NET and C# can interface with databases and web applications

through a framework of base libraries. The framework is in theory cross

platform compatible although this is not quite as simple as with a Java

implementation. An editor application created with this framework

would however be very easy to integrate with MS SQL Server database

implementations.

Web Ontology Language6

: Otherwise known as OWL is a family of

languages and tools which allow data to be serialised and for semantic

conclusions to be drawn. The technology uses the idea of axioms and

assertions, in which axioms (rules) categorise the data into related

groups based on these assertions. Web based technologies and

standards such as XML/RDF are used to encode meaning in data which

can later be inferred.

2.2.3 Supplementary tools

In some cases the editor language may need an interface to access the

database. There are a number of methods to enable this connectivity.

The better known of these include

JDBC7

: The JDBC provides connectivity for java applications to different

types of databases including relational databases, allowing for SQL

based data access. It is widely used and provides database

implementation independence and flexibility.

6

OWL Web Ontology Language Guide; World Wide Web Consortium;

http://www.w3.org/TR/owl-guide/ Last Accessed 10/05/10.

7

Java SE Technologies – Database; Oracle Corporation;

http://www.oracle.com/technetwork/java/javase/tech/index-jsp-136101.html

Page 28: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

28

JPA8

: As part of the Object-Relational Mapping (ORM) framework, the JPA

provides and interface for an ORM implementation to interact with a

database. The specification allows for the data source properties of a

database to be abstracted into a configuration file, which when packaged as

part of an application makes referencing and configuration a more

centralised process.

The JPA specification maps database tables to java classes called entities

which can then be manipulated as java objects. This allows a java

application to interact with these entities to update the database. This

specification requires an implementation to manage the interface

between the java application, entities and the database. Alternative

implmentations include Hibernate9

, iBatis10

and EclipseLink11

former are

compatible with both Java and MS.NET Framework languages with

Hibernate being written in the Java Virtual Machine environment for

platform independent. EclipseLink is a modified version of Oracle‟s

TopLink JPA implementation made for java.

Graphics: A graphical element in the editor would allow characters to be

input, and presented. Scalable Vector Graphics, an XML based drawing

platform12

allows shapes to be described in XML and drawn with an API

for scripting languages such as ECMA script. The standard has three

8

The Java Persistence API - A Simpler Programming Model for Entity Persistence; Oracle

Corporation; http://www.oracle.com/technetwork/articles/javaee/jpa-137156.html

9

Hibernate; JBoss Community; http://www.hibernate.org/ Last Accessed 11/05/10.

10

iBatis Homepage; Apache foundation; http://ibatis.apache.org/ ; Last Accessed

11/05/10.

11

Introducing EclipseLink; DZone, Inc; http://eclipse.dzone.com/articles/introducing-

eclipselink

12

About SVG; SVG Working Group; http://www.w3.org/Graphics/SVG/About; Last

Accessed 11/05/10.

Page 29: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

29

types of graphic objects; vector graphics, rastar graphics and text. It is

a scalable standard which can be used in both mobile and desktop

systems, and supported by most web browsers. An alternative to this

includes Postscript which can be used to describe shapes in a similar

way, though SVG however has much wider support for fonts using

Unicode character encoding. SVG also supports multi directional text

allowing characters to flow from right to left and from top to bottom

which makes it suitable for the current project. The files created can

also be compressed if necessary which maybe necessary for database

space efficiency. An example of this is the Batik13

library which can be

utilised within a java application to enable the rendering of SVG

documents to a swing derived SVG Canvas.

2.2.4 Reflection

There were alternative programming languages which could have been

used to create the editor; however those listed seemed to be the most

useful for this particular project. The Java platform is familiar and cross

platform compatible, and although the Microsoft.NET Framework seems

less so, both languages are object oriented and have developed to allow

interaction with databases, and manage the application memory

themselves which is a positive for simplicity of application development.

OWL is a useful technology for the semantic requirements of the project

and it may be possible to use this tool in future iterations of the project

to enhance the inference of semantics.

The JPA specification seemed to offer a simple and robust solution to

accessing a database from a java application. In conjunction with an

13

Batik SVG Toolkit, 2010 The Apache Software Foundation;

http://xmlgraphics.apache.org/batik/; Last Accessed 11/05/10.

Page 30: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

30

implementation such as EclipseLink it seemed the most suitable tool to

use. In conjunction with a MySQL DBMS which is a well established and

open source implementation.

Chapter Summary

This chapter looked at some of the fundamental concepts of the Chinese

writing system. The history of Chinese indexing was also explored with

the development of the Chinese character radicals. Some of the tools

that would aid the progress of the project were also outlined with the

most feasible being selected.

Page 31: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

31

Chapter 3: Research Methods and Design

Considerations

This chapter will examine some of the issues that required consideration

in order to ensure effective project development. The project objectives

will be explained and a basic project plan outlined with reference to the

relevant sections in the dissertation. The development process will also

be explained with a description of the process used to evaluate the

project.

3.1 Project Overview

The long term aim of the Zhongwen Youxi He project is that of a fully

implemented interactive learning tool for a native English speaker. The

scope the current project however has been narrowed to creating the

foundations of the system to allow the indexing of the main articles of

the Chinese written language. Conceptually this can be thought of as

the system back-end, a purely logical and functional foundation with

little consideration for the higher level or front end systems.

3.2 Project Objectives

The project was divided as fairly as possible to allow individual members

to make a significant contribution. Both members were assigned an area

of priority research. The actual division of work however was not a

simple case of each member taking half the responsibility as some areas

of the system were shared. Collaboration and group management was

necessary to avoid the progress of one member being too dependant on

that of the other‟s. The areas of priority were assigned as follows:

Radicals (Melville)

Page 32: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

32

Syllables (Melville)

Phonetics (Xing)

This division provided the basis for the division of work throughout the

project. This report concentrates on the Chinese character radicals and

the development of the project in relation to this. This individual

section of the project aimed record the information about the radicals

and to obtain some of the semantic themes from the characters indexed

by these radicals and thus will require a database to:

Index radicals.

Describe some basic uses and common semantic character

themes.

The project also required an editor to allow data to be input and

retrieved from the database. This editor application would therefore

need to communicate with the underlying database, to query and update

the database as necessary.

3.3 Project Plan

The project followed a basic development plan. To allow for variation in

methods of research and execution of individual group member‟s work,

regular meetings helped to ensure the coordination and a common

direction. These main project stages are milestones from which set a

foundation for the next project stage.

Database technology decision: MySQL was decided the most suitable

technology for this project as described in the ground section. This was

agreed by both team members.

Page 33: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

33

Editor development platform: Java was chosen as the programming

language due its availability on the University machines and the fact that

both members of the group had developed applications with it

previously. The editor application would need to utilise database

connectivity provided by the Java API. The JPA was chosen for the

reasons mentioned in the Chapter 2.

Database design: This project involves three main areas of

development; the database, the editor and the translation link between

them. In order to ensure robust system design as well as flexible and

efficient development time, the design and development process would

have to be iterative. The database had to be designed and implemented

before the editor as this could be argued the most important feature for

the current and future Zhongwen Youxi He developments.

Typically database design has three main stages [9] consisting of system

analysis requirements analysis followed by conceptual, logical and

physical modelling. As this was a new system there was no current

architecture to analyse, so the requirements were obtained and

engineered to ensure that there was an understanding of proposed

system operation. The process included modelling use cases and

brainstorming.

Conceptual Modelling: This stage modelled the conceptual schema

based on the system analysis. Entity Relationship modelling was be

used to identify the main system objects and how they should be related

to each other such as the relationship between the characters and the

components or the relationship between characters and radicals. Each

individual member paid particular attention to the design of their

particular area of responsibility.

Page 34: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

34

Logical Modelling: This stage of the process attempted to model the

conceptual model in terms of the database technology, mapping them

into a logical schema, such as the tables which stored the radicals and

their relationships to each other. Normalisation could then be applied.

Physical Modelling: Though this stage “involves the selection of

indexes (access methods), partitioning and clustering of data” [9]. This

was outside the scope of the current project, however.

Editor design: On completion of the database design the initial editor

design could also begin.

Implementation: Once the database architecture had been designed it

will be created. The editor could then also be implemented. In the

mean time the radicals table of the database could be populated with

data which would form the basis of the Zhongwen Youxi He project. This

is further discussed in Chapter 5.

Testing: The editor would be tested modularly as functionality is added

to ensure that any previous functionality was unaffected each method

would be treated as a functional sub unit. This method borrowed from

the test driven development methodology [10]. These tests would

include simple functionality such as branching and looping, but also

whether the functions operate correctly with the translation tools, i.e.

the database data is correctly affected. This would then be integrated

N 1

Composed of

Character Components

Figure 11: Example of an entity relationship for Chinese characters

and components (aggregation)

Page 35: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

35

into the overall system in accordance with group management policy

this is discussed mainly in the group report..

The application could then be tested as a whole. The system could be

populated with test data in usability tests:

entry of radicals

retrieval of radicals

input of details

retrieval of details

deleting data

editing of data

Evaluation of System: Usability tests to determine the extent to which

the system satisfied the project objectives, identifying areas of the

system requiring improvement. The design processes, research

methods, the database and some of the conclusions drawn from the

research, would be evaluated

Chapter Summary

This chapter outlined some considerations that affected the course of

project development. The scope and the objectives of the projects were

firmly established. The research and development processes were

described to and a project plan with regard to the Chinese character

radicals was also explained.

Page 36: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

36

Chapter 4: Design

The project was divided into the database and editor layers. The

individual projects were sub objectives of these layers, with both

members of the group attempting to share responsibility and contribute

equally. Due to overlapping areas of implementation however, this was

difficult but attempts were continuously made to ensure effective

distribution of workload. This section attempts to describe my

contribution to the design of various parts of the project.

4.1 Database

4.1.1 Division of Work

As mentioned in the group report, much of the design of the database

was reviewed by the project supervisor to ensure that as the basis of

future Zhongwen Youxi He projects the foundations were solid. Once the

basic architecture had been established, alterations were made to the

areas of the database which concerned the Chinese character radicals,

as this was my area of responsibility. My changes were then combined

with that of the group.

Page 37: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

37

4.1.2 Entity Relationship Model

Figure 12: Database entity relationship diagram

The database architecture was as shown in figure 12. The

RADICAL_TABLE was designed to store data about the Chinese character

radicals as indexed by the TICCC [13].

Page 38: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

38

Figure 13: Example of the TICCC

Though initially confusing, many of the fields in the RADICAL_TABLE

map directly from the information in the TICCC. Figure 13 shows an

example of a small part of the TICCC, if we give a working example:

The radical 匚 has number 8 in the list, and there are two radicals

without numbers underneath it. These are radicals have the same

number of strokes which would usually be indexed here, but they have

another form, a main form (written style) under which they are indexed,

this number is denoted by the square brackets [] and corresponds to the

RadicalLeadSequence_Id. In this example [9] corresponds to 卜 the

radical at number 9. At this index it is shown in round brackets (). This

corresponds to the RadicalLeadSequence_Id which would be 1 as it is the

first entry in the brackets, and the RadicalLastSubSequence_Id

corresponds to the total number of radicals in round brackets, 1 since

there is only one radical in round brackets. The RadicalTICCCSubsid

corresponds to the sub indexed radical (those without a number), and

Page 39: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

39

their position in the sub index. In this example [9] occurs at position 1

and [22] at position 2.

Figure 14: Example of the TICCC and database fields

However for the purposes of the database, the Radical_Id will

correspond to the strict occurrence of the radical in the TICCC, whether

sub Index or not. As such in the above example radical [9] would be

given radical_Id 9 and [22] would be given radical_id 10 (as they occur

strictly after radical 8), therefore the radical underneath these with the

number 9 would have radical_id 11. The RadicalTICCCSubsid_Id is

adjusted accordingly so radical [9] would become RadicalTICCCSubsid_Id

= 11.

Radical_id = 8

RadicalLeadSequence_Id = 0

RadicalTICCCSubsid_Id = 0

RadicalSubsidSequence_Id = 0

RadicalLastSubSequence_Id = 0

Radical_id = 9

RadicalLeadSequence_Id = 11

RadicalTICCCSubsid_Id = 1

RadicalSubsidSequence_Id = 1

RadicalLastSubSequence_Id = 1

Page 40: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

40

Radical_Name stores the pinyin name of the radical.

Stroke_Number is the number of strokes as indexed in the TICCC.

Unicode stores the Unicode representation where possible.

If_Char whether the radical is also a character.

About notes about the radical and some examples or characters

indexed by it. This field represents one of the main features of this

stage of the project, and will be described further in the implementation

section.

SVG is an SVG representation of the radical.

4.2 The Editor

With the database architecture finalised, work could begin on the editor.

The Java platform and the JPA specification were to be utilised to

interact with the database, and a model – view – controller software

model was deemed the most appropriate development model.

This model would allow a modular development process with database

operations separated from the view via the controller. This would also

allow development and testing to occur separately with the model being

built and tested first, followed by the controller with the view placed on

top receiving data from the model via the controller.

Controller

Data

presentation

User input Update

Model

Query Model

View

Model

Figure 15: Initial Model View Controller design pattern

Page 41: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

41

It was decided that to enable independent member development of the

editor, that core shared elements be implemented initially. This

framework would provide basic functionality for interaction between the

database, controller and the view. Both members of the team could

then develop further functionality in order to satisfy their own project

objectives. I took responsibility for the design of this framework.

4.2.1 The Model

The JPA enables java applications to interact with relational database

tables through entity objects. Each entity object corresponds to a row of

the corresponding table and one entity object is required for any table

accessed by the application. This resulted in an entity class for every

table in the database. Entities require one empty constructor, but a

constructor which took a radical_Id was added to allow new “empty”

radicals to be added to the database with details to be filled in at a later

time. Entities also require a field for each row in the table and get and

set methods for each of these fields. In the case of the RADICAL_TABLE

the entity was constructed as in figure 16.

Page 42: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

42

RadicalsTable()

RadicalsTable(Radical_id)

setRadical_id

getRadical_id int

setRadicalLeadSequence_Id

getRadicalLeadSequence_Id int

setRadicalTICCCSubsid_Id

getRadicalTICCCSubsid_Id int

setRadicalSubsidSequence_Id

getRadicalSubsidSequence_Id int

setRadicalLastSubSequence_Id

getRadicalLastSubSequence_Id int

setRadical_Name

getRadical_Name String

setStroke_Number

getStroke_Number int

setUnicode.

getUnicode. String

setIf_Char

getIf_Char boolean

setAbout

getAbout String

setSVG

getSVG String

RadicalTable

Radical_id int

RadicalLeadSequence_Id int

RadicalTICCCSubsid_Id int

RadicalSubsidSequence_Id int

RadicalLastSubSequence_Id int

Radical_Name String

Stroke_Number int

Unicode. String

If_Char boolean

About String

SVG String

Figure 16: RadicalTable class

Page 43: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

43

The JPA interacts with each of these entity classes via an Entity Manager

which is created from an Entity Manager Factory method provided by the

JPA implementation environment. The Entity Manager queries the

database, creates and updates the entities as appropriate providing row

level access to the database. One of the basic functions of the entity

manager is querying the database for entities:

Query q = new entitymanager.createQuery(“SELECT r from RadicalTable r”)

This query retrieves all records from the radical table. From here a

RadicalTable object can be obtained from the query q via

q.getResultSet() which returns a list. This list can be thought of as a list

of rows from the RADICAL_TABLE, with each row having a column

mapped to each field attribute, e.g. the first RadicalsTable object in the

list would likely have radical_id = 1. These objects could then be

updated or deleted and new objects created and saved to the database.

The framework would require the functionality of the Entity Manager.

4.2.2 The Controller

The Entity Manager functionality was encapsulated in a controller class,

Repository Manager.

Page 44: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

44

The remove and save methods are both used to alter records in the

database. They both utilise either the update() or persist() methods to

save the states of the entities to the database. The life cycle of an Entity

Manager is usually the length of a transaction e.g. save an object to or

retrieve and object from the database. During this life cycle any

changes to the object can be made with the persist() method as the

entity and entity manager are connected to the database. Once the

entities have been retrieved by the application the entity manager is

destroyed severing the connections of these entities to the database.

These entities are known as detached, and the application can make

changes to them, but in order for the changes to be saved a new entity

manager needs to establish the connection with the database once again

and merge the new entity object with the database version. The update

method provided this function.

In order to mediate between the Repository Manager and the view

another controller was designed. The View Controller would initiate

database access with the Repository Manager and process it correctly for

the view.

getRadicals() List

getPhonetics() List

getPoneticsbyId(int)

getRadicalsbyID(int)

removeRadical(RadicalTable)

removePhonetic(PhoneticTable)

saveRadical(RadicalTable)

savePhonetic(PhoneticTable)

persist(Object)

update(Object)

RepositoryManager

entityManager EntityManager

Figure 17: RepositoryManager class

Page 45: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

45

The methods were intended to be as general as possible allowing for

later additions to the Repository Manager to be utilised here. The

getAll() switches between types of entities (given by currentEntity) and

queries repositoryManager to get all records of that type (table) from the

database, the getById does the same but calls the corresponding getById

method. The save and remove methods correspond to those in the

Repository Manager. The asTable() method takes the currentList

variable and passes it in a suitable format for a JTable in the view.

These two classes formed the controller in the model view controller

architecture, allowing the view to interact with the model and thus the

database. This was the basic editor framework.

getAll() List

getbyId (int) List

removeObject(Object)

saveObject(Object)

updateObject(Object)

updateView()

asTable() Object[][]

ViewController

repositoryManager RepositoryManager

currentList List

currentEntity String

Figure 18: ViewController class

Page 46: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

46

4.2.3 The View

The view is the user interface for the system. The user uses the Main

View Panel to select the table in the database they wish to read or edit,

and these user requests are sent to the controller. The Main View,

which is discussed further in the group report, consists of a main panel

with a JTabbedPane TabView as shown in the figure below.

The TabView consists of a JTable which gives a grid view of the records

from the table, as passed from the ViewController. It also contains a

switchable array of JPanels; radicalPanel,

Update

Repository

Manager

Query and

update Model

View

Repository

Manager

Model

View

Controller

User input

and view

update

Query

Repository

Manager

Figure 19: Editor Framework

Figure 20: MainView and TabView class

search()

update()

MainView

tabView TabView

comboTables JComboBox

textIDfield JTextField

TabView(ViewController)

updateViews()

setPanel()

TabView

tableDBView JTable

panelEditRecordView JPanel

radicalPanel RadicalViewer

phoneticPanel PhonteicViewer

syllablePanel SyllableViewer

phoneticSyllable PhonSyllViewer

viewController View Controller

Page 47: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

47

phoneticPanel, syllablePanel, phoneticSyllable using the CardLayout

Layout Manager. I was responsible for the design of the RadicalPanel.

Figure 21 shows the basic design for the Radical Viewer input form, this

would be shown from the Main View user input screen, All fields are

JTextFields with the exception of the JComboBox if_char which gives the

choice of true or false, the SVG Panel which is an SVG Canvas and the

About field which is a JTextArea which cannot be edited directly. The

About Field has a click listener which opens a jEditorPane which can be

used to edit the text. This will be described in the next section. The

SVG Panel is an SVG Canvas which uses the Batik API to render SVG to a

JPanel. The save clear and cancel buttons are self explanatory.

Radical_Id RadicalLeadSequence_Id RadicalTICCCSubsid_Id

RadicalSubsid

Sequence_Id

RadicalLastSubSequence_Id If_char

SVG Panel

SVG Text

About Field

unicode Stroke number

save clear cancel

JTextArea not editable

SVG Canvas

Figure 21:RadicalView input form

Page 48: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

48

The RadicalViewer class is passed a ViewController which is used to

retrieve and update values in the database. The populateForm() method

is called when a row in the TabView tableDBView is selected, the row

number is passed to the ViewController which gets the RadicalTable

object from its currentList field. The fields on the form are then

populated with the data from the RadicalTable object. The isDuplicate()

method is a check performed on saving changes to the database. If a

user attempts to save a new record, the radical table is checked in the

ViewController‟s currentList, to ensure no radical_Id already exists. The

methods saveRadical() and removeFromDB() both check to ensure that

either a record is active (selected in the tableDBView), or that a record

with the same radical_Id doesn‟t already exist before performing the

action.

4.2.4 The AboutEditor and TextParser

The About field on the RadicalViewer panel is not directly populated

from the database, instead the value is passed to the TextParser class

which allows formatting to be performed on the text before its output to

the screen. For the purposes of testing, the About field only displays

normal text, but when clicked opens an editor field; the AboutEditor.

RadicalViewer(ViewController)

populateForm(int)

isDuplicate(int) boolean

clearAll()

saveRadical(RadicalTable)

removeFromDB(RadicalTable)

RadicalViewer

myontroller ViewContoller

Figure 22: RadicalViewer class diagram

Page 49: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

49

The AboutEditor appears when the About field is clicked in the

RadicalViewer panel. The TextParser class is passed to the editor to

allow parsing of the text. The showEditor() method initialises the view

by switching the switchPanel to rawEditor, with the contents of myParser

Test Reset Save Cancel

jEditorPane

Jscrollpane

Figure 23: The AboutEditor

AboutEditor(TextParser)

showAbout()

saveAndClose()

test()

closeDiscard()

reset()

AboutEditor

rawEditor jEditorPane

formatEditor jEditorPane

switchPanel JPanel

myParser TextParser

Figure 24: AboutEditor class diagram

Page 50: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

50

(the data from the about field in the database). The jEditorPane allows

text to be displayed with html mark-up as if in a web browser, this

allows basic html formatting in the pane. The rawEditor is a plain

jEditorPane which displays the text in raw form, whereas the formatPane

shows the results of the html markup. When the button “test” is pressed

this switches the rawEditor to the formatEditor, they are both contained

in the switchPanel in CardLayout. The reset button switches the view

back to rawEditor. Saving will update the TextParser with the contents

of the rawEditor pane, which will then update the database if the save

button is pressed in the RadicalViewer.

Figure 25 shows the basic workflow from the RadicalViewer, the

AboutEditor and the TextParser. In the case of RadicalViewer populating

the fields from the tableDBView jTable, the ViewController passes the

about information directly to the TextParser which displays the data in

the About field. When the About field is clicked the AboutEditor is

opened initialised with the raw contents of the About field. When the

text is parsed the text from the rawEditor is sent to the TextParser to be

passed and returned formatted with html for the html aware

About field

text

Save to DB

Display in

About

Field

Save raw text

RadicalView

TextParser

AboutEditor

Get format or

raw text

ViewContoller

Figure 25: Basic flow of control for AboutEditor

Page 51: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

51

formatEditor. If this is saved, the contents of the rawEditor are saved to

the TextParser to be saved to the database once the RadicalViewer save

button is pressed. Otherwise no changes are made, in both cases the

contents of the Textparser are passed to the ViewController

The TextParser class allows the parsing of user defined tags into html.

In this Editor a number if user tags were defined corresponding to

different types of database information:

<k keyword></k> keyword marker

<c unicode></c> character unicode

<r id></r> radical id

<s id></s> syllable id

<f id></f> phonetic id

<fs id id> </fs> phonetic-syllable id

<g></g> for grammar

<ie></ie> for english idioms

<ic></ic> for chinese idioms

<c-col ></c-col> colour: col = r,o,g,b

<bf></bf> bold face

<it></it> italic

<sf></sf> sans serif

<tt></tt> teletype / courier

<e example id></e> example id

These various codes would be marked up with various forms of html

when parsed by the TextParser. For the purpose of this project only a

few styles were selected:

Page 52: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

52

<c-red ></c-red> <font = “red”></font>

<c-yellow ></c-yellow> <font = “yellow”></font>

<c-blue ></c-blue> <font = “blue”></font>

<bf></bf> <b></b>

<it></it> <i></i>

The parser was designed to handle overlapping and these tags were

deemed sufficient for the purpose of testing. The framework was

however setup for the remainder of the tags to be added or altered at a

later date. Parsing of the special characters ”\n”, “\r”, “\r\n” were

mapped to <br> to maintain page formatting.

The TextParser setRawText() method sets the rawText member variable

storing the value of the About field passed from the ViewController.

The getRawText is the value passed back when the RadicalViewer is

closed. The parse() method takes a String from the rawEditor pane and

returns a formatted string of html. The algorithm iterates the start tags

splitting the string and applying formatting to the text between the start

and end tags, by replacing them with the corresponding html.

setRawText()

getRaswText() String

parse(String) String

TextParser

rawText String

startTags String[]

endTags String[]

Figure 26: The TextParser class diagram

Page 53: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

53

4.2.4 The SVG Panel

Utilising the Batik Library, the contents of the SVG field would be

rendered to this panel as well as passed to the SVG JTextArea. This

would allow changes to be made to the SVG and viewed in the SVG

Panel. The batik library SVG Canvas is an extension of the JPanel class

which provides this functionality, the text string is passed to the canvas

to be displayed via a url or a reader.

Chapter Summary

This chapter documented the design of the system. The database

design was outlined and justification for the design of the

RADICALS_TABLE was also provided. The design of the editor

framework and the model – view – controller pattern was described with

the additional RADICAL_TABLE specific additions.

Page 54: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

54

Chapter 5: Implementation and Testing

Once the database architecture and editor framework had been

designed, implementation and testing of core functionality began to

allow both members of the group to then undertake their own individual

tasks. This included the design and implementation of their own input

forms such as the Radical Viewer and adding data to the database. As I

assumed responsibility for the implementation of the framework, this

will be described here with testing results. Some of the tools which were

used in the development of the system will also be mentioned.

5.1 The Tools

The project was implemented using a MySQL database which was

engineered with the help of MySQL Workbench14

which enabled the

modelling of the Entity – Relationship model. The Eclipse IDE15

was used

to develop the editor with the use of the JPA specification and

EclipseLink JPA implementation and the Batik library which was used to

render SVG documents to the screen. JUnit which is included as part of

the Eclipse IDE, was used as a testing framework for some of the core

functionality as this ensured that, as the most important part of the

project the implementation was robust.

5.2 The Editor

The implementation of the core framework was carried out using

methods borrowed from Agile development. The JUnit framework was

14

Welcome to MySQL Workbench 5.2; MySQL Workbench Team; MySQL Inc;

http://wb.mysql.com/; Last Accessed 10/05/10.

15

Explore the Eclipse Universe 2010; The Eclipse Foundation 2010;

http://www.eclipse.org/; Last Accessed 12/06/10.

Page 55: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

55

used to create test classes of which tested the main functionality, or any

functionality which could feasibly tested without user input. The test

setup for the core functionality followed the same pattern for each class

with the main test cases being an array of three RadicalTable objects;

RadicalsTable r1 = new RadicalsTable(1);

RadicalsTable r2 = new RadicalsTable(2, "4E57", "yi1", true);

RadicalsTable r3 = new RadicalsTable(3);

The objects have radical_id 1-3 respectively, with radical testing a four

argument constructor. Each Test class had a number of methods to test

a number of different properties. Full test code can be found in the

Appendix.

5.2.1 Radicals Table Test

This class tested the RadicalTable Entity using an Entity Manager:

generateRadicalTables(): This method creates the above

radicalTestCases. To test the set methods for each of the fields

raadical(3) has its fields set at runtime.

insertAndRetrieve(): This method creates an entity manager to attempt

to persist the entity objects to the database, then test whether they can

be correctly retrieved.

findAndDelete(): Retrieves said entity objects and deletes them testing

they have been deleted

Page 56: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

56

Figure 27: Radical Table Test passed

5.2.2 Repository Manager Test

This class tested database access using the Repository Manager to

create an Entity Manager and manipulate the entities.

insertAndRetrieve(): This method creates an Repository Manager to

attempt to persist the entity objects to the database, then test whether

they can be correctly retrieved.

findAndUpdate(): Tests using a detached entity object to see if changes

can be merged with the database.

updateById(): This method retrieves an entity from the database with a

given id setting one its radical_Name field to a new value and testing

that it has merged with the database.

findAndDelete(): Retrieves said entity objects and deletes them testing

they have been deleted.

Page 57: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

57

Figure 28: Repository Manager Test passed

5.2.3 View Controller Test

This class tests some of the functionality of the ViewController, mainly

the interaction with the Repository Manger, but none of the input from

the view.

insertAndRetrieve(): This method creates an Repository Manager to

attempt to persist the entity objects to the database, then test whether

they can be correctly retrieved.

insertAndRetrieveById(): This method creates an Repository Manager to

attempt to persist the entity objects to the database, then retrieves an

object by its radical_Id.

asArrayTest(): This method tests the asArray() and getTableData()

methods used to pass the entity properties to the JTable in the TabView.

The Repository Manager retrieves all the records, the getTableData()

submethod places this into an array which is passed to asArray()

method. The lengths of these arrays are checked.

testUpdate(): Tests using a detached entity object to see if changes can

be merged with the database.

Page 58: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

58

testRemove(): This method attempts to delete entities, testing they have

been deleted.

Figure 29: View Controller Test passed

5.3 The Visual Elements

Testing the visual elements of the editor could not feasibly be carried

out with JUnit tests and datasets, so running tests were used, testing the

usability and functionality of the system simultaneously. The

implementation of the Main View was a start point for the visual tests

and would be used as part of the integration tests conducted by the

group.

Page 59: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

59

The editor framework presents the user with a Main View screen.

Figure 30: Main View

With radicals_table selected, the view switches to allow editing of the

database table.

Page 60: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

60

Figure 31: RadicalViewer input screen

5.3.1 Adding a New Record

Adding a new record with radical id 20:

Figure 32: Adding a new record

Page 61: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

61

When the save button is clicked the Radical Viewer checks whether a

record is selected in the table view, in the case of a new record (no

selection) the viewController initiates a check to ensure that there is no

radical with that radical_id already stored, and if not the user is

presented with a selection confirmation dialogue:

Figure 33: Save confirmation

If no is clicked then view is returned back to the input screen as before,

if yes is clicked, the save process is initiated. The code below illustrates

this functionality.

Figure 34: Save confirmation code

The save process involves ensuring that the data from the fields is in the

correct format for the database. A number of try catch clauses

encapsulate this, with incorrectly formatted text being presented in an

error box to the user. Once saved the database can be queried to

refresh the view, this refreshed view can be seen in the following figure.

Page 62: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

62

Figure 35: New record added

5.3.2 Updating a record

Updating a record follows similar logic to that of adding a new record

however the RadicalViewer class checks that a record is selected in the

jTable as it is assumed that the user will select the record from here

before attempting to edit it. The populateForm() method is called to fill

the RadicalViewer with data from the RadicalTable object in the

ViewController. If no record is selected the user is presented with the

following message box:

Figure 36: No radical to update

Page 63: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

63

Otherwise the logic follows as adding a new table, with the

ViewController updating instead of saving to the database.

5.3.3 Removing a Record

In order to remove a record the same checks are made, to ensure the

record is selected in the table view this ensures that there is not attempt

to delete a record that doesn‟t exist. If the user is presented with a

dialogue informing them of such as above otherwise they are greeted

with a confirmation dialogue

Figure 37: Remove from database confirmation

5.3.4 The SVG Panel

The SVG field from the RADICALS_TABLE is passed simultaneously to

both the SVG Panel and SVG field when the populateForm() method is

called.

Figure 38: Setup of the SVG Canvas

Page 64: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

64

For test purposes the SVG for the character 採 (cǎi) [13] was used. The

complexity and colour would test the effectiveness of the SVG Canvas as

shown in the figure below.

Figure 39: The SVG Panel and SVG Field

5.3.5 The AboutEditor and TextParser

The About editor and TextParser were tested by entering data into the

About field of a new record to test both parsing and saving. The

following input was used:

<c unicode>myTest</c>

<c-b>the colour blue</c-b>

<c-r>the colour red</c-r>

<it>test the italics</it>

normal

<c unicode>This is<c-b> a nested</c-b> example of colour<it>with

<c-r>some italics </c-r> </it> </c>

Page 65: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

65

Clicking the About field on the RadicalViewer form, causes the

AboutEditor appear. The text added to the rawEditorPane is shown in

the figure below.

Figure 40: AboutEditor rawEditor pane input

The TextParser parses the input when the Test button is pressed as

figure 41 shows. Note that for the purposes of this test Unicode was

given a yellow font.

Figure 41: The parsed text

Page 66: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

66

5.3.6 Discovered Issues

Issues were found throughout the course of the testing and sometimes,

though missed the initial tests setup were found through usability.

Some of these will be outlined here.

SVG Canvas glitch: On some occasions when saving an image the

database, the canvas would glitch covering the entire RadicalViewer

input form. An attempt to remedy this was placing the SVG canvas

inside a JScrollPane, however this only served to make the enlarged

image scrollable.

MainView refresh: When changes are committed to the database, the

user needs to query the database again to refresh the view using the

search button. As a point of usability the view should refresh

automatically when changes are made to keep the user abreast of the

state of the table.

5.4 The Database

The database was populated with data about the Chinese radicals as

described in the design section. Further information about the radicals

and any semantic relationships between the radical and the characters

indexed by it was researched and entered. This section will illustrate a

few examples of this information.

Page 67: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

67

一 yī : One

Unicode: 4E00

This radical also known as “one horizontal”, lies at the root of many of

the numbers including:

二 èr (two)

三 sān (three)

五 wu (five)

卅 sà (thirty)

It has many meanings associated with measurement and identification

of uniqueness such as 每 „each‟ or 各 „every time‟, 统一 unitary or unified.

The characters indexed by this radical include:

屯 tún which is used in terms such as to garrison or station 屯兵 tún

bīng, or 屯聚 tún jù to amass or assemble.

再 zài which has uses in terms, 再版 zài bǎn second edition or 再次 zài cì

meaning again e.g. 再一次 one more time.

丿 piě: Hook or left- falling stroke

Unicode: 4E3F

Other forms: 乀fu2, 乁(yi2)

This radical has many different indexing characters. It is used by

onomatopoeic characters such as 乓 pāng and 乒 pīng which are both

used to describe the sound of a discharging firearm.

The character 卵 luǎn (egg) is used to describe parts of an egg 卵黃 luǎn

huáng: egg yolk, or objects that share some physical similarity with an

Page 68: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

68

egg such as 卵形 luǎn xíng: oval shaped 卵石 luǎn shí: the rounded

shape of a pebble.

Another character, 乘 chéng is used in words which express the idea of

increasing one‟s lot, and taking advantage of opportunity. The

character 乘法 chéng fǎ multiplication [14] can be seen as a core theme

to gaining more, or increasing something. This can be seen in the

phrases which use characters such as 乘机 chéng jī: to “jump at a

chance” or “to strike while the iron is hot” 乘势 chéng shì.

八 bā: Eight

Unicode: 516B

Other forms: 丷

The number eight seems to hold some symbolic importance within the

Chinese language with 八德 bā dé: the eight virtues, the eight points of

the compass: 八方 bā fāng, the eight immortals 八仙 bā xiān and the

eight trigrams: 八卦 bā guà. The Chinese horoscope also consists of

eight characters. This radical indexes a numerous characters with an

equal number of meanings.

The character 公 gōng is used in a number of words and phrases

regarding the public and being in the open including 公安 gōng ān which

is used in the term public safety, and 公私 gōng sī used to describe

public interests. In a similar vein to above, the character 分 fēn can be

seen in phrases such as 分散:disperse or 分给: distribute. It is also used

in words which describe an acquaintance or person of whom one is

aware: 生分 shēng fen.

Page 69: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

69

兴 xīng is a character in phrases such meaning to begin a task, or to set

it up 兴办 xīng bàn, with the intention of achieving some goal such as

兴兵 xīng bīng: starting a war.

冫 bīng: Ice

Unicode: 51AB

This radical appears on the left side of numerous characters which share

a relation to the cold, most notably 冰 ice water (ice) which is the

concatenation of the radical 冫bīng (ice) with the character 水 shuǐ

(water). Phrases which include 冰 follow this theme further such as 冰山

bīng shān (ice mountain) and 冰窖 bīng jiào which can be used to

describe structures made of ice. Other uses include 冰霜 bīng shuāng

with connotations of high moral integrity.

The character 准 zhǔn is indexed by this radical and has a number of

meanings relating to precision and strictness of order which can be seen

in phrases such as 准将 zhǔn jiàng used to describe a number of military

ranks, and 准确 zhǔn què: precise or exact. A degree of certainty and

expectation of outcome is expressed with this, the regimented nature of

the military an example of this and the rigour of service are also implied

here. 准头 zhǔn tou is another example of this being used to describe

accuracy.

净 jìng has some similar uses to those mentioned with the implication of

attention to detail and neatness such as 干净 gān jìng which can be

found in terms of cleanliness with meticulousness 溜干二净 liū gān 'èr

jìng.

Page 70: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

70

卜 bǔ: To Divine

Unicode: 535C

This radical has roots with the Oracle bone inscriptions on the shells of

turtles from the Shan Dynasty. The art of divination:卜课 bǔ kè involved

attempting to predict:卜问 bǔ wèn the outcome of major events such as

the harvest or battle. Many phrases including this radical are some

reference to this such as; 卜骨 bǔ gǔ: the bone used for inscription,

with卜甲 bǔ jiǎ being the divination shells.

The character 占 zhàn is indexed by this radical and has uses in many

phrases which suggest the act of owning or asserting presence,

including; 占压 zhàn yā which is used in the phrases alluding to

occupation. 占有 zhàn yǒu has associations with ownership such as to

possess, occupy or have, even by force (攻占 gōng zhàn).

In contrast 卧 wò alludes to relaxation or rest; 卧车 wò chē is used in the

description of a sleeper carriage, as well as 卧床 wò chuáng (bed) and

卧房 wò fang(bedroom).

勹 bāo: Wrap

Unicode: 52F9

The radical bao is a character used to describe the act of wrapping or

placing an object in a bag. It also indexes a number of characters with

different connotations.

够 gòu is a character which is used by words and phrases with a vast

variety of meanings, from the good 够意思 gòu yì si (great), to the not so

Page 71: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

71

good 够戗 gòu qiàng, badly or horribly. The character has associations

with words which give an impression of enough or sufficiency, such

as够格 gòu gé which describes competence or 够本 gòu běn which can

be used to express the act of covering expenses.

匀 yún is another character indexed by this radical. It has uses with

words which denote balance such as 匀称 yún chen. Other terms which

conform to this include 匀净 yún jing which describes uniformity, which

can possibly spuriously be linked to the act of being “wrapped” or

collected into a group. The term 匀溜 yún liu is used to describe the

property of an object such as its texture and consistency.

几 jī: Table

Unicode: 51E0

This radical for the word table is used to describe flat surfaces as the

meaning suggests there is also some implication of counting or

measurement with terms such as 几何 jǐ hé used in phrases related to

geometry and quantification, 几种 jǐ zhǒng (several). This term is used

to ask for certainty of amount or time; 几时 jǐ shí (what time) and 几多 jǐ

duō (how many).

The character 凡 (凡) fán is indexed by this radical and has a general

connotation of commonality, or lacking any special quality, such as 凡夫

fán fū which describes an ordinary person and 凡是 fán shì which

expresses everything collectively. Terms such as 凡庸 fán yōng

(commonplace) also affirm this theme with an implication of mediocrity.

Page 72: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

72

咒 zhòu has connotations of the slightly mystic or occult, with terms

which describe the recitation of incantations and prayers: 咒文 zhòu

wén. As with English the term curse or swear is also linked with the

supernatural or more mysterious practices with words to this effect also

containing this character such as 咒骂 zhòu mà.

门 mén: Door

Unicode: 95E8

Other forms: 門

This radical character means door. It has meanings which relate to

doors and openings such as a door knob: 门把 mén bà. The idea of a

direction of ideas or a way to progress is also described by this radical

门道 mén dào (capability). Another main theme associated with this

radical is that of state or importance as the radical seems to have

historic links to positions of power such as the aristocracy: (门阀 mén

fá), government positions and buildings.

The character 闭 bì which is indexed by this radical has associations with

barriers and being able to cordon off an area. Terms such as 闭谷 bì gǔ

and 闭关 bì guān are used to describe isolation and the a life of

seclusion.

In contrast, 闻 wén has connotations of being well known with a hint of

celebrity status (闻达 wén dá). Other terms are related to the spread of

news and hearing or getting wind of gossip. The character 阐 chǎn has

some related meaning as it is used by words which describe explanation

or clarification 阐明 chǎn míng.

Page 73: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

73

人 rén: Human being

Unicode: 4EBA

Other forms: 亻

This character radical‟s main association is that of a person with

numerous words of a personal nature such as; 人工 rén gōng (man -

made) and 人格 rén gé which is used to describe a person‟s character.

Other terms include crowds and populations.

The character 舒 shū has connotations of the relaxed and blissful with terms

such as; 舒服 shū fu which describes satisfaction and a sense of well being and

舒畅 shū chàng which expresses having no cares or worries.

从 cóng is a character which describes the act of people grouping

together. The character can be seen as the combination of two people

characters or a group. 从命 cóng mìng denotes the idea of conforming,

and 从军 cóng jūn describes the act of joining or enrolling.

工 gōng: Work

Unicode: 5DE5

The main semantic theme of this radical is that of employment and

work. This includes places of work such as industrial areas (工厂 gōng

chǎng) and craftsmen and artisans such as 工匠 gōng jiàng. Other

examples include 工交 gōng jiāo (industry) and 工区 gōng qū (business

area).

Characters indexed by this radical include 巫 wū which can be found in

terms related to the mysterious and magical such as 巫婆 wū pó

Page 74: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

74

(sorceress) and 巫师 wū shī (wizard). It should be noted however that

the terms for medical doctor witchdoctor and wizard are described

using the same terms which suggests a widespread belief in the healing

power of the super natural even if this was in a bygone era.

攻 gōng has used of a more conformational nature with terms which

denote the act of attacking or defending such as 攻打 gōng dǎ. This

could relate to the theme of work and business with regard to

competition with businesses battling one another for supremacy.

Chapter Summary

This Chapter described the implementation and testing process,

including the JUnit testing of the core system functionality and the

usability testing of the graphical user interface components. Some

examples of database entries were also shown to give a flavour of the

type of information contained in the RADICALS_TABLE About field.

Page 75: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

75

Chapter 6: Evaluation

This stage of the Zhongwen Youxi He project required the collaboration

of a group to conduct research, record their findings and develop a

system to enable them to do this. This dissertation has documented the

stages of the project under my responsibility. The success of this

process will be assessed in this chapter. This will involve a measuring

the extent to which the final products, meet their objectives, with some

suggested improvements, and a critical appraisal to the approaches to

each stage of the project.

6.1The Editor

The database editor application's success can be measured by its ability

to meet the objectives described in the early stages of the project. The

requirement to allow data from the RADICALS_TABLE to be added

removed and updated was met as the system provided this functionality,

with a user interface to make the operations more user friendly. There

are a number of areas which could also possibly have been improved

and these will be addressed here.

The user interface could have been improved, by allowing resizing of the

main screen to allow better adaptability to users with different screens

and size preferences. The Layout Managers chosen could have been the

more flexible such as Flow Layout, which would have made the process

of positioning elements more involved but provided a much more fluid

user experience.

The system worked mainly by ignoring errors made by the user incorrect

data types being saved to the database. The try/catch clauses enabled

these errors to result in no incorrect data being sent to the database

Page 76: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

76

with output to the console to inform of the error “correction”, but a

more informative approach would have been a message which was more

suitable to the current user interface such as a message box which

would inform the user of their mistake and allow them to rectify it

before continuing. This would improve the user involvement with the

system as the mistake may otherwise be assumed to be that of the

system.

The TextParser class utilised custom tags to render text in the form of

html in a JEditor Pane. These tags each consisted of a start tag and end

tag enveloping some text e.g. <c unicode> my text </c>. This was

initially designed to enable users to nest tags with different properties

inside each other as demonstrated in the testing. This also allowed

users to overlap tags allowing a certain style to be applied from the start

tag onwards. This however requires concentration on the user‟s part to

ensure that start tags are matched with the correct end tag in order to

end the formatting. A more simple approach may have been to allow

the presence of a close tag “>” to denote the end of the most recently

applied formatting. Though this prevent overlapping it could be argued

that overlapping provides and needless complication to the styling of

text which will rarely require the overlapping of tags. A more automated

system may also have helped with the About Editor having automatically

adding the end tag once a start tag is added, allowing the user to add

their text between both.

It could be argued that the use of the JPA architecture though providing

a simple foundation for a database system limited the functionality of

the system by only allowing the table definitions specified in the

configuration files. It is possible that the JDBC would allow a more

flexible application with the database tables to be specified more

dynamically at runtime.

Page 77: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

77

6.2The Database

The database was designed to allow information to be entered and

retrieved with higher level applications. The database therefore placed

minimal constraints on the database with regard to foreign key

relationships. The applications developed to interact with the database

will infer the relationships between the tables and collate data from the

database to be presented to the user. Though it could be argued to

some degree that this could result in some data redundancy in areas of

the database with arbitrary primary keys, with regard to the

RADICALS_TABLE this should not occur as the format of the system

follows that of the TICCC.

The main objective of the RADICALS_TABLE was to index the Chinese

character radicals. This objective was met as explained in the design,

however there was also a desire to include some added information

about the radicals including some explanation of the semantic

connotations of each radical and characters indexed by it. The quality

of this information is a subjective measure, but as the information was

gathered from a range of sources with my own input it could be argued

that some value has been added to the original information. Due to the

distributed and sparse nature of official data, sources such as

dictionaries and text books were used as a starting point, and sources

from the internet used to supplement this. The selection of these

sources could be evaluated as my selection of these was based on those

I could understand i.e. those in English. The number of official or more

prestigious sources may have been increased if I had a greater

understanding of the Chinese language, or employed the aid of a native

Chinese speaker to help me understand this information.

Page 78: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

78

Chapter Summary

My contribution to the project was appraised in this chapter by

evaluating the extent to which the deliverables met their objectives and

some possible improvements to various areas of the project.

Page 79: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

79

Chapter 7: Conclusions and Future Work

This stage of the Zhongwen Youxi He project aimed to create the

foundation for an interactive learning tool, to allow non native English

speakers to learn about the Chinese characters. This stage concentrated

on the creation of the database to store the data, with a means to enter

this data. This dissertation has documented this process, and my

individual attempt to contribute to meeting these objectives.

This stage of the project can be generally seen as a prototype for future

iterations of the Zhongwen Youxi He project. The database design was

functionally adequate for the current project, but perhaps in the future

requirements may change and the design may be refined. The data

stored in the database, more specifically the RADICALS_TABLE can be

seen as one part of a vast repository of data from which information can

be drawn. When the cumulative efforts of the group are combined the

project contains data on the Chinese character radicals, phonetics and

syllables. Though no character or component data was stored at this

stage there has been sufficient contribution and sourcing of information

for the other areas to be addressed at a later time.

The database editor can be seen as an experimental prototype for future

iterations of the higher level functionality that Zhongwen Youxi He

project will require. The SVG panel from the RadicalViewer will provide a

platform independent viewing window for the characters. The use of

SVG also lends itself to programmatic functions, such as searching by

path, stroke or shape. These images can also be manipulated

programmatically eventually allowing for users with the correct access

rights to add and update the images stored in the system. Colours

could be used to differentiate the radical or components that make up

Page 80: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

80

various characters, or the radical components being colour coded by

their general connotation e.g. positive or negative.

The system may also utilise some form of analysis on external sources

of data such as characters copied and pasted by users into the system

from an external website, whether in Unicode or SVG format, allowing

the character to be matched and information retrieved to give examples

of other uses.

The future applications may be able to recognise audio input, matching

user spoken words to the pinyin entries in the system and retrieving the

information accordingly. Audio feedback or correct pronunciation could

also be included. This would help to ensure that future applications are

more rounded and complete, encompassing a vast number of sources

and utilising a number of forms of interaction.

Chapter Summary

The deliverables have been mentioned with regard to their relative

success and their place in the wider Zhongwen Youxi He project. By

commenting on the outcome of the project as the foundation of a

language tool and suggesting some possible future enhancements to

the project this chapter concludes both the dissertation and Zhongwen

Youxi He project. .

Page 81: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

81

References

[1]. Chinese Characters: Their Origin, Etymology, History,

Classification and Signification; A thorough study from Chinese

documents By Léon Wieger , Publisher: Dover Publications; New

issue of 1927 ed edition (Jun 1 1965) ISBN-10: 0486213218

ISBN-13: 978-0486213217. Page 14.

[2]. Success With Chinese, Level 1, Reading & Writing: A

Communicative Approach for Beginners (Paperback); by De-An

Wu Swihart; Paperback: 224 pages; Publisher: Cheng & Tsui; 2

edition (Jan 2007); Language English; ISBN-10: 0887276016;

ISBN-13: 978-0887276019. pp 6 – 8.

[3]. Planning Chinese Characters: Reaction, Evolution or Revolution?

(Language Policy) (Hardcover); by Shouhui Zhao (Author), Richard

B. Jr. Baldauf (Author); # Hardcover: 420 pages; Publisher:

Springer (27 Dec 2007); Language English; ISBN-10:

0387485740; ISBN-13: 978-0387485744; pp 10 -15.

[4]. Lexicography Crit Concepts V2: 002 (Hardcover); by HARTMANN

R; Hardcover: 3 pages; Publisher: Routledge; illustrated edition

edition (29 Sep 2000); Language English; ISBN-10: 0415253675;

ISBN-13: 978-0415253673; pp 159, 161.

[5]. A History of Chinese Calligraphy (Hardcover); by Yuho Tseng;

Hardcover: 446 pages; Publisher: The Chinese University Press;

2nd edition edition (31 Dec 1998); Language English; ISBN-10:

9622014267; ISBN-13: 978-9622014268; p12.

Page 82: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

82

[6]. Unicode Demystified: A Practical Programmers Guide to the

Encoding Standard (Paperback); by Richard Gillam; Paperback:

896 pages; Publisher: Addison Wesley (27 Sep 2002); Language

English; ISBN-10: 0201700522; ISBN-13: 978-0201700527; p357.

[7]. Chinese Radicals; Wikipedia; Wikimedia Foundation, Inc;

http://en.wikipedia.org/wiki/Radical_%28Chinese_character%29#

Semantic_Elements; Last Accessed 10/05/10.

[8]. Comparison of different DB Technologies; by Troels Arvin;

http://troels.arvin.dk/db/rdbms/; Last updated 06/04/10; Last

Accessed 10/05/10.

[9]. Database Design: Know It All (Morgan Kaufmann Know It All)

(Hardcover); by Toby J. Teorey, Stephen Buxton, Lowell Fryman,

Ralf Hartmut Güting, Terry Halpin, Jan L. Harrington, William H.

Inmon, Sam S. Lightstone, Jim Melton, Tony Morgan, Thomas P.

Nadeau, Bonnie O'Neil, Elizabeth O'Neil, Patrick O'Neil, Markus

Schneider, Graeme Simsion, Graham Witt; Hardcover: 368 pages;

Publisher: Morgan Kaufmann (12 Nov 2008); Language English;

ISBN-10: 0123746302; ISBN-13: 978-0123746306; pp 1-10.

[10]. Test Driven Development (The Addison-Wesley Signature Series);

Paperback: 240 pages; Publisher: Addison Wesley (20 Nov 2002);

Language English; ISBN-10: 0321146530; ISBN-13: 978-

0321146533.

[11]. MySQL; Wikipedia Foundation, Inc.;

http://en.wikipedia.org/wiki/MySQL; Last Accessed 10/05/10.

Page 83: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

83

[12]. MEPRC (2009) Radicals] Ministry of Education of the People's

Republic of China, Table of Indexing Chinese Character

Components, 中华人民共和国教育部, 2009.

[13]. File:Chinese character 採 cai3 pick with ROOT colored.svg;

Wikepedia.org;

http://en.wikipedia.org/wiki/Radical_(Chinese_character); Last

Accessed 15/07/10.

[14]. Chinese Tools.com; 2008; http://www.chinese-

tools.com/tools/sinograms.html?q=%E4%B9%98; Last Accessed

01/08/10.

Page 84: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

84

Appendix

TestBase

Page 85: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

85

RadicalTableTest

Page 86: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

86

Page 87: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

87

RepositoryManagerTest

Page 88: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

88

Page 89: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

89

Page 90: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

90

Page 91: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

91

ViewControllerTest

Page 92: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

92

Page 93: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

93

Page 94: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

94

Page 95: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

95

Page 96: Zhongwen Youxi He - University of Manchesterstudentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · Zhongwen Youxi He A dissertation submitted to The University of

96