Zhongwen Youxi He - University of Manchester · 2010. 12. 6. · 5.4 The Database .....66 一 yī :...

96
Zhongwen Youxi He A dissertation submitted to The University of Manchester for the degree of Master of Science in the Faculty of Engineering and Physical Sciences 2010 By Melville McDonald School of Computer Science

Transcript of Zhongwen Youxi He - University of Manchester · 2010. 12. 6. · 5.4 The Database .....66 一 yī :...

  • Zhongwen Youxi He

    A dissertation submitted to The University of Manchester for the degree

    of Master of Science in the Faculty of Engineering and Physical Sciences

    2010

    By

    Melville McDonald

    School of Computer Science

  • 2

    Table of Contents

    Table of Contents ................................................................................. 2

    Table of Figures .................................................................................... 5

    List of Abbreviations ............................................................................. 7

    Abstract ............................................................................................... 8

    Declaration ........................................................................................... 9

    Copyright ........................................................................................... 10

    Acknowledgements ............................................................................ 11

    Chapter 1: Introduction ...................................................................... 12

    1.2 Dissertation Overview ................................................................ 14

    Chapter 2: Background ....................................................................... 15

    2.1 The Chinese System ................................................................... 15

    2.1.1 Overview ............................................................................. 15

    2.1.2 Strokes ................................................................................ 16

    2.1.3 Radicals .............................................................................. 19

    2.1.4 Components ........................................................................ 22

    2.1.5 Conclusion .......................................................................... 23

    2.2 The Computerised System ......................................................... 23

    2.2.1 Database Technology .......................................................... 24

    2.2.2 The Editor ........................................................................... 26

    2.2.3 Supplementary tools ............................................................ 27

    2.2.4 Reflection ............................................................................ 29

    Chapter Summary ............................................................................ 30

    Chapter 3: Research Methods and Design Considerations .................... 31

    3.1 Project Overview ........................................................................ 31

    3.2 Project Objectives ...................................................................... 31

    3.3 Project Plan ............................................................................... 32

    Chapter Summary ............................................................................ 35

    Chapter 4: Design ............................................................................... 36

    4.1 Database ................................................................................... 36

  • 3

    4.1.1 Division of Work .................................................................. 36

    4.1.2 Entity Relationship Model .................................................... 37

    4.2 The Editor ................................................................................. 40

    4.2.1 The Model ........................................................................... 41

    4.2.2 The Controller ..................................................................... 43

    4.2.3 The View ............................................................................. 46

    4.2.4 The AboutEditor and TextParser .......................................... 48

    4.2.4 The SVG Panel ..................................................................... 53

    Chapter Summary ............................................................................ 53

    Chapter 5: Implementation and Testing .............................................. 54

    5.1 The Tools .................................................................................. 54

    5.2 The Editor ................................................................................. 54

    5.2.1 Radicals Table Test .............................................................. 55

    5.2.2 Repository Manager Test ..................................................... 56

    5.2.3 View Controller Test ............................................................ 57

    5.3 The Visual Elements................................................................... 58

    5.3.1 Adding a New Record .......................................................... 60

    5.3.2 Updating a record ................................................................ 62

    5.3.3 Removing a Record .............................................................. 63

    5.3.4 The SVG Panel ..................................................................... 63

    5.3.5 The AboutEditor and TextParser .......................................... 64

    5.3.6 Discovered Issues ................................................................ 66

    5.4 The Database ............................................................................ 66

    一 yī : One .................................................................................... 67

    丿 piě: Hook or left- falling stroke ................................................. 67

    八 bā: Eight .................................................................................. 68

    冫 bīng: Ice ................................................................................... 69

    卜 bǔ: To Divine ............................................................................. 70

    勹 bāo: Wrap ................................................................................ 70

    几 jī: Table .................................................................................... 71

    门 mén: Door ................................................................................ 72

  • 4

    人 rén: Human being .................................................................... 73

    工 gōng: Work .............................................................................. 73

    Chapter Summary ............................................................................ 74

    Chapter 6: Evaluation.......................................................................... 75

    6.1The Editor .................................................................................. 75

    6.2The Database ............................................................................. 77

    Chapter Summary ............................................................................ 78

    Chapter 7: Conclusions and Future Work ............................................. 79

    Chapter Summary ............................................................................ 80

    References ......................................................................................... 81

    Appendix ........................................................................................... 84

    TestBase.......................................................................................... 84

    RadicalTableTest ............................................................................. 85

    RepositoryManagerTest ................................................................... 87

    ViewControllerTest .......................................................................... 91

    Word count: 13,377

  • 5

    Table of Figures

    Figure 1: English Alphabet and Chinese character nǐ (you) ................... 13

    Figure 2: Stroke order [3] .................................................................... 17

    Figure 3: Top to bottom ...................................................................... 17

    Figure 4: Left to right .......................................................................... 18

    Figure 5: Horizontal strokes before vertical strokes ............................. 18

    Figure 6: Centre strokes before left and right strokes .......................... 18

    Figure 7: Outside then inside .............................................................. 19

    Figure 8: Fill up before closing ............................................................ 19

    Figure 9: The Eight Trigrams [5] .......................................................... 21

    Figure 10: Example of component breakdown [3] ................................ 23

    Figure 11: Example of an entity relationship for Chinese characters and

    components (aggregation) .................................................................. 34

    Figure 12: Database entity relationship diagram .................................. 37

    Figure 13: Example of the TICCC ........................................................ 38

    Figure 14: Example of the TICCC and database fields .......................... 39

    Figure 15: Initial Model View Controller design pattern ........................ 40

    Figure 16: RadicalTable class .............................................................. 42

    Figure 17: RepositoryManager class .................................................... 44

    Figure 18: ViewController class ........................................................... 45

    Figure 19: Editor Framework ............................................................... 46

    Figure 20: MainView and TabView class .............................................. 46

    Figure 21:RadicalView input form........................................................ 47

    Figure 22: RadicalViewer class diagram ............................................... 48

    Figure 23: The AboutEditor ................................................................. 49

    Figure 24: AboutEditor class diagram .................................................. 49

    Figure 25: Basic flow of control for AboutEditor .................................. 50

    Figure 26: The TextParser class diagram ............................................. 52

    Figure 27: Radical Table Test passed .................................................. 56

    file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607098file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607108file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607108file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607112file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607113file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607114file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607115file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607116file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607117file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607118file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607119file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607120file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607121file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607122file:///C:/Users/Mel/Documents/work/uni/msc/project/final/Zhongwen%20Youxi%20He%20Individual%20tocheck.doc%23_Toc271607123

  • 6

    Figure 28: Repository Manager Test passed ........................................ 57

    Figure 29: View Controller Test passed ............................................... 58

    Figure 30: Main View .......................................................................... 59

    Figure 31: RadicalViewer input screen ................................................. 60

    Figure 32: Adding a new record .......................................................... 60

    Figure 33: Save confirmation .............................................................. 61

    Figure 34: Save confirmation code ...................................................... 61

    Figure 35: New record added .............................................................. 62

    Figure 36: No radical to update ........................................................... 62

    Figure 37: Remove from database confirmation .................................. 63

    Figure 38: Setup of the SVG Canvas .................................................... 63

    Figure 39: The SVG Panel and SVG Field .............................................. 64

    Figure 40: AboutEditor rawEditor pane input ....................................... 65

    Figure 41: The parsed text .................................................................. 65

  • 7

    List of Abbreviations

    TICCC – Table of Indexing Chinese Character Components [12]

    DBMS – Database Management System

    JDBC - Java Database Connectivity

    JPA – Java Persistence Architecture

    SVG – Scalable Vector Graphics

    XML – Extensible Mark-up Language

    HTML – Hyper Text Mark-up Language

    SQL – Structured Query Language

    GB - Gigabyte

    TB - Terabyte

    API – Application Programming Interface

    SAX - Simple API for XML

    VB.NET – Visual basic.NET

    URL – Uniform Resource Locator

    MVC – Model - View – Controller Architecture

  • 8

    Abstract

    As an initial stage of the Zhongwen Youxi He Project this section aims to

    look into the foundations of building a tool to allow a native English

    speaker to learn about Chinese characters. The tool will be composed

    of a database store of the characters, and their main components,

    phonetics and radicals much like the official Chinese indexing and

    classification system found in literature and the Chinese Dictionary and

    the “Table for Indexing Chinese Character Component”(TICCC). The

    project will also investigate the architecture of storage and indexing of

    the radicals and ways to infer semantics links between their radicals and

    the characters and components.

  • 9

    Declaration

    No portion of the work referred to in the dissertation has been

    submitted in support of an application for another degree or

    qualification of this or any other university or other institute of

    learning;

  • 10

    Copyright

    i. Copyright in text of this dissertation rests with the Author. Copies (by

    any process) either in full, or of extracts, may be made only in

    accordance with instructions given by the Author. Details may be

    obtained from the appropriate Graduate Office. This page must form

    part of any such copies made. Further copies (by any process) of copies

    made in accordance with such instructions may not be made without the

    permission (in writing) of the Author.

    ii. The ownership of any intellectual property rights which may be

    described in this thesis is vested in the University of Manchester, subject

    to any prior agreement to the contrary, and may not be made available

    for use by third parties without the written permission of the University,

    which will prescribe the terms and conditions of any such agreement.

    iii. Further information on the conditions under which disclosures and

    exploitation may take place is available from the Head of the School of

    Computer Science.

  • 11

    Acknowledgements

    I would like to thank my supervisor Dr Richard Banach who

    masterminded this project. His guidance helped me to understand the

    scope of this project and its potential benefits, but also helped me to

    overcome a number of difficulties throughout the course of the project.

    I would also like to thank my family; my mother, father and brothers

    who have supported and encouraged me in my pursuit of further

    education. None of this would have been possible without their help.

  • 12

    Chapter 1: Introduction

    When a person attempts to learn another language they must overcome

    the challenge of learning new words, grammar systems, and completely

    different alphabets. The Chinese and English writing systems differ

    drastically, from the direction of reading to the composition of words. It

    is a substantial task for an English speaker to learn this new system as it

    originated from a completely different cultural perspective. It is this

    perspective, the native language and educational system which

    influences a person‟s learning processes.

    The English and Chinese writing systems are fundamentally different,

    with English employing an alphabet, and the Chinese using a

    logographic system. An alphabetic system has letters which constitute

    phonemes or sounds. These letters usually have no meaning

    individually. However in the case of English, an alphabet of 26

    meaningless monosyllabic letters can be combined in various legal

    permutations to create multisyllabic phonetic and semantics

    (words/sounds). Words from this system have the property of a user

    being able to decompose them to identify the letters or even spell them

    according to how they sound. The Chinese system is composed of core

    components which can create monosyllabic words. These words can

    also be combined to form new words though they require some

    knowledge or shared basic concepts in order to be understood. In

    Chinese there are some characters which are pictograms and can be

    identified by their similarity to real world objects, but others are not.

  • 13

    Figure 1 shows the entire English alphabet of 26 letters. The letters can

    be memorised so that the speaker can combine them to form the word

    “you”, whereas the Chinese character nǐ is a logogram which requires the

    user to know the character to recognise its meaning.

    The Chinese language also has the concept of tones, which in English

    are used to convey an emotional context such as inflection for

    questions. For example “You have one” with “You have one?” The first

    is a statement, possibly a reply usually given with a flat tone. The

    intonation of the second however suggests it is a question with the pitch

    rising near the end. (Incidentally this could be an example of a very

    short conversation). With Chinese the word‟s meaning is dependent on

    the phonemes which give the words its sound or pronunciation. But also

    the tone associated with it. As a result there are formally documented

    tones and tonal signs in the written language.

    These differences also extend to the indexing systems of the two

    languages, with Chinese being indexed by the radical system in many

    dictionaries, and English being indexed alphabetically by starting letters.

    The scarcity of reliable information makes it more difficult for a native

    Figure 1: English Alphabet and Chinese character nǐ (you)

  • 14

    English person to learn about the Chinese system. Recently however the

    Chinese Government Published an official Table of Indexing Chinese

    Character Components, (TICCC). The main aim for this stage of the

    project is to create the foundations of an indexing system for the

    Chinese characters which can be further advanced to create a tool to aid

    an English speaker to learn about the Chinese characters. This system

    will be built using the officially published character data. In addition to

    this the project aims to investigate ways to allow links to be inferred

    semantically between the components and the radicals.

    1.2 Dissertation Overview

    The dissertation structure will be roughly analogous to the chronological

    development of the project:

    Chapter 2 will introduce the background to the project. It will

    discuss the Chinese writing system as well as the history and

    development the Chinese character radicals with some of the

    research surrounding technologies which may be suitable to the

    current project.

    Chapter 3 will discuss other project considerations with regard to

    the main project objectives, including my contribution and the

    basic project plan.

    Chapter 4 will describe my contribution to the design process for

    the database and the editor.

    Chapter 5 outlines the implementation and testing of the project.

    Chapter 6 evaluates the success of the project with suggestion of

    possible improvements.

    Chapter 7 concludes the project and suggests possibilities for the

    future of the Zhongwen Youxi He project.

  • 15

    Chapter 2: Background

    This chapter will explore the fundamentals of the Chinese writing

    system as well as the history and development of the Chinese character

    radical indexing system. This chapter will also outline some of the

    research conducted with regard to tools which could be used for the

    development of this project.

    2.1 The Chinese System

    2.1.1 Overview

    The modern Chinese writing system uses Han Characters (Hànzì). This

    logographic system contains over 50,000 characters. The characters

    have evolved from pictograms and hieroglyphs, to the more abstract

    ideographs and the characters known today. At the same time

    phonetics were beginning to be included in the character structure [3].

    The sources of the modern characters however are a mixture of new and

    old characters composed mostly of pictograms, ideograms, phonetic

    loans and some phonetic-semantic components, there are also with

    some different characters which have their meaning due to regional

    differences. They can be divided into two very broad categories; simple

    and compound. The simple characters account for about 4% of the

    characters, and contrary to the compound characters are not divisible

    [3].

    In 1956 the government of The People Republic of China introduced the

    first draft of simplified characters. This process was introduced to

    simplify the Traditional Chinese characters to in an effort to increase

    literacy nationally. According to the government the number of

    characters a literate person should know is 2000. Reproducing the

    characters is a difficult task however as even the simplified characters

  • 16

    contain diverse shapes. There is a claim that at least five years of

    formal schooling would be needed to achieve literacy [3], though in

    reality at least 6 years are needed. Most native English speakers would

    quite possibly not want to have formal schooling or to have to wait this

    long before they felt confident enough to with the language to begin to

    use it.

    2.1.2 Strokes

    The main components of the hànzì characters are the strokes, radicals,

    and components. The smallest structural unit is the stroke (similarly to

    the English alphabetic system), which represents the action of the brush

    or pen on the page. This system of strokes however is more formalised

    and a specific technique which are recurrently used for the creation of

    all Chinese characters [3] These strokes can be divided into eight main

    categories: horizontal (一), vertical, (丨) left-downward, (丿) right-

    downward, dot (、), hooks (亅), turning (乛, 乚, 乙) and rising (丶) though

    the number of supplementary strokes is 30 [3] but these are only

    variants.

  • 17

    Figure 2: Stroke order [3]

    There is an order to the strokes for writing Chinese characters [2]:

    Figure 3: Top to bottom

  • 18

    Figure 4: Left to right

    Figure 5: Horizontal strokes before vertical strokes

    Figure 6: Centre strokes before left and right strokes

  • 19

    Figure 7: Outside then inside

    Figure 8: Fill up before closing

    It is however the radicals and components that form the logical units of

    the hànzì system.

    2.1.3 Radicals

    Radicals are the smallest meaningful unit in the hànzì writing system.

    They are used both as independent simple characters and as part of

    more complex characters. In modern Chinese dictionaries radicals are

    used as section headers (bùshǒu) with characters indexed according to

    the radical they most closely match or contain. One of the biggest

    issues regarding radicals is that there is no formal and exact way to

    describe exactly what they are since there are so many ways to use

    them. As a result there is some disagreement as to their exact role and

    how they are to be used.

  • 20

    Many Chinese Dictionaries use radicals as section headers, this system

    is said to have been introduced with the Shuōwén Jiézì by Xǔ Shèn at

    around the 2nd

    Century A.D during the Han Dynasty. Xǔ made

    distinctions between two types of characters wén 文 (pictograms) and zì

    字 (characters). The dictionary contained 9352 characters “as distinct

    entries and 1163 in variant form” [4]. They were organised based on

    their visual components into headings numbering 540 [4]. This method

    though at the time radical in thought has since become the most widely

    used system of organisation as it made the process of locating

    characters more methodical and convenient [4].

    Prior to this dictionaries were organised differently, the earliest known

    major character dictionary is the Erya which is argued to have been

    created between the 8th

    and 2nd

    Century BC [4]. It was not created as a

    dictionary, it was more of the an encyclopaedia and literary reference,

    however “It was the first work to collect arrange and define words in a

    systematic fashion” [4], spanning 20 chapters with over 2000 entries

    organised into categories such as common terms and kinship terms [4].

    This collection however was seen to be too difficult to read, and scholars

    felt that the book was less of a reference for consultation and more of a

    study text. It was this that led to the creation of the Shuōwén Jiézì.

    The idea of organising dictionaries however maybe argued to have

    originated much earlier than this even. Chinese lore claims that Chinese

    characters were created by a Great Emperor who came across the idea

    from observing nature, and how each object seem to fit into a category.

    He noted that marks left by the animals could be used as a tool of

    lasting communication e.g. claw prints and decided to create the

    characters to reflect this. The characters were then placed into eight

    categories called the Eight Trigrams. Though these categories were

  • 21

    broad it can be seen that there was some practicality as any text

    indexed by this system would have had some search ability.

    Figure 9: The Eight Trigrams [5]

    The most prominent use of the radical system however was the Kāngxī

    Dictionary (Kāngxī zìdiǎn) 18th Century A.D. Containing nearly 50,000

    characters, which was considerably more than that of the Shuōwén Jiézì,

    the dictionary was able to reduce the number of radicals to 214. The

    dictionary has characters indexed by radicals as well as by the number

    of strokes [6] and contains information about variant forms and

    pronunciation, and though there have been some obvious changes over

    the centuries, these 214 radical characters are still the basis for all

    modern radical dictionaries.

    Today there are numerous radical indexing systems in use with different

    numbers of radicals, sometimes with secondary radicals indexed with

    stroke counts. With more than 80,000 characters in the Chinese

    language there are many variations [6] such as the TICCC which lists

    201 main radicals.

    There is some debate as to the extent of the connection a radical has to

    the characters under it, and whether there is some semantic relationship

  • 22

    with all of the characters under a certain radical. There an argument that

    the term radical implies semantic links since its Latin definition is “root”

    [7], and Latin derived languages such as English can be broken up into

    “root and termination” [1], and although this does not translate exactly

    in to the Chinese system the radical should still be considered “the

    meaning part” [1].

    “采 cǎi „to pick, pluck‟ is an associative compound] comprising two

    elements or components, a hand 爫 (zhǎo or zhuǎ) picking items from a

    tree 木 (mù); that is, it is originally a two-part graph” [7].

    However the phonetic elements of words need to be taken into

    consideration and there is disagreement over how to categorise radicals

    and phonetics with semantic-phonetic compounds which have become

    increasingly used.

    2.1.4 Components

    Components seem to have arisen out of a need to “reconstruct

    characters into more manoeuvrable units” [3] for the modern age of

    computing. The characters are divided into logical units based on the

    shape and makeup of the characters. These components are based

    purely on the graphic qualities as in a computer system the semantics or

    the stroke composition would be of little consequence. The Information

    Processing Standard Components for GB 13000.1 Character Set has 560

    basic components [3] and The Specification of Common Modern Chinese

    Character Components and Component Names list 514. There is some

    issue with regard to the representations of the characters and whether

    the necessary characters have been included, however this standard will

    probably be reviewed or replaced as the character sets change and the

    technology for representing them improves.

  • 23

    Figure 10: Example of component breakdown [3]

    2.1.5 Conclusion

    The system should utilise the most logical system based on the available

    documentation and index the radicals in this manner. The most obvious

    solution would be to follow the TICCC and the most recent published

    character and component data as this is the most accurate and up to

    date and the current organisation index will reflect this.

    2.2 The Computerised System

    The system will comprise two broad parts: A database and an Editor.

    The database will be created to store and index characters with, radical,

    phonetic, character and component data as well as semantics, and the

    editor will be the method for input and retrieval from the database.

    Software tools that will allow this will be investigated so that a decision

    can be made about the most suitable.

  • 24

    2.2.1 Database Technology

    The database will be the main repository for storage and retrieval of

    information for the entire project. It is the most fundamental and one of

    the most critical areas of the system, and as a result the platform used

    needs to be optimal for the current and future advances and the system

    architecture will need to be robust. The expected platforms are Linux

    and Windows and there are a number of technologies which work on

    either. Databases use query languages to allow manipulation of their

    data, the current standard for databases is SQL, and there are a number

    of implementations in different database systems which have different

    features. This is especially true in the case of the free versions which

    often limit their functionality or speed in some way. Most of the

    implementations here conform to the SQL-92 standard [8].

    DB2: This is an IBM created Relational Database Management system

    which runs on Linux and Windows. The free version IBM DB2 Express-C1

    can be installed for development of database systems for a small

    number of users is limited to cores and 2GB memory. It has both a

    command line and GUI interface.

    Informix: Another IBM owned product, this is similar to DB2 in many

    ways except that it is only available for 32bit operating systems, and is

    limited to 4GB memory.

    H2: This is an open source RDBMS which is written in Java. Using JaQu2

    which is a Java Query language it is able to be integrated directly into

    Java Applications and boasts an impressive number of features

    1

    About DB2 Express-C; IBM; http://www-

    01.ibm.com/software/data/db2/express/about.html ; Last Accessed 10/05/10

    2

    H2 Database Engine; H2; http://www.h2database.com/html/jaqu.html ; Last Accessed

    10/05/10

    http://www-01.ibm.com/software/data/db2/express/about.htmlhttp://www-01.ibm.com/software/data/db2/express/about.htmlhttp://www.h2database.com/html/jaqu.html

  • 25

    compared to some other database platforms including In-Memory

    Databases which allows non persisting data to be manipulated. This is

    useful for testing and prototyping, and may be an interesting feature to

    have for this new system. The platform can run in embedded mode (run

    from within the same JVM), server mode (as a server database to client

    application) and mixed mode (embedded in application server) which

    gives it some flexibility. There also seem to be a number of tutorials for

    getting started.

    MySQL: “The most popular open source database software”3

    is available

    for use on both Windows and Linux. Although it does not come with an

    integrated GUI third party products and the MySQL Workbench can be

    downloaded to provide this functionality. Most programming languages

    can interface with it via the Open Database Connectivity API (or the JDBC

    for Java). One of the most interesting features of this system apart from

    its wide acceptance is that it allows multiple storage engines [11]

    allowing different engine technologies to be used to implement

    individual tables within a database .e.g. H2 for and Employees Table and

    DB2 for a Payroll Table.

    Microsoft SQL Server Express Edition: The Microsoft implementation

    of SQL4

    limits the size of the databases to 4GB and the hardware to a

    single CPU with 1GB RAM. It provides native support for XML data and

    can manipulate it using XQuery. It provides and easily accessible

    backend for applications written in the MS.NET Framework.

    3

    About MySQL; Oracle; http://www.mysql.com/about/; Last Accessed 10/05/10.

    4

    Microsoft SQL Server 2008 Express; Microsoft Corporation;

    http://www.microsoft.com/sqlserver/2008/en/us/express.aspx ; Last Accessed

    10/05/10.

    http://www.mysql.com/about/http://www.microsoft.com/sqlserver/2008/en/us/express.aspx

  • 26

    PostgreSQL: Is a PostgreSQL licensed product which in effect makes it

    free to use and distribute. It is widely used and available on a range of

    platforms including Windows and Linux, and can be used for enterprise

    sized systems5

    with an operational limit in excess of 4TB of data.

    Oracle: The free version of Oracle‟s database limits the size of the

    database to 4GB with a single processor with 1GB RAM. It is compatible

    with both Windows and supports a number of programming languages

    however only in a 32bit environment.

    2.2.2 The Editor

    Databases as above can be manipulated through command line and

    graphical interfaces. For the purposes of this system the tools will need

    to store data on Chinese characters, components, phonetics and

    radicals. The integrated offerings from the DBMS developers in many

    cases may not be flexible enough to handle this character data fully. As

    a result an editor will be created which can allow creation, manipulation

    and viewing of the character data from within the database. This tool

    can be written in a number of languages such as:

    Java: This would be compatible with many through the JDBC and can be

    fully integrated with the H2 implementation. As an object oriented

    language with a wide acceptance it is well matured has many useful

    features in its API such as the DOM3 and SAX database views. It is also

    widely available in the university, and is compatible with a number of

    environments such as Windows and Linux meaning the implementations

    of any application should have to be changed little if at all between

    platforms.

    5

    PostgreSQL; PostgreSQL Global Development Group

    http://www.postgresql.org/about/ ; Last Accessed 10/05/10.

    http://www.postgresql.org/about/

  • 27

    Microsoft.NET Framework: This Microsoft family of languages including

    VB.NET and C# can interface with databases and web applications

    through a framework of base libraries. The framework is in theory cross

    platform compatible although this is not quite as simple as with a Java

    implementation. An editor application created with this framework

    would however be very easy to integrate with MS SQL Server database

    implementations.

    Web Ontology Language6

    : Otherwise known as OWL is a family of

    languages and tools which allow data to be serialised and for semantic

    conclusions to be drawn. The technology uses the idea of axioms and

    assertions, in which axioms (rules) categorise the data into related

    groups based on these assertions. Web based technologies and

    standards such as XML/RDF are used to encode meaning in data which

    can later be inferred.

    2.2.3 Supplementary tools

    In some cases the editor language may need an interface to access the

    database. There are a number of methods to enable this connectivity.

    The better known of these include

    JDBC7

    : The JDBC provides connectivity for java applications to different

    types of databases including relational databases, allowing for SQL

    based data access. It is widely used and provides database

    implementation independence and flexibility.

    6

    OWL Web Ontology Language Guide; World Wide Web Consortium;

    http://www.w3.org/TR/owl-guide/ Last Accessed 10/05/10.

    7

    Java SE Technologies – Database; Oracle Corporation;

    http://www.oracle.com/technetwork/java/javase/tech/index-jsp-136101.html

    http://www.w3.org/TR/owl-guide/

  • 28

    JPA8

    : As part of the Object-Relational Mapping (ORM) framework, the JPA

    provides and interface for an ORM implementation to interact with a

    database. The specification allows for the data source properties of a

    database to be abstracted into a configuration file, which when packaged as

    part of an application makes referencing and configuration a more

    centralised process.

    The JPA specification maps database tables to java classes called entities

    which can then be manipulated as java objects. This allows a java

    application to interact with these entities to update the database. This

    specification requires an implementation to manage the interface

    between the java application, entities and the database. Alternative

    implmentations include Hibernate9

    , iBatis10

    and EclipseLink11

    former are

    compatible with both Java and MS.NET Framework languages with

    Hibernate being written in the Java Virtual Machine environment for

    platform independent. EclipseLink is a modified version of Oracle‟s

    TopLink JPA implementation made for java.

    Graphics: A graphical element in the editor would allow characters to be

    input, and presented. Scalable Vector Graphics, an XML based drawing

    platform12

    allows shapes to be described in XML and drawn with an API

    for scripting languages such as ECMA script. The standard has three

    8

    The Java Persistence API - A Simpler Programming Model for Entity Persistence; Oracle

    Corporation; http://www.oracle.com/technetwork/articles/javaee/jpa-137156.html

    9

    Hibernate; JBoss Community; http://www.hibernate.org/ Last Accessed 11/05/10.

    10

    iBatis Homepage; Apache foundation; http://ibatis.apache.org/ ; Last Accessed

    11/05/10.

    11

    Introducing EclipseLink; DZone, Inc; http://eclipse.dzone.com/articles/introducing-

    eclipselink

    12

    About SVG; SVG Working Group; http://www.w3.org/Graphics/SVG/About; Last

    Accessed 11/05/10.

    http://ibatis.apache.org/http://www.w3.org/Graphics/SVG/About

  • 29

    types of graphic objects; vector graphics, rastar graphics and text. It is

    a scalable standard which can be used in both mobile and desktop

    systems, and supported by most web browsers. An alternative to this

    includes Postscript which can be used to describe shapes in a similar

    way, though SVG however has much wider support for fonts using

    Unicode character encoding. SVG also supports multi directional text

    allowing characters to flow from right to left and from top to bottom

    which makes it suitable for the current project. The files created can

    also be compressed if necessary which maybe necessary for database

    space efficiency. An example of this is the Batik13

    library which can be

    utilised within a java application to enable the rendering of SVG

    documents to a swing derived SVG Canvas.

    2.2.4 Reflection

    There were alternative programming languages which could have been

    used to create the editor; however those listed seemed to be the most

    useful for this particular project. The Java platform is familiar and cross

    platform compatible, and although the Microsoft.NET Framework seems

    less so, both languages are object oriented and have developed to allow

    interaction with databases, and manage the application memory

    themselves which is a positive for simplicity of application development.

    OWL is a useful technology for the semantic requirements of the project

    and it may be possible to use this tool in future iterations of the project

    to enhance the inference of semantics.

    The JPA specification seemed to offer a simple and robust solution to

    accessing a database from a java application. In conjunction with an

    13

    Batik SVG Toolkit, 2010 The Apache Software Foundation;

    http://xmlgraphics.apache.org/batik/; Last Accessed 11/05/10.

    http://xmlgraphics.apache.org/batik/

  • 30

    implementation such as EclipseLink it seemed the most suitable tool to

    use. In conjunction with a MySQL DBMS which is a well established and

    open source implementation.

    Chapter Summary

    This chapter looked at some of the fundamental concepts of the Chinese

    writing system. The history of Chinese indexing was also explored with

    the development of the Chinese character radicals. Some of the tools

    that would aid the progress of the project were also outlined with the

    most feasible being selected.

  • 31

    Chapter 3: Research Methods and Design

    Considerations

    This chapter will examine some of the issues that required consideration

    in order to ensure effective project development. The project objectives

    will be explained and a basic project plan outlined with reference to the

    relevant sections in the dissertation. The development process will also

    be explained with a description of the process used to evaluate the

    project.

    3.1 Project Overview

    The long term aim of the Zhongwen Youxi He project is that of a fully

    implemented interactive learning tool for a native English speaker. The

    scope the current project however has been narrowed to creating the

    foundations of the system to allow the indexing of the main articles of

    the Chinese written language. Conceptually this can be thought of as

    the system back-end, a purely logical and functional foundation with

    little consideration for the higher level or front end systems.

    3.2 Project Objectives

    The project was divided as fairly as possible to allow individual members

    to make a significant contribution. Both members were assigned an area

    of priority research. The actual division of work however was not a

    simple case of each member taking half the responsibility as some areas

    of the system were shared. Collaboration and group management was

    necessary to avoid the progress of one member being too dependant on

    that of the other‟s. The areas of priority were assigned as follows:

    Radicals (Melville)

  • 32

    Syllables (Melville)

    Phonetics (Xing)

    This division provided the basis for the division of work throughout the

    project. This report concentrates on the Chinese character radicals and

    the development of the project in relation to this. This individual

    section of the project aimed record the information about the radicals

    and to obtain some of the semantic themes from the characters indexed

    by these radicals and thus will require a database to:

    Index radicals.

    Describe some basic uses and common semantic character

    themes.

    The project also required an editor to allow data to be input and

    retrieved from the database. This editor application would therefore

    need to communicate with the underlying database, to query and update

    the database as necessary.

    3.3 Project Plan

    The project followed a basic development plan. To allow for variation in

    methods of research and execution of individual group member‟s work,

    regular meetings helped to ensure the coordination and a common

    direction. These main project stages are milestones from which set a

    foundation for the next project stage.

    Database technology decision: MySQL was decided the most suitable

    technology for this project as described in the ground section. This was

    agreed by both team members.

  • 33

    Editor development platform: Java was chosen as the programming

    language due its availability on the University machines and the fact that

    both members of the group had developed applications with it

    previously. The editor application would need to utilise database

    connectivity provided by the Java API. The JPA was chosen for the

    reasons mentioned in the Chapter 2.

    Database design: This project involves three main areas of

    development; the database, the editor and the translation link between

    them. In order to ensure robust system design as well as flexible and

    efficient development time, the design and development process would

    have to be iterative. The database had to be designed and implemented

    before the editor as this could be argued the most important feature for

    the current and future Zhongwen Youxi He developments.

    Typically database design has three main stages [9] consisting of system

    analysis requirements analysis followed by conceptual, logical and

    physical modelling. As this was a new system there was no current

    architecture to analyse, so the requirements were obtained and

    engineered to ensure that there was an understanding of proposed

    system operation. The process included modelling use cases and

    brainstorming.

    Conceptual Modelling: This stage modelled the conceptual schema

    based on the system analysis. Entity Relationship modelling was be

    used to identify the main system objects and how they should be related

    to each other such as the relationship between the characters and the

    components or the relationship between characters and radicals. Each

    individual member paid particular attention to the design of their

    particular area of responsibility.

  • 34

    Logical Modelling: This stage of the process attempted to model the

    conceptual model in terms of the database technology, mapping them

    into a logical schema, such as the tables which stored the radicals and

    their relationships to each other. Normalisation could then be applied.

    Physical Modelling: Though this stage “involves the selection of

    indexes (access methods), partitioning and clustering of data” [9]. This

    was outside the scope of the current project, however.

    Editor design: On completion of the database design the initial editor

    design could also begin.

    Implementation: Once the database architecture had been designed it

    will be created. The editor could then also be implemented. In the

    mean time the radicals table of the database could be populated with

    data which would form the basis of the Zhongwen Youxi He project. This

    is further discussed in Chapter 5.

    Testing: The editor would be tested modularly as functionality is added

    to ensure that any previous functionality was unaffected each method

    would be treated as a functional sub unit. This method borrowed from

    the test driven development methodology [10]. These tests would

    include simple functionality such as branching and looping, but also

    whether the functions operate correctly with the translation tools, i.e.

    the database data is correctly affected. This would then be integrated

    N 1

    Composed of

    Character Components

    Figure 11: Example of an entity relationship for Chinese characters

    and components (aggregation)

  • 35

    into the overall system in accordance with group management policy

    this is discussed mainly in the group report..

    The application could then be tested as a whole. The system could be

    populated with test data in usability tests:

    entry of radicals

    retrieval of radicals

    input of details

    retrieval of details

    deleting data

    editing of data

    Evaluation of System: Usability tests to determine the extent to which

    the system satisfied the project objectives, identifying areas of the

    system requiring improvement. The design processes, research

    methods, the database and some of the conclusions drawn from the

    research, would be evaluated

    Chapter Summary

    This chapter outlined some considerations that affected the course of

    project development. The scope and the objectives of the projects were

    firmly established. The research and development processes were

    described to and a project plan with regard to the Chinese character

    radicals was also explained.

  • 36

    Chapter 4: Design

    The project was divided into the database and editor layers. The

    individual projects were sub objectives of these layers, with both

    members of the group attempting to share responsibility and contribute

    equally. Due to overlapping areas of implementation however, this was

    difficult but attempts were continuously made to ensure effective

    distribution of workload. This section attempts to describe my

    contribution to the design of various parts of the project.

    4.1 Database

    4.1.1 Division of Work

    As mentioned in the group report, much of the design of the database

    was reviewed by the project supervisor to ensure that as the basis of

    future Zhongwen Youxi He projects the foundations were solid. Once the

    basic architecture had been established, alterations were made to the

    areas of the database which concerned the Chinese character radicals,

    as this was my area of responsibility. My changes were then combined

    with that of the group.

  • 37

    4.1.2 Entity Relationship Model

    Figure 12: Database entity relationship diagram

    The database architecture was as shown in figure 12. The

    RADICAL_TABLE was designed to store data about the Chinese character

    radicals as indexed by the TICCC [13].

  • 38

    Figure 13: Example of the TICCC

    Though initially confusing, many of the fields in the RADICAL_TABLE

    map directly from the information in the TICCC. Figure 13 shows an

    example of a small part of the TICCC, if we give a working example:

    The radical 匚 has number 8 in the list, and there are two radicals

    without numbers underneath it. These are radicals have the same

    number of strokes which would usually be indexed here, but they have

    another form, a main form (written style) under which they are indexed,

    this number is denoted by the square brackets [] and corresponds to the

    RadicalLeadSequence_Id. In this example [9] corresponds to 卜 the

    radical at number 9. At this index it is shown in round brackets (). This

    corresponds to the RadicalLeadSequence_Id which would be 1 as it is the

    first entry in the brackets, and the RadicalLastSubSequence_Id

    corresponds to the total number of radicals in round brackets, 1 since

    there is only one radical in round brackets. The RadicalTICCCSubsid

    corresponds to the sub indexed radical (those without a number), and

  • 39

    their position in the sub index. In this example [9] occurs at position 1

    and [22] at position 2.

    Figure 14: Example of the TICCC and database fields

    However for the purposes of the database, the Radical_Id will

    correspond to the strict occurrence of the radical in the TICCC, whether

    sub Index or not. As such in the above example radical [9] would be

    given radical_Id 9 and [22] would be given radical_id 10 (as they occur

    strictly after radical 8), therefore the radical underneath these with the

    number 9 would have radical_id 11. The RadicalTICCCSubsid_Id is

    adjusted accordingly so radical [9] would become RadicalTICCCSubsid_Id

    = 11.

    Radical_id = 8

    RadicalLeadSequence_Id = 0

    RadicalTICCCSubsid_Id = 0

    RadicalSubsidSequence_Id = 0

    RadicalLastSubSequence_Id = 0

    Radical_id = 9

    RadicalLeadSequence_Id = 11

    RadicalTICCCSubsid_Id = 1

    RadicalSubsidSequence_Id = 1

    RadicalLastSubSequence_Id = 1

  • 40

    Radical_Name stores the pinyin name of the radical.

    Stroke_Number is the number of strokes as indexed in the TICCC.

    Unicode stores the Unicode representation where possible.

    If_Char whether the radical is also a character.

    About notes about the radical and some examples or characters

    indexed by it. This field represents one of the main features of this

    stage of the project, and will be described further in the implementation

    section.

    SVG is an SVG representation of the radical.

    4.2 The Editor

    With the database architecture finalised, work could begin on the editor.

    The Java platform and the JPA specification were to be utilised to

    interact with the database, and a model – view – controller software

    model was deemed the most appropriate development model.

    This model would allow a modular development process with database

    operations separated from the view via the controller. This would also

    allow development and testing to occur separately with the model being

    built and tested first, followed by the controller with the view placed on

    top receiving data from the model via the controller.

    Controller

    Data

    presentation

    User input Update

    Model

    Query Model

    View

    Model

    Figure 15: Initial Model View Controller design pattern

  • 41

    It was decided that to enable independent member development of the

    editor, that core shared elements be implemented initially. This

    framework would provide basic functionality for interaction between the

    database, controller and the view. Both members of the team could

    then develop further functionality in order to satisfy their own project

    objectives. I took responsibility for the design of this framework.

    4.2.1 The Model

    The JPA enables java applications to interact with relational database

    tables through entity objects. Each entity object corresponds to a row of

    the corresponding table and one entity object is required for any table

    accessed by the application. This resulted in an entity class for every

    table in the database. Entities require one empty constructor, but a

    constructor which took a radical_Id was added to allow new “empty”

    radicals to be added to the database with details to be filled in at a later

    time. Entities also require a field for each row in the table and get and

    set methods for each of these fields. In the case of the RADICAL_TABLE

    the entity was constructed as in figure 16.

  • 42

    RadicalsTable()

    RadicalsTable(Radical_id)

    setRadical_id

    getRadical_id int

    setRadicalLeadSequence_Id

    getRadicalLeadSequence_Id int

    setRadicalTICCCSubsid_Id

    getRadicalTICCCSubsid_Id int

    setRadicalSubsidSequence_Id

    getRadicalSubsidSequence_Id int

    setRadicalLastSubSequence_Id

    getRadicalLastSubSequence_Id int

    setRadical_Name

    getRadical_Name String

    setStroke_Number

    getStroke_Number int

    setUnicode.

    getUnicode. String

    setIf_Char

    getIf_Char boolean

    setAbout

    getAbout String

    setSVG

    getSVG String

    RadicalTable

    Radical_id int

    RadicalLeadSequence_Id int

    RadicalTICCCSubsid_Id int

    RadicalSubsidSequence_Id int

    RadicalLastSubSequence_Id int

    Radical_Name String

    Stroke_Number int

    Unicode. String

    If_Char boolean

    About String

    SVG String

    Figure 16: RadicalTable class

  • 43

    The JPA interacts with each of these entity classes via an Entity Manager

    which is created from an Entity Manager Factory method provided by the

    JPA implementation environment. The Entity Manager queries the

    database, creates and updates the entities as appropriate providing row

    level access to the database. One of the basic functions of the entity

    manager is querying the database for entities:

    Query q = new entitymanager.createQuery(“SELECT r from RadicalTable r”)

    This query retrieves all records from the radical table. From here a

    RadicalTable object can be obtained from the query q via

    q.getResultSet() which returns a list. This list can be thought of as a list

    of rows from the RADICAL_TABLE, with each row having a column

    mapped to each field attribute, e.g. the first RadicalsTable object in the

    list would likely have radical_id = 1. These objects could then be

    updated or deleted and new objects created and saved to the database.

    The framework would require the functionality of the Entity Manager.

    4.2.2 The Controller

    The Entity Manager functionality was encapsulated in a controller class,

    Repository Manager.

  • 44

    The remove and save methods are both used to alter records in the

    database. They both utilise either the update() or persist() methods to

    save the states of the entities to the database. The life cycle of an Entity

    Manager is usually the length of a transaction e.g. save an object to or

    retrieve and object from the database. During this life cycle any

    changes to the object can be made with the persist() method as the

    entity and entity manager are connected to the database. Once the

    entities have been retrieved by the application the entity manager is

    destroyed severing the connections of these entities to the database.

    These entities are known as detached, and the application can make

    changes to them, but in order for the changes to be saved a new entity

    manager needs to establish the connection with the database once again

    and merge the new entity object with the database version. The update

    method provided this function.

    In order to mediate between the Repository Manager and the view

    another controller was designed. The View Controller would initiate

    database access with the Repository Manager and process it correctly for

    the view.

    getRadicals() List

    getPhonetics() List

    getPoneticsbyId(int)

    getRadicalsbyID(int)

    removeRadical(RadicalTable)

    removePhonetic(PhoneticTable)

    saveRadical(RadicalTable)

    savePhonetic(PhoneticTable)

    persist(Object)

    update(Object)

    RepositoryManager

    entityManager EntityManager

    Figure 17: RepositoryManager class

  • 45

    The methods were intended to be as general as possible allowing for

    later additions to the Repository Manager to be utilised here. The

    getAll() switches between types of entities (given by currentEntity) and

    queries repositoryManager to get all records of that type (table) from the

    database, the getById does the same but calls the corresponding getById

    method. The save and remove methods correspond to those in the

    Repository Manager. The asTable() method takes the currentList

    variable and passes it in a suitable format for a JTable in the view.

    These two classes formed the controller in the model view controller

    architecture, allowing the view to interact with the model and thus the

    database. This was the basic editor framework.

    getAll() List

    getbyId (int) List

    removeObject(Object)

    saveObject(Object)

    updateObject(Object)

    updateView()

    asTable() Object[][]

    ViewController

    repositoryManager RepositoryManager

    currentList List

    currentEntity String

    Figure 18: ViewController class

  • 46

    4.2.3 The View

    The view is the user interface for the system. The user uses the Main

    View Panel to select the table in the database they wish to read or edit,

    and these user requests are sent to the controller. The Main View,

    which is discussed further in the group report, consists of a main panel

    with a JTabbedPane TabView as shown in the figure below.

    The TabView consists of a JTable which gives a grid view of the records

    from the table, as passed from the ViewController. It also contains a

    switchable array of JPanels; radicalPanel,

    Update

    Repository

    Manager

    Query and

    update Model

    View

    Repository

    Manager

    Model

    View

    Controller

    User input

    and view

    update

    Query

    Repository

    Manager

    Figure 19: Editor Framework

    Figure 20: MainView and TabView class

    search()

    update()

    MainView

    tabView TabView

    comboTables JComboBox

    textIDfield JTextField

    TabView(ViewController)

    updateViews()

    setPanel()

    TabView

    tableDBView JTable

    panelEditRecordView JPanel

    radicalPanel RadicalViewer

    phoneticPanel PhonteicViewer

    syllablePanel SyllableViewer

    phoneticSyllable PhonSyllViewer

    viewController View Controller

  • 47

    phoneticPanel, syllablePanel, phoneticSyllable using the CardLayout

    Layout Manager. I was responsible for the design of the RadicalPanel.

    Figure 21 shows the basic design for the Radical Viewer input form, this

    would be shown from the Main View user input screen, All fields are

    JTextFields with the exception of the JComboBox if_char which gives the

    choice of true or false, the SVG Panel which is an SVG Canvas and the

    About field which is a JTextArea which cannot be edited directly. The

    About Field has a click listener which opens a jEditorPane which can be

    used to edit the text. This will be described in the next section. The

    SVG Panel is an SVG Canvas which uses the Batik API to render SVG to a

    JPanel. The save clear and cancel buttons are self explanatory.

    Radical_Id RadicalLeadSequence_Id RadicalTICCCSubsid_Id

    RadicalSubsid

    Sequence_Id

    RadicalLastSubSequence_Id If_char

    SVG Panel

    SVG Text

    About Field

    unicode Stroke number

    save clear cancel

    JTextArea not editable

    SVG Canvas

    Figure 21:RadicalView input form

  • 48

    The RadicalViewer class is passed a ViewController which is used to

    retrieve and update values in the database. The populateForm() method

    is called when a row in the TabView tableDBView is selected, the row

    number is passed to the ViewController which gets the RadicalTable

    object from its currentList field. The fields on the form are then

    populated with the data from the RadicalTable object. The isDuplicate()

    method is a check performed on saving changes to the database. If a

    user attempts to save a new record, the radical table is checked in the

    ViewController‟s currentList, to ensure no radical_Id already exists. The

    methods saveRadical() and removeFromDB() both check to ensure that

    either a record is active (selected in the tableDBView), or that a record

    with the same radical_Id doesn‟t already exist before performing the

    action.

    4.2.4 The AboutEditor and TextParser

    The About field on the RadicalViewer panel is not directly populated

    from the database, instead the value is passed to the TextParser class

    which allows formatting to be performed on the text before its output to

    the screen. For the purposes of testing, the About field only displays

    normal text, but when clicked opens an editor field; the AboutEditor.

    RadicalViewer(ViewController)

    populateForm(int)

    isDuplicate(int) boolean

    clearAll()

    saveRadical(RadicalTable)

    removeFromDB(RadicalTable)

    RadicalViewer

    myontroller ViewContoller

    Figure 22: RadicalViewer class diagram

  • 49

    The AboutEditor appears when the About field is clicked in the

    RadicalViewer panel. The TextParser class is passed to the editor to

    allow parsing of the text. The showEditor() method initialises the view

    by switching the switchPanel to rawEditor, with the contents of myParser

    Test Reset Save Cancel

    jEditorPane

    Jscrollpane

    Figure 23: The AboutEditor

    AboutEditor(TextParser)

    showAbout()

    saveAndClose()

    test()

    closeDiscard()

    reset()

    AboutEditor

    rawEditor jEditorPane

    formatEditor jEditorPane

    switchPanel JPanel

    myParser TextParser

    Figure 24: AboutEditor class diagram

  • 50

    (the data from the about field in the database). The jEditorPane allows

    text to be displayed with html mark-up as if in a web browser, this

    allows basic html formatting in the pane. The rawEditor is a plain

    jEditorPane which displays the text in raw form, whereas the formatPane

    shows the results of the html markup. When the button “test” is pressed

    this switches the rawEditor to the formatEditor, they are both contained

    in the switchPanel in CardLayout. The reset button switches the view

    back to rawEditor. Saving will update the TextParser with the contents

    of the rawEditor pane, which will then update the database if the save

    button is pressed in the RadicalViewer.

    Figure 25 shows the basic workflow from the RadicalViewer, the

    AboutEditor and the TextParser. In the case of RadicalViewer populating

    the fields from the tableDBView jTable, the ViewController passes the

    about information directly to the TextParser which displays the data in

    the About field. When the About field is clicked the AboutEditor is

    opened initialised with the raw contents of the About field. When the

    text is parsed the text from the rawEditor is sent to the TextParser to be

    passed and returned formatted with html for the html aware

    About field

    text

    Save to DB

    Display in

    About

    Field

    Save raw text

    RadicalView

    TextParser

    AboutEditor

    Get format or

    raw text

    ViewContoller

    Figure 25: Basic flow of control for AboutEditor

  • 51

    formatEditor. If this is saved, the contents of the rawEditor are saved to

    the TextParser to be saved to the database once the RadicalViewer save

    button is pressed. Otherwise no changes are made, in both cases the

    contents of the Textparser are passed to the ViewController

    The TextParser class allows the parsing of user defined tags into html.

    In this Editor a number if user tags were defined corresponding to

    different types of database information:

    keyword marker

    character unicode

    radical id

    syllable id

    phonetic id

    phonetic-syllable id

    for grammar

    for english idioms

    for chinese idioms

    colour: col = r,o,g,b

    bold face

    italic

    sans serif

    teletype / courier

    example id

    These various codes would be marked up with various forms of html

    when parsed by the TextParser. For the purpose of this project only a

    few styles were selected:

  • 52

    The parser was designed to handle overlapping and these tags were

    deemed sufficient for the purpose of testing. The framework was

    however setup for the remainder of the tags to be added or altered at a

    later date. Parsing of the special characters ”\n”, “\r”, “\r\n” were

    mapped to
    to maintain page formatting.

    The TextParser setRawText() method sets the rawText member variable

    storing the value of the About field passed from the ViewController.

    The getRawText is the value passed back when the RadicalViewer is

    closed. The parse() method takes a String from the rawEditor pane and

    returns a formatted string of html. The algorithm iterates the start tags

    splitting the string and applying formatting to the text between the start

    and end tags, by replacing them with the corresponding html.

    setRawText()

    getRaswText() String

    parse(String) String

    TextParser

    rawText String

    startTags String[]

    endTags String[]

    Figure 26: The TextParser class diagram

  • 53

    4.2.4 The SVG Panel

    Utilising the Batik Library, the contents of the SVG field would be

    rendered to this panel as well as passed to the SVG JTextArea. This

    would allow changes to be made to the SVG and viewed in the SVG

    Panel. The batik library SVG Canvas is an extension of the JPanel class

    which provides this functionality, the text string is passed to the canvas

    to be displayed via a url or a reader.

    Chapter Summary

    This chapter documented the design of the system. The database

    design was outlined and justification for the design of the

    RADICALS_TABLE was also provided. The design of the editor

    framework and the model – view – controller pattern was described with

    the additional RADICAL_TABLE specific additions.

  • 54

    Chapter 5: Implementation and Testing

    Once the database architecture and editor framework had been

    designed, implementation and testing of core functionality began to

    allow both members of the group to then undertake their own individual

    tasks. This included the design and implementation of their own input

    forms such as the Radical Viewer and adding data to the database. As I

    assumed responsibility for the implementation of the framework, this

    will be described here with testing results. Some of the tools which were

    used in the development of the system will also be mentioned.

    5.1 The Tools

    The project was implemented using a MySQL database which was

    engineered with the help of MySQL Workbench14

    which enabled the

    modelling of the Entity – Relationship model. The Eclipse IDE15

    was used

    to develop the editor with the use of the JPA specification and

    EclipseLink JPA implementation and the Batik library which was used to

    render SVG documents to the screen. JUnit which is included as part of

    the Eclipse IDE, was used as a testing framework for some of the core

    functionality as this ensured that, as the most important part of the

    project the implementation was robust.

    5.2 The Editor

    The implementation of the core framework was carried out using

    methods borrowed from Agile development. The JUnit framework was

    14

    Welcome to MySQL Workbench 5.2; MySQL Workbench Team; MySQL Inc;

    http://wb.mysql.com/; Last Accessed 10/05/10.

    15

    Explore the Eclipse Universe 2010; The Eclipse Foundation 2010;

    http://www.eclipse.org/; Last Accessed 12/06/10.

    http://wb.mysql.com/http://www.eclipse.org/

  • 55

    used to create test classes of which tested the main functionality, or any

    functionality which could feasibly tested without user input. The test

    setup for the core functionality followed the same pattern for each class

    with the main test cases being an array of three RadicalTable objects;

    RadicalsTable r1 = new RadicalsTable(1);

    RadicalsTable r2 = new RadicalsTable(2, "4E57", "yi1", true);

    RadicalsTable r3 = new RadicalsTable(3);

    The objects have radical_id 1-3 respectively, with radical testing a four

    argument constructor. Each Test class had a number of methods to test

    a number of different properties. Full test code can be found in the

    Appendix.

    5.2.1 Radicals Table Test

    This class tested the RadicalTable Entity using an Entity Manager:

    generateRadicalTables(): This method creates the above

    radicalTestCases. To test the set methods for each of the fields

    raadical(3) has its fields set at runtime.

    insertAndRetrieve(): This method creates an entity manager to attempt

    to persist the entity objects to the database, then test whether they can

    be correctly retrieved.

    findAndDelete(): Retrieves said entity objects and deletes them testing

    they have been deleted

  • 56

    Figure 27: Radical Table Test passed

    5.2.2 Repository Manager Test

    This class tested database access using the Repository Manager to

    create an Entity Manager and manipulate the entities.

    insertAndRetrieve(): This method creates an Repository Manager to

    attempt to persist the entity objects to the database, then test whether

    they can be correctly retrieved.

    findAndUpdate(): Tests using a detached entity object to see if changes

    can be merged with the database.

    updateById(): This method retrieves an entity from the database with a

    given id setting one its radical_Name field to a new value and testing

    that it has merged with the database.

    findAndDelete(): Retrieves said entity objects and deletes them testing

    they have been deleted.

  • 57

    Figure 28: Repository Manager Test passed

    5.2.3 View Controller Test

    This class tests some of the functionality of the ViewController, mainly

    the interaction with the Repository Manger, but none of the input from

    the view.

    insertAndRetrieve(): This method creates an Repository Manager to

    attempt to persist the entity objects to the database, then test whether

    they can be correctly retrieved.

    insertAndRetrieveById(): This method creates an Repository Manager to

    attempt to persist the entity objects to the database, then retrieves an

    object by its radical_Id.

    asArrayTest(): This method tests the asArray() and getTableData()

    methods used to pass the entity properties to the JTable in the TabView.

    The Repository Manager retrieves all the records, the getTableData()

    submethod places this into an array which is passed to asArray()

    method. The lengths of these arrays are checked.

    testUpdate(): Tests using a detached entity object to see if changes can

    be merged with the database.

  • 58

    testRemove(): This method attempts to delete entities, testing they have

    been deleted.

    Figure 29: View Controller Test passed

    5.3 The Visual Elements

    Testing the visual elements of the editor could not feasibly be carried

    out with JUnit tests and datasets, so running tests were used, testing the

    usability and functionality of the system simultaneously. The

    implementation of the Main View was a start point for the visual tests

    and would be used as part of the integration tests conducted by the

    group.

  • 59

    The editor framework presents the user with a Main View screen.

    Figure 30: Main View

    With radicals_table selected, the view switches to allow editing of the

    database table.

  • 60

    Figure 31: RadicalViewer input screen

    5.3.1 Adding a New Record

    Adding a new record with radical id 20:

    Figure 32: Adding a new record

  • 61

    When the save button is clicked the Radical Viewer checks whether a

    record is selected in the table view, in the case of a new record (no

    selection) the viewController initiates a check to ensure that there is no

    radical with that radical_id already stored, and if not the user is

    presented with a selection confirmation dialogue:

    Figure 33: Save confirmation

    If no is clicked then view is returned back to the input screen as before,

    if yes is clicked, the save process is initiated. The code below illustrates

    this functionality.

    Figure 34: Save confirmation code

    The save process involves ensuring that the data from the fields is in the

    correct format for the database. A number of try catch clauses

    encapsulate this, with incorrectly formatted text being presented in an

    error box to the user. Once saved the database can be queried to

    refresh the view, this refreshed view can be seen in the following figure.

  • 62

    Figure 35: New record added

    5.3.2 Updating a record

    Updating a record follows similar logic to that of adding a new record

    however the RadicalViewer class checks that a record is selected in the

    jTable as it is assumed that the user will select the record from here

    before attempting to edit it. The populateForm() method is called to fill

    the RadicalViewer with data from the RadicalTable object in the

    ViewController. If no record is selected the user is presented with the

    following message box:

    Figure 36: No radical to update

  • 63

    Otherwise the logic follows as adding a new table, with the

    ViewController updating instead of saving to the database.

    5.3.3 Removing a Record

    In order to remove a record the same checks are made, to ensure the

    record is selected in the table view this ensures that there is not attempt

    to delete a record that doesn‟t exist. If the user is presented with a

    dialogue informing them of such as above otherwise they are greeted

    with a confirmation dialogue

    Figure 37: Remove from database confirmation

    5.3.4 The SVG Panel

    The SVG field from the RADICALS_TABLE is passed simultaneously to

    both the SVG Panel and SVG field when the populateForm() method is

    called.

    Figure 38: Setup of the SVG Canvas

  • 64

    For test purposes the SVG for the character 採 (cǎi) [13] was used. The

    complexity and colour would test the effectiveness of the SVG Canvas as

    shown in the figure below.

    Figure 39: The SVG Panel and SVG Field

    5.3.5 The AboutEditor and TextParser

    The About editor and TextParser were tested by entering data into the

    About field of a new record to test both parsing and saving. The

    following input was used:

    myTest

    the colour blue

    the colour red

    test the italics

    normal

    This is a nested example of colourwith

    some italics

  • 65

    Clicking the About field on the RadicalViewer form, causes the

    AboutEditor appear. The text added to the rawEditorPane is shown in

    the figure below.

    Figure 40: AboutEditor rawEditor pane input

    The TextParser parses the input when the Test button is pressed as

    figure 41 shows. Note that for the purposes of this test Unicode was

    given a yellow font.

    Figure 41: The parsed text

  • 66

    5.3.6 Discovered Issues

    Issues were found throughout the course of the testing and sometimes,

    though missed the initial tests setup were found through usability.

    Some of these will be outlined here.

    SVG Canvas glitch: On some occasions when saving an image the

    database, the canvas would glitch covering the entire RadicalViewer

    input form. An attempt to remedy this was placing the SVG canvas

    inside a JScrollPane, however this only served to make the enlarged

    image scrollable.

    MainView refresh: When changes are committed to the database, the

    user needs to query the database again to refresh the view using the

    search button. As a point of usability the view should refresh

    automatically when changes are made to keep the user abreast of the

    state of the table.

    5.4 The Database

    The database was populated with data about the Chinese radicals as

    described in the design section. Further information about the radicals

    and any semantic relationships between the radical and the characters

    indexed by it was researched and entered. This section will illustrate a

    few examples of this information.

  • 67

    一 yī : One

    Unicode: 4E00

    This radical also known as “one horizontal”, lies at the root of many of

    the numbers including:

    二 èr (two)

    三 sān (three)

    五 wu (five)

    卅 sà (thirty)

    It has many meanings associated with measurement and identification

    of uniqueness such as 每 „each‟ or 各 „every time‟, 统一 unitary or unified.

    The characters indexed by this radical include:

    屯 tún which is used in terms such as to garrison or station 屯兵 tún

    bīng, or 屯聚 tún jù to amass or assemble.

    再 zài which has uses in terms, 再版 zài bǎn second edition or 再次 zài cì

    meaning again e.g. 再一次 one more time.

    丿 piě: Hook or left- falling stroke

    Unicode: 4E3F

    Other forms: 乀fu2, 乁(yi2)

    This radical has many different indexing characters. It is used by

    onomatopoeic characters such as 乓 pāng and 乒 pīng which are both

    used to describe the sound of a discharging firearm.

    The character 卵 luǎn (egg) is used to describe parts of an egg 卵黃 luǎn

    huáng: egg yolk, or objects that share some physical similarity with an

  • 68

    egg such as 卵形 luǎn xíng: oval shaped 卵石 luǎn shí: the rounded

    shape of a pebble.

    Another character, 乘 chéng is used in words which express the idea of

    increasing one‟s lot, and taking advantage of opportunity. The

    character 乘法 chéng fǎ multiplication [14] can be seen as a core theme

    to gaining more, or increasing something. This can be seen in the

    phrases which use characters such as 乘机 chéng jī: to “jump at a

    chance” or “to strike while the iron is hot” 乘势 chéng shì.

    八 bā: Eight

    Unicode: 516B

    Other forms: 丷

    The number eight seems to hold some symbolic importance within the

    Chinese language with 八德 bā dé: the eight virtues, the eight points of

    the compass: 八方 bā fāng, the eight immortals 八仙 bā xiān and the

    eight trigrams: 八卦 bā guà. The Chinese horoscope also consists of

    eight characters. This radical indexes a numerous characters with an

    equal number of meanings.

    The character 公 gōng is used in a number of words and phrases

    regarding the public and being in the open including 公安 gōng ān which

    is used in the term public safety, and 公私 gōng sī used to describe

    public interests. In a similar vein to above, the character 分 fēn can be

    seen in phrases such as 分散:disperse or 分给: distribute. It is also used

    in words which describe an acquaintance or person of whom one is

    aware: 生分 shēng fen.

  • 69

    兴 xīng is a character in phrases such meaning to begin a task, or to set

    it up 兴办 xīng bàn, with the intention of achieving some goal such as

    兴兵 xīng bīng: starting a war.

    冫 bīng: Ice

    Unicode: 51AB

    This radical ap