6.UAP Moji Project Reportmoji.mit.edu/report.pdf · 2016-05-05 · 6.UAP Moji Project Report Aaron...

6.UAP Moji Project Report Aaron T. Nojima

ABSTRACT Moji is a languagelearning application that helps students practice the correct form of writing

characters. It differentiates itself from handwritten character recognition systems by providing feedback to its users. Students practice writing characters based on samples submitted by teachers. All evaluation methods extract positional, directional, and size information from stroke points and grade based on assigned weights and component scores. Moji provides appropriate grading and feedback and allows teachers and students to work with any character in the Unicode character set.

1. INTRODUCTION Many spoken languages of the world are often accompanied by a written language. Take Japanese for example. In addition to the Chineseadopted morphological characters known as kanji, Japanese has two sets of phonetic characters called kana (specifically hiragana and katakana). In order to advance their studies, language learners need a solid understanding of the written language in which vocabulary and grammar are usually defined or explained. Moji is a webbased application that grades userdrawn characters based on an ideallydrawn character and provides helpful feedback that aids the userlearner. This report will focus on the Japanese character set hiragana but the applications can be applied to other written character sets. Our target audience will be people with a desire to learn a new language but have limited to no experience.

2. RELATED WORK The most common applications for handwritten character evaluation are recognition systems. These systems take userdrawn characters and return the bestmatched character label in addition to a score correlating to the confidence or probability of the specified label. Many of these systems already do an excellent job of recognizing userdrawn characters. However, the purpose of Moji is different. For this application, we already assume that we already know which character the user is attempting to draw. Our concern is focused moreso on a matching score as well as the feedback returned to the user. Despite a contrast in scope between Moji and other handwriting recognition systems, we reference such systems for the representation, extraction, and grading of character input data.

2.1 Data Representation Character recognition systems can be categorized as either online recognition or offline recognition, both of which deal with different types of input data. The first uses a 2dimensional photo from a camera or a digital image file as input. The latter uses patterns recorded from mouse, finger, or pen stroke movements [1]. Since the goal of this application is to grade and provide feedback for the proper handwriting of a given character we are more interested in using inputs as seen in online recognition systems. Offline recognition input would only be useful for evaluating the visual appearance of a character but not the construction of the character. 2.2 Information Extraction We can classify dataextraction techniques as structural or unstructural. Structural uses extracted features from the input using computer vision algorithms and image processing techniques. Unstructural uses stroke geometry and analysis on input data. [1] Again since we are more interested in stroke processing instead of image processing, we will take an unstructural approach. Some of the key features to extract from character strokes are direction and orientation. Zhu and Nakagawa mention the use of histograms of normalized stroke directions. After applying a Gaussian blur kernel on the entire image, they categorize stroke directions into eight general directions across small grids of the image [1]. Our application similarly extracts directional features but for simplicity we will calculate the derivative at a stroke point rather than generating cell histograms. A recent study by Das and Banerjee proposes an algorithm that uses image processing along with stroke analysis and template matching to classify and score handdrawn characters against character templates. In their algorithm, they suggest the usage of several interesting features, the center of gravity of the character, the distance of conjunction or intersection points of strokes relative to the center of gravity, and the minimum radius from the center of gravity to enclose the entire character [2]. This algorithm brings up several points worth noting. First we need to ensure that any extracted information used in our evaluation method is relative to the overall character. Second, the usage of conjunction points proves successful in the classification of characters and therefore should be a useful metric in the evaluation of a character. However, based on their results, some of the stroke shapes were irregular. We utilize both directional information and unique relative point locations in our evaluation algorithm. 2.3 Feedback There are several existing solutions that many people choose to study the characters of a written language. Users of Quizlet and other online flashcard web services place the character on one side of the flashcard and the equivalent in a language they understand (e.g.: け and ‘ke’).

Unfortunately, flashcards do not provide any interaction or memorable action. Some language learning applications exist in the form of video games. While the experience may seem more enjoyable and interactive, the user does nothing more than translate the character. For this application, we have the user physically write the character and return useful feedback and necessary guidance.

3. WORKFLOW DESIGN In this section we describe the design of the workflow. There are two groups of users: students and teachers. The following diagrams will explain how both groups of users will interact with the application. Each step will be further described in the technical approach section of this report.

Figure 3.1: Student user workflow diagram

Figure 3.2: Teacher user workflow diagram

4. TECHNICAL APPROACH This section describes all technologies, algorithms, and methods used for the implementation of the Moji web application as well as the grading and feedback concepts. 4.1 Character Data Model The character is the only data model stored in a database. Each character contains four fields: a unicode value, a unicode block, a unicode description, and a list of points for each stroke. Unicode is an industry standard for the encoding and representation of characters expressed in the majority of the written languages across the world. This standard contains over 120,000 characters from 129 scripts (groups of characters) [3].

The unicode value is a unique identifier (primary key) for every character. These hexadecimal values range from 0x0000 to 0x2FFFFF (0 to 196607 in decimal) although not every identifier is currently defined [3]. For example, the letter ‘A’ has a unicode value of 0x0041 and the kana ‘あ’ has a unicode value of 0x3042 [3]. In our data model, the unicode value is represented as a positive integer field with a range from 0 to 2,147,483,647. The unicode block is an alternative name for the script from which the character is assigned and most often it is the name of a written language. This is to impose some modularity for the unicode standard. For example, the letter ‘A’ is in the unicode block ‘BASIC LATIN’ and the kana ‘あ’is in the unicode block ‘HIRAGANA’ [3]. In our data model, the unicode block is represented as a variable character field with a maximum length of 50 characters. The unicode description is a humanreadable name for each character in the unicode standard. Since hexadecimal values are harder to remember, unicode descriptions will make it easier for students to select which character to practice writing. For example, the letter ‘A’ has a unicode description of ‘LATIN CAPTIAL LETTER A’ and the kana ‘あ’ has a unicode description of ‘HIRAGANA LETTER A’ [3].In our data model, the unicode description is represented as a variable character field with a maximum length of 50 characters. Conceptually, the stroke points field is a list of strokes each of which is simply a list of coordinates in the format [x,y]. These points represent teachersubmitted drawings of the character similarly seen in online recognition methods mentioned earlier. Since we may be unable to store a list object in certain databases, the stroke points field will first be converted into a string format through JavaScript Object Notation (JSON) standards. The points will be stored in the database as a long text field with the default value set to ‘[ ]’ (an empty list). Through the same JSON standards, we can convert the stored string back into a list format for stroke analysis purposes.

Figure 4.1.1: Character data design model fields and attributes

4.2 Drawing Pad The drawing pad is the main tool of user interaction for both students and teachers. This tool allows users to write characters for submission. Since Moji is a web application, we use the HTML5 Canvas element. Along with mouse and touch event listeners, the canvas element lets us record mouse, finger, or pen movement throughout the construction of the character.

Figure 4.2.1: A stroke represented as a list of line segments (connecting adjacent points)

When the mouse, finger, or pen is pressed onto the user’s screen, we create an empty list and start appending points recorded from the user’s movement. The frequency at which we capture points is high enough to render lines, curves, loops, etc. When the drawing tool is no longer touching the user’s screen, we append the list of captured points (we will refer to this list as the ‘stroke list’) to another list (we will refer to this list as the ‘character list’). To account for user mistakes, the drawing pad has two helpful features: undo and clear. The clear function simply removes all stroke lists from the character list and clears the HTML canvas. The undo function removes the latest stroke list from the character list, clears the canvas, and redraws the points of the updated character list. To avoid excessive data points, the drawing pad uses an algorithm to detect which points are redundant and removes them from the appropriate strokes list. A redundant point is either a point that has the same coordinates as a previous point (this occurs when the user draws very slowly) or a point that falls on the same line generated from the previous point and the following point. In either case scenario, the drawing pad will remove the data point without any loss in necessary information.

Figure 4.2.2: Stroke point reduction rendering the same stroke

4.3 Information Extraction Given a list of strokes, each containing a list of points, the next challenge is to extract as much information as possible. The key features we focus on include stroke count, character and stroke ranges, stroke endpoints, stroke derivatives, and intersection points. Given the character data (list of strokes of points), we can calculate the stroke count by returning the number of lists. For each stroke, the range can be calculated by finding the minimum and maximum x and y coordinates among all the stroke points. For the entire character, the range can be calculated by finding the minimum and maximum x and y coordinates among all the stroke ranges (each range represented as [xmin, ymin, xmax, ymax]).

Figure 4.3.1: Character and stroke ranges boxed in red and blue respectively

For the endpoints, we iterate through each stroke and observe its points. The first and last points of the stroke represent where the user pressed and released, respectively, the drawing tool to start and end the stroke.

Figure 4.3.2: Stroke endpoints circled in red

For the derivatives, we calculate the average slope at each stroke point. To do this we need two slopes. Since each point is connected to two line segments, with the exception of the startpoint and endpoint, we calculate the slope of the preceding and following line segments and take the mean of both.

Figure 4.3.3: Stroke derivatives gradient (normalized and perpendicular) marked in blue

For all types of conjunction points, we must first examine the algorithm for calculating the intersection point of two line segments. Suppose we have, two line segments one defined by points (note the order for stroke purposes) P0 and P1 while the other is defined by points P2 and P3 as shown below:

Figure 4.3.4: Intersection of two line segments

The following variables represent the change in x and y for line segments 1 and 2:

; ; ; S1X = P 1X − P 0X S1Y = P 1Y − P 0Y S2X = P 3X − P 2X S2Y = P 3Y − P 2Y

The following equations describe a point of intersection between lines extended from line segments 1 and 2. The first describes the xcoordinate while the second describes the ycoordinate for some given unknown variables t and s:

; PP 0X + t * S1X = P 2X + s * S2X 0Y + t * S1Y = P 2Y + s * S2Y We need to solve for t and s. If s and t exist, then the two lines intersect. Solving for both s and t we get:

S P ) P )) / (S )s = ( 1X * ( 0Y − P 2Y − S1Y * ( 0X − P 2X 1X * S2Y − S2X * S1Y S P ) P )) / (S )t = ( 2X * ( 0Y − P 2Y − S2Y * ( 0X − P 2X 1X * S2Y − S2X * S1Y

However, we are only concerned if the line segments intersect. If s and t are inclusively between 0 and 1, then the line segments intersect each other at the following points:

; IIX = P 0X + t * S1X = P 2X + s * S2X Y = P 0Y + t * S1Y = P 2Y + s * S2Y

Once we have the intersection point, we can determine if the intersection is a crossstroke intersection or a selfintersection by looking at which stroke the segments belong to. We must be sure that we keep track of the type of intersection for later.

Figure 4.3.5: Crossstroke and selfstroke intersections circled in yellow and green respectively

4.4 Evaluation Algorithm This section covers the algorithm used to score and grade the submitted character in comparison to the expected character. The overall grade is dependent on the character score and the strokes score. The following weights are assigned according to the following diagram:

Figure 4.4.1: Overall grade weights

For the character score the following weights are assigned according to the following diagram:

Figure 4.4.2: Character grade weights

The stroke count is the first crucial factor taken into account for the characters grade. If the submitted character and expected character differ by any number of strokes the overall grade will be 0. In this case, the user will see this score and feedback and redraw the character with the appropriate number of strokes. We also want to look at the shape of the submitted character. Some usersubmissions might be overall larger or smaller than the expected character. We don’t want to penalize for size scaling since it some users may not feel inclined to use all the space of the drawing pad. Instead we examine the ratio of the character’s overall width to overall height, both of which can be calculated from the character’s range values. The grader module in Moji tolerates a 15% difference but any dimension ratio that is farther will be penalized in the dimension ratio score. Submissions with smaller dimension ratios will be told to make the character longer or narrower while submissions with larger dimension ratios will be told to make the character wider or shorter. For multiplestroke intersections or crossintersections, we compare the number of intersections. If the counts do not match, then we assign a score of 0 for the crossintersections positions. If the number of expected intersections is 0 then we assign a score of 0 for the crossintersections positions. Otherwise, if the counts do match, we calculate the relative positions of the submitted and expected coordinates. For relative positioning, we use the percentage within the range dimension. For example, if the xcoordinate is 150 and the horizontal range is from 75 to 175, then the relative xcoordinate is 0.75 since it is positioned 75% through the horizontal range. Next we compare the relative positions of the submitted and expected intersections. If the intersections are relatively further than 10% (0.10) then the grader will begin to penalize the crossintersections score. Submissions with lower relative positions (horizontal and/or vertical) will be told to cross the intersections either lower or more to the right while submissions with greater relative positions will be told to cross the intersections either higher or more to the left.

Figure 4.4.3: Relative position of 70% with respect to character width

For the strokes score the following weights are assigned according to the following diagram:

Figure 4.4.4: Stroke grade weights

For stroke width and height we must first calculate the relative length compared to that of the overall character width and height. For example, if the stroke width is 100 and the character width is 200, the relative length is 50% (0.50). If the dimension is off by more than 15% (0.15) the grader will begin to penalize the width or height score. Submissions with smaller widths or heights will be told to make the stroke wider or longer, respectively. Submissions with larger widths or heights will be told to make the stroke narrower or shorter, respectively.

Figure 4.4.5:Relative width of 85% with respect to character width

We evaluate selfintersections of each stroke very similarly to crossintersections for the overall character. This time we just use the stroke’s range data instead of the entire character and each stroke has its own value. For all position grades, we evaluate the score very similarly to how we performed the evaluation of crossintersections positions. Note that for general position, we utilize the midpoint of the stroke range (imagine a box for the range and taking the center point of that box). The points for selfintersections (note that we need to evaluate count first just like the crossintersections grade) and endpoints are selfexplanatory. The feedback provided is also similar to crossintersection positions feedback (high, low, left, or right). The shape is arguably the most important trait of a stroke and hence why it has such a high weight for the strokes grading. However there is a common challenge that we need to resolve before moving onto the grading approach. A stroke is represented as a sequence of points used to write the character. We have a sequence for the expected character stroke and one for the submitted character stroke and we are trying to compare the shape of these two strokes. The basic idea is to sample some number of points in a stroke and see what the orientation or direction is (we have derivative values for each point). One approach would be to iterate through the points and compare each point derivative, but the number of points per stroke is not guaranteed to be close. For example, one person might write a stroke and generate 400 points while another might only generate 100 points. This is not uncommon as it can depend on how shaky a user’s drawing tool might be or how fast the user writes. We need to accommodate for varying number of data points for each user’s stroke. To resolve this, we will calculate the step size for both the submitted and expected stroke. If one stroke has more data points, we will iterate through each of those points. For the other stroke we will use a smaller step size to iterate through all points at the same rate. Using the previous example of 400 vs 100, we would iterate through all 400 points of the stroke. In order to iterate through all 100 points of the other stroke in 400 iterations we would need to set the step size to 0.25. Thus when we are examining at the 4th point in the 400point stroke we will be examining the 1st point in the 100point stroke. Essentially we are comparing what we expect to be relatively similar sections within the stroke. When we are comparing a section about halfway in through the 100point stroke (~50), we should be looking at a similar section near the halfway mark in the 400point stroke (~200).

Figure 4.4.6: Observing relatively similar locations within a stroke (at 0%, 50%, and 95%)

If we are calculating new proportional step sizes for a stroke, there is a high chance that we will be looking at fractions instead of whole numbers. Since we are using a sequence of points, how can we obtain a point 2.25th point? We will interpolate the data point based on the surrounding points. Let’s say we are supposed to find the 2.25th point which does not exist. However, we do know the 2nd and 3rd point in the sequence. Since the stroke is rendered by connecting adjacent points to form line segments the derivative of the 2.25th point is the slope from the 2nd to the 3rd point. To calculate an orientation for a data point, we use an arctangent function (atan2) that requires the dx and dy variables from the derivative. This function returns an angle, in radians, from ᵰ to ᵰ based on the slope and direction of the derivative. If the angles differ by more than onesixteenth of a rotation (approximately 0.2 radians) then we will consider that a mismatching point. For each stroke the raw score is calculated as the number of mismatching points divided by the total number of iterations (not points since we have to adjust as mentioned earlier). The grader will tolerate any raw score above 85% (0.85) but any lower raw score will be penalized and be labelled as ‘Irregular’ for feedback purposes.

5. RESULTS This section will showcase the finished product for Moji (views) as well as the achievements and failures of the provided grades and feedback.

5.1 Page Layouts There are two main pages to observe. The first is the teacher page. This page has a character text input near the top of the page where a user will be allowed to type in any character (only permits one character). Below that is the drawing pad. Teachers will draw the character with the correct strokes using a mouse, finger, or stylus. The teacher can also click on the template button to load an image from a URL source underneath the canvas element to ensure correct tracing. The debug button to display extracted information. The undo button removes the last stroke while the clear button removes all strokes. The submit button will send the drawing to the server and update the stroke points field of the expected character model (or create a new one).

Figure 5.1.1: Teacher page (http://moji.scripts.mit.edu/teacher)

http://moji.scripts.mit.edu/teacher

The second is the student page. This page has two dropdowns one to select the character group (e.g.: HIRAGANA, BASIC LATIN LETTERS, etc.) and another to select the character (e.g.: あ, A, etc.). Below that is the drawing pad. Students will draw the character with the correct strokes using a mouse, finger, or stylus. The debug button to display extracted information. The undo button removes the last stroke while the clear button removes all strokes. The submit button will send the drawing to the server and display resulting grades and feedback for the character submission in a popup modal.

Figure 5.1.2: Student page (http://moji.scripts.mit.edu/student)

5.2 Character Grades and Feedback Below we will show some submitted characters and the appropriate score and feedback provided.

http://moji.scripts.mit.edu/student

Overall Grade: 98

Overall Grade: 91

Character Grade: 77 Feedback: Character Dimension Ratio Too

Narrow

Character Grade: 63 Feedback: Missing Intersection

Strokes Grade: 88 Feedback: First Stroke Too Wide

Strokes Grade: 89 Feedback: First Stroke Too High

Strokes Grade: 67 Feedback: Third Stroke Irreg

Table 5.2.1: Submitted characters and notable grade and feedback provided

6. CONCLUSION Moji differentiates itself from other handwritten character applications by shifting away from purely recognition to a grader and feedback provider. Moji does has a nice userinterface and provides enough tools for users to draw and understand character feedback. The character data model is nicely designed such that this system can support any character that can be displayed on the Internet. All mentioned algorithms run smoothly and both extract information and evaluate characters effectively. Aside from tweaking the grading algorithm based on usertesting and feedback, there are other aspects that should be considered for future work on this application. Currently, there is barely any distinction between students and users. It would be nice to develop the roles of both users by creating exercises. Teachers can create exercises for certain students and students would have their grade information saved to help them improve overall. Another issue at hand is the lack of security for bad teacherdrawn submissions. Moji allows anyone to submit an ‘ideal’ character drawing for any character at the moment. If the system could incorporate an optical character recognition system that performs offline recognition techniques on the image of the teacher submission first and only allow highmatching drawings be saved into the database, we would be able to prevent bad data from being stored in lieu of accurate data. Since Moji is highly dependent on teacher submissions, it currently lacks a variety of characters to practice. But given some time and the power of crowdsourcing it can evolve into an innovative language learning application for all written scripts.

7. REFERENCES [1] Zhu, B., Nakagawa, M. “Online Handwritten Chinese/Japanese Character Recognition,

Advances in Character Recognition”. InTech. http://cdn.intechopen.com/pdfswm/40720.pdf [2] Das, S., Banerjee, S. “An Algorithm for Japanese Character Recognition”. International

Journal of Image, Graphics and Signal Processing. http://www.mecspress.org/ijigsp/ijigspv7n1/IJIGSPV7N12.pdf

[3] The Unicode Consortium. http://www.unicode.org/

http://cdn.intechopen.com/pdfs-wm/40720.pdf

http://www.mecs-press.org/ijigsp/ijigsp-v7-n1/IJIGSP-V7-N1-2.pdf

http://www.unicode.org/

6.UAP Moji Project Reportmoji.mit.edu/report.pdf · 2016-05-05 · 6.UAP Moji Project Report Aaron...

Documents

Transcript of 6.UAP Moji Project Reportmoji.mit.edu/report.pdf · 2016-05-05 · 6.UAP Moji Project Report Aaron...