Artificial Intelligence

Online CS Modules: Artificial Intelligence

ArtificialIntelligence

Artificial Intelligence

Introduction:

The following lessons introduce the topic of artificial intelligence (AI) in computers by surveying several of the major application domains of AI. These application domains include language processing, visual processing, game playing, expert systems, and neural networks. Each lesson includes a set of review questions which test the important concepts from the lesson and provide practice problems. After reading each lesson, you should work the review questions before proceeding to the next lesson. Use the navigation bar at the top of this page to view the lessons and access the review questions. Each lesson page has a link on the navigation bar which will take you to the review questions for that lesson. To

begin your study, click at the top of this page.

Lessons:

I. Introduction to Artificial IntelligenceII. Humans Versus Computers

III. Natural Language ProcessingIV. Game Playing V. Visual Processing

VI. Neural NetworksVII. Expert Systems

VIII. Summary

Learning objectives:

● Understand current applications of AI● Recognize the limitations of AI● Compare the human mind to computer intelligence

http://courses.cs.vt.edu/~csonline/AI/Lessons/index.html [2/26/2007 23:10:38]

http://courses.cs.vt.edu/~csonline/AI/Questions/index.html

http://courses.cs.vt.edu/~csonline/Help/index.html

Online CS Modules: Introduction to Artificial Intelligence


Introduction to Artificial

Intelligence

The field of artificial intelligence (AI) encompasses the study of how to apply computers to problem domains that have traditionally been handled by humans only. A good example of such a problem domain is natural language. Current research in AI is attempting to find ways to use computers to automatically perform natural language translation and to recognize spoken words and convert them into written language. These applications of AI technology demonstrate the general goal of AI research:

To develop more powerful, versatile programs that can handle problems currently handled efficiently only by the human mind [Balci 1996].

In thinking about this goal, it is helpful to consider what types of problems the human mind can handle efficiently and what types of problems computers can handle efficiently. In his text Computer Science: An Overview, Glenn Brookshear makes the following comparison between humans and computers:

"Algorithmic machines are designed to perform precisely defined tasks with speed and accuracy, and they do this extremely well. However, machines are not gifted with common sense. When faced with a situation not foreseen by the programmer, a machine's performance is likely to deteriorate rapidly. The human mind, although often floundering on complex computations, is capable of understanding and reasoning. Consequently, whereas a machine might outperform a human in computing solutions to problems in nuclear physics, the human is much more likely to understand the results and determine what the next computation should be" [Brookshear 1997].

We can summarize Brookshear's comments in the following graph [Biermann 1990]. Notice that computer performance and task complexity are inversely related. This reflects the fact that computers are good at performing well-defined, repetitive computations but poor at complex tasks like reasoning. On the other hand, humans struggle to accurately add a long column of numbers without error while they have little problem conversing and reasoning with friends using natural language.

http://courses.cs.vt.edu/~csonline/AI/Lessons/Introduction/index.html (1 of 3) [2/26/2007 23:10:45]

http://courses.cs.vt.edu/~csonline/AI/Questions/Introduction/index.html



An important question arises when we consider the idea of computer intelligence: how do we determine whether a particular computer has demonstrated intelligence? The answer to this question really depends on the context in which you discuss artificial intelligence. From a philosophical perspective, "one considers questions regarding intelligence itself and whether machines can possess actual intelligence or merely simulate its presence." From an applied perspective, the question is "how technology can be applied to produce machines that behave in intelligent ways" [Brookshear 1997].

In these lessons, we will mainly focus on the applied perspective of AI although we will briefly consider the philosophical side in the next lesson. Some of the most important applications in the field of AI include natural language processing, visual processing, game playing, expert systems, and neural networks. In our study, we will look at some examples of these technologies. By the end of this section, you should be able to do the following:


References

● Balci, O. (1996), Introduction to Computer Science Lecture Notes, Department of Computer Science, Virginia Tech, Blacksburg, VA, pp. 129.

● Brookshear, J. G. (1997), Computer Science: An Overview, Fifth Edition, Addison-Wesley, Reading, MA.

● Biermann, A. W. (1990), Great Ideas in Computer Science, The MIT Press, Cambridge, MA, pp. 376.


Online CS Modules: Humans Versus Computers


Humans Versus Computers

In the last lesson we saw that we can view AI from a philosophical perspective by considering whether machines can really possess intelligence or not. While we certainly will not be able to answer that question in this lesson, we can stimulate some more thought by comparing the human mind to the current computer technology. The following animation compares the storage capacity and processing speed of the brain to that of the fastest computers currently available. The estimates for this comparison were taken from The Analytical Engine by Rick Decker and Stuart Hirshfield [1998].

When comparing the processing capabilities of the human mind and the computer, it is important to remember that each one is good at certain tasks. While computers can crunch long lists of numbers in milliseconds, humans can decide what the results mean and what action should be done based on those results. Computers are good at computation with numbers while humans are good at reasoning and interpretation.

To demonstrate this difference, let's run an experiment that will pit you against the computer in two different tasks that involve artificial intelligence. The first task is playing a game of checkers. Since this task can actually be reduced to number crunching, computers are quite good at it. In fact, the world champion checkers player is currently a computer. Try playing a game of checkers [Fabio 1997] against your computer in the applet below and see how well you do.

http://courses.cs.vt.edu/~csonline/AI/Lessons/HversusC/index.html (1 of 2) [2/26/2007 23:10:51]

http://courses.cs.vt.edu/~csonline/AI/Questions/HversusC/index.html


http://www.cs.ualberta.ca/%7Echinook/


Online CS Modules: Humans Versus Computers

The second task in our experiment is carrying on a conversation. This is the type of task at which humans are good and computers are poor. Conversations can not be reduced to simple number crunching, so computers can not perform nearly as well at this task. Typical conversations involve a huge amount of "world knowledge" or common facts about life which humans accumulate as they grow. Remember in our animation we saw that humans have a 50 to 1 advantage in terms of information storage. What seems to be effortless for us is quite challenging for a computer.

The applet below is a version of a program called ELIZA which was first developed by Joseph Weizenbaum in the mid-1960s. The program as designed to imitate the role of a psychologist asking questions to a patient. Try starting a conversation with ELIZA [Goerlich 1996] and see what you think of your computer's ability to talk.

At the least, our experiment in this lesson has confirmed the basic observations we made about AI, namely, computer intelligence is currently limited to tasks that are easily reduced to algorithms and number crunching. Remember, however, that the goal of AI is "to develop more powerful, versatile programs that can handle problems currently handled efficiently only by the human mind." In the next few lessons we will look at some applications that are trying to use computer intelligence to support tasks traditionally handled by humans.

References

● Decker, R. and S. Hirshfield (1998), The Analytical Engine, First Edition, PWS Publishing Company, Boston, MA, pp. 303-304.

● Fabio, F. (1997), "Checkers Applet," http://www.geocities.com/SiliconValley/8381/checkers_en.html.● Goerlich, R. (1996), "ELIZA Applet," http://www.cyberloft.com/bgoerlic/index.htm.

http://courses.cs.vt.edu/~csonline/AI/Lessons/HversusC/index.html (2 of 2) [2/26/2007 23:10:51]

http://www.geocities.com/SiliconValley/8381/checkers_en.html

http://www.cyberloft.com/bgoerlic/index.htm

Untitled Document

The field of artificial intelligence (AI) encompasses the study of how to apply computers to problem domains that have traditionally been handled by humans only. A good example of such a problem domain is natural language. Current research in AI is attempting to find ways to use computers to automatically perform natural language translation and to recognize spoken words and convert them into written language. These applications of AI technology demonstrate the general goal of AI research:

To develop more powerful, versatile programs that can handle problems currently handled efficiently only by the human mind [Balci 1996].

In thinking about this goal, it is helpful to consider what types of problems the human mind can handle efficiently and what types of problems computers can handle efficiently. In his text Computer Science: An Overview, Glenn Brookshear makes the following comparison between humans and computers:

"Algorithmic machines are designed to perform precisely defined tasks with speed and accuracy, and they do this extremely well. However, machines are not gifted with common sense. When faced with a situation not foreseen by the programmer, a machine's performance is likely to deteriorate rapidly. The human mind, although often floundering on complex computations, is capable of understanding and reasoning. Consequently, whereas a machine might outperform a human in computing solutions to problems in nuclear physics, the human is much more likely to understand the results and determine what the next computation should be" [Brookshear 1997].

We can summarize Brookshear's comments in the following graph [Biermann 1990]. Notice that computer performance and task complexity are inversely related. This reflects the fact that computers are good at performing well-defined, repetitive computations but poor at complex tasks like reasoning. On the other hand, humans struggle to accurately add a long column of numbers without error while they have little problem conversing and reasoning with friends using natural language.

http://courses.cs.vt.edu/~csonline/AI/Lessons/Introduction/Lesson.html (1 of 2) [2/26/2007 23:10:52]

Untitled Document

An important question arises when we consider the idea of computer intelligence: how do we determine whether a particular computer has demonstrated intelligence? The answer to this question really depends on the context in which you discuss artificial intelligence. From a philosophical perspective, "one considers questions regarding intelligence itself and whether machines can possess actual intelligence or merely simulate its presence." From an applied perspective, the question is "how technology can be applied to produce machines that behave in intelligent ways" [Brookshear 1997].

In these lessons, we will mainly focus on the applied perspective of AI although we will briefly consider the philosophical side in the next lesson. Some of the most important applications in the field of AI include natural language processing, visual processing, game playing, expert systems, and neural networks. In our study, we will look at some examples of these technologies. By the end of this section, you should be able to do the following:


References

● Balci, O. (1996), Introduction to Computer Science Lecture Notes, Department of Computer Science, Virginia Tech, Blacksburg, VA, pp. 129.

● Brookshear, J. G. (1997), Computer Science: An Overview, Fifth Edition, Addison-Wesley, Reading, MA.

● Biermann, A. W. (1990), Great Ideas in Computer Science, The MIT Press, Cambridge, MA, pp. 376.

http://courses.cs.vt.edu/~csonline/AI/Lessons/Introduction/Lesson.html (2 of 2) [2/26/2007 23:10:52]

Online CS Modules: Natural Language Processing


Natural Language Processing

The first application of AI that we will explore is natural language processing. You have already seen one example of natural language processing when you chatted with the ELIZA system. Of course this system was not very sophisticated since it was only able to echo questions back as responses.

We can see some of the difficulties of natural language processing when we compare natural languages with computer programming languages. The latter are designed to be unambiguous so that the meaning of a statement can be derived primarily from the syntax of the statement. For example, the statement below has only one meaning to a computer:

If (count < 100) Then Call OutputLine(count)

However, natural languages are not nearly so straightforward. Many statements in natural languages have meanings that are greater than the sum of their words. Even the same statement can have different meanings in different contexts. For example, the simple question "Where have you been?" can be meant as a query regarding your previous location or as a scolding for being late. The difference depends on the context of the question. These ambiguities in natural language make it difficult for computers to understand our speech.

In order for a computer to understand a natural language statement, the computer must understand the syntax, semantics, and context of the statement. Syntax analysis corresponds to recognizing the words of the statement and their grammatical roles (i.e. subject, verb, object, etc.). Semantic analysis corresponds roughly to understanding the relationships between the words. Questions such as "Does the subject or the object receive the action?" and "Who or what is responsible for the action?" reveal the semantics of a statement. Contextual analysis involves comparing a statement with its context to determine its meaning. For example, the sentence "The bat slipped from his hand" has two different meanings for a cave explorer and a baseball player [Brookshear 1997].

Considering these levels of analysis, we can see that the ELIZA program only performs syntactic and semantic analysis. That's why the program can not actually answer your questions. It only rearranges the words of your statement into a new question. ALICE [Wallace 2000] is a more sophisticated chat program which performs some limited contextual analysis. Try chatting with ALICE by clicking the button below. The program

http://courses.cs.vt.edu/~csonline/AI/Lessons/LangProcessing/index.html (1 of 2) [2/26/2007 23:10:57]

http://courses.cs.vt.edu/~csonline/AI/Questions/LangProcessing/index.html


Online CS Modules: Natural Language Processing

will open in a new browser window.

Some more practical applications of natural language processing are language translation, natural language database queries, and speech recognition. For an example of language translation, visit the Systran Software web site. Their site provides a translation utility that translates English, German, French, Spanish, Italian, and Portuguese. Try translating a few English sentences to another language, and then translate them back to English to see how intelligent the computer translation is.

For an example of natural language database queries, visit the Ask.com web site. This site allows you to ask questions in typical English form such as "What is the weather like today?" or "How is the stock market doing?" The computer parses the question to determine your subject of interest and then suggests relevant web sites to visit. Try asking a few questions and see if the computer can give you a relevant answer.

Voice recognition software is relatively new to the AI domain because personal computer hardware has only recently become powerful enough to support such processing. The goal of voice recognition is to interact with a computer using spoken commands. While it would be great if we could all dictate letters and papers to our computer rather than typing, this goal is still a way off. While the current technology may be able to recognize the sounds that are spoken, these sounds may map to several different words. For example, the words "to", "too", and "two" all map to the same sound. Sophisticated grammatical analysis is needed to determine from the context which is the correct word to display.

If you are interested in trying some voice recognition software, Philips provides a free dictation utility for the Microsoft Windows operating system. You will need a computer equipped with a sound card and microphone to use the software.

References

● Brookshear, J. G. (1997), Computer Science: An Overview, Fifth Edition, Addison-Wesley, Reading, MA, pp. 384.

● Wallace, R. (2000), "ALICE chat robot," http://www.alicebot.org.

http://courses.cs.vt.edu/~csonline/AI/Lessons/LangProcessing/index.html (2 of 2) [2/26/2007 23:10:57]

http://www.systransoft.com/

http://www.ask.com/

http://www.philips.com/

http://www.speech.philips.com/ud/get/Pages/psp_frameset_h.htm?072fr.htm


http://www.alicebot.org/

Untitled Document

In the last lesson we saw that we can view AI from a philosophical perspective by considering whether machines can really possess intelligence or not. While we certainly will not be able to answer that question in this lesson, we can stimulate some more thought by comparing the human mind to the current computer technology. The following animation compares the storage capacity and processing speed of the brain to that of the fastest computers currently available. The estimates for this comparison were taken from The Analytical Engine by Rick Decker and Stuart Hirshfield [1998].

When comparing the processing capabilities of the human mind and the computer, it is important to remember that each one is good at certain tasks. While computers can crunch long lists of numbers in milliseconds, humans can decide what the results mean and what action should be done based on those results. Computers are good at computation with numbers while humans are good at reasoning and interpretation.

To demonstrate this difference, let's run an experiment that will pit you against the computer in two different tasks that involve artificial intelligence. The first task is playing a game of checkers. Since this task can actually be reduced to number crunching, computers are quite good at it. In fact, the world champion checkers player is currently a computer. Try playing a game of checkers [Fabio 1997] against your computer in the applet below and see how well you do.

The second task in our experiment is carrying on a conversation. This is the type of task at which humans are good and computers are poor. Conversations can not be reduced to simple number crunching, so computers

http://courses.cs.vt.edu/~csonline/AI/Lessons/HversusC/Lesson.html (1 of 2) [2/26/2007 23:10:58]



Untitled Document

can not perform nearly as well at this task. Typical conversations involve a huge amount of "world knowledge" or common facts about life which humans accumulate as they grow. Remember in our animation we saw that humans have a 50 to 1 advantage in terms of information storage. What seems to be effortless for us is quite challenging for a computer.

The applet below is a version of a program called ELIZA which was first developed by Joseph Weizenbaum in the mid-1960s. The program as designed to imitate the role of a psychologist asking questions to a patient. Try starting a conversation with ELIZA [Goerlich 1996] and see what you think of your computer's ability to talk.

At the least, our experiment in this lesson has confirmed the basic observations we made about AI, namely, computer intelligence is currently limited to tasks that are easily reduced to algorithms and number crunching. Remember, however, that the goal of AI is "to develop more powerful, versatile programs that can handle problems currently handled efficiently only by the human mind." In the next few lessons we will look at some applications that are trying to use computer intelligence to support tasks traditionally handled by humans.

References

● Decker, R. and S. Hirshfield (1998), The Analytical Engine, First Edition, PWS Publishing Company, Boston, MA, pp. 303-304.

● Fabio, F. (1997), "Checkers Applet," http://www.geocities.com/SiliconValley/8381/checkers_en.html.● Goerlich, R. (1996), "ELIZA Applet," http://www.cyberloft.com/bgoerlic/index.htm.

http://courses.cs.vt.edu/~csonline/AI/Lessons/HversusC/Lesson.html (2 of 2) [2/26/2007 23:10:58]

http://www.geocities.com/SiliconValley/8381/checkers_en.html

http://www.cyberloft.com/bgoerlic/index.htm

Online CS Modules: Game Playing


Game Playing

To begin our discussion of artificial intelligence in game playing, we will use the following puzzle described by Brookshear:

"Imagine that you have a puzzle consisting of eight square tiles labeled 1 through 8 mounted in a frame capable of holding a total of nine such tiles in three rows and three columns. Among the tiles in the frame then is a vacancy into which any of the adjacent tiles can be pushed" [Brookshear 1997].

The applet below [Opdorp 1997] shows an example of the puzzle previously described. Note that the numbered tiles have been replaced by images in this version. You can move the tiles by clicking on an image adjacent to the empty tile. Try solving the puzzle and see how many steps you need. Then click the button "Same Shuffle" followed by "Solve" to restore the original shuffle and watch the computer solve the puzzle.

You are probably wondering how the computer is able to solve this puzzle so efficiently. Of course one possibility is that the computer is already programmed with all the possible configurations for this puzzle. However, considering that there are over 180,000 possible configurations for a simple puzzle like this, such a solution is clearly not feasible.

Instead of explicitly storing all the possible solutions, the computer constructs a search tree which holds possible configurations for the 8-puzzle. If we imagine our tiles being numbered, a very simple search tree would look like this:

Notice that moving from one node to the next is equivalent to moving a single tile of the puzzle. In this case, we only need two moves to reach the goal state (the solution configuration). However, if we had started with a more difficult configuration, our search

http://courses.cs.vt.edu/~csonline/AI/Lessons/GamePlaying/index.html (1 of 3) [2/26/2007 23:11:01]

http://courses.cs.vt.edu/~csonline/AI/Questions/GamePlaying/index.html



tree would be much larger. By constructing a search tree, the computer can examine the possible configurations of the puzzle systematically until it reaches the goal state. Then by following the path from the goal state back to the start state, the computer can determine the correct steps to solve the puzzle.

The search tree for a particular problem can grow in size quite rapidly if the goal state is not found quickly. To reduce the amount of searching the computer must do, the tree can be constructed in a depth-first manner rather than a breadth-first manner. In this way a single branch of the tree is considered first before examining other branches. The advantage of this approach is that more promising branches can be considered first.

Another technique for reducing the size of the search tree is by using heuristics. Heuristics are like "rules-of-thumb" that tell the computer whether a given branch of the tree is worth exploring or not. These rules guide the computer by telling it how close a given configuration is to the goal state. Configurations which are closer can be given more attention than less promising configurations. To see the effect of using a heuristic, try running the eight-puzzle applet but change the value in the "Heuristic" list to "None". You may need to shuffle the board a couple of times to find a difficult configuration. You should notice that the computer takes quite a while to find the solution. Then try asking the computer to solve the same configuration (use the "Same Shuffle" button) using the "Manhattan" heuristic. The time required to find the solution is noticeably shorter.

In recent years, researchers have combined sophisticated heuristics and increased computing capability to create formidable computer opponents in the games of chess and checkers. As mentioned before, a computer program now officially holds the world title for checkers. In 1997, a chess program designed by IBM researchers played against world champion Garry Kasparov in a six game match and defeated him. Details of the match can be found at a special IBM web site. Both checkers and chess programs construct search trees similar to the example we saw with the 8-puzzle. Then the programs search through the tree for the most advantageous move. Since computers can perform calculations so quickly, these programs can analyze a tremendous number of moves in a single second. For example, the computer which Kasparov played against could examine and evaluate up to 200,000,000 chess positions per second while Kasparov himself could only examine and evaluate up to three chess positions per second [IBM 1997]. While it might seem that Kasparov's odds against such computing power are hopeless, the reality is that most of the moves the computer examines are useless. Since the computer lacks experience and intuition, it must examine each possible move at every step of the game. On the other hand, chess masters like Kasparov have the ability to immediately recognize that some moves are useless. This allows them to think more efficiently and focus their attention on certain strategies that have the most potential.


http://www.research.ibm.com/deepblue/


References


● Opdorp, G. (1997), "Eight-puzzle applet," http://www.aie.nl/~geert/java/public/EightPuzzle.html.

● IBM (1997), "Kasparov Versus Deep Blue," http://www.research.ibm.com/deepblue/meet/html/d.2.html.


http://www.aie.nl/%7Egeert/java/public/EightPuzzle.html

http://www.research.ibm.com/deepblue/meet/html/d.2.html

Untitled Document

The first application of AI that we will explore is natural language processing. You have already seen one example of natural language processing when you chatted with the ELIZA system. Of course this system was not very sophisticated since it was only able to echo questions back as responses.

We can see some of the difficulties of natural language processing when we compare natural languages with computer programming languages. The latter are designed to be unambiguous so that the meaning of a statement can be derived primarily from the syntax of the statement. For example, the statement below has only one meaning to a computer:

If (count < 100) Then Call OutputLine(count)

However, natural languages are not nearly so straightforward. Many statements in natural languages have meanings that are greater than the sum of their words. Even the same statement can have different meanings in different contexts. For example, the simple question "Where have you been?" can be meant as a query regarding your previous location or as a scolding for being late. The difference depends on the context of the question. These ambiguities in natural language make it difficult for computers to understand our speech.

In order for a computer to understand a natural language statement, the computer must understand the syntax, semantics, and context of the statement. Syntax analysis corresponds to recognizing the words of the statement and their grammatical roles (i.e. subject, verb, object, etc.). Semantic analysis corresponds roughly to understanding the relationships between the words. Questions such as "Does the subject or the object receive the action?" and "Who or what is responsible for the action?" reveal the semantics of a statement. Contextual analysis involves comparing a statement with its context to determine its meaning. For example, the sentence "The bat slipped from his hand" has two different meanings for a cave explorer and a baseball player [Brookshear 1997].

Considering these levels of analysis, we can see that the ELIZA program only performs syntactic and semantic analysis. That's why the program can not actually answer your questions. It only rearranges the words of your statement into a new question. ALICE [Wallace 2000] is a more sophisticated chat program which performs some limited contextual analysis. Try chatting with ALICE by clicking the button below. The program will open in a new browser window.

Some more practical applications of natural language processing are language translation,

http://courses.cs.vt.edu/~csonline/AI/Lessons/LangProcessing/Lesson.html (1 of 2) [2/26/2007 23:11:03]

Untitled Document

natural language database queries, and speech recognition. For an example of language translation, visit the Systran Software web site. Their site provides a translation utility that translates English, German, French, Spanish, Italian, and Portuguese. Try translating a few English sentences to another language, and then translate them back to English to see how intelligent the computer translation is.

For an example of natural language database queries, visit the Ask.com web site. This site allows you to ask questions in typical English form such as "What is the weather like today?" or "How is the stock market doing?" The computer parses the question to determine your subject of interest and then suggests relevant web sites to visit. Try asking a few questions and see if the computer can give you a relevant answer.

Voice recognition software is relatively new to the AI domain because personal computer hardware has only recently become powerful enough to support such processing. The goal of voice recognition is to interact with a computer using spoken commands. While it would be great if we could all dictate letters and papers to our computer rather than typing, this goal is still a way off. While the current technology may be able to recognize the sounds that are spoken, these sounds may map to several different words. For example, the words "to", "too", and "two" all map to the same sound. Sophisticated grammatical analysis is needed to determine from the context which is the correct word to display.

If you are interested in trying some voice recognition software, Philips provides a free dictation utility for the Microsoft Windows operating system. You will need a computer equipped with a sound card and microphone to use the software.

References


● Wallace, R. (2000), "ALICE chat robot," http://www.alicebot.org.

http://courses.cs.vt.edu/~csonline/AI/Lessons/LangProcessing/Lesson.html (2 of 2) [2/26/2007 23:11:03]

http://www.systransoft.com/

http://www.ask.com/

http://www.philips.com/



http://www.alicebot.org/

Online CS Modules: Visual Processing


Visual Processing

Another important area of research in artificial intelligence is visual processing. Visual processing involves the collection and analysis of digitized image data by computers. In this lesson, we will look at two related examples of visual processing: optical character recognition (OCR) and handwriting recognition.

Optical Character Recognition

OCR is defined by Webopedia [2000] as "the branch of computer science that involves reading text from paper and translating the images into a form that the computer can manipulate...All OCR systems include an optical scanner for reading text, and sophisticated software for analyzing images. Most OCR systems use a combination of hardware (specialized circuit boards) and software to recognize characters, although some inexpensive systems do it entirely through software. Advanced OCR systems can read text in large variety of fonts, but they still have difficulty with handwritten text". Currently OCR software can accurately recognize up to 99% of the characters in a well-printed document. While this is a very high percentage, it still leaves something to be desired when scanning long texts. Consider a typical novel of 300 pages, 40 lines per page, and 75 characters per line. Even with 99% accuracy, an OCR software package would need help 9,000 times in order to output the entire document correctly!

OCR software uses two main approaches to identifying characters: matrix matching and pattern extraction. Matrix matching is the simpler of the two since it only involves comparing the scanned data to stored templates. This approach is effective when the characters being identified are uniform in style and size. Pattern extraction involves the identification of certain features which are unique to an individual character. For example, the lower case character 't' is typified by a horizontal line crossed by a shorter vertical line. These patterns generally remain the same regardless of the size and style of the font. However, pattern extraction can be very difficult for other characters. Consider the images below showing the character 'y' in a variety of fonts. What patterns would you use to accurately describe this character?

According to Decker and Hirshfield, "most OCR programs use a combination of matrix

http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/index.html (1 of 3) [2/26/2007 23:11:09]

http://courses.cs.vt.edu/~csonline/AI/Questions/VisualProcessing/index.html



matching and pattern extraction, using matrix matching for monospaced typefaces such as Courier, in which all of the characters have the same width, and pattern extraction for proportional typefaces such as Palatino and Helvetica" [Decker and Hirshfield 1998]. In order to get a feel for the effectiveness of OCR software, browse through the following web page provided by the Electronic Text Center at the University of Virginia [1998]. This page gives five examples of various texts that were scanned and converted to HTML format.

Handwriting Recognition

Some text documents cannot be converted to electronic format using OCR because they are handwritten rather than printed. In this case, handwriting recognition software can be used to convert the document. Since handwriting differs greatly among individuals, more sophisticated analysis is required to identify characters. Handwriting recognition also offers an alternative method for inputting text into a computer. Many of the current handheld computers incorporate this technology in order to avoid integrating a keyboard into the unit. While this technology is promising, it is still currently much slower than input via a keyboard. For this reason, handwriting recognition is mainly limited to computing devices for which keyboards are impractical because of their large size.

Just like the various fonts above present a challenge to OCR software, different writing styles challenge handwriting recognition software. In order to improve the accuracy of recognition, most handwriting recognition software incorporates some technique for learning an individual's style. With this method, the software compares samples of the user's writing rather than using a fixed pattern for comparison and recognition. As long as the user is fairly consistent in their writing style, this method can be very effective for recognizing individual writing styles.

To try an example of handwriting recognition software, click the button below to launch JRec, a handwriting recognition applet designed by Bob Mitchell [1998]. Then follow the instructions below to train the applet and perform some simple tests.

Instructions:

1. Use the mouse to draw a character in the applet window.2. Store the character by pressing one of the label buttons (0-9). This also associates

the character with the digit code on the button.3. Press "Clear" to erase your character.4. Repeat steps 1-3 for up to 10 different characters.5. Press the "Train" button to train the applet with the character set.6. Redraw one of the characters and press "Test" to recognize the character.



Since JRec is a fairly simple example of handwriting recognition, you will probably notice that it easily confuses characters with similar shapes. For example, try training the applet to recognize '5' and '6'. How often does it correctly distinguish these two digits?

References

● Decker, R. and S. Hirshfield (1998), The Analytical Engine, First Edition, PWS Publishing Company, Boston, MA, pp. 317.

● Mitchell, B. (1998), "Java Handwriting Recognition Applet," http://members.aol.com/Trane64/java/JRec.html.

● University of Virginia (1998), "Scanning Helpsheets," http://etext.lib.virginia.edu/helpsheets/scan-train.html.

● Webopedia (2000), "Online Computer Dictionary," http://webopedia.internet.com/TERM/o/optical_character_recognition.html.


http://members.aol.com/Trane64/java/JRec.html

http://etext.lib.virginia.edu/helpsheets/scan-train.html

http://webopedia.internet.com/TERM/o/optical_character_recognition.html

Untitled Document

To begin our discussion of artificial intelligence in game playing, we will use the following puzzle described by Brookshear:

"Imagine that you have a puzzle consisting of eight square tiles labeled 1 through 8 mounted in a frame capable of holding a total of nine such tiles in three rows and three columns. Among the tiles in the frame then is a vacancy into which any of the adjacent tiles can be pushed" [Brookshear 1997].

The applet below [Opdorp 1997] shows an example of the puzzle previously described. Note that the numbered tiles have been replaced by images in this version. You can move the tiles by clicking on an image adjacent to the empty tile. Try solving the puzzle and see how many steps you need. Then click the button "Same Shuffle" followed by "Solve" to restore the original shuffle and watch the computer solve the puzzle.

You are probably wondering how the computer is able to solve this puzzle so efficiently. Of course one possibility is that the computer is already programmed with all the possible configurations for this puzzle. However, considering that there are over 180,000 possible configurations for a simple puzzle like this, such a solution is clearly not feasible.

Instead of explicitly storing all the possible solutions, the computer constructs a search tree which holds possible configurations for the 8-puzzle. If we imagine our tiles being numbered, a very simple search tree would look like this:

Notice that moving from one node to the next is equivalent to moving a single tile of the puzzle. In this case, we only need two moves to reach the goal state (the solution configuration). However, if we had started with a more difficult configuration, our search tree would be much larger. By constructing a search tree, the computer can examine the possible configurations of the puzzle systematically until it reaches the goal state. Then by following the path from the goal state back to the start state, the computer can determine

http://courses.cs.vt.edu/~csonline/AI/Lessons/GamePlaying/Lesson.html (1 of 3) [2/26/2007 23:11:11]

Untitled Document

the correct steps to solve the puzzle.

The search tree for a particular problem can grow in size quite rapidly if the goal state is not found quickly. To reduce the amount of searching the computer must do, the tree can be constructed in a depth-first manner rather than a breadth-first manner. In this way a single branch of the tree is considered first before examining other branches. The advantage of this approach is that more promising branches can be considered first.

Another technique for reducing the size of the search tree is by using heuristics. Heuristics are like "rules-of-thumb" that tell the computer whether a given branch of the tree is worth exploring or not. These rules guide the computer by telling it how close a given configuration is to the goal state. Configurations which are closer can be given more attention than less promising configurations. To see the effect of using a heuristic, try running the eight-puzzle applet but change the value in the "Heuristic" list to "None". You may need to shuffle the board a couple of times to find a difficult configuration. You should notice that the computer takes quite a while to find the solution. Then try asking the computer to solve the same configuration (use the "Same Shuffle" button) using the "Manhattan" heuristic. The time required to find the solution is noticeably shorter.

In recent years, researchers have combined sophisticated heuristics and increased computing capability to create formidable computer opponents in the games of chess and checkers. As mentioned before, a computer program now officially holds the world title for checkers. In 1997, a chess program designed by IBM researchers played against world champion Garry Kasparov in a six game match and defeated him. Details of the match can be found at a special IBM web site. Both checkers and chess programs construct search trees similar to the example we saw with the 8-puzzle. Then the programs search through the tree for the most advantageous move. Since computers can perform calculations so quickly, these programs can analyze a tremendous number of moves in a single second. For example, the computer which Kasparov played against could examine and evaluate up to 200,000,000 chess positions per second while Kasparov himself could only examine and evaluate up to three chess positions per second [IBM 1997]. While it might seem that Kasparov's odds against such computing power are hopeless, the reality is that most of the moves the computer examines are useless. Since the computer lacks experience and intuition, it must examine each possible move at every step of the game. On the other hand, chess masters like Kasparov have the ability to immediately recognize that some moves are useless. This allows them to think more efficiently and focus their attention on certain strategies that have the most potential.

References

● Brookshear, J. G. (1997), Computer Science: An Overview, Fifth Edition, Addison-


http://www.research.ibm.com/deepblue/

Untitled Document

Wesley, Reading, MA, pp. 361.● Opdorp, G. (1997), "Eight-puzzle applet,"

http://www.aie.nl/~geert/java/public/EightPuzzle.html.● IBM (1997), "Kasparov Versus Deep Blue,"

http://www.research.ibm.com/deepblue/meet/html/d.2.html.


http://www.aie.nl/%7Egeert/java/public/EightPuzzle.html

http://www.research.ibm.com/deepblue/meet/html/d.2.html

Online CS Modules: Module Title


Neural Networks

In the previous lesson, we saw an example of pattern recognition implemented with an artificial neural network. An artificial neural network (ANN) is a collection of processing units that are connected together in a manner similar to neurons in the brain. Recall that our brains are constructed of millions of neurons that are each interconnected with thousands of neighboring neurons. The electrochemical signals that travel through these connections are altered based on the configuration of the neurons and the type of neuron encountered. To mimic this design, ANNs simulate neurons with processing units that have connections with other processing units. The diagram below shows an example of a processing unit [Brookshear 1997].

In our example, the processing unit has three inputs labeled X1 - X3 which can be either 1 or 0. Each of these inputs is

associated with a certain weight (W1 - W3) which represents the relative strength of the input. The effective input to

the processing unit is computed by taking the weighted average of all the inputs: X1W1 + X2W2 + X3W3. Finally, the

effective input is compared with a threshold value stored in the processing unit. If the effective input is greater than the threshold value, the processing unit produces an output of 1; otherwise the unit produces an output of 0. Consider the following example. Suppose our inputs are 1, 0, 1 respectively, our weights are 1, .5, -2 respectively, and our threshold value is 1. The effective input for this unit would be (1*1) + (0*.5) + (1*-2) or -1. Since this input is less than the threshold value of 1, the unit would output a zero.

The power of neural networks comes from linking these processing units together so that the output of one unit becomes the input of the next unit. Such networks can be trained for specific applications through a process of fine-tuning the weights in each processing unit. The training process works as follows:

1. The network is given a set of inputs for which the correct output is known.2. The output of the network is compared with the known correct output, and the error is measured.3. The weights of the network are adjusted in order to reduce the output error, and the training process is repeated.4. The training continues until the network reaches an acceptable error level on the test input.

While artificial neural networks are only a very crude model of the complexity of biological networks (like the brain), they are still powerful tools for solving certain problems in AI such as pattern recognition that are difficult to describe with algorithms. Consider the following images:

We can easily recognize that each of these images is a picture of the number three. Even the last image with all the noise is still recognizable without difficulty. However, for a computer, the task of recognizing the similarity of these

http://courses.cs.vt.edu/~csonline/AI/Lessons/NeuralNetworks/index.html (1 of 3) [2/26/2007 23:11:18]

http://courses.cs.vt.edu/~csonline/AI/Questions/NeuralNetworks/index.html



images is notoriously difficult. Such a process is not easily described by an algorithm. If you were to ask a friend how he or she can recognize the three, your friend might respond with the answer "Because I've seen a three before!" At first, this seems like a simplistic answer, but as we will see it is really the heart of how neural networks perform pattern recognition.

Neural networks are effective for solving problems with the following characteristics [Smith 1996]:

1. Problems where we can't formulate an algorithmic solution.2. Problems where we can get lots of examples of the behavior we require. 3. Problems where we need to pick out the structure from existing data.

Pattern recognition problems such as handwriting and speech recognition problems both fit these characteristics since they lack clear algorithmic descriptions, have an abundance of example data (i.e. speech and writing), and have clear structures which must be recognized (e.g. the number three or the sound "hello"). Using the example data, a neural network can be "trained" to recognize certain patterns. Of course, writing and speech are not the only data that can be used for pattern recognition. The table below [AI Intelligence 2000] shows other possibilities for neural networks.

Input to the network Output from the networkInput: Digitized Images Output:

of a face the person's name

of an aircraft the category of aircraft: friendly or hostile

of a typed or handwritten character the ASCII value of the character

a solder joint the quality level of the joint

Input: Sensor Readings Output:

from an industrial process the adjustments needed to keep the process within quality and safety limits

from a gas turbine whether or not maintenance is due on the turbine

from an infrared detector how many people are in a room

Input: Financial or Marketing Data Output:

recent share price values a buy/sell indicator

personal financial details the creditworthiness of a customer

exchange rates and inflation trend the predicted movement of exchange rates in four hours' time

a customer's historical buying patterns the likely response of the customer to a direct mail campaign

In the previous lesson, you saw a simple handwriting recognition applet that used a simulated neural network for training and recognition of characters. The following description explains the basic logic behind the applet:

"Assume that we want a network to recognize handwritten digits. We might use an array of, say, 256 sensors, each recording the presence or absence of ink in a small area of a single digit. The network would therefore need 256 input units (one for each sensor), 10 output units (one for each kind of digit) and a number of hidden units. For each kind of digit recorded by the sensors, the network should produce high activity in the appropriate output unit and low activity in the other output units. To train the network, we present an image of a digit and compare the actual activity of the 10 output units with the desired activity. We then calculate the error, which is defined as the square of the difference between the actual and the desired activities. Next we change the weight of each connection so as to reduce the error. We repeat this training process for many different images of each different images of each kind of digit until the network classifies every image correctly" [Stergiou 1996].

Now launch the JRec applet, and let's watch this process happen.

Instructions:

1. Use the mouse to draw a character in the applet window. Note that our input is already digitized since we are using the mouse.



2. Store the character by pressing one of the label buttons (0-9). This saves our digitized input.3. Press "Clear" to erase your character.4. Press the "Train" button to train the applet with the character set. Notice that the graph which appears is

measuring the error. As the training takes place, the network is repeatedly trained until the error is less than 0.1%

5. Redraw one of the characters and press "Test" to recognize the character.

References

● Brookshear, J. G. (1997), Computer Science: An Overview, Fifth Edition, Addison-Wesley, Reading, MA, pp.378.

● Smith, L. (1996), "An Introduction to Neural Networks," http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html.● AI Intelligence (2000), "Neural Networks," http://aiintelligence.com/aii-info/techs/nn.htm.● Stergiou, C. (1996), "Neural Networks, the Human Brain and Learning," http://www-

dse.doc.ic.ac.uk/~nd/surprise_96/journal/vol2/cs11/article2.html.


http://www.cs.stir.ac.uk/%7Elss/NNIntro/InvSlides.html

http://aiintelligence.com/aii-info/techs/nn.htm

http://www-dse.doc.ic.ac.uk/%7End/surprise_96/journal/vol2/cs11/article2.html


Untitled Document

Another important area of research in artificial intelligence is visual processing. Visual processing involves the collection and analysis of digitized image data by computers. In this lesson, we will look at two related examples of visual processing: optical character recognition (OCR) and handwriting recognition.

Optical Character Recognition

OCR is defined by Webopedia [2000] as "the branch of computer science that involves reading text from paper and translating the images into a form that the computer can manipulate...All OCR systems include an optical scanner for reading text, and sophisticated software for analyzing images. Most OCR systems use a combination of hardware (specialized circuit boards) and software to recognize characters, although some inexpensive systems do it entirely through software. Advanced OCR systems can read text in large variety of fonts, but they still have difficulty with handwritten text". Currently OCR software can accurately recognize up to 99% of the characters in a well-printed document. While this is a very high percentage, it still leaves something to be desired when scanning long texts. Consider a typical novel of 300 pages, 40 lines per page, and 75 characters per line. Even with 99% accuracy, an OCR software package would need help 9,000 times in order to output the entire document correctly!

OCR software uses two main approaches to identifying characters: matrix matching and pattern extraction. Matrix matching is the simpler of the two since it only involves comparing the scanned data to stored templates. This approach is effective when the characters being identified are uniform in style and size. Pattern extraction involves the identification of certain features which are unique to an individual character. For example, the lower case character 't' is typified by a horizontal line crossed by a shorter vertical line. These patterns generally remain the same regardless of the size and style of the font. However, pattern extraction can be very difficult for other characters. Consider the images below showing the character 'y' in a variety of fonts. What patterns would you use to accurately describe this character?

According to Decker and Hirshfield, "most OCR programs use a combination of matrix matching and pattern extraction, using matrix matching for monospaced typefaces such as Courier, in which all of the characters have the same width, and pattern extraction for proportional typefaces such as Palatino and Helvetica" [Decker and Hirshfield 1998]. In order to get a feel for the effectiveness of OCR software, browse through the following

http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/Lesson.html (1 of 3) [2/26/2007 23:11:21]

Untitled Document

web page provided by the Electronic Text Center at the University of Virginia [1998]. This page gives five examples of various texts that were scanned and converted to HTML format.

Handwriting Recognition

Some text documents cannot be converted to electronic format using OCR because they are handwritten rather than printed. In this case, handwriting recognition software can be used to convert the document. Since handwriting differs greatly among individuals, more sophisticated analysis is required to identify characters. Handwriting recognition also offers an alternative method for inputting text into a computer. Many of the current handheld computers incorporate this technology in order to avoid integrating a keyboard into the unit. While this technology is promising, it is still currently much slower than input via a keyboard. For this reason, handwriting recognition is mainly limited to computing devices for which keyboards are impractical because of their large size.

Just like the various fonts above present a challenge to OCR software, different writing styles challenge handwriting recognition software. In order to improve the accuracy of recognition, most handwriting recognition software incorporates some technique for learning an individual's style. With this method, the software compares samples of the user's writing rather than using a fixed pattern for comparison and recognition. As long as the user is fairly consistent in their writing style, this method can be very effective for recognizing individual writing styles.

To try an example of handwriting recognition software, click the button below to launch JRec, a handwriting recognition applet designed by Bob Mitchell [1998]. Then follow the instructions below to train the applet and perform some simple tests.

Instructions:

1. Use the mouse to draw a character in the applet window.2. Store the character by pressing one of the label buttons (0-9). This also associates

the character with the digit code on the button.3. Press "Clear" to erase your character.4. Repeat steps 1-3 for up to 10 different characters.5. Press the "Train" button to train the applet with the character set.6. Redraw one of the characters and press "Test" to recognize the character.

Since JRec is a fairly simple example of handwriting recognition, you will probably notice that it easily confuses characters with similar shapes. For example, try training the applet to recognize '5' and '6'. How often does it correctly distinguish these two digits?


Untitled Document

References

● Decker, R. and S. Hirshfield (1998), The Analytical Engine, First Edition, PWS Publishing Company, Boston, MA, pp. 317.

● Mitchell, B. (1998), "Java Handwriting Recognition Applet," http://members.aol.com/Trane64/java/JRec.html.

● University of Virginia (1998), "Scanning Helpsheets," http://etext.lib.virginia.edu/helpsheets/scan-train.html.

● Webopedia (2000), "Online Computer Dictionary," http://webopedia.internet.com/TERM/o/optical_character_recognition.html.


http://members.aol.com/Trane64/java/JRec.html

http://etext.lib.virginia.edu/helpsheets/scan-train.html

http://webopedia.internet.com/TERM/o/optical_character_recognition.html

Sample OCR Scans -- Electronic Text Center

Optical Character Recognition: Some Sample Scans

Electronic Text CenterAlderman Library, University of Virginia

Charlottesville, VA 22903(804 924-3230) [email protected]

This document is a supplement to the scanning helpsheets, and therefore it does not offer guidance on how to set up a scanner or how to use OmniPage Professional.

These few examples show some typical results from scanning different types of printed texts.

Note: These test scans were made in May 1998 using OmniPage Pro, version 8. The results of each scan were exported directly as HTML except in the case of the Middle English text, which was exported as Rich Text Format and then converted to HTML.

A modern text, well printed and of good typesize.

G. Thomas Tanselle. "The Life and Work of Fredson Bowers." Studies in Bibliography 46, p. 1.

Very good results. Almost no errors; rule lines at top are ignored but cause no problem; the software misses the large initial letter, and an unnecessary hard return is inserted within the first paragraph. Small capitals in the first line are retained, as are italics throughout. Minimal post-OCR cleanup required. Text of this print quality should not pose any significant problems.

http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/OCRscans.html (1 of 4) [2/26/2007 23:11:29]

http://www.lib.virginia.edu/

http://etext.lib.virginia.edu/helpsheets/index.html

http://etext.lib.virginia.edu/

http://etext.lib.virginia.edu/helpsheets/helpsheets.html


An image of the source text

The results of the OCR scan

A modern mass-produced paperback. Good typesize.

E. Arnot Robertson. Ordinary Families. London: Virago, 1982. p. 15.

Good results. Very few errors: there is no space between "for, it" in the fourth line, and a hyphen has been erroneously inserted before "to" in the sixth line. Scans well, despite the lower quality of the printing in comparison to example 1 above (the good typesize makes up for the poorer print quality).



A printed, glossy brochure. Adequate typesize.

Special Collections of the University of Virginia Library. 1995.

Good results. Few errors: the software stumbles over the apostrophe in "University's" in two places, and it inserts erroneous umlauts over the letter u in "full" in the first paragraph and in "fund" in the third paragraph; in the fourth paragraph "F." is mistaken for "E".

Note: One can expect similar results from a clear printout from a laser printer or a good, clean photocopy with adequate typesize.




http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/OCRscans_files/bowers.jpg

http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/OCRscans_files/bowers.html

http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/OCRscans_files/robertson.jpg

http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/OCRscans_files/robertson.html

http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/OCRscans_files/speccoll.jpg

http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/OCRscans_files/speccoll.html


A Middle English text.

Frances E. Richardson, ed. Sir Eglamour of Artois. London, New York, Toronto: Oxford University Press, 1965.

Good results. The software represents words it does not know (i.e., that are not in its dictionary) in green; in this case most of the text is in green, which may appear alarming. However, an inspection of the text reveals no major errors in the poem, and only a few errors in the footnotes. Naturally the software sees the thorn (þ) as a lower-case p and the yogh as a numeral 3. However, it is possible to train OmniPage to recognize the thorn and yogh (choose Edit Training File... from the Tools menu) and to represent them with the abbreviated SGML entity references &t; and &y;. Note that after training the software it does a good job of recognizing the Middle English characters.


The results of the OCR scan prior to training the software to recognize Middle English characters

The results of the OCR scan after training for Middle English characters

A 19th-century commercial printing

Geoffrey Chaucer. The Complete Works of Geoffrey Chaucer. Rev. Walter W. Skeat, ed. Oxford: Clarendon Press, 1894.

Generally good results, as one typically finds with most 19th century printings (and some 18th century ones) of reasonable clarity, average typesize, and a font style that is not ornate or archaic. There are very few errors in the poem itself: the software fails to recognize the •e with umlaut) in line 10. There are several errors in the footnotes, as one may expect with a very small print size. Minimal cleanup required in the poem, but the footnotes would have to be checked carefully.



A 17th-century printing


http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/OCRscans_files/calig.jpg

http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/OCRscans_files/calig1.html

http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/OCRscans_files/calig2.html

http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/OCRscans_files/chaucer.jpg

http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/OCRscans_files/chaucer.html


Plaine Description of the Barmudas. London: W. Stansby, 1613.

A complete disaster. The resulting text is riddled with errors. The preponderance of unfamiliar letter forms (the long S) and ligatures, and the broken type, causes an unacceptable error rate. This will be true even if you train the OCR software to recognize some of the ligatures, although training will cut the error rate somewhat.

The effort of scanning and correcting this type of text is greater than the effort of typing it in manually.



Back to Visual Processing


http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/OCRscans_files/barmudas.jpg

http://courses.cs.vt.edu/~csonline/AI/Lessons/VisualProcessing/OCRscans_files/barmuda.html

http://www.lib.virginia.edu/ecenters.html

http://etext.lib.virginia.edu/index.html

http://www.lib.virginia.edu/

Online CS Modules: Expert Systems


Expert Systems

The final AI application that we will examine is expert systems. These systems are a combination of at least three entities: a database, rules for interpreting the data, and a sophisticated algorithm for searching the database by applying the rules. Usually these systems are very domain specific because of the large amount of rules necessary to represent expert knowledge in a given domain. According to Ashwin Ram [1993] of Georgia Institute of Technology, expert systems exhibit the following characteristics:

1. Solve expert problems by expert knowledge, or handle tasks that require detailed knowledge in a particular area.

2. Operate in a micro-world where a particular kind of problem solving is required.3. Encapsulates a significant portion of the specialized knowledge that an expert

human problem solver would bring to bear.4. Exhibit performance approaching that of an expert.5. Usually built on a production system with certainty values attached to hypotheses.

Now consider an example expert system like MYCIN, one of the first expert systems developed for diagnosing medical problems. According to Alison Cawsey [1994] of Heriot-Watt University, "MYCIN was an expert system developed at Stanford in the 1970s. Its job was to diagnose and recommend treatment for certain blood infections. To do the diagnosis 'properly' involves growing cultures of the infecting organism. Unfortunately this takes around 48 hours, and if doctors waited until this was complete their patient might be dead! So, doctors have to come up with quick guesses about likely problems from the available data, and use these guesses to provide a 'covering' treatment where drugs are given which should deal with any possible problem. MYCIN was developed partly in order to explore how human experts make these rough (but important) guesses based on partial information."

Notice that MYCIN matches the characteristics listed earlier. First, it was designed to solve expert problems in a particular area, namely medicine. Second, it solves a particular kind of problem, namely blood infections. Third, it encapsulates the knowledge that is usually held by an experienced doctor. Fourth, it exhibited performance on the level with human experts. In fact, in some tests, MYCIN even outperformed members of the Stanford medical school! Fifth, MYCIN was built using a system of rules that allowed it to logically compute a diagnosis. These rules represented the "expert knowledge" in a sense. The table below shows one of the many rules in the MYCIN system. Notice that the conclusion (the THEN clause) includes a probability of certainty. Given the preceding facts, the system would make the diagnosis of staphyloccus with 70% confidence.

http://courses.cs.vt.edu/~csonline/AI/Lessons/ExpertSystems/index.html (1 of 3) [2/26/2007 23:11:35]

http://courses.cs.vt.edu/~csonline/AI/Questions/ExpertSystems/index.html



IF the stain of the organism is gram-positiveAND the morphology of the organism is coccusAND the growth conformation of the organism is clumpsTHEN (0.7) the identity of the organism is staphyloccus

An example rule from the MYCIN system

We can represent the general architecture of expert systems by the diagram below [Ram 1993]. Notice this diagram includes all three parts of our definition: a database, rules for interpreting the data (knowledge base), and a sophisticated algorithm for searching the database by applying the rules (inference engine). For the MYCIN system, the knowledge base consisted of rules similar to the one above. The database of facts would include observations about the blood infection that can be applied to the knowledge base. The inference engine uses the relevant facts and rules to reach an expert diagnosis of the situation.

Some simple expert systems for various topics of interest are available on the web. While these systems do not approach the complexity of commercial expert systems, they still follow the same basic model. The table below lists each of the systems and their purpose. Click on the links below to open the system in a new window, and see if you can identify the parts that correspond to the definition of an expert system.

Expert System Purpose

Exsel Expert system for brush and weed control technology selection in Texas.


http://cnrit.tamu.edu/rsg/exsel/work/exsel.cgi


Whale Watcher Aids in the identification of whales.

Pest Identification Identifies insects which attack Douglas Fir Cones.

Grad Admissions Advisor Expert system application for screening applications for admission to graduate school.

Assorted Examples A collection of expert systems ranging from animal identification to computer trouble shooting.

References

● Cawsey, A. (1994), "Databases and Artificial Intelligence Lecture Notes," Department of Computing and Electrical Engineering, Heriot-Watt University, Edinburgh, UK, http://www.cee.hw.ac.uk/~alison/ai3notes/section2_5_5.html.

● Ram, A. (1993), "Artificial Intelligence Lecture Notes," College of Computing, Georgia Institute of Technology, Atlanta, Georgia, http://www.cc.gatech.edu/classes/cs3361_98_winter/expert.txt.


http://www2.aiinc.ca/demos/javawhale.html

http://www.for.gov.bc.ca/TIP/IID/pestmain.htm

http://www.aiinc.ca/demos/grad.html

http://demo.multilogic.com/exsysweb/

http://www.cee.hw.ac.uk/%7Ealison/ai3notes/section2_5_5.html

http://www.cc.gatech.edu/classes/cs3361_98_winter/expert.txt

Untitled Document

In the previous lesson, we saw an example of pattern recognition implemented with an artificial neural network. An artificial neural network (ANN) is a collection of processing units that are connected together in a manner similar to neurons in the brain. Recall that our brains are constructed of millions of neurons that are each interconnected with thousands of neighboring neurons. The electrochemical signals that travel through these connections are altered based on the configuration of the neurons and the type of neuron encountered. To mimic this design, ANNs simulate neurons with processing units that have connections with other processing units. The diagram below shows an example of a processing unit [Brookshear 1997].

In our example, the processing unit has three inputs labeled X1 - X3 which can be either 1 or 0. Each of these inputs is

associated with a certain weight (W1 - W3) which represents the relative strength of the input. The effective input to

the processing unit is computed by taking the weighted average of all the inputs: X1W1 + X2W2 + X3W3. Finally, the

effective input is compared with a threshold value stored in the processing unit. If the effective input is greater than the threshold value, the processing unit produces an output of 1; otherwise the unit produces an output of 0. Consider the following example. Suppose our inputs are 1, 0, 1 respectively, our weights are 1, .5, -2 respectively, and our threshold value is 1. The effective input for this unit would be (1*1) + (0*.5) + (1*-2) or -1. Since this input is less than the threshold value of 1, the unit would output a zero.

The power of neural networks comes from linking these processing units together so that the output of one unit becomes the input of the next unit. Such networks can be trained for specific applications through a process of fine-tuning the weights in each processing unit. The training process works as follows:

1. The network is given a set of inputs for which the correct output is known.2. The output of the network is compared with the known correct output, and the error is measured.3. The weights of the network are adjusted in order to reduce the output error, and the training process is repeated.4. The training continues until the network reaches an acceptable error level on the test input.

While artificial neural networks are only a very crude model of the complexity of biological networks (like the brain), they are still powerful tools for solving certain problems in AI such as pattern recognition that are difficult to describe with algorithms. Consider the following images:

We can easily recognize that each of these images is a picture of the number three. Even the last image with all the noise is still recognizable without difficulty. However, for a computer, the task of recognizing the similarity of these images is notoriously difficult. Such a process is not easily described by an algorithm. If you were to ask a friend how he or she can recognize the three, your friend might respond with the answer "Because I've seen a three before!" At first, this seems like a simplistic answer, but as we will see it is really the heart of how neural networks perform pattern recognition.

http://courses.cs.vt.edu/~csonline/AI/Lessons/NeuralNetworks/Lesson.html (1 of 3) [2/26/2007 23:11:36]

Untitled Document

Neural networks are effective for solving problems with the following characteristics [Smith 1996]:

1. Problems where we can't formulate an algorithmic solution.2. Problems where we can get lots of examples of the behavior we require. 3. Problems where we need to pick out the structure from existing data.

Pattern recognition problems such as handwriting and speech recognition problems both fit these characteristics since they lack clear algorithmic descriptions, have an abundance of example data (i.e. speech and writing), and have clear structures which must be recognized (e.g. the number three or the sound "hello"). Using the example data, a neural network can be "trained" to recognize certain patterns. Of course, writing and speech are not the only data that can be used for pattern recognition. The table below [AI Intelligence 2000] shows other possibilities for neural networks.

Input to the network Output from the networkInput: Digitized Images Output:

of a face the person's name

of an aircraft the category of aircraft: friendly or hostile

of a typed or handwritten character the ASCII value of the character

a solder joint the quality level of the joint

Input: Sensor Readings Output:

from an industrial process the adjustments needed to keep the process within quality and safety limits

from a gas turbine whether or not maintenance is due on the turbine

from an infrared detector how many people are in a room

Input: Financial or Marketing Data Output:

recent share price values a buy/sell indicator

personal financial details the creditworthiness of a customer

exchange rates and inflation trend the predicted movement of exchange rates in four hours' time

a customer's historical buying patterns the likely response of the customer to a direct mail campaign

In the previous lesson, you saw a simple handwriting recognition applet that used a simulated neural network for training and recognition of characters. The following description explains the basic logic behind the applet:

"Assume that we want a network to recognize handwritten digits. We might use an array of, say, 256 sensors, each recording the presence or absence of ink in a small area of a single digit. The network would therefore need 256 input units (one for each sensor), 10 output units (one for each kind of digit) and a number of hidden units. For each kind of digit recorded by the sensors, the network should produce high activity in the appropriate output unit and low activity in the other output units. To train the network, we present an image of a digit and compare the actual activity of the 10 output units with the desired activity. We then calculate the error, which is defined as the square of the difference between the actual and the desired activities. Next we change the weight of each connection so as to reduce the error. We repeat this training process for many different images of each different images of each kind of digit until the network classifies every image correctly" [Stergiou 1996].

Now launch the JRec applet, and let's watch this process happen.

Instructions:

1. Use the mouse to draw a character in the applet window. Note that our input is already digitized since we are using the mouse.

2. Store the character by pressing one of the label buttons (0-9). This saves our digitized input.3. Press "Clear" to erase your character.4. Press the "Train" button to train the applet with the character set. Notice that the graph which appears is

measuring the error. As the training takes place, the network is repeatedly trained until the error is less than


Untitled Document

0.1%5. Redraw one of the characters and press "Test" to recognize the character.

References

● Brookshear, J. G. (1997), Computer Science: An Overview, Fifth Edition, Addison-Wesley, Reading, MA, pp.378.

● Smith, L. (1996), "An Introduction to Neural Networks," http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html.● AI Intelligence (2000), "Neural Networks," http://aiintelligence.com/aii-info/techs/nn.htm.● Stergiou, C. (1996), "Neural Networks, the Human Brain and Learning," http://www-

dse.doc.ic.ac.uk/~nd/surprise_96/journal/vol2/cs11/article2.html.


http://www.cs.stir.ac.uk/%7Elss/NNIntro/InvSlides.html

http://aiintelligence.com/aii-info/techs/nn.htm



Online CS Modules: Summary of Artificial Intelligence


Summary of Artificial

Intelligence

Let's take a quick review of all that we have learned about artificial intelligence in these lessons.

1. We introduced the topic of artificial intelligence by discussing the types of problems that computers can solve well and the types of problems that humans can solve well. We concluded that computers are good at well-defined, repetitive computations but poor at complex tasks like reasoning.

2. We compared the human mind to the fastest computers. Although computer technology has grown rapidly in the past 20 years, the human mind still excels over the computer in storage capacity and connection complexity. Computers, however, excel at data transfer since the electrons that represent their data travel at the speed of light!

3. We investigated the field of natural language processing, and we saw that since the English language is ambiguous, it is difficult for computers to determine the precise meaning of a sentence. Language processing software must consider three aspects of a sentence when determining its meaning: syntax, semantics, and context.

4. We studied how computers approach the problem of game playing. In order for computers to compete at games like chess and checkers, they must search an immense number of potential moves to find a good move. This is accomplished by constructing a game tree to represent the possible moves from a given state of the game. By searching this tree "intelligently", the computer can reduce the number of moves it must consider.

5. We examined two technologies that have emerged from the field of visual processing. The first technology is optical character recognition (OCR) and its primary purpose is to read text from paper and translate the images into a form that the computer can manipulate. The second technology is handwriting recognition and its primary purpose is to convert handwritten language into machine editable text.

6. We saw how artificial neural networks can be used to solve problems that do not have an algorithmic representation. One such class of problems is pattern recognition. Artificial neural networks can be trained to recognize certain patterns using known example patterns. Since handwriting recognition is really a special case of pattern recognition, neural networks can be used to solve this problem.

7. We briefly explored the parts of an expert system and saw several examples of these systems

http://courses.cs.vt.edu/~csonline/AI/Lessons/Summary/index.html (1 of 2) [2/26/2007 23:11:39]

http://courses.cs.vt.edu/~csonline/AI/Questions/index.html


Online CS Modules: Summary of Artificial Intelligence

online.

http://courses.cs.vt.edu/~csonline/AI/Lessons/Summary/index.html (2 of 2) [2/26/2007 23:11:39]

Untitled Document

The final AI application that we will examine is expert systems. These systems are a combination of at least three entities: a database, rules for interpreting the data, and a sophisticated algorithm for searching the database by applying the rules. Usually these systems are very domain specific because of the large amount of rules necessary to represent expert knowledge in a given domain. According to Ashwin Ram [1993] of Georgia Institute of Technology, expert systems exhibit the following characteristics:

1. Solve expert problems by expert knowledge, or handle tasks that require detailed knowledge in a particular area.

2. Operate in a micro-world where a particular kind of problem solving is required.3. Encapsulates a significant portion of the specialized knowledge that an expert

human problem solver would bring to bear.4. Exhibit performance approaching that of an expert.5. Usually built on a production system with certainty values attached to hypotheses.

Now consider an example expert system like MYCIN, one of the first expert systems developed for diagnosing medical problems. According to Alison Cawsey [1994] of Heriot-Watt University, "MYCIN was an expert system developed at Stanford in the 1970s. Its job was to diagnose and recommend treatment for certain blood infections. To do the diagnosis 'properly' involves growing cultures of the infecting organism. Unfortunately this takes around 48 hours, and if doctors waited until this was complete their patient might be dead! So, doctors have to come up with quick guesses about likely problems from the available data, and use these guesses to provide a 'covering' treatment where drugs are given which should deal with any possible problem. MYCIN was developed partly in order to explore how human experts make these rough (but important) guesses based on partial information."

Notice that MYCIN matches the characteristics listed earlier. First, it was designed to solve expert problems in a particular area, namely medicine. Second, it solves a particular kind of problem, namely blood infections. Third, it encapsulates the knowledge that is usually held by an experienced doctor. Fourth, it exhibited performance on the level with human experts. In fact, in some tests, MYCIN even outperformed members of the Stanford medical school! Fifth, MYCIN was built using a system of rules that allowed it to logically compute a diagnosis. These rules represented the "expert knowledge" in a sense. The table below shows one of the many rules in the MYCIN system. Notice that the conclusion (the THEN clause) includes a probability of certainty. Given the preceding facts, the system would make the diagnosis of staphyloccus with 70% confidence.

IF the stain of the organism is gram-positiveAND the morphology of the organism is coccusAND the growth conformation of the organism is clumpsTHEN (0.7) the identity of the organism is staphyloccus

http://courses.cs.vt.edu/~csonline/AI/Lessons/ExpertSystems/Lesson.html (1 of 3) [2/26/2007 23:11:41]

Untitled Document

An example rule from the MYCIN system

We can represent the general architecture of expert systems by the diagram below [Ram 1993]. Notice this diagram includes all three parts of our definition: a database, rules for interpreting the data (knowledge base), and a sophisticated algorithm for searching the database by applying the rules (inference engine). For the MYCIN system, the knowledge base consisted of rules similar to the one above. The database of facts would include observations about the blood infection that can be applied to the knowledge base. The inference engine uses the relevant facts and rules to reach an expert diagnosis of the situation.

Some simple expert systems for various topics of interest are available on the web. While these systems do not approach the complexity of commercial expert systems, they still follow the same basic model. The table below lists each of the systems and their purpose. Click on the links below to open the system in a new window, and see if you can identify the parts that correspond to the definition of an expert system.

Expert System Purpose

Exsel Expert system for brush and weed control technology selection in Texas.

Whale Watcher Aids in the identification of whales.

Pest Identification Identifies insects which attack Douglas Fir Cones.


http://cnrit.tamu.edu/rsg/exsel/work/exsel.cgi

http://www2.aiinc.ca/demos/javawhale.html

http://www.for.gov.bc.ca/TIP/IID/pestmain.htm

Untitled Document

Grad Admissions Advisor Expert system application for screening applications for admission to graduate school.

Assorted Examples A collection of expert systems ranging from animal identification to computer trouble shooting.

References

● Cawsey, A. (1994), "Databases and Artificial Intelligence Lecture Notes," Department of Computing and Electrical Engineering, Heriot-Watt University, Edinburgh, UK, http://www.cee.hw.ac.uk/~alison/ai3notes/section2_5_5.html.

● Ram, A. (1993), "Artificial Intelligence Lecture Notes," College of Computing, Georgia Institute of Technology, Atlanta, Georgia, http://www.cc.gatech.edu/classes/cs3361_98_winter/expert.txt.


http://www.aiinc.ca/demos/grad.html

http://demo.multilogic.com/exsysweb/

http://www.cee.hw.ac.uk/%7Ealison/ai3notes/section2_5_5.html

http://www.cc.gatech.edu/classes/cs3361_98_winter/expert.txt

Artificial Intelligence

Documents

Transcript of Artificial Intelligence