Should a computer compete for language?

Should a computer compete for language?An exploration of a computer acquiring natural language using brain modeling, evolution and affective computing

Why should a computer acquire natural language?I think it is useful to pursue the quest of learning a computer natural language.It is useful to do this because it makes us able to cope with the information overload and filter failure we face in our world. Filter failure is concept that Clay Shirkey a professor at Interactive Telecommunications Program thought up for the information explosion of the Internet era.1 He says; there has been more books than anybody could read since the sixteenth century . So information overload is not the problem we are facing with the Internet. The problem we face today is that the natural filters that existed have disappeared. For example an encyclopedia could only have a limited number of pages. A television station can only have one program airing at the same time. Look to the modern variant of these media, Wikipedia want to collect all human knowledge and has over 3,5 million articles. YouTube gets 24 hours of footage uploaded every second. The filters of television or paper encyclopedias have ceased to exist and the floodgates are open.

Putting human filters back in doesn’t seem useful because the amount of data is only getting bigger and the human brain isn’t keeping up. A computer might be, it is able to run for days or months just analysing texts for relevant content. A computer then becomes a personal filter between the enormous data that is available and the stuff that is relevant to a certain user. Understanding which information is relevant in an argument. A computer is able to filter for a specific person and not the taste and opinion of human filters. I think the quest for learning a computer natural language is relevant and even necessary to manage the enormous flood of data.

Our Internet is more and more tailor made for our interests. This seems to solve the above motioned problems but can make them even worse. We will come to live more and more in a filter bubble.2Our world is tailor made for us without us knowing what is filtered out. Google will give you different result based on 60 parameters without you even being logged in to Google. This is a major issue if take into account that even news via your social network is filtered by Facebook. You will only see progressive news as a progressive voter and only conservative news as a conservative voter. Your views will never be challenged by an algorithmic gatekeeper. Unless such a gate keeper would understand the content and witch views opposes each other. Then it could give you a regular and solid argument that opposes your view. Or even completely different texts on the Internet that don’t have the same subject as you like, but the same style of writing.

If we don’t tackle this problem we will all float in our own filter bubbles. Without ever finding anything that opposes our views. The idea of a free Internet where every voice is equal is gone and a great foundation for more extremist views to grow is laid.1 Shirky, C. (2008)2 Partiser, E. (2011)

1

How are we going to learn a computer natural language?The problems with natural language are big. There are an infinite number of grammatically correct sentences in a language, but an even greater number of sentences that are incorrect. And the words used within these languages aren’t even clearly defined. Take the concept of running. A person can run. An engine can run, A river runs and a nose can run. The problem is that the same word can mean many things in a different sentence. If you want to describe the grammatical rules and the meaning of every word in a language you will get in trouble. A word can mean so many things in a different context.

We need breakthrough innovations to tackle the problem of natural language in computers. So how are we going to find these breakthrough innovations?3 First we are going to put the problem of natural language in new and different context Away from the idea of Turing machines, data and building specific programs. We need to cope with input that is noisy but predictable. Look to heuristics instead of hard coded rules.

After explaining this context I will explain the Hierarchical Temporal Memory(HTM) witch can cope with the parameters mentioned above, and is modelled on the neocortex.4 First we are going to put the problem of natural language in new and different context Away from the idea of Turing machines, data and building specific programs. We need to cope with input that is noisy but predictable. Look to heuristics instead of hard coded rules.

After explaining this context I will explain the Hierarchical Temporal Memory (HTM) witch can cope with the parameters mentioned above, and is modelled on the neocortex. # This is the brain area where for example higher vision, hearing and language is processed. Next to this algorithm I suggest some concepts of affective computing to change the state of the system according to changes in the input.

Evolution and Memes in the context of languageOur human brains are the only device that has been able to acquire language. The human brain is a product of evolution so can we learn something from evolution that makes it easier for a computer to acquire language.5 There are three powers that govern evolution:

1. Variation: In the context of genetic evolution these are the genetic mutations in the DNA when it is copied. These variations make sure that novel qualities can arise that might be beneficial to the animal the DNA is present in. Think of a mutation that make a bacteria resistant to penicillin.

2. Selection: Selection is the force that prohibits all variations to survive. This can be anything in the environment that prohibits the transfer of genes. This can either be by killing an animal or prohibit its reproduction. If they take the bacteria example the selection criteria may be a penicillin rich environment. Selection would give the

3 Baldwin, C. Y.(2009)4 Hawkins,J. (2007)

5 Dawkins, R. (1978)

2

advantage to the bacteria that has the penicillin resistant piece of DNA over the non resistant bacteria.

3. Heredity: The traits made a certain DNA mutation survive should past over to the next generation. This seems logical but is an essential part of evolution if a trait could not be passed from one generation to the next the whole process of variation and selection could not be exploited.

Why are these three processes so important, because they also play a key role in our brains. Next to a genetic evolution the human brain undergoes a memetic evolution. Culture, customs and language are not transferred via genes but via so called memes. So what is a meme? A meme is anything that can be copied between brains. This can literally be any concept, behaviour or word. The same process applies as in with genetic evolution. There is variation in memes, a good example is the party game where a sentence is passed along by whispering it in each others ears. The more people you add to line the more the sentence is transformed. You can see this as a variation. Selection some ideas stick in peoples minds and others don’t. Also the limited capacity of a human brain and the time it would take to pass on all ideas you have make sure some memes are selected above others. And the heredity in the world of memetics is that ensured by the sharing of ideas between brain.

So does this memetic evolution effect language. It does; words get a new meaning and new words are added. Some words become old fashioned or even complete languages die out because they are not taught to a new generation. Just like species of animals a language must be adept to its environment or go the path of the dodo. So a computer that learns language must be as adaptable and ever changing as language itself. Language is not a data set it is a process so the acquiring of language should be process focused. Competition like our brains arose from genetic and memetic evolutionary competition.

Imperfect data and time in the context of languageOur brains, the world and our senses are full of noise. Still our brains are very capable of of coping with these imperfections. Compare this to the world of computers and you see cracking language will be hard with traditional Turing based computing. Knock out one bit in computer memory and it won’t function. Change a bit and a file and it will be corrupted. After a heavy night of drinking and knocking out some brain cells our brain still functions. We can understand people in a crowded room with a lot of other people talking. Our brains are build to cope with imperfect data. A computer is much better in things where accuracy is required like calculus. Why is this?

I think this has to do with two things the first is heuristics vs rules. A computer is able to cope with rules that are need to be followed in the same way. This is great for doing calculus, but not so great for tasks that need to be done in natural world You can see. Thus for example with a making an algorithm that can distinguish between a cat and a dog. You can feed a computer thousand of images of cats and dogs but it will not be able to find a general rule. A computer that learns language should be able to cope with heuristics. Heuristics are rules of thumb and not hard coded if else statements. This being better able to cope with imperfect data. You could

3

argue that this is not really necessary. The human brain understands text only if it is grammatically correct. But you can add a lot of noise to a text and a brain can still be able to decipher it. The the text below shows how even with a lot of noise added to a text a human is still able to get the meaning from it.6

“Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.”

We do not read every individual letter but the the word as a whole. Also the shape and height of different letters make us able to read something much faster.

“A ALL UPPERCASE TEXT IS MUCH MORE UNPLEASANT TO READ, than text with lowercase letters. We recognize different letters by their height.”

For computer there is no real difference between the uppercase and lowercase letters as a data structure. As long as the data is valid there is no problem for a computer to interpret the data. The text example above that would cause a problem. This is the fundamental difference between human and computers. For a computer validity is more important than structure. For a human structure is much more important than validity. We can see past a spelling error but knock out all line beaks, tabs and spaces in a piece of computer code and it will be unreadable. For a computer it's the other way at around the mistake will make a piece of computer code unreadable but the line breaks, spaces and tabs serve no function. The hard coded rules ask for validity above overall structure. Heuristics ask for overall structure but can handle much more messy and imperfect input.

A brain model for languageHierarchical Temporal Memory is a way of modelling the prefrontalcortex of a human. The prefrontalcortex is the place where higher vision, listening and language is processed .# This is interesting because coping with noisy input is what this part of the brain does and it is responsible for language. So if there is a way of using the brain structures to understand language it is the prefrontalcortex we should look.

Hierarchical Temporal Memory (HTM) is based on concepts that are interesting for computers understanding language. The main one is that it is temporal. This means that things are recognized in sequence. Language is sequential it exists in sentences. And sentences exist in paragraphs. So the temporal system makes sense for understanding language.

6 Rawlinson, G. E.(1976)

4

The model is hierarchical this means that it goes up a pyramid like structure. With a broad base and a narrow top. See image below. This is perfect for language, because this system could first take up words (input image) than a paragraph (level 0) and move up through the hierarchy, and eventually answers the question what is this text about (level 1) and is interesting for user X (level 2)?

So how does this HTM works? It is now used mostly for computer vision systems, but because the prefrontalcortex structure is the same for language processing as for vision this is no problem. The system takes a group of pixels and looks at them if a pattern change occurs it fires up to next layer but also fires to its own layer to knock them out. In this way information travels up the hierarchy and only the most efficient system “survives”. So the system competes on each layer of the hierarchy with itself. The layer that is most effective survives. This is how in the brain different groups of brain cells compete and only the most effective paths survive. This way of representing the world in entities and connections is how our brain works so it is effective for translating the products of these brains for example texts. The texts can also be represented hierarchically with words at the bottom, sentences in the next layer, paragraphs in the layer on top of that and the full text on top. The text gets summarized into a few nodes or words at the top level. This is how the brain stores information by finding a common denominator. If you give people a list of fruits but not the word fruit itself and quiz them later on the content of the list they are sure that the word fruit was in the list. This is because the common label of the individual parts is fruits. With the HTM system a computer is able to make this same “mistake”. It does not try to explain the complete data set of a text but tries to find the common denominator or the subject of the text.

The HTM method does not search for clear mathematical outlines but is useful for noisy heuristic problems. It is well suited for the problems raised in the chapters above. It works well with noisy data and competes on every step of the hierarchy with itself so an “evolutionary” process is facilitated.

Conclusion

5

To overcome the flood of information that comes in via the internet and all other forms of media we need to install new filtering systems. A computer that could understand a text and see if it is relevant for a user could be a solution. To do this we need a breakthrough innovation because the Turing machine based algorithms won’t do to solve the problem of language. So we need to look to the only device that has solved the problem the human brain. The model we are going to use should like the human brain is able to cope with ambiguity and noise. Next to it should compete in a “Darwinian” struggle. The Hierarchical Temporal Memory has these attributes and is based on the neo cortex. This is the part where language resides in the human brain. To solve the problem learning a computer language is the model to use.

Reference Baldwin, C. Y., von Hippel, E. A. (2009). Modeling a Paradigm Shift: From Producer Innovation to User and Open Collaborative Innovation. Working Paper, Cambridge, December 23, 2009, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1502864

Dawkin, R.,” The Selfish Gene”,Oxford University Press, USA, 1978

Hawkins, J.,” HIERARCHICAL TEMPORAL MEMORY”September 2011http://www.numenta.com/htm-overview/education/HTM_CorticalLearningAlgorithms.pdf

Hawkins, J.,George, D.,”Hierarchical Temporal Memory:Concepts, Theory, and Terminology” , Numenta Inc., 27 March 2007http://www.numenta.com/htm-overview/education/Numenta_HTM_Concepts.pdf

Partiser, E. , “The Filter Bubble”,Penguin Press HC,12 May 2011

Shirky, C ,”It's Not Information Overload. It's Filter Failure”,Web 2.0 Expo NY ,19 September 2008http://blip.tv/web2expo/web-2-0-expo-ny-clay-shirky-shirky-com-it-s-not-information-overload-it-s-filter-failure-1283699

Rawlinson, G. E., “The significance of letter position in word recognition.” Unpublished PhD Thesis, Psychology Department, University of Nottingham, Nottingham UK.,1976

6

Should a computer compete for language?

Technology

Transcript of Should a computer compete for language?