Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some...

Post on 24-May-2015

104 views 2 download

Tags:

description

A presentation in the symposium “Interfaces between Language, Literature and Culture:  Research at Department of Modern Languages” at University of Helsinki, 19th of May, 2014

Transcript of Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some...

Timo Honkela

27 May 2014

Digital Preservation and Computational Modeling of Language and Culture:

Some Philosophical and Empirical Aspects

timo.honkela@helsinki.fi

Symposium “Interfaces between Language, Literature and Culture:

Research at Department of Modern Languages”

Background

Natural language database interfacewith dependency-based compositional semantics

● H. Jäppinen, T. Honkela, H. Hyötyniemi & A. Lehtola (1988):A Multilevel Natural Language Processing Model. Nordic Journal of Linguistics 11:69-87.

What is the turnover of the ten largest stock exchange companies in forestry?

Morphological analysis

Dependency parsing

Logical analysis

Database query formation

Result from the SQL database

Classical example: Learning meaning from context:

Maps of words in Grimm fairy tales

Honkela, Pulkki & Kohonen 1995

Map of Finnish Science

Chemistry

Physics andengineering

Biosciences

Medicine

Culture and society

WordICA

Timo Honkela, Aapo Hyvärinen, and Jaakko Väyrynen. WordICA - Emergence of linguistic representations for words by independent component analysis. Natural Language Engineering, 16(3):277–308, 2010.

Jaakko J. Väyrynen, Lasse Lindqvist, and Timo Honkela. Sparse distributed representations for words with thresholded independent component analysis. In Proceedings of IJCNN'07, pages 1031–1036, 2007.

Learning taxonomies

Mari-Sanna Paukkeri, Alberto Pérez García-Plaza, Víctor Fresno, Raquel Martínez Unanue and Timo Honkela (2012). Learning a taxonomy from a set of text documents. Applied Soft Computing, 12(3), pp. 1138--1148.

CentralInterests:

Contextualityand

Subjectivity

Meaning is contextual

red winered skinred shirt

Gärdenfors: Conceptual Spaces

Hardin: Color for Philosophers

Meaning is contextual

SNOW -WHITE?

WHITE

Meaning is contextual

● “Small”, “big”● “White house”● “Get”● “Every” - “Every Swede is tall/blond”● etc. etc.

Another comment:

Strict compositionality cannot be assumed

Fuzziness

Meaning is subjective

Meaning is subjective

● Good● Fair● Useful● Scientific● Democratic● Sustainable● etc.

A proper theory ofmeaning has to takethis into account

Timo Honkela, Ville Könönen, Tiina Lindh-Knuutila, and Mari-Sanna Paukkeri. Simulating processes of concept formation and communication. Journal of Economic Methodology, 15(3):245–259, 2008.

Intermediate conclusion

● Languages, including formal languages, should be considered as tools for coordination, storing and sharing knowledge in a compressed form – approximate and relative to the point of view taken

● Constructing a language or symbol system is an investment and spreading the language into use in a community is even a larger one

DigitalHumanities

Digital humanities

● Research within humanities with the help of computers– Digital resources

– Computational models

● Basic motivation– One can already fly to moon and

build sophisticated factory products

– The most important open questionsin the world are related to humanitiesand social sciences

Digital Computational

Humanities

Contentstorage and

transfer

Contentanalysis

● Heinz von Foerster in “Responsibilities of Compentence” (1972): “The hard sciences are successful because they deal with the soft problems; the soft sciences are struggling because they deal with the hard problems”

Tieteenalat järjestettynähakemusten englanninkielisten

osuuksien suhteellisen määrän mukaan(*)

Matematiikka 95.3

Farmasia 94.1

Kemia 93.7

Fysiikka 93.4

Biokemia, molekyylibiologia, mikrobiologia, perinnöllisyystiede ja biotekniikka

93.4

Solu- ja kehitysbiologia, fysiologia ja ekofysiologia 93.4

Tietojenkäsittelytieteet 93.0

Sähkötekniikka ja elektroniikka 92.8

Ympäristötekniikka 92.7

Geotieteet 92.1

Ekologia, evoluutiotutkimus ja systematiikka 92.1

Kone- ja valmistustekniikka 91.9

Metsätieteet 91.4

Avaruustieteet ja tähtitiede 91.0

Prosessi- ja materiaalitekniikka 90.8

Tilastotiede 90.7

Muu ympäristön ja luonnonvarojen tutkimus 90.1

Kliininen lääketiede 89.6

Ekotoksikologia, ympäristön tila ja ympäristövaikutukset 89.5

Ravitsemustiede 89.3

Psykologia 89.0

Liikuntatiede 88.9

Hoitotiede 88.9

Eläinlääketiede 88.5

Kansanterveystiede 88.1

Kielitieteet 87.6

Filosofia 87.3

Liiketaloustiede, talousmaantiede ja tuotantotalous 87.2

Hammaslääketiede 86.7

Kansantaloustiede 86.3

Rakennus- ja yhdyskuntatekniikka 85.9

Maatalous- ja elintarviketieteet 85.4

Ympäristöpolitiikka, -talous ja -oikeus 85.3

Maantiede 84.8

Arkkitehtuuri ja teollinen muotoilu 83.7

Viestintä- ja informaatiotieteet 83.1

Kasvatustiede 82.6

Valtio-oppi ja hallintotiede 82.2

Taiteiden tutkimus 81.6

Sosiaalitieteet 80.4

Kulttuurien tutkimus 79.3

Historia ja arkeologia 78.1

Teologia 77.0

Oikeustiede 70.8

(*) SuomenAkatemialleosoitettujenhakemustenkorpuksessa

Matematiikka 95.3

Farmasia 94.1

Kemia 93.7

Fysiikka 93.4

Biokemia, molekyylibiologia, mikrobiologia, perinnöllisyystiede ja biotekniikka

93.4

Solu- ja kehitysbiologia, fysiologia ja ekofysiologia 93.4

Tietojenkäsittelytieteet 93.0

Sähkötekniikka ja elektroniikka 92.8

Ympäristötekniikka 92.7

Geotieteet 92.1

Ekologia, evoluutiotutkimus ja systematiikka 92.1

Kone- ja valmistustekniikka 91.9

Metsätieteet 91.4

Avaruustieteet ja tähtitiede 91.0

Prosessi- ja materiaalitekniikka 90.8

Tilastotiede 90.7

Muu ympäristön ja luonnonvarojen tutkimus 90.1

Kliininen lääketiede 89.6

Ekotoksikologia, ympäristön tila ja ympäristövaikutukset 89.5

Ravitsemustiede 89.3

Psykologia 89.0

Liikuntatiede 88.9

Hoitotiede 88.9

Eläinlääketiede 88.5

Kansanterveystiede 88.1

Kielitieteet 87.6

Filosofia 87.3

Liiketaloustiede, talousmaantiede ja tuotantotalous 87.2

Hammaslääketiede 86.7

Kansantaloustiede 86.3

Rakennus- ja yhdyskuntatekniikka 85.9

Maatalous- ja elintarviketieteet 85.4

Ympäristöpolitiikka, -talous ja -oikeus 85.3

Maantiede 84.8

Arkkitehtuuri ja teollinen muotoilu 83.7

Viestintä- ja informaatiotieteet 83.1

Kasvatustiede 82.6

Valtio-oppi ja hallintotiede 82.2

Taiteiden tutkimus 81.6

Sosiaalitieteet 80.4

Kulttuurien tutkimus 79.3

Historia ja arkeologia 78.1

Teologia 77.0

Oikeustiede 70.8

Accessing and analyzing digital resources

Archives

Libraries

Universities

Citizens

Researchers

Media

DIGITALRESOURCES

Museums

Teachers

Artists

Companies

Societies

Municipalities

StateDecisionmakers

Journalists

Informationspecialists

Texts

Images

Videos

Computationalmodels

Numericaldata

DIGITAL RESOURCES

Speeches/convers.

Multimediadocuments

Interactivesystems

Computersoftware

Resource Meta data

DIGITAL RESOURCES

Resources

Content andinformationprofessional

Users ofthe contents

(professionalsand lay people)

Machine learningand

pattern recognitionsystems

Formal metadata

Languagetechnology

resources andsystems

Other forms of description

Resources

Users ofthe contents

(professionalsand lay people)

Other forms of description

Crowdsourcing

Importanceof openness

Resources

Machine learningand

pattern recognitionsystems

Formal metadata Other forms of description

ClassificationClustering

Importance ofthe availabilityof data

Challenge:

A tension between

the usability and standardizationof content descriptions

and

richness and evolution of language and its interpretation,genre and style variation, andcontextuality, subjectivity and

cultural dependence

ComputationalMethods and

Tools

Mainframe computersPersonal computers

InternetMultimedia

Virtual realityWorld wide web

Social mediaMOOCs

Mobile devicesCloud services

Games and gamification3D printing

Big DataPattern recognition

Statistical machine learningRobotics

...Statistics

Information theoryProbability theory

Dynamical systems theory...

Implications of machine learning

● Machines are not anymore simply doingwhat they are programmed to do

● Machine learning algorithms are programsin the traditional sense but theyenable evolving “behaviors” of the systembased on the “experience” that the systemgathers after having been programmed

● This makes it possible for the systems tohave a certain level of “conceptual autonomy”:they build their view on some phenomena basedon the data/texts/etc. that are given to them

Theories

Data

Models Hypotheses

Conceptual systems

Melissa Bowerman

Max Planck Institute for Psycholinguistics

Space under Construction

Language-Specific Spatial Categorization In First Language Acquisition

Lund University Cognitive Science2003

DUTCH

INOP AANINOP AAN

OPEN

open boxopen dooropen bagopen

envelope

open

mouthopen clamshellopen pair ofshutters

openlatcheddrawer open hand

open book

eyes open

open fan

Categorization of `opening’ in English and Korean.

'tear awayfrom base'

YELTA'remove barrier tointerior space'

PPAYTA

‘unfit’

TTUTA‘rise’

PELLITA'separate two partssymmetrically'

take offwallpaper

unwrappackage

spreadlegs apart

take offring

take cassetteout of case

sun rises

spread blanket outpeacock spreads tail

'spread out flat thing'

TTUTA

PHYELCHITA

(Pye 1995, 1996)

PLATE STICK ROPE CLOTHES

può puòduàn(long rigid thing)

MANDARIN può

-q’upi:j(other hardthing)

rach’aqij (“tear”)

-tóqopi’j(long, flexiblething)

-paxi:j(rock, glass,clay thing)

K’ICHE’MAYAN

tear, ripbreakENGLISH breakbreak

http://www.mpi.nl/people/bowerman-melissa

http://www.mpi.nl/people/bowerman-melissa/publications

Processing multimodal information

Acknowledgements:

Finnish Broadcasting Company (YLE)

An example of automatic multimedia content analysis

users.ics.aalto.fi/jorma/scholar.google.com/citations?user=suHzeyIAAAAJ&hl=en

users.ics.aalto.fi/mikkok/elec.aalto.fi/en/about/careers/professors/mikko_kurimo/

Jorma Laaksonen

MikkoKurimo

Speakerrecognition

Video analysis / scene classification

Speech recognition(speech to text)

Video analysis / scene classification

Speakerrecognition

Speech recognition(speech to text)

OCR

Movement verbs

David Bailey's thesis (1997):

Verbs related to hand movement

Point of view fromcognitive linguistics

● The meaning of linguistic symbols in the mind of the language users derives from the users' sensory perceptions, their actions with the world and with each other.

● For example: the meaning of the word 'walk' involves– what walking looks like– what it feels like to walk and after having walked

– how the world looks when walking (e.g. objects approach at a certain speed, etc.).

– ...

Abstract vs concrete grounding

Ronald Langacker

Multimodally Grounded Language Technology

A project funded by Academy of Finland2011-2014

Timo Honkela as the Principal Investigator

A collaboration betweendepartments of

* Information and Computer Science, and

* Media Technology

Consider how different languagesdivide the conceptual space

in different ways(cf. e.g. Melissa Bowerman et al.) Förger & Honkela 2013

Analysis ofsubjectivity

GICA: Grounded IntersubjectiveConcept Analysis

Analysis of “health” in theState of the Union addresses

Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity. Timo Honkela, Juha Raitio, Krista Lagus, Ilari T. Nieminen, Nina Honkela, and Mika Pantzar.Proc. of IJCNN 2012.

Thank you for your attention!