Visualising Typological Relationships: Plotting WALS with Heat Maps

15
Visualising Typological Relationships: Plotting WALS with Heat Maps Richard Littauer¹, Rory Turnbull², Alexis Palmer¹ 1 Universität des Saarlandes 2 Ohio State University

description

Presented at EACL 2012.

Transcript of Visualising Typological Relationships: Plotting WALS with Heat Maps

Page 1: Visualising Typological Relationships: Plotting WALS with Heat Maps

Visualising Typological Relationships: Plotting WALS with Heat Maps

Richard Littauer¹, Rory Turnbull², Alexis Palmer¹

1 Universität des Saarlandes2 Ohio State University

Page 2: Visualising Typological Relationships: Plotting WALS with Heat Maps

Why?

• Data deluge in science• Typology has been shown to be useful for

linguistic studies (Greenberg 1963, Chomsky 2000, Dunn et al. 2001).

• Showing typological diversity visually can help cut down on research time and illuminate new areas of possible research.

Page 3: Visualising Typological Relationships: Plotting WALS with Heat Maps

Basic Overview

• Our visualisation technique combines: – geographic– phylogenetic– linguistic data.

World Atlas of Language Structures (WALS) (Dryer and Haspelmath, 2011).

Page 4: Visualising Typological Relationships: Plotting WALS with Heat Maps

Previous Work

Similar visualisation work:- Language Typology: Mayer et al., 2010;

Rohrdantz et al., 2010- Phylogeny: Multitree, 2009- Geographical variation: Wieling et al., 2011 Work with WALS:- Daumé & Campbell 2007, Daumé 2009

Page 5: Visualising Typological Relationships: Plotting WALS with Heat Maps

Pruning

WALS:– 2,678– 192 feature options (out of 144 features)– 16% of the data filled

Pruning:– 372 Languages– Average of 96 features– Only languages with 30% or more filled

Page 6: Visualising Typological Relationships: Plotting WALS with Heat Maps

Phylogenetic Distance

WALS’ Tree Hierarchy:– Three different levels– Doesn’t take into account language contact. • Family: ‘Sino- Tibetan’; • Sub-family: ‘Tibeto-Burman’; • Genus: ‘Northern Naga’.

– We used geographical proximity as a proxy for language contact.

Page 7: Visualising Typological Relationships: Plotting WALS with Heat Maps

Geographical Proximity Filtering

• Each language in WALS is associated with a geographical coordinate.

• Haversine formula• Within limits: geography, fullness in WALS.

Page 8: Visualising Typological Relationships: Plotting WALS with Heat Maps

Geographical Proximity Filtering

• First approach:– Arbitrary radius from centroid in order to create a

decision boundary for clustering neighbouring languages.

– 500 kilometres provided a sufficient number of examples after cleaning WALS.

Page 9: Visualising Typological Relationships: Plotting WALS with Heat Maps

Geographical Proximity Filtering

• Second approach:– Arbitrary lower bound for near languages.– Sufficient remainder.– Under-representative of contact languages.– Not as good as the radius method.

Page 10: Visualising Typological Relationships: Plotting WALS with Heat Maps

WALS Languages and Sparsity

Page 11: Visualising Typological Relationships: Plotting WALS with Heat Maps

Geographically Focused Map

Page 12: Visualising Typological Relationships: Plotting WALS with Heat Maps

Phylogenetic Focused Map

W E

Page 13: Visualising Typological Relationships: Plotting WALS with Heat Maps

More Maps

Page 14: Visualising Typological Relationships: Plotting WALS with Heat Maps

Conclusion

• A newly applied method for looking at sparse data

• Combines phylogenetic, geographic, and typological data

Page 15: Visualising Typological Relationships: Plotting WALS with Heat Maps

Final Remarks

Future work: • Integrating Ethnologue or Multitree for

language families. • Further exploration showing more natural

organisation of the linguistic features

All code and visualisations available here:https://github.com/RichardLitt/visualizing-language