Visualising Typological Relationships: Plotting WALS with Heat Maps
-
Upload
richard-littauer -
Category
Technology
-
view
487 -
download
1
description
Transcript of Visualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat Maps
Richard Littauer¹, Rory Turnbull², Alexis Palmer¹
1 Universität des Saarlandes2 Ohio State University
Why?
• Data deluge in science• Typology has been shown to be useful for
linguistic studies (Greenberg 1963, Chomsky 2000, Dunn et al. 2001).
• Showing typological diversity visually can help cut down on research time and illuminate new areas of possible research.
Basic Overview
• Our visualisation technique combines: – geographic– phylogenetic– linguistic data.
World Atlas of Language Structures (WALS) (Dryer and Haspelmath, 2011).
Previous Work
Similar visualisation work:- Language Typology: Mayer et al., 2010;
Rohrdantz et al., 2010- Phylogeny: Multitree, 2009- Geographical variation: Wieling et al., 2011 Work with WALS:- Daumé & Campbell 2007, Daumé 2009
Pruning
WALS:– 2,678– 192 feature options (out of 144 features)– 16% of the data filled
Pruning:– 372 Languages– Average of 96 features– Only languages with 30% or more filled
Phylogenetic Distance
WALS’ Tree Hierarchy:– Three different levels– Doesn’t take into account language contact. • Family: ‘Sino- Tibetan’; • Sub-family: ‘Tibeto-Burman’; • Genus: ‘Northern Naga’.
– We used geographical proximity as a proxy for language contact.
Geographical Proximity Filtering
• Each language in WALS is associated with a geographical coordinate.
• Haversine formula• Within limits: geography, fullness in WALS.
Geographical Proximity Filtering
• First approach:– Arbitrary radius from centroid in order to create a
decision boundary for clustering neighbouring languages.
– 500 kilometres provided a sufficient number of examples after cleaning WALS.
Geographical Proximity Filtering
• Second approach:– Arbitrary lower bound for near languages.– Sufficient remainder.– Under-representative of contact languages.– Not as good as the radius method.
WALS Languages and Sparsity
Geographically Focused Map
Phylogenetic Focused Map
W E
More Maps
Conclusion
• A newly applied method for looking at sparse data
• Combines phylogenetic, geographic, and typological data
Final Remarks
Future work: • Integrating Ethnologue or Multitree for
language families. • Further exploration showing more natural
organisation of the linguistic features
All code and visualisations available here:https://github.com/RichardLitt/visualizing-language