Introducing The Revolutionary ENSA EN ergy SA ving Automated Wireless Light Switch
Introducing an automated subject classifier
-
Upload
australian-council-for-educational-research -
Category
Education
-
view
225 -
download
0
Transcript of Introducing an automated subject classifier
![Page 1: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/1.jpg)
Introducing an automated subject classifierPru Mitchell, Tine Grimston
Robert Parkes
With thanks to: Phil Anderson, Leidos #vala16 #s27
![Page 2: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/2.jpg)
Cunningham Library• Services• ACER staff• ACER students• Education
community• Indexing services
![Page 3: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/3.jpg)
Australian Education Index• First print edition 1957 • Available on Informit as A+
Education, ProQuest, Taiwan• Indexed by ACER staff and
external contract indexers
Indexing varies with staffing levels and budget“an increasingly onerous task”
2006 2007 2008 2009 2010 2011 2012 2013 2014 20150
100020003000400050006000700080009000
10000
![Page 4: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/4.jpg)
Production steps
1. Identification of potential sources
2. Acquisition of identified sources
3. Selection of relevant material from these sources
4. Cataloguing or indexing of selected material
5. Quality assurance of indexed records
6. Dissemination of records to users
![Page 5: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/5.jpg)
The product
![Page 6: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/6.jpg)
Indexing database
Cunningham catalogue
One vocabulary to bind them
• AEI• EdResearch
Online• Australian
Education Research Theses
• IDP Database• Learning
Ground• BOLDE
Australian Thesaurus of Education Descriptors
Web docsbooks
Journal articles
conf papers
![Page 7: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/7.jpg)
Machine learning
![Page 8: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/8.jpg)
Automated classificationWhy• More to index• Less staff time available• Increasing metadata
feeds instead of print journals• Increase efficiency
Our story
2009 First journal metadata2011 Information online
presentation2012 Increased metadata
replacing print journals2013 Feasibility study 2014 Initial installation in June
– followed by continuousrefinement of system
![Page 9: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/9.jpg)
What is the classifier?
Two Processes1. Training:
Uses past data to create models of how each subject term should be used
2. Classifier: Uses the models to assign subjects to new records based on article title, abstract and journal title
![Page 10: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/10.jpg)
Training the classifier
• Selection of past records - not all are suitable
![Page 11: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/11.jpg)
Running the classifier
![Page 12: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/12.jpg)
What the human indexer sees
![Page 13: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/13.jpg)
How the classifier has performed
• Provides a useful set of descriptors on the majority of records
• Average of 11.7 major descriptors assigned per record (Max=13)
• Average of 6.5 “correct” major descriptors per record
![Page 14: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/14.jpg)
FindingsA particular challenge:Horse-Girl Assemblages: Towards a Post-Human Cartography of Girls' Desire in an Ex-Mining Valleys Community [Discourse, 35(3)]
• Classifier performance greatly dependent on abstract length, style and level of detail
• ACER index a wide variety of material, some is not necessarily easy to index using ATED
• The specific topic of an article might only have a more general term in ATED
• Quality vs efficiency
![Page 15: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/15.jpg)
Workflow improvements
Classifier use increasing due to workflow improvements
![Page 16: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/16.jpg)
Publisher feeds
• Taylor & Francis 2009--• SAGE 2013--• Wiley 2013--• Springer 2013--• Inderscience 2013—• Emerald (in negotiation)
• Many publishers can provide a metadata feed of education journals
• All in XML, but all different from each other
• 24,138 articles received in feeds in 2015, up from 5,006 in 2010
![Page 17: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/17.jpg)
Lessons • Indexing from the abstract
• Thesaurus structure• Metadata • Processes simplification• Prioritisation• Indexer experience• Curation• Skill set required in team
![Page 18: Introducing an automated subject classifier](https://reader035.fdocuments.net/reader035/viewer/2022070514/587ee6331a28ab17388b5c63/html5/thumbnails/18.jpg)
What next?• Ongoing development of workflows• Possible changes to our database
structure• More publisher feeds• Other ways to get bibliographic
metadata into the workflow – eg RSS feeds, search alerts from databases
• Develop selection processes further
• Documentation and dissemination