New Life for Old Media (NEM presentation)
-
Upload
victor-de-boer -
Category
Education
-
view
14 -
download
0
Transcript of New Life for Old Media (NEM presentation)
New Life for Old Media
Investigations into Speech Synthesis and Deep Learning-based Colorization for
Audiovisual Archive
Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen
70% audio-visual heritage material
More than 1.000.000 hrs of
TV (public broadcasters)
Radio, Music,Documentaries, Film, Commercials,
etc.
Photographs, objects, …
CC BY - SA as preferable license
3000 items “Internet Quality”
Polygoon newsreels
Supporting a National and
European Audiovisual Commons
Public outreach by embracing
new technologies and
‘participatory culture’
Openbeelden.nl / openimages.eu
Explore AI techniques to enrich this archival material to allow for new types of engagement
1. Text-To-Speech engine based on limited single narrator2. Colorization of old black-and-white video footage
Limited Domain Speech Synthesis
Can the current corpus of audio recordings of Bloemendal be used to construct a TTS engine?
• Percentage of the Dutch language can be
generated with the current corpus?• What can we do to improve?• How well is the text-to-speech engine
recognizable as Philip Bloemendal?
• How understandable are the constructed audio files?
Text:
Audio:
The Dutch football played Germany
the.wav dutch.wav football.wav
Spoken Language Elements Repository
(35,000 words)
team
Slot-and-filler Text-to-speech
3,300 newsreels, speech recognition
How to expand the coverage of the index?
•Many (contemporary) words have not been pronounced by Philip Bloemendal
•Multiple strategies–Change format (Lowercase, diaeresis)–Numbers–Finding synonyms–Decompounding
Finding Synonyms
• Open Dutch Wordnet Dutch lexical semantic database (Postma et al. 2016)
• Yields synsets
(e.g. Hoofdmeester -> Rector, Schoolhoofd)
• Computationally expensive lookup
Decompounding
• Dutch language allows for compounding words, each word is distinct in the corpus
• Decompounding is computationally expensive (for large corpora, long words)
• Constructed Bigrams and Trigrams
School, hoofd -> Schoolhoofd
Regen, water -> regenwater
Staat, hoofd -> StaatShoofd
4 corpora to test against
•News articles (same domain, different time) | 50 articles, 2743 unique words
•1970s news articles from the (same domain, time) | 50 articles | 16,191 words
•E-books (different domain, various times) |6 books | 2,657 words
•Tweets (different domain, different time) | 1000 tweets| 27,180 words
• Evaluation
– Number of distinct words
– Number of sentences
Evaluation
• 8 people tested the software
• Philip was recognized (or ‘that news guy’)
• Words with more consonants were easier to recognize
• When user input their own sentences, more recognition
• When sentences were demonstrated without subtitles, less
• Speed of software / GUI limited testing capabilities
How recognisable are sentences?
Neural Networks
Recent progress in computational power made implementation of Deep Neural Nets possible
Neural Networks trained on large training set can accurately make predictions in real-world examples
Zhang et al. (2012) trained a neural net on over a million images for colorization
http://richzhang.github.io/colorization/
Existing Literature
• Extract individual frames from video using FFMPEG
• Colorize each individual frame
• Re-compile video and attach original audio fileOutcome
Extract 200x200
frames 24fps
(ffmpeg) Zhang et al. implemented in
TensorFlow
Combine into
videos (ffmpeg)
Implementation on Video
• Colorized videos are more ‘tangible’ and ‘alive’ than black/white
• Showing colorized Polygoonjournaals can augment TTS engine
• General positive responses on technology may increase attention to NISV collection
Outcome
Outcome
• Each frame is considered independent and is colorized as such
--> Artifacts appear between frames
• Slow performance without use of Nvidia GPU
• Low resolution
• Predicted colors still far from perfect
Challenges
www.openbeelden.nl/tags/ingekleurd
Hosted on Openbeelden platform
One of the colorized videos received 61,000+ views, 1,700 likes and was shared 521 times, illustrating the potential to engage new audiences.
tiny.cc/colorNL
• Collection-specific TTS systems for audio-enrichments of archive material or multimedia applications.
• Colorization of old media allows for a new view on existing images
• NISV will continue investigating these emerging technologies to enable new types of interaction and to further engage new audiences with archival material in unexpected ways. – In the media museum – On its public-facing online channels.
Take home
New Life for old Media:Investigations into Speech Synthesis and Deep Learning-based Colorization for
Audiovisual Archive
Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen
Thank you