Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep...
Transcript of Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep...
![Page 2: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/2.jpg)
Table of contents
● Introduction● Challenges
○ Representation○ Translation○ Alignment○ Fusion○ Co-Learning
● Interpretability in Multimodal Deep Learning
![Page 3: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/3.jpg)
Aim of the presentation
● Identify challenges particular to Multimodal Learning● Popular research topics in the field● Brief of the problem I have been working on - Interpretability in
Multimodal Deep Learning
![Page 4: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/4.jpg)
Multimodal Learning - Heterogeneous Information Sources
![Page 5: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/5.jpg)
Challenge - 1) Representation
● How to combine the information from multiple sources?● How to deal with different levels of noise?● How to deal with missing data?
![Page 6: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/6.jpg)
1) Representation - Ways of learning
![Page 7: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/7.jpg)
Coordinated Representation DeViSE — a deep visual-semantic embedding - Similarity model
![Page 8: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/8.jpg)
Challenge - 2) Translation
![Page 9: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/9.jpg)
Translation
(mapping) from one modality to another
Retrieval based models
Generative
![Page 10: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/10.jpg)
How to evaluate translations
![Page 11: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/11.jpg)
Challenge - 3) Alignment
1) Identify the direct relations between (sub)elements from two or more different modalities.
![Page 12: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/12.jpg)
Alignment
Given an image and a caption we want to find the areas of the image corresponding to the caption’s words or phrases
![Page 13: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/13.jpg)
Alignement problems
● Few datasets with explicitly annotated alignments● It is difficult to design similarity metrics between modalities● Existence of multiple possible alignments and not all elements
in one modality have correspondences in another
![Page 14: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/14.jpg)
Challenge - 4) Fusion
![Page 15: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/15.jpg)
Fusion
● Signals not temporarily aligned● Lack of interpretability of where is the prediction coming from
- (This is what I am working on)
![Page 16: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/16.jpg)
Challenge - 5) Co-Learning
● Aiding the modeling of a (resource poor) modality by exploiting knowledge from another (resource rich) modality.
● When one modality has lack of annotated data, noisy inputs and unreliable labels.
![Page 17: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/17.jpg)
Colearning - Zero Shot learningUsing text embeddings to classify unseen classes of images
![Page 18: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/18.jpg)
Interpretability in Multimodal Deep Learning
Problem statement -
Not every modality has equal contribution to the prediction
![Page 19: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/19.jpg)
Interpretability in Multimodal Deep Learning
Solution -
We give different weights to different modalities
![Page 20: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/20.jpg)
Real data experiments - Multimodal Sentiment Analysis(MOSI Dataset)
![Page 21: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/21.jpg)
MUltimodal Sentiment Analysis
![Page 22: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/22.jpg)
P1 and P2 contribution of each modality
![Page 23: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/23.jpg)
Θ
Θ
![Page 24: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/24.jpg)
Modified loss function with 𝞫 weight given to each modality
![Page 25: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/25.jpg)
Modified loss function with 𝞫 weight given to each modality
Let’s make 𝞫 trainable
![Page 26: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/26.jpg)
Modified loss function with 𝞫 weight given to each modality
Using the Lemma
![Page 27: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/27.jpg)
Final Optimization problem
![Page 28: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/28.jpg)
Tensor fusion
To allow interaction between modalities - take tensor product between modalities.
![Page 29: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/29.jpg)
Problem with interpretability and tensorfusion - Degree Inflation
Beta weights came to be higher for higher dimension weights - Always high for tensor fusion.
![Page 30: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/30.jpg)
Degree Inflation - reason
When one modality learns constant features.
The information from the first modality can be
expressed in the tensorfusion.
Relevant source-with information
Irrelevant source-constant
![Page 31: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/31.jpg)
Iterative batch normalisation
We derived a batch norm which does not allow the lower dimension information to be represented in higher dimension(TensorFusion).
This method helps to remove noise from the data and give better accuracies.
![Page 32: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/32.jpg)
Synthetic data experiment
● Relevant data - Sequences of length 100 (composed of 4 letters - ATGC) with a signature which defines the label
● Irrelevant data - Random 100 length sequences
![Page 33: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/33.jpg)
Example - 1 as relevant source
![Page 34: Multimodal Deep Learning - GitHub Pages · 2021. 4. 11. · Interpretability in Multimodal Deep Learning Problem statement - Not every modality has equal contribution to the prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062318/6147cc45a830d0442101ab7d/html5/thumbnails/34.jpg)
Thank You