Qualitative Results An Empirical Evaluation of Visual...

IEEE 2017 Conference on Computer Vision and Pattern

Recognition

Additional observations➢ Using Inception features over VGG features does not improve

novel VQA performance➢ Using pre-trained word vectors to expand vocabulary in general

setting helps➢ Improvement obtained on the better architecture (arch 1) is

unfortunately lesser

Qualitative ResultsP - Proposed, B - Baseline, GT - Ground Truth

Conclusions➢ Challenging and real-world setting that needs to be addressed➢ Significant drop in performance of two existing architectures in

the new setting➢ Proposed 2 methods for inducing novel objects ➢ Text based induction - effective; Weak pairing - not effective

(noisy)➢ External text data without novel object induction need to be

effective Contact: ee12b101@ee.iitm.ac.in Project Site: (Santhosh) https://goo.gl/ELLb9z

Quantitative Results➢ Feature - pre-trained VGGnet (VGG) or Google Inception (INC)➢ Aux - Auxiliary data, none / text / weak paired (text + im)➢ Vocab - train (only VQA train), oracle, general➢ OEQ - Open Ended Questions, MCQ - Multiple Choice Questions

Text based novel object induction

➢ Text data provides overall improvements of ~2 % in arch 1 and ~3% in arch 2

➢ Majority of improvement is in Yes/No and Novel question types

Weakly paired data based novel object induction

➢ Provides marginal improvements in arch 2 over text only induction➢ A noisy method of pairing the data

Need to incorporate novel words into vocabulary

➢ Necessary to incorporate novel objects into vocabulary➢ Train words + external data can even lead to poorer performance

MotivationExisting work on VQA focuses on datasets where the trainand test objects overlap significantly. In real world scenarios,

thereare a large number of objects for which (image, question, answer)data is not available. This work is the first to generalize currentVQA systems to objects not queried about in training data.

Problem DescriptionGiven➢ VQA dataset with test objects not present in train set (novel) .➢ Unpaired external text and image data, Answer test questions containing these novel objects.

Datasets➢ VQA dataset ➢ BookCorpus, Wikipedia dump (external text)➢ ImageNet (external images)

Novel split of VQA dataset➢ List out nouns from questions and answers in train set➢ Cluster nouns based on their question type statistics➢ Separate out 20% nouns from each cluster as test nouns➢ Assign all questions containing test nouns to test set and the rest

to train set

➢ ~27% test objects are novel in proposed splitBaseline performance on Original vs Novel split

➢ Drop in overall accuracy of ~14% in both the architectures !

Approach to induce novel objectsSeq2seq AutoEncoder Architectures

➢ Pre-train image, text encoders on auxiliary data➢ Finetune them on VQA

Text only induction ➢ Train text to text AutoEncoder (AE)➢ Incorporate novel objects into vocabulary under 2 settings:

○ Oracle: novel words known textually○ General: novel words semantically similar to known words

Weakly paired text + image induction➢ Pair images of novel object with random text about it➢ Train image + text to text AutoEncoder (AE)

An Empirical Evaluation of Visual Question Answering for Novel ObjectsSanthosh K. Ramakrishnan1,2, Ambar Pal1, Gaurav Sharma1, Anurag Mittal2

1IIT Kanpur, 2IIT Madras

Architecture 1 Architecture 2

➢ Used for weak paired training in VQA Arch 1

➢ Used for weak paired training in VQA Arch 2

➢ Used for text only training in VQA Archs 1 and 2

Positive Examples

Negative Examples

Qualitative Results An Empirical Evaluation of Visual...

Documents

Transcript of Qualitative Results An Empirical Evaluation of Visual...

1 Short title: Flavanoid pathway regulation by miR858...2016/04/27 · 10 Deepika Sharma1, 2, Manish Tiwari1, $, Ashutosh Pandey1, #, Chitra Bhatia1, 2, Ashish Sharma1, 11 Prabodh

Design of Battery management system for Residential ...Design of Battery management system for Residential applications Poushali Pal1, Devabalaji K. R2, S. Priyadarshini3 Research

PET Radiopharmaceuticals for Imaging Chemotherapy-Induced ... · PET Radiopharmaceuticals for Imaging Chemotherapy-Induced Cardiotoxicity Jothilingam Sivapackiam1 & Monica Sharma1

Electronic Supplementary Information (ESI) · Arun 2Kumar Sinha1, P.K. Manna , Mukul Pradhan 1, Chanchal Mondal , S. M. Yusuf2 and Tarasankar Pal1 1Department of Chemistry Indian

iCaRL: Incremental Classifier and Representation Learningopenaccess.thecvf.com/content_cvpr_2017/papers/Rebuffi_iCaRL... · Alexander Kolesnikov, Georg Sperl, Christoph H. Lampert

Prospective Leads from Endophytic Fungi for Anti-Inflammatory … · Pragya Paramita Pal1, Ameer Basha Shaik2,A.SajeliBegum1 Affiliations 1 DepartmentofPharmacy,BirlaInstituteofTechnologyand

nikhil sharma1

B. Lungsi Sharma1 , and Richard B. Wells - Department of …rwells/techdocs/Mathematical or… · · 2017-10-30B. Lungsi Sharma1 , and Richard B ... (1999). The Feeling of What

Joint Graph Decomposition & Node Labeling: Problem, Algorithms ...openaccess.thecvf.com/content_cvpr_2017/supplemental/Levinkov_J… · Joint Graph Decomposition & Node Labeling:

Learning Shape Abstractions by Assembling Volumetric ...openaccess.thecvf.com/content_cvpr_2017/papers/Tulsiani_Learning_S… · Learning Shape Abstractions by Assembling Volumetric

Maxwell’s Demon, Szilard Engine and Landauer Principle · Maxwell’s Demon, Szilard Engine and Landauer Principle P. S. Pal1 and A. M. Jayannavar2;3 1University of Maryland, Baltimore

ActionVLAD: Learning Spatio-Temporal Aggregation for ...openaccess.thecvf.com/content_cvpr_2017/.../Girdhar_ActionVLAD_Learning... · ActionVLAD: Learning spatio-temporal aggregation

Review Article Cancer: An Overview - IJRDPLijrdpl.com/uploads/41/3234_pdf.pdf · ... Ganesh N. Sharma1, Birendra Shrivastava1, Nitin Chaudhary2 andNitendraSahu2 1ITS ... cancer of

Memory-Augmented Attribute Manipulation Networks for ...openaccess.thecvf.com/content_cvpr_2017/papers/... · Memory-Augmented Attribute Manipulation Networks for Interactive Fashion

Supplementary material for the paper: Discriminative ...openaccess.thecvf.com/content_cvpr_2017/supplemental/Lukezic... · Supplementary material for the paper: Discriminative Correlation

EAST: An Efficient and Accurate Scene Text Detectoropenaccess.thecvf.com/content_cvpr_2017/papers/Zhou_EAST_An... · tual information extraction and understanding. ... Performance

Awesome Typography: Statistics-Based Text Effects …openaccess.thecvf.com/content_cvpr_2017/papers/Yang_Awesome_Ty… · Awesome Typography: Statistics-Based Text Effects Transfer

Plug & Play Generative Networks: Conditional …openaccess.thecvf.com/content_cvpr_2017/papers/Nguyen_Plug__Play...Plug & Play Generative Networks: Conditional Iterative Generation

1*, Dilbag Singh Ahlawat2, Surender Kumar Sharma1, Naveen ... · Vikas Bharti1*, Dilbag Singh Ahlawat2, Surender Kumar Sharma1, Naveen Vikram Singh1, Jitender1 and Nachhatar Singh1

Proteome-wide characterization and biomarker ... · Proteome-wide characterization and biomarker identification of intracranial aneurysm . Tanavi Sharma1, Keshava K. Datta2, Munish

1, Dilbag Singh Ahlawat2, Surender Kumar Sharma1, Naveen ... · Vikas Bharti1, Dilbag Singh Ahlawat2, Surender Kumar Sharma1, Naveen Vikram Singh1, Jitender1 and Nachhatar Singh1