Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework

Nandy, Abhilash; Sharma, Soumya; Maddhashiya, Shubham; Sachdeva, Kapil; Goyal, Pawan; Ganguly, Niloy

Computer Science > Computation and Language

arXiv:2109.05897 (cs)

[Submitted on 13 Sep 2021 (v1), last revised 14 Sep 2021 (this version, v2)]

Title:Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework

Authors:Abhilash Nandy, Soumya Sharma, Shubham Maddhashiya, Kapil Sachdeva, Pawan Goyal, Niloy Ganguly

View PDF

Abstract:Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper we meticulously create a large amount of data connected with E-manuals and develop suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over the most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at this https URL, and the corresponding project website is this https URL.

Comments:	EMNLP Findings 2021, Long
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2109.05897 [cs.CL]
	(or arXiv:2109.05897v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2109.05897

Submission history

From: Soumya Sharma [view email]
[v1] Mon, 13 Sep 2021 12:11:39 UTC (1,471 KB)
[v2] Tue, 14 Sep 2021 05:37:20 UTC (1,473 KB)

Computer Science > Computation and Language

Title:Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators