What is QA?

What info does system build on?

Text, web documents, knowlege bases, … Question type
Factoid vs non-factoid, open-domain vs closed-domain, simple vs compositional, … Answer type
Short text segment, paragraph, list, yes/no, …

In deep learning era- almost all are built on pre-trained language models

Focus here is unstructured text anwers, but also exists

Reading Comprehension

Core task: comprehend text + answer questions about its content

Why care?

Practically useful
Testbed for understanding language
Many NLP can be reduced to reading comprehension
- Information extraction
- Semantic role labeling
- …

Dataset: SQuAD

Other QA datasets

Neural models for reading Comprehension- How to solve SQuAD?

Problem

Models-

LSTM based with attention

BERT for reading comprehesion

BERT: has 2 training objectives
- MLM
- NSP
Incorporate loss
- $L = - log p_{start} (s^{*}) - log p_{end} (e^{*})$
- Where $p$ is softmax of (weights $\cdot$ hidden vector of $c_{i}$ )
Works very well

Better pre-training objectives? 2 ideas (SpanBERT)

Core difference: not given passage, instead given a large collection of docuemnts

More challenge + more practical

Retriever-reader framework

In DrQA

Can also train the retriever

Join training
- Each text passage encoded as vector using BERT + retrieve score measures as dot product between question representatinos and passage representations
Dense passage retrieval: just rain retriever using question-answer pairs

LLM can also do open-domain QA without explicit retriever stage