Coreference Resolution
Coreference resolution
- Identifying all mentions that refer to the same entity in the world
Applications
- Full text understanding
- Machine translation
- Dialogue systems
In 2 steps
- Detect mentions (easy)
- Cluster mentions (hard)
Mention detection
- Mention: text referring to an entity
- Either
- Pronoun
- Named Entity (people, place)
- Noun phrases
- To detect: usually pipeline o other NLP systems
- Pronounce: POS tagger
- Named Entities: NER System
- Noun phrases: Use a (consistuency) parser
Marking all of the 3 types over-generates mentions
- Example: “It is sunny”- it isn’t a mention
How to deal with these bad mentions
- Naive: classifier filter out
- More common: keep mentions as candidate mentions
Some linguisitics
- Coreference: 2 mentions refer to same entity in the world
- Anaphora: When term (anaphora) refers to other term (antecedent)
- Pronomial Anaphora: when anaphora is coreferential
- Bridging Anaphora: Not all anaphoric relations are coreferential!
- Example: We saw a concert last night; the tickets were reall expensie
- Cataphora: Reverse order of anaphora
Coreference Resolution Models
Rule-based (Hobbs)- pronomial anaphora resolution
Messy naive algorithm that worked very long
- Begin at NP immediately dominating pronoun
- Go up tree to first NP or S. Call this X, and the path p.
- Traverse all branches below X to the left of p, left-to-right, breadth-first. Propose as antecedent any NP that has a NP or S between it and X
- If X is the highest S in the sentence, traverse the parse trees of the previous sentences in the order of recency. Traverse each tree left-to-right, breadth first. When an NP is encountered, propose as antecedent. If X not the highest node, go to step 5.
- From node X, go up the tree to the first NP or S. Call it X, and the path p.
- If X is an NP and the path p to X came from a non-head phrase of X (a specifier or adjunct, such as a possessive, PP, apposition, or relative clause), propose X as antecedent (The original said “did not pass through the N’ that X immediately dominates”, but the Penn Treebank grammar lacks N’ nodes….)
- Traverse all branches below X to the left of the path, in a left-to-right, breadth firstmanner. Propose any NP encountered as the antecedent
- If X is an S node, traverse all branches of X to the right of the path but do not go below any NP or S encountered. Propose any NP as the antecedent.
- Go to step 4
Wino-grad schema: knowledge-based proomial coreference
- Example: The city council refused the women a permit because they advocated violence. (what is “they” coreferential to?)
- Alternative to Turing test
Mention pair coreference models
Train binary classifier which assigned each pair- probability of being coreferent
- e.g. given “she” look at all candidate antecedents
Training:
- N mentions in document
- if mentions and are coreference- else -1
- Loss: cross-entropy
Test time
- Pick threshold and add coference links when
- Can fill in gaps by using things like transitivity
Cons
- Given long document, many mentions only have one clear antecedent, but asking moel to predict all of them
- Solution: mention ranking- predict only one antecedent / mention
Mention-pair and mention-ranking models
Idea: assign eaeh mention highest scoring candidate antecedent according to model
- Use dummy “NA” mention to decline linking to anything
Apply softmax over scores for candidate antecedent
Training
- Maximize the following
-
- Iteration through candidate antedents (previously ocrring mentions)
- For ones that are coreferent to we want a high probability
But, how to compute probabilities?
- Non-neural statistical classifier
- simple neural network
- More advanced model using LSTMs, attention, transformers
(1) Non-neural coref model
Features
- Person/number//gender agreement
- Semantic Compatibility
- Some syntanctic constraints
- More recently mentioned entities preferred
- Prefers entities in subject position
- Paralelism
(2) Neural coref model
Standard FFNN
- Input: word embeddings + categorical features
- Embeddings: prev two words, first word,last word, head word, of each mention
- Other feautres: ditance, document genre, speaker info
(3) End-to-end model
Improvements
- LSTM
- Attention
- Mention detection + coreference end-to-end
- No mention detection step!-
- Instead, consider ever span of text (with length limit) a candidate mention Steps
- Embd words
- Run bidirectional LSTM over doc
- Represesnt each span of text as vector
- Span representation:
- Explanation of terms:
- Hidden states for spans’ start and end
- Represents context to left anad right of span
- Attention based representation of words in span
- Span itself
- Additional feautres
- Info not in text
- Hidden states for spans’ start and end
- attention-weighted acerage of word embedings in the span
- Attention distribution is osoftmax over attention scores for the span
- Then weight using and
- Finally- score each pair of spant o decide if coreferent mentions
- Explanation of terms:
- Are they coreferent?
- Is i a mention
- Is j a mention
- Do they look coreferent?
Transformer-based coref (now SOTA)
Can learn long-distance dependencies
- (Idea 1) SpanBERT: pretrain BERT to be better at spna-based prediction task
- (Idea 2) BERT-QA: treat Coreference like QA task
- (Idea 3) maybe no need for spans and can represent mention with a word and make things
Evaluation and current results
Coreference evaluation: many different metrics
- Usually report average of a few
Intuition: metrics think of corefernce as clustering + evaluate quality of the clusters