Why analyze models?
Key questions: What are our models learning / doing?
- Today’s models- black box Questions we want to answer
- What can we (not) learn via pretraining?
- How do our models affect people?
- What will replace the transformer?
- What do neural models tell us about language?
Varying levels of abstraction,from which to reaso about model
- Probability distirbutino + decision function
- Sequence of vector representation
- Weights, mechanisms like attention, dropout, +++
Out of domain Evaluation Sets
Evaluation as analysis
- Concerned with behavior, not mechanisms- how does model behave in specific situation
Model trained on samples , how does it behave on samples from same ?
- ie. i.i.d.
- This is test set accuracy / F1 / BLEU
In Natural Language Inference task (recall NLI)
- Use specific carefully built diagnostic test set
- E.g. HANS- tests syntactic heuristics in NLI
Humans as linguistic test subjects
- How do we undersand language behavior in humans?
- Minimal pairs: small change whicih makes unacceptable
- Just like in HANS- test set with careful properties
- Question can LMs handle “attractors” in subject-verb agreement?
Careful test sets are kind of like unit test suites
- Minimum functionality tet sts for a specific behavior
Inlfuence Studies + Adversarial Examples
Input influence
Example: does my model really use long-distance context?
- Test: shuffle/rmoeve context farther than k words away- check loss
Saliency map: Prediction explanation for a single input How to make?
- Lots of ways to encode “importance”
Simple gradient methods
- For words and model’s score, take the norm of the gradient of score wrt each word
- Idea: High gradient norm- changing word locally would affect score a lot
- Not perfect! Lots more methods
- Example issue: linear approx may not hold well
Explanation by Input reduction Idea: input saliency + then remove most unimportnat words
- can we break with seemingly innocuous changes to inut?
Analyzing Interpretations
Idea: some modeling components lend to inspection
- Ie some heads seem to have simple operations, etc.
Probing: supervised analysis of nueral neural
- Given property y (like POS)
- Given model’s representations at a fixed layer
- Given function family
- Freeze parametrs of model, then train probe
- where
- Extent to which can predict from shows accessibilty of feature in representation
Layerwise trends of probing accuracy
- More abstract linguistic properties are more accessible later in the network
Summary of probing / correlation studies
- Probing show that properties are accesible to probe fmail, not that thye’r eused by neural model you’re studying
- Likewise for correlation studies
- For example:
- Hewitt and Liang, 2019 show that under certain conditions, probes can achieve high accuracy on random labels.
- Ravichander et al., 2021 show that probes can achieve high accuracy on a property even when the model is trained to know the property isn’t useful.
- Some efforts towards causal studies- harder but interesting
Model Ablation
Model tweaks + ablation as anlaysis Typical NN improvement process is kind of a model analysis
- Tweaking, see if can remove complex parts, +++