Structure in human language
Underlying structure in language (rememebr dependency parsing)
- Dictates the rules of language
Implictly, we know complex rules
Grammar: attempt to descrbe all these rules
Grammaticality: whether or not we consider an utterance in accordance with the grammar
- Some grammaticality rules accept useless utterances
- And block communicative utterances,
So why have rules in the first place?
- Without, we’d have limitless expressive things
Linguistic Structure in NLP
Before self supervised learning
- Goal was to reverse engineer + imitate human language system (Syntax + semantics + discourse)
- E.g. Parsing
Now, we don’t constrain our systems to know any syntax
- They just catch on to stuff!
Question:
- In human: syntactic structures exist indepdnetnly of words they have appeared with (e..g jabberwocky)
- True for langauge models?
Tested with COGS Benchmark: new word-structure combinations
- Task: semantic inerpretation
- Training / test sets have distinct words + structures in diff. roles
Can test a whole bunch of other stuff in language models
- How do they map syntactic structure to meaning?
- Does the latent space encode structural information?
- How do new words impact this?
Going Beyond Pure Structure
Semantics matters a ton! Impacts the rules of language
- This is how we train language models! Embeddings
Meaning isn’t allways just individual words, though
- e.g. idioms, constructions
- Can test in langauge models (via acceptability)
Multilinguality
Multilingual language models let us share parameters (high and low resource languages)
Key ideas
- Language typology- lots of diversity
- Evidentiality
- Morphemes per word
- Describing motion
- Language universals- lots of similarities
- Universal grammer in the chomskyan sense?
- All deal with subject, object, modifiers, etc.