Default research project: MinBERT Base implementation
- Use base given code to implement MinBERT
- Utilize pre-trained model weights + embeddings to perform sentiment analysis on two datasets
- Train for sentiment classification Extend: how to build robust embeddings which can perform well across a large range of different tasks, not just one
- Adjust BERT embeddings to perform following 3 tasks
- Sentiment analysis
- Paraphrase detection
- Semantic textual similarity
- Find relevant research paper for each improvement (some suggestions given)
Notes on finding research projects (how to build an economic model in your spare time)
- Getting ideas
- Journals not great
- Look in media, news, etc. that aren’t about your topic area
- Conversations with people in business
- Think through your own idea independently, somewhat thoroughly
- Go find somebody else that did your idea but 10x better- ask yourself why you missed what they did
- Give seminar
- Planning
Winning default papers
- Walk Less + Only Down Smooth Valleys
- Pretrained embedings from BERT for 3 fine-grained tasks”
- First- test ability to be tuend towards sentence sentiment classification only
- Then implement SMART which aims to tackle overffiting
- Apply multitask learning approaches that learn on all 3 aforementioned tasks
- Basically
- INvestigate of pre-trained + fine-tuned BERT model on 3 downsteam prediction tasks when include
- Regularization (SMART)
- Multitask Learning with task-specific datasets
- Rich relational layers that xploit similarity between tasks
- INvestigate of pre-trained + fine-tuned BERT model on 3 downsteam prediction tasks when include
- Approach
- Starting point: BERT
- Focusing on the 3 specific fine-tuning tasks (set up basic baselines)
- Extending
- Regularization of loss + optimizer step (SMART)
- Coded up themselves
- Round-robin multitask fine-tuning
- Baseline BERT assumes fine-tuning only on sentiment classification generalizes well to paragraphising and similarity prediction tasks- not true
- Instead, implement batch-level round-robin MTL routine (SST, paragraph, and STS ata)
- test 2 versions
- Rich relational layer combining similar tasks
- Adapt model to handle relations acros tasks
- Regularization of loss + optimizer step (SMART)
- Experiments 1.
Finding research topics Generally 2 ways in science
- Nails: improve ways to address specific domain problem of interest
- Hammers: start with technical method + work out ways to extend / improve / etc.
Most projects
- Find application / task + expore how to approach / solve it effectively, often with existing model
- Implement complex neural arch. + demonstrate performance
- Ideally find some way to tweak it- make it better (kind of #3)
- Come up with new / vaiant neural network model + explore empirical success
- Analysis project- anayze behavior of model- how it represents linguisitc knowledge or what kinds of phenomena it can handle / errors it makes
- Rare theoretical project
Examples
- Using LSTM togenerate lyrics (adding components for metric structrus + rhyme)
- Complex neural model: implement differential neural computers + get to work (I believe building an implementatin of existing closed source paper)
- Got published- showed improvements to RNNLMs (Title: Improving Learning through Augmenting the Loss)
- Quantization of word vectors
- Counted for class because evaluated on natural language tasks
Finding a place to start
- Recent papers-
- ACL Anthologgy
- Online proceedings of major ML conferences
- NeurIPS papers.nips.cc
- ICML
- ICLR
- Arxiv.org
- Even better- look for interesting problem in the world!
If want to beat the state of the art, look at leadeerboards
- Paperswithcode
- nlpprogress
Modern Deep Learning NLP
- Most works- download big pre-trained model + work from there
- Recommended for practical projects-
- Transformer from Huggingface
- Load a big pre-trained language model
- Fine tune it for our task
- Test it Exciting areas now
- Evaluating / improving models for something other than accuraacy
- Empirical work on what PLMs have learned
- Transfer learning with ittle data
- Low resource stuff
- Addressing bias
- Scaling models down (pruning, quantization, etc.)
- More advanced fucntionality (compositionality, generalization, fast leraning (e.g. meta learning) on smaller problems)
Datasets
- catalog.ldc.upenn.edu
- Huggingface
- paperswithcode
- for machine translation: statmt.org
- for dependency parsing: universaldependencies.org
Example of doing research (e.g. orf applying NN to summariation)
- Define task
- Summarization
- Define dataset
- Search for academic datasets (already have baselines, helpful)
- e.g. newsroom summmarization dataset
- Or- define your own dataset
- Fresh problem
- be creative
- Search for academic datasets (already have baselines, helpful)
- Dataset hygiene
- Separate test and dev test data splits
- Define metric
- Search for well etablished metrics
- Summarization: ROUGE or human eval
- Establish baselin
- Implement simple model first
- Summarization: LEAD-3 baseline
- Compute metrics on Train AND dev NOT test
- often will have errors- analyze
- Imlement existing neural net mdel
- Compute metric to train + dev
- Analyze output + error
- Always be cose to the data (except final test set)
- Visualize dataset
- Collect statistics
- Look at errors
- Analyze hyperparameters
- Try out diferent models + variants (set up quick iteration)
- Fixed dwindow Nn
- RNN
- Recursive NN
- CNN
- Attention based Model
- Etc.
- Ideally only use test set once.
Getting nns to train
- Be positive- they want to learn
- Takes time to get them fixed up