Skip to content
Snippets Groups Projects
README.md 1.01 KiB
Newer Older
Emily Haw's avatar
Emily Haw committed
## What to do:
-'s avatar
- committed

Emily Haw's avatar
Emily Haw committed
# tokenisation
    - auto tokeniser
# data augmentation / preprocessing
Emily Haw's avatar
Emily Haw committed
    - synonym (wordnet) (azhara)
Emily Haw's avatar
Emily Haw committed
    - remove all caps
    - add text to samples
Emily Haw's avatar
Emily Haw committed
    - back translation (ella)
    - feature space synonym replacemnet (emily)
-'s avatar
- committed

Emily Haw's avatar
Emily Haw committed
# hyperparameter
Azhara's avatar
Azhara committed
azhara
- learning rate [0.0001, 0.0002, 0.0005, 0.001, 0.002, 0.005, 0.01] 
- optimizer on [AdamW, Adafactor]

ella
Emily Haw's avatar
Emily Haw committed
- early stopping (only req for longer epochs)
Azhara's avatar
Azhara committed
- num_train_epochs [1, 5, 10, 15, 20]

emily
- train_batch_size [8, 16, 32, 64, 128]
- scheduler ["linear_schedule_with_warmup", "polynomial_decay_schedule_with_warmup", "constant_schedule_with_warmup"]

# creative stuff
- model ["facebook/bart-large-cnn", "distilroberta-base", "bert-base-cased"]
Emily Haw's avatar
Emily Haw committed


# for augmentation
- percentage of word embeddings replaced in BERT (em)
    - how much percentage of all sentences

- synonym (azhara)
    - percentage of words replacing

Azhara's avatar
Azhara committed
- back translation (ella)
Emily Haw's avatar
Emily Haw committed
    - which languages, and amount of languages


- evaluate 


Emily Haw's avatar
Emily Haw committed
# one other newer model than roberta
-'s avatar
- committed


Emily Haw's avatar
Emily Haw committed
# sam