Commit b93e93a7 authored by Se Park's avatar Se Park

With Chinese instead of German

parent 66460561
# CO490 - NLP Course Labs (Spring 2020)
## Lab Notebooks
- **(16/01/2020) Lab 1:** Pre-processing and word representations [(Open in Colab)](https://colab.research.google.com/github/ImperialNLP/NLPLabs/blob/master/lab01/preprocessing_and_embeddings.ipynb)
- **(23/01/2020) Lab 2:** Text Classification: Sentiment Analysis [(Open in Colab)](https://colab.research.google.com/github/ImperialNLP/NLPLabs/blob/master/lab02/sentiment_classification.ipynb)
- **(30/01/2020) Lab 3:** Language Modelling
- Part I: N-gram modelling [(Open in Colab)](https://colab.research.google.com/github/ImperialNLP/NLPLabs/blob/master/lab03/ngram_lm.ipynb)
- Part II: Neural language models [(Open in Colab)](https://colab.research.google.com/github/ImperialNLP/NLPLabs/blob/master/lab03/neural_lm.ipynb)
- **(06/02/2020) Lab 4:** Part of Speech Tagging [(Open in Colab)](https://colab.research.google.com/github/ImperialNLP/NLPLabs/blob/master/lab04/POStagging.ipynb)
## Coursework
05/02/2020: A baseline model for the coursework has been [added](/coursework/baseline.ipynb) [(Open in Colab)](https://colab.research.google.com/github/ImperialNLP/NLPLabs/blob/master/coursework/baseline.ipynb)
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -74,5 +74,6 @@ class LoadData(Dataset):
segment_ids = torch.tensor(segment_ids, dtype=torch.long)
attn_mask = torch.tensor(attn_mask, dtype=torch.long)
score = torch.tensor(score, dtype=torch.float)
# score = torch.tanh(score)
return tokens_ids, attn_mask, segment_ids, score
......@@ -29,8 +29,8 @@ def evaluate(model, loss_fn, dataloader, device):
pred = np.concatenate((pred, qe_scores))
ref = np.concatenate((ref, labels))
print (f'pred: {pred}')
print (f'ref: {ref}')
# print (f'pred: {pred}')
# print (f'ref: {ref}')
eval_loss += loss.item()
count += 1
......@@ -87,8 +87,8 @@ if __name__ == "__main__":
optimizer = optim.AdamW(model.parameters(), lr=2e-5)
MAX_LEN = 64
train_set = LoadData(src_file=PATH/'data/train.ende.src', mt_file=PATH/'data/train.ende.mt', score_file=PATH/'data/train.ende.scores', maxlen=MAX_LEN)
val_set = LoadData(src_file=PATH/'data/dev.ende.src', mt_file=PATH/'data/dev.ende.mt', score_file=PATH/'data/dev.ende.scores', maxlen=MAX_LEN)
train_set = LoadData(src_file=PATH/'data/train.enzh.src', mt_file=PATH/'data/train.enzh.mt', score_file=PATH/'data/train.enzh.scores', maxlen=MAX_LEN)
val_set = LoadData(src_file=PATH/'data/dev.enzh.src', mt_file=PATH/'data/dev.enzh.mt', score_file=PATH/'data/dev.enzh.scores', maxlen=MAX_LEN)
train_loader = DataLoader(train_set, batch_size=32)
val_loader = DataLoader(val_set, batch_size=32)
......
......@@ -25,16 +25,11 @@ class QualityEstimation(nn.Module):
def forward(self, token_ids, segment_ids=None, attention_mask=None):
# Feeding the input to BERT model to obtain contextualized representations
# flat_token_ids = token_ids.view(-1, token_ids.size(-1))
# flat_segment_ids = segment_ids.view(-1, segment_ids.size(-1))
# flat_attention_mask = attention_mask.view(-1, attention_mask.size(-1))
# encoded_layers, _ = self.bert(flat_token_ids, flat_segment_ids, flat_attention_mask)
encoded_layers, _ = self.bert(input_ids=token_ids, token_type_ids=segment_ids, attention_mask=attention_mask)
encoded_layers = self.dropout(encoded_layers)
output, _ = self.lstm(encoded_layers)
# output = torch.tanh(self.fc1(output[:,-1,:]))
qe_scores = self.fc1(output[:,-1,:])
# qe_scores = torch.tanh(qe_scores)
return qe_scores
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment