Commit f0285f85 authored by  Joel  Oksanen's avatar Joel Oksanen

Finished ontology chapter

parent 82eae0c4
......@@ -405,18 +405,13 @@ When it comes to Amazon review texts, the issue with many of these methods is th
Some cross-domain feature extraction methods have been proposed \cite{RefWorks:doc:5e374c55e4b0d3ee568e80d6}, but these still perform significantly worse than the single-domain methods discussed above. Furthermore, none of these methods take advantage of the semantic relationships between the product and its features, but look for opinion targets in the absence of any existing product information.
\subsubsection{ConceptNet and WordNet}
\subsubsection{Common-sense knowledge bases}
Semantic information can be obtained from \textit{ConceptNet} \cite{RefWorks:doc:5e382bf3e4b0034ec2324aed}, which is a common-sense knowledge graph connecting words and phrases with labelled and weighted edges expressing semantic relations between the words. As our goal is to obtain features of a product, the relations we are most interested in are:
\begin{itemize}
\item \textit{CapableOf} for capabilities of products;
\item \textit{HasA} for parts of products;
\item \textit{MadeOf} for materials of products;
\item \textit{UsedFor} for uses of products.
\end{itemize}
For example, the term \textit{electric toothbrush}\footnote{http://conceptnet.io/c/en/electric\_toothbrush} is related by the \textit{UsedFor} relation to \textit{cleanliness} and by the \textit{CapableOf} relation to \textit{run on batteries}. However, many reviews for electronic toothbrushes comment on the product \textit{model} or its \textit{timer} functionality, neither of which are terms related to electronic toothbrushes on ConceptNet.
Semantic information can be obtained from \textit{ConceptNet} \cite{RefWorks:doc:5e382bf3e4b0034ec2324aed}, which is a \textit{common-sense knowledge graph} connecting words and phrases with labelled and weighted edges expressing semantic relations between the words. As our goal is to obtain features of a product, we are most interested in the \textit{HasA} relation. For example, the term \textit{watch}\footnote{http://conceptnet.io/c/en/watch} is related by the \textit{HasA} relation to \textit{a hand that counts minutes}. However, many reviews for watches comment on the watch's \textit{face} or its \textit{band}, neither of which are terms related to watches on ConceptNet. Furthermore, the knowledge on ConceptNet is loosely structured as natural language descriptions, whereas ADA requires more structured information: for a \textit{watch}, we require the feature \textit{hand} instead of \textit{a hand that counts minutes}.
Some methods have been proposed for \textit{automatic common-sense completion}, which aims to structure the information in common-sense knowledge graphs such as ConceptNet. \textit{Common-sense Transformers} (COMeT) proposed by Bosselut et al.\ \cite{RefWorks:doc:5edf7951e4b0846eecf1d89f} is one such state-of-the-art method, which uses the Transformer architecture described in Section \ref{sec:BERT} to generate structured knowledge based on ConceptNet. However, even COMeT appears incomplete in terms of the \textit{HasA} relation, as it is missing many of the same features of the term \textit{watch} \footnote{https://mosaickg.apps.allenai.org/comet\_conceptnet/?l=watch\&r=HasA} as ConceptNet.
\textit{WordNet} \cite{RefWorks:doc:5ed503e0e4b081759f6d02e2}, a large manually constructed lexical database, is another popular source for semantic information between words. Although WordNet includes fewer relations than ConceptNet, it includes the relation of \textit{meronymy}, which denotes a word being a constituent part of another word, similar to the \textit{HasA} and \textit{MadeOf} relations in ConceptNet. For example, the term \textit{electric toothbrush}\footnote{http://wordnetweb.princeton.edu} has a single meronym of \textit{electric motor}. As with ConceptNet, the list of relations on WordNet is incomplete, as it fails to include relevant meronyms such as the timer.
\textit{WordNet} \cite{RefWorks:doc:5ed503e0e4b081759f6d02e2}, a large manually constructed lexical database, is another popular source for common-sense lexical information. Although WordNet includes fewer relations than ConceptNet, it includes the relation of \textit{meronymy}, which denotes a word being a constituent part of another word, similar to the \textit{HasA} and \textit{MadeOf} relations in ConceptNet. For example, the term \textit{electric toothbrush}\footnote{http://wordnetweb.princeton.edu} has a single meronym of \textit{electric motor}. As with ConceptNet, the list of relations on WordNet is incomplete, as it fails to include relevant meronyms such as the timer.
Due to the low recall of ConceptNet and WordNet in obtaining product features, our approach to target extraction must take advantage of the review texts as a source of information about the product. However, we can evaluate our implementation against ConceptNet and WordNet, or possibly take a hybrid approach to feature extraction where features are mined from review texts to complement the existing semantic information on ConceptNet and WordNet.
......@@ -431,6 +426,7 @@ More advanced methods using deep learning have been proposed in literature, alth
However, both methods were trained and tested in the same domain including tweets about celebrities, companies and consumer electronics. The performance would likely drop substantially in a separate domain, as the sentiment polarity of a word can be highly dependent on context: for example the adjective \textit{hard} has a positive connotation when describing a protective case, but a negative connotation when describing an armchair.
\section{BERT}
\label{sec:BERT}
Both of the two NLP tasks relevant to ADA, feature extraction and feature-dependent sentiment analysis, become more difficult to perform in a general domain, which is crucial to applying ADA for all Amazon products. BERT \cite{RefWorks:doc:5e8dcdc3e4b0dba02bfdfa80}, which stands for \textit{Bidirectional Encoder Representations from Transformers}, is a state-of-the-art language representation model, which uses pre-training to learn language representations from large amounts of unlabelled text, which can then be fine-tuned for more specific NLP tasks. Because the unlabelled text used to pre-train BERT comes from a general domain corpus such as Wikipedia, the knowledge in a pre-trained BERT model is especially well-suited for domain-independent NLP tasks such as ours. This section will provide a brief overview of the BERT architecture and the underlying \textit{Transformer network}, shown in Figure \ref{fig:BERTarchitecture}. The paper presents two model sizes for BERT, BERT-base and BERT-large, which differ in terms of the size of various hyperparameters. In this paper, we will use the former due to memory constraints.
......
@article{RefWorks:doc:5edfe5c1e4b064c22cd56d15,
author={Mary L. McHugh},
year={2012},
title={Interrater reliability: the kappa statistic},
journal={Biochemia medica: Biochemia medica},
volume={22},
number={3},
pages={276-282}
}
@inproceedings{RefWorks:doc:5edca760e4b0ef3565a5f38d,
author={Tomas Mikolov and Ilya Sutskever and Kai Chen and Greg S. Corrado and Jeff Dean},
year={2013},
......@@ -70,11 +79,11 @@
isbn = {1532-0464},
doi={https://doi.org/10.1016/j.jbi.2020.103384}
}
@misc{RefWorks:doc:5eb97f10e4b084c78e199bc3,
author = {Antoine Bosselut and Hannah Rashkin and Maarten Sap and Chaitanya Malaviya and Asli Celikyilmaz and Yejin Choi},
year = {2019},
title = {Comet: Commonsense transformers for automatic knowledge graph construction},
journal = {arXiv preprint arXiv:1906.05317}
@article{RefWorks:doc:5edf7951e4b0846eecf1d89f,
author={Antoine Bosselut and Hannah Rashkin and Maarten Sap and Chaitanya Malaviya and Asli Celikyilmaz and Yejin Choi},
year={2019},
title={Comet: Commonsense transformers for automatic knowledge graph construction},
journal={arXiv preprint arXiv:1906.05317}
}
@misc{RefWorks:doc:5eb97a3ae4b04ec536ff3ba1,
author = {S. Huang and X. Liu and X. Peng and Z. Niu},
......
......@@ -19,6 +19,8 @@
\usepackage{multirow}
\usepackage{pgfplots}
\usepackage{listings}
\lstset{basicstyle=\ttfamily\footnotesize,breaklines=true}
\renewcommand{\figurename}{Listing}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment