Commit cbca8c9d authored by  Joel  Oksanen's avatar Joel Oksanen
Browse files

Finished Feature-level sentiment analysis section

parent e985c0da
\chapter{Background}
We begin this chapter by detailing the methodology of the ADA proposed by Cocarascu et al. \cite{RefWorks:doc:5e08939de4b0912a82c3d46c}. We then evaluate the limitations of ADA in relation to Amazon reviews, and suggest extensions to address them. We then move onto considering current research in the fields of NLP and conversational systems in order to establish a basis for our enhancements to the agent.
We begin this chapter by detailing the methodology of the ADA proposed by Cocarascu et al. \cite{RefWorks:doc:5e08939de4b0912a82c3d46c}. We then evaluate the limitations of ADA in relation to Amazon reviews, and suggest extensions to address them. We then move onto considering current research in the fields of aspect-level sentiment analysis and conversational systems in order to establish a basis for our enhancements to the agent.
\section{Argumentative Dialogical Agent}
......@@ -359,7 +359,7 @@ The Amazon reviews are for the \textit{Canon IXUS 185} digital camera and the \t
\paragraph{Features}
\hangindent=\parindent
\hangafter=0
Because all Rotten Tomatoes reviews are on movies, there are features such as \textit{characters}, found in both of the movie reviews, that are common to all movies. On the other hand, Amazon has reviews for a large variety of different products, and therefore the main features might be very different from one item to another. For example, the feature \textit{cleanliness} is very important to an electric toothbrush but not to a digital camera, and vice versa for \textit{image quality}. This limits the possibility of having predetermined features, and emphasises the importance of mined features. However, some features might be common to wider product categories such as \textit{battery} in electronics. Furthermore, there are features such as \textit{price} and \textit{shipping} that apply to all Amazon products. The possibility for different \textit{tiers} of feature-based representation has not been explored in ADAs.
Because all Rotten Tomatoes reviews are on movies, there are features such as \textit{characters}, found in both of the movie reviews, that are common to all movies. On the other hand, Amazon has reviews for a large variety of different products, and therefore the main features might be very different from one item to another. For example, the feature \textit{cleanliness} is very important to an electric toothbrush but not to a digital camera, and vice versa for \textit{image quality}. The vast amount of products limits the possibility of having predetermined features, and emphasises the importance of unsupervised feature extraction. However, some features might be common to wider product categories such as \textit{battery} in electronics. Furthermore, there are features such as \textit{price} and \textit{shipping} that apply to all Amazon products. The possibility for different \textit{tiers} of feature-based representation has not been explored in ADAs.
\paragraph{Writing style}
\hangindent=\parindent
\hangafter=0
......@@ -373,23 +373,69 @@ Based on these two differences, we propose two extensions to ADA in order to acc
\item A method for sentiment analysis in Amazon's more heterogeneous review domain.
\end{enumerate}
\section{Natural language processing}
\section{Feature-level sentiment analysis}
In this section, we will examine state of the art research in natural language processing (NLP), particularly in the fields of feature extraction and sentiment analysis. This will guide our implementation of the extensions proposed in section 2.1.
In this section, we will examine state of the art research in \textit{aspect-level sentiment analysis} \cite{RefWorks:doc:5e2b0d8de4b0711bafe4fba8}, which attempts to determine people's opinions on \textit{entities} and their \textit{aspects}. We have already been introduced to aspect-level sentiment analysis in ADA's review aggregation, where the entities are \textit{products} and the aspects are their \textit{features}. For consistency, we will refer to entities as products and aspects as features. Further research into this area will guide our implementation of the extensions proposed in section 2.1.
Consider the following review for a particular model of the \textit{Philips Sonicare} electric toothbrush range:
\begin{center}
\textit{Trust me, as someone who has owned several terrible \\ Sonicare toothbrushes before: this model is great.}
\end{center}
\noindent
Since ADA would associate \textit{Sonicare toothbrush} with \textit{terrible}, it would incorrectly extract a negative vote for the product. This is because ADA:
\begin{enumerate}
\item cannot distinguish that the toothbrush under review is referred to as \textit{this model};
\item will calculate only a single sentiment for the entire review, while it actually represents a negative sentiment towards other Sonicare toothbrushes and a positive sentiment towards this particular toothbrush.
\end{enumerate}
The first error has to do with \textit{feature extraction}, which accounts for the first part of feature-level sentiment analysis where the agent extracts features from the text. The second error has to do with \textit{feature sentiment analysis}, where the extracted features are assigned with sentiment polarities. The following sections evaluate research in these two areas in hopes of finding a way to resolve these errors.
\subsection{Feature extraction}
Predetermined (metadata) \newline
Mined (semantic network: ConceptNet: Feature categorization) \newline
Mined (deep learning) \newline
Representation tiers
In order to obtain an unsupervised feature-based representation for a product, we must have a way to extract the features from some source of information about the product, such as metadata or its review texts. As the availability of metadata varies greatly from one product category to another, we shall focus our analysis on feature extraction from the review texts in order to not limit the set of Amazon products supported by our implementation.
\subsubsection{Opinion target extraction}
A lot of research has already gone into feature extraction from text using NLP methods in the form of \textit{opinion target extraction}. Most studies on this topic can be categorised into \textit{rule-based methods} or \textit{supervised machine learning based methods} \cite{RefWorks:doc:5e374c55e4b0d3ee568e80d6}. The former relies on lexical data or syntactic rules to detect features, such as in \cite{RefWorks:doc:5e38230ae4b07b376b61b3fe}, while the latter models the problem as a sequence labelling task with the use of Conditional Random Fields (CRFs) \cite{RefWorks:doc:5e381a1ce4b084bfe828c41a}. More recently, deep learning methods have also been proposed for the problem, an overview of which can be found in \cite{RefWorks:doc:5e2b0d8de4b0711bafe4fba8}.
When it comes to Amazon review texts, the issue with many of these methods is that they are domain-dependent. Particularly for the machine learning methods, the models work well on the domain on which they are trained on, but may face a performance drop of up to $40\%$ when tested in different domains \cite{RefWorks:doc:5e374c55e4b0d3ee568e80d6}. Therefore, a model trained on digital camera reviews might not perform well on reviews for garden hoses. On the other hand, a model trained on the whole Amazon review dataset might be too general for individual domains.
Some cross-domain feature extraction methods have been proposed \cite{RefWorks:doc:5e374c55e4b0d3ee568e80d6}, but these still perform significantly worse than the single-domain methods discussed above. Furthermore, none of these methods take advantage of the semantic relationships between the product and its features, but look for opinion targets in the absence of any existing product information.
\subsubsection{ConceptNet}
Semantic information can be obtained from \textit{ConceptNet} \cite{RefWorks:doc:5e382bf3e4b0034ec2324aed}, which is a graph connecting words and phrases with labelled and weighted edges expressing semantic relations between the words. As our goal is to obtain features of a product, the relations we are most interested in are:
\begin{itemize}
\item \textit{CapableOf} for capabilities of products;
\item \textit{HasA} for parts of products;
\item \textit{MadeOf} for materials of products;
\item \textit{UsedFor} for uses of products.
\end{itemize}
For example, the term \textit{electric toothbrush}\footnote{http://conceptnet.io/c/en/electric\_toothbrush} is related by the \textit{UsedFor} relation to \textit{cleanliness} and by the \textit{CapableOf} relation to \textit{run on batteries}. However, many reviews for electronic toothbrushes comment on the product \textit{model} or its timer functionality, neither of which are terms related to electronic toothbrushes on ConceptNet. Due to the incompleteness of ConceptNet, a hybrid approach to feature extraction where opinion targets are mined from review texts for wider categories of products to complement the product-specific semantic information from ConceptNet might be the most effective.
% Another possible source of information is the semantic network ConceptNet, which can identify related terms based on semantic information.
%\cite{RefWorks:doc:5e374c55e4b0d3ee568e80d6}
%Predetermined (metadata) \newline
%Mined (semantic network: ConceptNet: Feature categorization) \newline
%Mined (deep learning) \newline
%Representation tiers
\subsection{Feature sentiment analysis}
After we have extracted opinion targets (arguments) from a review, we wish to discern whether the opinions towards the arguments are positive or negative through sentiment analysis. Perhaps the main difficulty in feature-dependent sentiment analysis is to distinguish which opinions are acting on which arguments.
ADA attempts to tackle this issue by diving the review into phrases at specific keywords, such as the word \textit{but} in \textit{I liked the acting, but the cinematography was dreadful}, after which it assumes each phrase contains at most one sentiment. However, there are many cases where such a simple method will not work, like the example at the start of this section. This is particularly true for Amazon reviews where the text tends to be less formal compared to Rotten Tomatoes reviews.
More advanced methods using deep learning have been proposed in literature, although the task is deemed difficult and there is currently no dominating technique for this purpose \cite{RefWorks:doc:5e2b0d8de4b0711bafe4fba8}. Dong et al. \cite{RefWorks:doc:5e2e107ce4b0bc4691206e2e} proposed an \textit{adaptive recursive neural network} (AdaRNN) for target-dependent Twitter sentiment classification, which propagates the sentiments of words to the target by exploiting the context and the syntactic relationships between them. The result were promising, and the domain of Twitter is similar to Amazon reviews in terms of formality. The results were compared with a re-implementation of \textit{SVM-dep} proposed by Jiang et al. \cite{RefWorks:doc:5e2e1e23e4b0e67b35d1c360}, which uses target-dependent syntactic features in a SVM classifier instead of a neural network. As SVM-dep performed nearly as well as AdaRNN, either could be used to improve ADAs sentiment analysis accuracy.
\subsection{Sentiment analysis}
Sentiment analysis vs. argument modelling \newline
\cite{RefWorks:doc:5e08939de4b0912a82c3d46c} counter example: “although the screen is very clear, the battery life is too short.” \newline
Aspect-level sentiment classification: "three important tasks in aspect-level sentiment classification using neural networks" \cite{RefWorks:doc:5e2b0d8de4b0711bafe4fba8} \newline
AdaRNN \cite{RefWorks:doc:5e2e107ce4b0bc4691206e2e} vs. SVM-dep \cite{RefWorks:doc:5e2e1e23e4b0e67b35d1c360}
%Sentiment analysis vs. argument modelling \newline
%\cite{RefWorks:doc:5e08939de4b0912a82c3d46c} counter example: “although the screen is very clear, the battery life is too short.” \newline
%Aspect-level sentiment classification: "three important tasks in aspect-level sentiment classification using neural networks" \cite{RefWorks:doc:5e2b0d8de4b0711bafe4fba8} \newline
%AdaRNN \cite{RefWorks:doc:5e2e107ce4b0bc4691206e2e} vs. SVM-dep \cite{RefWorks:doc:5e2e1e23e4b0e67b35d1c360}
\section{Conversational systems}
......
@inproceedings{RefWorks:doc:5e382bf3e4b0034ec2324aed,
author={Robyn Speer and Joshua Chin and Catherine Havasi},
year={2017},
title={ConceptNet 5.5: An Open Multilingual Graph of General Knowledge},
booktitle={Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence},
series={AAAI’17},
publisher={AAAI Press},
location={San Francisco, California, USA},
pages={4444–4451}
}
@article{RefWorks:doc:5e38230ae4b07b376b61b3fe,
author={Guang Qiu and Bing Liu and Jiajun Bu and Chun Chen},
year={2011},
month={mar},
title={Opinion Word Expansion and Target Extraction through Double Propagation},
journal={Comput.Linguist.},
volume={37},
number={1},
pages={9–27},
isbn={0891-2017},
url={https://doi.org/10.1162/coli_a_00034},
doi={10.1162/coli_a_00034}
}
@inproceedings{RefWorks:doc:5e381a1ce4b084bfe828c41a,
author={John D. Lafferty and Andrew McCallum and Fernando C. N. Pereira},
year={2001},
title={Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data},
booktitle={Proceedings of the Eighteenth International Conference on Machine Learning},
series={ICML ’01},
publisher={Morgan Kaufmann Publishers Inc},
address={San Francisco, CA, USA},
pages={282–289},
isbn={1558-607781}
}
@inproceedings{RefWorks:doc:5e374c55e4b0d3ee568e80d6,
author={Ying Ding and Jianfei Yu and Jing Jiang},
year={2017},
title={Recurrent Neural Networks with Auxiliary Labels for Cross-Domain Opinion Target Extraction},
booktitle={Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence},
series={AAAI’17},
publisher={AAAI Press},
location={San Francisco, California, USA},
pages={3436–3442}
}
@inproceedings{RefWorks:doc:5e349b0ce4b033832f2cb721,
author={Julian McAuley and Jure Leskovec},
year={2013},
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment