Commit e985c0da authored by  Joel  Oksanen's avatar Joel Oksanen
Browse files

1) Finished background section on ADA in report. 2) Added support for sub-features.

parent 7c4096e6
......@@ -74,7 +74,7 @@ def extract_votes(phrases):
if abs(sentiment) > sentiment_threshold:
for reviewable in reviewables:
if (reviewable not in votes) or (abs(votes[reviewable]) < abs(sentiment)):
votes[reviewable] = sentiment
votes[reviewable] = sentiment # what if there's two phrases with same reviewable?
# normalize votes to 1 (+) or -1 (-)
for reviewable in votes:
votes[reviewable] = 1 if votes[reviewable] > 0 else -1
......@@ -90,7 +90,7 @@ def augment_votes(votes):
if subfeat in votes:
polar_sum += votes[subfeat]
if polar_sum != 0:
votes[reviewable] = 1 if polar_sum > 0 else 0
votes[reviewable] = 1 if polar_sum > 0 else -1
def get_qbaf(ra, review_count):
# sums of all positive and negative votes for reviewables
......@@ -101,16 +101,15 @@ def get_qbaf(ra, review_count):
if r['reviewable'] == reviewable:
reviewable_sums[reviewable] += r['vote']
# if there are sub-features, calculate attack/support relations here
supporters = []
attackers = []
# calculate attack/support relations for camera
for feature in camera.children:
if reviewable_sums[feature] > 0:
supporters.append(feature)
elif reviewable_sums[feature] < 0:
attackers.append(feature)
supporters = {r: [] for r in reviewables}
attackers = {r: [] for r in reviewables}
for r in reviewables:
for subf in r.children:
if reviewable_sums[subf] > 0:
supporters[r].append(subf)
elif reviewable_sums[subf] < 0:
attackers[r].append(subf)
# calculate base scores for reviewables
base_scores = {}
......@@ -143,9 +142,9 @@ def get_strengths(qbaf):
attacker_strengths = []
supporter_strengths = []
for child in reviewable.children:
if child in qbaf["attackers"]:
if child in qbaf["attackers"][reviewable]:
attacker_strengths.append(strengths[child])
elif child in qbaf["supporters"]:
elif child in qbaf["supporters"][reviewable]:
supporter_strengths.append(strengths[child])
strengths[reviewable] = argument_strength(qbaf["base_scores"][reviewable], attacker_strengths, supporter_strengths)
return strengths
......@@ -201,3 +200,24 @@ print("mae: ", mae)
# plot result correlation
pyplot.scatter(camera_strengths, scaled_star_rating_avgs)
pyplot.show()
# vs = [{camera: 1, image: 1, zoom: -1},
# {camera: 1, image: 1, battery: -1},
# {image: 1, battery: 1, af: 1},
# {af: 1},
# {camera: -1, zoom: -1},
# {camera: -1, image: -1, af: 1},
# {battery: -1}]
#
# ra = []
# for v in vs:
# print(v)
# augment_votes(v)
# print(v)
# for reviewable in v:
# ra.append({'reviewable': reviewable, 'vote': v[reviewable]})
#
# qbaf = get_qbaf(ra, len(vs))
# strengths = get_strengths(qbaf)
# print(qbaf)
# print(strengths)
\chapter{Background}
We begin this chapter by detailing the methodology of the ADA proposed by Cocarascu et al. \cite{RefWorks:doc:5e08939de4b0912a82c3d46c}. We then move onto considering current research in the fields of NLP and conversational systems in order to establish a basis for our enhancements to the agent.
We begin this chapter by detailing the methodology of the ADA proposed by Cocarascu et al. \cite{RefWorks:doc:5e08939de4b0912a82c3d46c}. We then evaluate the limitations of ADA in relation to Amazon reviews, and suggest extensions to address them. We then move onto considering current research in the fields of NLP and conversational systems in order to establish a basis for our enhancements to the agent.
\section{Argumentative Dialogical Agent}
......@@ -77,26 +77,35 @@ ADA mines votes for and against the arguments in $\mathcal{A}$ from the review s
A review aggregation for product p is a tuple $\mathcal{R}(p) = \langle \mathcal{U}, \mathcal{V} \rangle$ where $\mathcal{U}$ is a finite, non-empty set of users (reviewers) and $\mathcal{V} : \mathcal{U} \times \mathcal{A} \to \{-,+\}$ is a partial function, with $\mathcal{V}(u, \alpha)$ representing the vote of user $u$ on argument $\alpha$.
\end{definition}
The mining is performed on each review snippet individually, and follows a three-step process of \textit{tokenisation}, \textit{argument detection}, and \textit{sentiment analysis} detailed below.
\subsubsection{Tokenisation}
\paragraph{Tokenisation}
\hangindent=\parindent
\hangafter=0
The review snippet is tokenised into sentxences, which are further split into phrases at specific keywords (\textit{but}, \textit{although}, \textit{though}, \textit{otherwise}, \textit{however}, \textit{unless}, \textit{whereas}, \textit{despite}). Each phrase can then possibly constitute a negative or a positive vote for one or multiple arguments.
\subsubsection{Argument detection}
\paragraph{Argument detection}
\hangindent=\parindent
\hangafter=0
ADA determines the arguments on which a vote acts using a \textit{glossary} $\mathcal{G}$ of words related to each argument. For our earlier digital camera example, the glossary could be as follows:
\begin{align*}
\mathcal{G}(p) &= \{camera, device, product\}; \\
\mathcal{G}(f_I) &= \{image, picture, photo\}; \\
\mathcal{G}(f_L) &= \{lens\}; \\
\mathcal{G}(f_B) &= \{battery\}.
\mathcal{G}(f_B) &= \{battery\}; \\
\mathcal{G}(f_{L1}') &= \{zoom\}; \\
\mathcal{G}(f_{L2}') &= \{autofocus\}.
\end{align*}
A phrase constitutes a vote towards an argument $\alpha$ if the phrase contains a word from $\mathcal{G}(\alpha)$. Sub-features take precedence over their parents, so for example a phrase with words from both $\mathcal{G}(f)$ and $\mathcal{G}(p)$ will constitute a single vote for $f$. A phrase that contains words corresponding to multiple unrelated features results in a vote for each of the corresponding features.
\subsubsection{Sentiment analysis}
\paragraph{Sentiment analysis}
\hangindent=\parindent
\hangafter=0
ADA uses NLP methods to determine whether the vote is for or against the argument(s). In their paper, Cocarascu et al. compared two different NLP methods for this purpose, Sentiment Analysis (SA) and Argument Mining (AM), finding that SA performed slightly better than AM. Furthermore, SA is a more established field of research than AM, which supports our decision to use SA for this project. The ADA for Rotten Tomatoes used an off-the-shelf classifier on movie reviews for its SA, which calculates a sentiment polarity for the phrase in $[-1, 1]$, with $-1$ and $1$ denoting universally negative and positive sentiments respectively. A phrase with a highly positive (negative) polarity forms a vote for (against) the argument(s), while a phrase with an absolute polarity less than a pre-set polarity threshold of 0.6 is filtered out as neutral.
\newline
\noindent
To illustrate review aggregation, consider this digital camera review from user $u$:
\begin{center}
\textit{The image quality was great, although the battery life could be a bit better.}
\textit{The picture quality was great, although the zoom wasn't as good as advertised.}
\end{center}
ADA extracts two phrases, $p_1$: \textit{The image quality was great} and $p_2$: \textit{the battery life could be a bit better}. ADA then connects $p_1$ with $f_I$ and $p_2$ with $f_B$ as \textit{image} $\in G(f_I)$ and \textit{battery} $\in G(f_B)$. ADA then obtains the sentiment polarities $(p_1, 0.863)$ and $(p_2, -0.429)$. As the polarity for $p_2$ does not surpass the threshold of 0.6, ADA finally assigns only one vote of $\mathcal{V}(u,f_I) = +$ for the phrase.
ADA extracts two phrases, $p_1$: \textit{The image quality was great} and $p_2$: \textit{the zoom wasn't as good as advertised}. ADA then connects $p_1$ with $f_I$ and $p_2$ with $f_B$ as \textit{picture} $\in G(f_I)$ and \textit{zoom} $\in G(f_{L1}')$. ADA then obtains the sentiment polarities $(p_1, 0.863)$ and $(p_2, -0.629)$. As the polarity for both surpass the threshold of $\pm0.6$, ADA assigns two votes of $\mathcal{V}(u,f_I) = +$ and $\mathcal{V}(u,f_{L1}') = -$ for the phrase.
\subsubsection{Augmented review aggregation}
Online reviews are often brief: Amazon reviews have a median length of just 82 words according to a 2013 study \cite{RefWorks:doc:5e349b0ce4b033832f2cb721}. This can result in sparsely populated review aggregations. To fix this issue, the review aggregation can be augmented as follows:
......@@ -156,38 +165,38 @@ Figure \ref{fig:QBAF} shows an example of a QBAF for a digital camera, with the
\node[state][label=right:
\begin{tabular}{@{}c@{}}
$\sigma : 0.81$ \\
$\tau:0.64$
$\sigma:0.71$ \\
$\tau:0.57$
\end{tabular}]
(p) {$p$};
\node[state][label=below:
\begin{tabular}{@{}c@{}}
$\sigma : 0.81$ \\
$\tau:0.64$
$\sigma:0.27$ \\
$\tau:0.14$
\end{tabular}]
(fl) [below of=p] {$f_L$};
\node[state][label=left:
\begin{tabular}{@{}c@{}}
$\sigma : 0.81$ \\
$\tau:0.64$
$\sigma:0.29$ \\
$\tau:0.29$
\end{tabular}]
(fi) [below left of=p] {$f_I$};
\node[state][label=right:
\begin{tabular}{@{}c@{}}
$\sigma : 0.81$ \\
$\tau:0.64$
$\sigma:0.14$ \\
$\tau:0.14$
\end{tabular}]
(fb) [below right of=p] {$f_B$};
\node[state][label=left:
\begin{tabular}{@{}c@{}}
$\sigma : 0.81$ \\
$\tau:0.64$
$\sigma:0.29$ \\
$\tau:0.29$
\end{tabular}]
(fl1) [below left of=fl] {$f_{L1}'$};
\node[state][label=right:
\begin{tabular}{@{}c@{}}
$\sigma : 0.81$ \\
$\tau:0.64$
$\sigma:0.43$ \\
$\tau:0.43$
\end{tabular}]
(fl2) [below right of=fl] {$f_{L2}'$};
......@@ -226,7 +235,10 @@ Let $\sigma(\alpha) \in [0,1]$ represent the \textit{dialogical strength measure
where $E = \sum_{\beta\in\mathcal{L}^+(\alpha)}\sigma(\beta)-\sum_{\gamma\in\mathcal{L}^-(\alpha)}\sigma(\gamma)$.
\end{definition}
For a product $p$, a low (high) $\sigma(p)$ signifies a negative (positive) overall sentiment towards the product and its features, while $\sigma(p) = 0.5$ signifies a neutral sentiment. Therefore, $\sigma(p)$ can be compared with the average user rating for $p$ in order to evaluate the accuracy of ADA.
For a product $p$, a low (high) $\sigma(p)$ signifies a negative (positive) overall sentiment towards the product and its features, while $\sigma(p) = 0.5$ signifies a neutral sentiment. Therefore, $\sigma(p)$ can be compared with the average user rating for $p$ in order to evaluate the accuracy of ADA.
The QBAF in figure \ref{fig:QBAF} is annotated with DF-QuAD strengths for its arguments. Since $p(\sigma)=0.71$, users have an overall positive sentiment towards the camera.
\subsection{Dialogical explanations}
......@@ -283,31 +295,94 @@ ADA can generate dialogical explanations for arguments based on the extracted QB
\end{itemize}
\end{definition}
When a user requests an explanation for the sentiment towards an argument $\alpha$, ADA responds by providing another argument $\beta$ that supports $\alpha$, and possibly also contrasts it with a third argument $\gamma$ that attacks $\alpha$. A user can also request a direct quote from one of the reviewers on an argument. A conversation between a user and ADA on a Canon IXUS 185 digital camera\footnote{https://www.amazon.co.uk/Canon-IXUS-185-Digital-Camera/dp/B01N6JP07Y} could take the following form:
\newline\newline
\begin{tabular}{p{1cm}p{\textwidth-2cm}}
When a user requests an explanation for the sentiment towards an argument $\alpha$, ADA responds by providing another argument $\beta$ that supports $\alpha$, and possibly also contrasts it with a third argument $\gamma$ that attacks $\alpha$. A user can also request a direct quote from one of the reviewers on an argument. For example, a conversation between a user and ADA on the \textit{Canon IXUS 185} digital camera\footnote{https://www.amazon.co.uk/Canon-IXUS-185-Digital-Camera/dp/B01N6JP07Y} could take the following form:
\\
\begin{tabular}{@{}p{1cm}p{\textwidth-2cm}}
\textbf{User}:&\textit{Why was the Canon IXUS 185 highly rated?}\\
\textbf{ADA}:&\textit{The product was highly rated because the lens was good, although the battery was poor.}\\
\textbf{User}:&\textit{Why was the lens considered to be good?}\\
\textbf{ADA}:&\textit{The lens was considered to be good because the autofocus was good, although the zoom was poor.}\\
\textbf{User}:&\textit{What did users say about the zoom being poor?}\\
\textbf{ADA}:&\textit{"...example sentence..."}\\
\textbf{ADA}:&\textit{"...the zoom wasn't as good as advertised..."}\\
\end{tabular}
%% TODO: fix QBAF figure numbers to fit this example, make example sentence about zoom we can use it above!
\subsection{ADA for Amazon reviews}
RT and Amazon review examples and comparison \newline
Sentiment Analysis choice: off-the-shelf classifier: too specific for this project's domain: does not distinguish between different polarities
We will conclude this section by discussing the limitations of ADA in the Amazon review domain. To illustrate the differences between Amazon and Rotten Tomatoes reviews, let us take examples of a positive and a negative review from each site:\\
\begin{minipage}[t]{0.5\textwidth-\parindent-0.25cm}
\begin{center}
\textbf{Amazon reviews}
\bigbreak
\textit{Canon IXUS 185 Digital Camera - Black}
\\
rated 4/5 stars
\end{center}
\textit{About the size of a box of 10 cigarettes. Really light fits into any trousers pocket good quality picture better than any of the cheaper models with the same \textbf{specs}. I love \textbf{ot} because its a high enough quality to make it worth while taking it everywhere and \textbf{architecture being my interest} \mytilde I Take pictures of buildings all the time.}
\end{minipage}
\hspace{0.5cm}
\begin{minipage}[t]{0.5\textwidth-\parindent-0.25cm}
\begin{center}
\textbf{Rotten Tomatoes reviews}
\bigbreak
\textit{Star Wars: The Rise of Skywalker} (2019)
\\
rated Fresh\footnotemark
\end{center}
\textit{The pacing in the first act leaves something to be desired but at the end of the day, the joy I felt watching this beloved story come to its conclusion could not be diminished. The action, the \textbf{characters}, and the overall world won me over one last time.}
\end{minipage} \\
\vspace{0.2cm}
\footnotetext{Individual reviews on Rotten Tomatoes are rated Fresh (positive) or Rotten (negative).}
\begin{minipage}[t]{0.5\textwidth-\parindent-0.25cm}
\begin{center}
\textit{Philips Sonicare Electric Toothbrush}
\\
rated 2/5 stars
\end{center}
\textit{So, I really enjoy how clean my teeth feel after using this toothbrush. It has a built in 2 minute timer which is also nice and the \textbf{battery} stays charged for several days. My complaint is that when I changed the head of the toothbrush after the 3 month suggested timeframes, there was was mold in the handle of the base. I am considering purchasing something else because using something that contains mold close to my mouth isnt very appealing.}
\end{minipage}
\hspace{0.5cm}
\begin{minipage}[t]{0.5\textwidth-\parindent-0.25cm}
\begin{center}
\textit{A Rainy in New York} (2019)
\\
rated Rotten
\end{center}
\textit{Woody, too old to flirt with a pretty young thing, has split up the chore amongst multiple older \textbf{characters}, as if that would somehow dilute the ick factor.}
\end{minipage}
\vspace{0.5cm}
The Amazon reviews are for the \textit{Canon IXUS 185} digital camera and the \textit{Philips Sonicare} electric toothbrush, and the Rotten Tomatoes reviews are for the movies \textit{Star Wars: The Rise of Skywalker} (2019) and \textit{A Rainy Day in New York} (2019). We notice some key differences between the Amazon Reviews and the Rotten Tomatoes reviews:
\paragraph{Features}
\hangindent=\parindent
\hangafter=0
Because all Rotten Tomatoes reviews are on movies, there are features such as \textit{characters}, found in both of the movie reviews, that are common to all movies. On the other hand, Amazon has reviews for a large variety of different products, and therefore the main features might be very different from one item to another. For example, the feature \textit{cleanliness} is very important to an electric toothbrush but not to a digital camera, and vice versa for \textit{image quality}. This limits the possibility of having predetermined features, and emphasises the importance of mined features. However, some features might be common to wider product categories such as \textit{battery} in electronics. Furthermore, there are features such as \textit{price} and \textit{shipping} that apply to all Amazon products. The possibility for different \textit{tiers} of feature-based representation has not been explored in ADAs.
\paragraph{Writing style}
\hangindent=\parindent
\hangafter=0
As the reviews on Rotten Tomatoes are written by critics, the style of writing tends to be similar across all reviews. On the other hand, Amazon reviews can be written by anyone who has purchased the product, so the style is more individualistic. For example in the review for the \textit{Canon IXUS 185} digital camera, the user comments on their personal interests. Colloquial language (\textit{'specs'}) and spelling mistakes (\textit{'ot'}) are also common. Due to these differences, using the same off-the-shelf movie review classifier for sentiment analysis on Amazon reviews would not be sensible. \newline
%Sentiment Analysis choice: off-the-shelf classifier: too specific for this project's domain: does not distinguish between different polarities
Based on these two differences, we propose two extensions to ADA in order to accommodate Amazon reviews:
\begin{enumerate}
\item A generic method of feature extraction that will work for any product, possibly taking advantage of product categories with \textit{tiered feature-based representations}.
\item A method for sentiment analysis in Amazon's more heterogeneous review domain.
\end{enumerate}
\section{Natural language processing}
In this section, we will examine state of the art research in natural language processing (NLP), particularly in the fields of feature extraction and sentiment analysis. This will guide our implementation of the extensions proposed in section 2.1.
\subsection{Feature extraction}
Predetermined (metadata) \newline
Mined (semantic network: ConceptNet: Feature categorization) \newline
Mined (deep learning)
Mined (deep learning) \newline
Representation tiers
\subsection{Sentiment analysis}
......
......@@ -22,6 +22,7 @@
\usepackage[bottom]{footmisc} %% footnotes below figures
\usepackage{amssymb} %% pretty empty set
\let\emptyset\varnothing %% pretty empty set
\newcommand{\mytilde}{\raise.17ex\hbox{$\scriptstyle\mathtt{\sim}$}} %% to get ~
\newtheoremstyle{def}%
{}% (space above)
......
......@@ -8,9 +8,12 @@ flash = Node('flash', parent=camera)
audio = Node('audio', parent=camera)
price = Node('price', parent=camera)
shipping = Node('shipping', parent=camera)
lens = Node('lens', parent=camera)
zoom = Node('zoom', parent=lens)
af = Node('af', parent=lens)
reviewables = [camera, image, video, battery, flash, audio, price, shipping]
features = [image, video, battery, flash, audio, price, shipping]
reviewables = [camera, image, video, battery, flash, audio, price, shipping, lens, zoom, af]
features = [image, video, battery, flash, audio, price, shipping, lens, zoom, af]
glossary = {
camera: ['camera', 'device', 'product'],
......@@ -20,5 +23,6 @@ battery: ['battery'],
flash: ['flash'],
audio: ['audio', 'sound'],
price: ['price', 'value', 'cost'],
shipping: ['ship']
shipping: ['ship'],
}
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment