Commit 05a8299a authored by  Joel  Oksanen's avatar Joel Oksanen
Browse files

Interim report last minute changes.

parent d8ea8871
......@@ -4,11 +4,11 @@ Our extensions to ADA involve both quantitative and qualitative aspects, such as
\section{Quantitative assessment}
We will evaluate our sentiment analysis implementation both individually and together with our feature extraction implementation as part of ADA.
We will evaluate our sentiment analysis implementation both individually and as part of the entire pipeline of ADA.
The two sentiment analysis implementations can be evaluated on their own using hand labelled data sets, such as a freely available one from \cite{RefWorks:doc:5e2e107ce4b0bc4691206e2e} for target-dependent Twitter sentiment classification. These results can be compared to each other, and with baseline results in existing target-dependent sentiment analysis papers, discussed in section \ref{sec:sa}. We could possibly also label our own dataset of Amazon reviews, with which we could also test the feature extraction. Then, of course, we face the challenge of what constitutes a feature of a product.
The two sentiment analysis implementations can be evaluated on their own using hand labelled data sets, such as a freely available one from \cite{RefWorks:doc:5e2e107ce4b0bc4691206e2e} for target-dependent Twitter sentiment classification. These results can be compared to each other, and with baseline results in existing target-dependent sentiment analysis papers, discussed in section \ref{sec:sa}. Alternatively, w could label our own dataset of Amazon reviews, with which we could also test the feature extraction. In that case, we face the challenge of what constitutes a feature of a product.
We also evaluate our sentiment analysis and feature extraction implementations as part of ADA. This is possible by comparing the dialectical strength measure for a product to the product's aggregated user rating by calculating their Pearson correlation coefficient (PCC). The intuition is that a closer correspondence between these two figures implies a better accuracy of the agent's semantic understanding. PCC scores for particular Amazon product domains can be compared with the PCC score for a wide domain of products, in order to determine the generality of our method. We can evaluate the performance of the ADA extensions by comparing our PCC score to the one achieved in \cite{RefWorks:doc:5e08939de4b0912a82c3d46c} for Rotten Tomatoes: a somewhat similar score would constitute success as the domain is more challenging. Furthermore, we can possibly evaluate our ADA on the same Rotten Tomatoes review dataset, in order to see how well it generalises to different settings.
We also evaluate our sentiment analysis and feature extraction implementations as part of ADA. This is possible by comparing the dialectical strength measure for a product to the product's aggregated user rating by calculating their Pearson correlation coefficient (PCC). The intuition is that a closer correlation between these two figures implies a better accuracy of the agent's semantic understanding. PCC scores for particular Amazon product domains can be compared with the PCC score for a wide domain of products, in order to determine the generality of our method. We can evaluate the performance of the ADA extensions by comparing our PCC score to the scores achieved in \cite{RefWorks:doc:5e08939de4b0912a82c3d46c} for Rotten Tomatoes: a somewhat similar score would constitute success as this domain is more challenging. Furthermore, we can possibly evaluate our ADA on the same Rotten Tomatoes review dataset, in order to further test how well it generalises to different settings.
The results from the individual evaluation of the feature-dependent sentiment analysis provide information on how well the ADA can distinguish between the different aspects of a product. This contrasts with the overall evaluation of the ADA, which only tells us if it understands the general sentiment towards a product. Quantitative evaluation of the ADA's feature-level understanding was not performed in \cite{RefWorks:doc:5e08939de4b0912a82c3d46c}, so it will be interesting to evaluate.
......
......@@ -4,17 +4,24 @@ In this chapter, we will first discuss the motivations behind the project and th
\section{Motivations}
People spend an ever growing share of their earnings online, from purchasing daily necessities on e-commerce sites such as Amazon\footnote{https://www.amazon.com/} to streaming movies on services such as Netflix\footnote{https://www.netflix.com/}. As the market shifts online, people's purchase decisions are increasingly based on product reviews either accompanying the products on their e-commerce sites, or on specialised review websites, such as Rotten Tomatoes\footnote{https://www.rottentomatoes.com/} for movies. These reviews can be written by fellow consumers who have purchased the product or by professional critics, such as in the latter example, but what unites most online review platforms is the massive number of individual reviews: a particular type of electronic toothbrush can have more than 10,000 reviews on Amazon\footnote{https://www.amazon.com/Philips-Sonicare-Electric-Rechargeable-Toothbrush/dp/B00QZ67ODE/}. As people cannot possibly go through all of the individual reviews, purchase decisions are often based on various kinds of review aggregations. The presentation of a review aggregation must be concise and intuitive in order to be effective, but it must also retain some nuances of the original reviews, so that consumers can understand \textit{why} a product is considered good or bad and if the arguments align with their individual preferences. \par
People spend an ever growing share of their earnings online, from purchasing daily necessities on e-commerce sites such as Amazon\footnote{https://www.amazon.com/} to streaming movies on services such as Netflix\footnote{https://www.netflix.com/}. As the market shifts online, people's purchase decisions are increasingly based on product reviews either accompanying the products on their e-commerce sites, or on specialised review websites, such as Rotten Tomatoes\footnote{https://www.rottentomatoes.com/} for movies. These reviews can be written by fellow consumers who have purchased the product or by professional critics, such as in the latter example, but what unites most online review platforms is the massive number of individual reviews: a particular type of electronic toothbrush can have more than 10,000 reviews on Amazon\footnote{https://www.amazon.com/Philips-Sonicare-Electric-Rechargeable-Toothbrush/dp/B00QZ67ODE/}. As people cannot possibly go through all of the individual reviews, purchase decisions are often based on various kinds of review aggregations. The presentation of a review aggregation must be concise and intuitive in order to be effective, but a good review aggregation will also retain some nuances of the original reviews, so that consumers can understand \textit{why} a product is considered good or bad, and if the reviewers' arguments align with their individual preferences.
Perhaps the most well-known review aggregation method is a product's average star rating out of five stars. Although this metric is simple to both implement and understand, it completely ignores the information in the accompanying review texts. To illustrate, consider this three-star Amazon review for the aforementioned toothbrush:
\begin{center}
\textit{The product is great but the packaging literally ruins it to the point that I can never buy it again. The packaging was so ridiculous and convoluted that it took me 35 minutes to get the toothbrush out and use it.}
\end{center}
\noindent
Only the three-star rating of the above review would count towards the review aggregation, while the user clearly liked the product itself, but disliked its packaging. If a potential buyer would be able to discern this, they could decide for themselves whether good packaging of the product is important to them. Amazon provides users a way to give additional star ratings on a limited number of the product's features (in this case, packaging is not one of them), but users might not be willing to take their time to repeat what they have already written down in textual form.
Clear explanations of review aggregations can also be used to improve e-commerce site recommender systems, as it has been shown that explanations can help to improve the overall acceptance of a recommender system \cite{RefWorks:doc:5e2f3970e4b0241a7d69e2a4}, and recommendations are often largely based on review aggregations such as average user ratings.
\section{Objectives}
There have already been some attempts to improve explanations for review aggregations, some of which are discussed in Chapter 2. One such attempt is what is called an Argumentative Dialogical Agent (ADA), proposed by Cocarascu et al.\ \cite{RefWorks:doc:5e08939de4b0912a82c3d46c} and implemented for the Rotten Tomatoes and Trip Advisor\footnote{https://www.tripadvisor.com/} platforms \cite{RefWorks:doc:5e0de20ee4b055d63d355913}. The goal of this project is to extend upon the work of Cocarascu et al. in order to design and implement a more generalised ADA to provide explanations for Amazon product reviews. The main objectives for the extended agent are threefold:
There have already been some attempts to improve explanations for review aggregations, some of which are discussed in Chapter 2. One such attempt is what is called an Argumentative Dialogical Agent (ADA), proposed by Cocarascu et al.\ \cite{RefWorks:doc:5e08939de4b0912a82c3d46c} and implemented for the Rotten Tomatoes and Trip Advisor\footnote{https://www.tripadvisor.com/} platforms \cite{RefWorks:doc:5e0de20ee4b055d63d355913}. The goal of this project is to extend upon the work of Cocarascu et al. in order to design and implement a more generalised ADA to provide explanations for Amazon product reviews. The main objectives for the extended agent are twofold:
\begin{itemize}
\item \textbf{Generalise} the agent to work with a larger variety of different products. Currently ADA has only been implemented for movie and hotel reviews, two highly homogeneous domains in which there is not much variance in key features and review language from one product to another. Implementing ADA for Amazon reviews will require more general NLP methods in extracting review aggregations.
\item \textbf{Enhance dialogue} between the user and the agent to support conversational search. Currently, the agent is able to respond to a limited number of questions centred solely around explanations for a single product review aggregation. Given the large amounts of current research into explainable recommender systems and the potential of ADA in this domain, we will extend its dialogue to support conversational search.
\item \textbf{Learn} from user feedback. The enhanced agent should be able to query and implement information and opinions provided by the user to improve its review aggregations and product recommendations.
\item \textbf{Explore user interfaces} for the agent. While a limited argumentative dialogue has been proposed for ADA's review aggregations explanations, interfaces through which a user can partake in this dialogue have not been considered. We will implement two such interfaces, one based on text and one based on speech.
\end{itemize}
In addition to the above, we will implement a conversational user interface for the ADA on the Alexa\footnote{https://www.amazon.com/b?\&node=13727921011/} virtual assistant. The user interface will provide the user with a novel way to obtain explainable product recommendations using voice commands on Alexa-compatible smart speakers.
At the end of this project, we will have a working implementation of an ADA for Amazon reviews, which can be used to obtain dialogical explanations for review aggregations.
......@@ -6,25 +6,25 @@ In this chapter, we will detail our progress so far as well as our plan for the
\item a method for general feature extraction using metadata, NLP methods, and ConceptNet;
\item two methods for feature-dependent sentiment analysis, based on \textit{SVM-dep} by Jiang et al. and \textit{AdaRNN} by Dong et al.;
\item a Botplication textual interface;
\item a voice interface;
\item a speech interface;
\end{itemize}
\section{Progress so far}
So far, I have located Amazon review data\footnote{https://s3.amazonaws.com/amazon-reviews-pds/readme.html} and used it to build a basic version of ADA for camera reviews. This was to gain a further understanding of the ADA pipeline, and to get a baseline which to evaluate the extensions. The basic version uses either an out of the box sentiment classifier\footnote{https://www.nltk.org/\_modules/nltk/sentiment/vader.html} or a Naive Bayes classifier trained on the aforementioned review data. It does not make use of any advanced feature extraction methods, such as ConceptNet. The best Pearson coefficient of 0.621 was achieved using the Naive Bayes classifier, suggesting that machine learning methods on Amazon review data might work best for SA.
So far, I have located Amazon review data\footnote{https://s3.amazonaws.com/amazon-reviews-pds/readme.html} and used it to build a basic version of ADA for camera reviews. This was to gain a further understanding of the ADA pipeline, and to get a baseline with which to evaluate the extensions. The basic version uses either a general out-of-the-box sentiment classifier\footnote{https://www.nltk.org/\_modules/nltk/sentiment/vader.html} or a Naive Bayes classifier trained on the aforementioned review data. It does not make use of any advanced feature extraction methods, such as ConceptNet. The best Pearson coefficient of 0.621 was achieved using the Naive Bayes classifier, suggesting that machine learning methods on Amazon review data might work best for SA.
\section{Project timetable}
For the rest of the project, the timetable will be as follows:
\begin{center}
\begin{tabular}{ |l|p{7cm}|p{5cm}| }
\begin{tabular}{ |l|p{7cm}|p{5.3cm}| }
\hline
Month & Plan & Challenges \\
\hline \hline
February & Design and implement a Botplication UI & Lectures and coursework \\
\hline
March & Design feature extraction & Exams and school applications\\
March & Design feature extraction & Exams and art school applications\\
\hline
April & Implement feature extraction and SVM-dep SA & Working from home\\
\hline
......@@ -35,7 +35,7 @@ For the rest of the project, the timetable will be as follows:
\end{tabular}
\end{center}
I will be quite busy with lectures, coursework, school applications, and exams until the end of the spring term. However, from there on I will be able to give my full attention to the project, although I will be working from back home in April. At the end of April, I should have a working implementation of an ADA for Amazon reviews, with a text-based UI and general sentiment analysis. In May, I will focus on alternative implementations for the UI and sentiment analysis, leaving time in June for evaluation and finishing up the report.
I will be quite busy with lectures, coursework, applications, and exams until the end of the spring term. However, from there on I will be able to give my full attention to the project, although I will be working from back home in April. At the end of April, I should have a working implementation of an ADA for Amazon reviews, with a text-based UI and general sentiment analysis using SVM-dep. In May, I will focus on alternative implementations for the UI and sentiment analysis, leaving time in June for evaluation and finishing up the report.
\section{Possible extensions}
......@@ -43,7 +43,7 @@ If time permits, the following extensions could be added:
\begin{itemize}
\item Currently, ADA responds to the user with predetermined template responses. Particularly with the voice interface, more diverse and personalised responses as in \cite{RefWorks:doc:5e2b0ea0e4b01fdb376c81ac} could make for more realistic conversations and improve user satisfaction.
\item ADA could learn about the products directly from user feedback, as proposed in \cite{RefWorks:doc:5e31af55e4b017f1b5fb8684}. For example, if a user says \textit{this sweater looks awful}, the ADA could assing a negative vote for the feature \textit{look} of the sweater.
\item ADA could learn about the products directly from user feedback, as proposed in \cite{RefWorks:doc:5e31af55e4b017f1b5fb8684}. For example, if a user says \textit{this sweater looks awful}, the ADA could assign a negative vote for the feature \textit{look} of the sweater.
\item ADA could also be integrated within a recommender system, to evaluate its performance in a highly useful domain.
\end{itemize}
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment