Commit f0285f85 authored by  Joel  Oksanen's avatar Joel Oksanen

Finished ontology chapter

parent 82eae0c4
......@@ -405,18 +405,13 @@ When it comes to Amazon review texts, the issue with many of these methods is th
Some cross-domain feature extraction methods have been proposed \cite{RefWorks:doc:5e374c55e4b0d3ee568e80d6}, but these still perform significantly worse than the single-domain methods discussed above. Furthermore, none of these methods take advantage of the semantic relationships between the product and its features, but look for opinion targets in the absence of any existing product information.
\subsubsection{ConceptNet and WordNet}
\subsubsection{Common-sense knowledge bases}
Semantic information can be obtained from \textit{ConceptNet} \cite{RefWorks:doc:5e382bf3e4b0034ec2324aed}, which is a common-sense knowledge graph connecting words and phrases with labelled and weighted edges expressing semantic relations between the words. As our goal is to obtain features of a product, the relations we are most interested in are:
\begin{itemize}
\item \textit{CapableOf} for capabilities of products;
\item \textit{HasA} for parts of products;
\item \textit{MadeOf} for materials of products;
\item \textit{UsedFor} for uses of products.
\end{itemize}
For example, the term \textit{electric toothbrush}\footnote{http://conceptnet.io/c/en/electric\_toothbrush} is related by the \textit{UsedFor} relation to \textit{cleanliness} and by the \textit{CapableOf} relation to \textit{run on batteries}. However, many reviews for electronic toothbrushes comment on the product \textit{model} or its \textit{timer} functionality, neither of which are terms related to electronic toothbrushes on ConceptNet.
Semantic information can be obtained from \textit{ConceptNet} \cite{RefWorks:doc:5e382bf3e4b0034ec2324aed}, which is a \textit{common-sense knowledge graph} connecting words and phrases with labelled and weighted edges expressing semantic relations between the words. As our goal is to obtain features of a product, we are most interested in the \textit{HasA} relation. For example, the term \textit{watch}\footnote{http://conceptnet.io/c/en/watch} is related by the \textit{HasA} relation to \textit{a hand that counts minutes}. However, many reviews for watches comment on the watch's \textit{face} or its \textit{band}, neither of which are terms related to watches on ConceptNet. Furthermore, the knowledge on ConceptNet is loosely structured as natural language descriptions, whereas ADA requires more structured information: for a \textit{watch}, we require the feature \textit{hand} instead of \textit{a hand that counts minutes}.
Some methods have been proposed for \textit{automatic common-sense completion}, which aims to structure the information in common-sense knowledge graphs such as ConceptNet. \textit{Common-sense Transformers} (COMeT) proposed by Bosselut et al.\ \cite{RefWorks:doc:5edf7951e4b0846eecf1d89f} is one such state-of-the-art method, which uses the Transformer architecture described in Section \ref{sec:BERT} to generate structured knowledge based on ConceptNet. However, even COMeT appears incomplete in terms of the \textit{HasA} relation, as it is missing many of the same features of the term \textit{watch} \footnote{https://mosaickg.apps.allenai.org/comet\_conceptnet/?l=watch\&r=HasA} as ConceptNet.
\textit{WordNet} \cite{RefWorks:doc:5ed503e0e4b081759f6d02e2}, a large manually constructed lexical database, is another popular source for semantic information between words. Although WordNet includes fewer relations than ConceptNet, it includes the relation of \textit{meronymy}, which denotes a word being a constituent part of another word, similar to the \textit{HasA} and \textit{MadeOf} relations in ConceptNet. For example, the term \textit{electric toothbrush}\footnote{http://wordnetweb.princeton.edu} has a single meronym of \textit{electric motor}. As with ConceptNet, the list of relations on WordNet is incomplete, as it fails to include relevant meronyms such as the timer.
\textit{WordNet} \cite{RefWorks:doc:5ed503e0e4b081759f6d02e2}, a large manually constructed lexical database, is another popular source for common-sense lexical information. Although WordNet includes fewer relations than ConceptNet, it includes the relation of \textit{meronymy}, which denotes a word being a constituent part of another word, similar to the \textit{HasA} and \textit{MadeOf} relations in ConceptNet. For example, the term \textit{electric toothbrush}\footnote{http://wordnetweb.princeton.edu} has a single meronym of \textit{electric motor}. As with ConceptNet, the list of relations on WordNet is incomplete, as it fails to include relevant meronyms such as the timer.
Due to the low recall of ConceptNet and WordNet in obtaining product features, our approach to target extraction must take advantage of the review texts as a source of information about the product. However, we can evaluate our implementation against ConceptNet and WordNet, or possibly take a hybrid approach to feature extraction where features are mined from review texts to complement the existing semantic information on ConceptNet and WordNet.
......@@ -431,6 +426,7 @@ More advanced methods using deep learning have been proposed in literature, alth
However, both methods were trained and tested in the same domain including tweets about celebrities, companies and consumer electronics. The performance would likely drop substantially in a separate domain, as the sentiment polarity of a word can be highly dependent on context: for example the adjective \textit{hard} has a positive connotation when describing a protective case, but a negative connotation when describing an armchair.
\section{BERT}
\label{sec:BERT}
Both of the two NLP tasks relevant to ADA, feature extraction and feature-dependent sentiment analysis, become more difficult to perform in a general domain, which is crucial to applying ADA for all Amazon products. BERT \cite{RefWorks:doc:5e8dcdc3e4b0dba02bfdfa80}, which stands for \textit{Bidirectional Encoder Representations from Transformers}, is a state-of-the-art language representation model, which uses pre-training to learn language representations from large amounts of unlabelled text, which can then be fine-tuned for more specific NLP tasks. Because the unlabelled text used to pre-train BERT comes from a general domain corpus such as Wikipedia, the knowledge in a pre-trained BERT model is especially well-suited for domain-independent NLP tasks such as ours. This section will provide a brief overview of the BERT architecture and the underlying \textit{Transformer network}, shown in Figure \ref{fig:BERTarchitecture}. The paper presents two model sizes for BERT, BERT-base and BERT-large, which differ in terms of the size of various hyperparameters. In this paper, we will use the former due to memory constraints.
......
@article{RefWorks:doc:5edfe5c1e4b064c22cd56d15,
author={Mary L. McHugh},
year={2012},
title={Interrater reliability: the kappa statistic},
journal={Biochemia medica: Biochemia medica},
volume={22},
number={3},
pages={276-282}
}
@inproceedings{RefWorks:doc:5edca760e4b0ef3565a5f38d,
author={Tomas Mikolov and Ilya Sutskever and Kai Chen and Greg S. Corrado and Jeff Dean},
year={2013},
......@@ -70,11 +79,11 @@
isbn = {1532-0464},
doi={https://doi.org/10.1016/j.jbi.2020.103384}
}
@misc{RefWorks:doc:5eb97f10e4b084c78e199bc3,
author = {Antoine Bosselut and Hannah Rashkin and Maarten Sap and Chaitanya Malaviya and Asli Celikyilmaz and Yejin Choi},
year = {2019},
title = {Comet: Commonsense transformers for automatic knowledge graph construction},
journal = {arXiv preprint arXiv:1906.05317}
@article{RefWorks:doc:5edf7951e4b0846eecf1d89f,
author={Antoine Bosselut and Hannah Rashkin and Maarten Sap and Chaitanya Malaviya and Asli Celikyilmaz and Yejin Choi},
year={2019},
title={Comet: Commonsense transformers for automatic knowledge graph construction},
journal={arXiv preprint arXiv:1906.05317}
}
@misc{RefWorks:doc:5eb97a3ae4b04ec536ff3ba1,
author = {S. Huang and X. Liu and X. Peng and Z. Niu},
......
......@@ -142,6 +142,7 @@ The first step of our ontology extraction method is to extract the most commonly
The review data is divided into review texts, many of which are multiple sentences long, so we first split the texts into sentences. In this paper, we will treat each sentence as an individual unit of information, independent from other sentences in the same review text. We will then tokenise the sentences, and use an out-of-the-box implementation of a method by Mikolov et al.\ \cite{RefWorks:doc:5edca760e4b0ef3565a5f38d} to join common co-occurrences of tokens into bigrams and trigrams. This step is crucial in order to detect multi-word nouns such as \textit{operating system}, which is an important feature of \textit{computer}. After this, we use a part-of-speech tagger to select the nouns within the tokens, and count the number of occurrences for each of the nouns. Finally, as for the annotation method detailed in Section \ref{sec:annotation}, we select the 200 most common nouns and pass them onto the feature extraction step.
\subsection{Feature extraction}
\label{sec:feature_extraction}
For the feature extraction step, we obtain review sentences that mention exactly one of the nouns obtained in the previous step, and pass the sentences through a BERT-based classifier to obtain votes for whether the noun is an argument or not. In the end, we aggregate these votes for each of the nouns to obtain a list of extracted arguments.
......@@ -168,21 +169,21 @@ Reviewers can refer to the same argument using many different terms; for example
However, since the terms are interchangeable within the review texts, we can once again utilise the context of the words to group words with similar contexts into synsets. In order to compare the contexts of words, we must obtain context-based representations for them. One such representation is called a \textit{word embedding}, which is a high-dimensional vector in a vector space where similar words are close to each other. We can obtain review-domain word embeddings by training a \textit{Word2Vec} model on the review texts. The Word2Vec model learns the word embeddings by attempting to predict each word in the text corpus from a window of surrounding words.
We use a relatively small window of 7 words, exemplified by the following two review sentences where the window is underlined for the terms \textit{laptop} and \textit{product}:
We use a relatively small window of 4 words, exemplified by the following two review sentences where the window is underlined for the terms \textit{laptop} and \textit{product}:
\begin{center}
\textit{I \underline{would recommend this \textbf{laptop} to my friends}, although the keyboard isn't perfect}
\textit{\underline{I would recommend this \textbf{laptop} to my friends, although} the keyboard isn't perfect}
and
\textit{I \underline{would recommend this \textbf{product} to my friends}, as it is the best purchase I've ever made.}
\textit{\underline{I would recommend this \textbf{product} to my friends, as} it is the best purchase I've ever made.}
\end{center}
The windows for \textit{laptop} and \textit{product} are identical, which means that their word embeddings will be similar. The small window ensures that the focus is on the interchangeability of the words, rather than their relatedness on larger scale. As the above two sentences illustrate, the terms \textit{laptop} and \textit{product} might be used in slightly different contexts on a larger scale, but their meaning, which is expressed in the nearby text, stays the same. Furthermore, the small window size prevents sibling arguments from being grouped together based on their association with their parent argument, as exemplified in these two review texts:
\begin{center}
\textit{I like this lens because \underline{of the convenient \textbf{zoom} functionality which works} like a dream}
\textit{I like this lens \underline{because of the convenient \textbf{zoom} functionality which works like} a dream}
and
\textit{I like this lens because the \underline{quality of its \textbf{glass} takes such clear} pictures.}
\textit{I like this lens because \underline{the quality of its \textbf{glass} takes such clear pictures}.}
\end{center}
Although both \textit{zoom} and \textit{glass} are mentioned in association with their parent argument \textit{lens}, their nearby contexts are very different.
......@@ -190,11 +191,12 @@ Once we have obtained the word embeddings, we can use the \textit{relative cosin
$$rcs_n(w_i,w_j) = \frac{cosine\_similarity(w_i,w_j)}{\sum_{w_c \in TOP_n}cosine\_similarity(w_i,w_c)},$$
where $TOP_n$ is a set of the $n$ most similar words to $w_i$. In this paper, we use $n=10$. If $rcs_{10}(w_i,w_j) > 0.10$, $w_i$ is more similar to $w_j$ than an arbitrary similar word from $TOP_{10}$, which was shown in \cite{RefWorks:doc:5eaebe76e4b098fe9e0217c2} to be a good indicator of synonymy.
Let arguments $a_1$ and $a_2$ be synonyms if $rcs_{10}(a_1,a_2) \geq 0.11$. Then we group the arguments $\mathcal{A}$ into synsets $\mathcal{S}$ where
$$\forall a_1,a_2 \in \mathcal{A}. \ \forall s \in \mathcal{S}. \ rcs_{10}(a_1,a_2)\geq0.11 \wedge a_1 \in s \implies a_2 \in s,$$
given that $$\forall a \in \mathcal{A}. \ \exists s \in \mathcal{S}. \ a \in s.$$
Let arguments $a_1$ and $a_2$ be synonyms if $rcs_{10}(a_1,a_2) + rcs_{10}(a_2,a_1) \geq 0.21$. Then we group the arguments $\mathcal{A}$ into synsets $\mathcal{S}$ where
$$\forall a_1,a_2 \in \mathcal{A}. \ \forall s \in \mathcal{S}. \ rcs_{10}(a_1,a_2) + rcs_{10}(a_2,a_1) \geq 0.21 \wedge a_1 \in s \implies a_2 \in s,$$
and $$\forall a \in \mathcal{A}. \ \exists s \in \mathcal{S}. \ a \in s.$$
\subsection{Ontology extraction}
\label{sec:ontology_extraction}
The synsets obtained in the previous step will form the nodes of the ontology tree. In this step, we will extract the sub-feature relations that will allow us to construct the shape of the tree. In order to do this, we obtain review sentences that mention a word from exactly two synsets, and pass the sentences through a BERT-based classifier to obtain votes for whether the arguments are related, and if they are, which of the arguments is feature of the other. In the end, we aggregate these votes within each of the synsets to obtain a relatedness measure between each of the synset pairs, which we use to construct the ontology.
......@@ -229,19 +231,19 @@ Using this formula, we define the \textit{relation matrix}
$$R = V \mathbin{/} \textbf{c},$$
where $\textbf{c}$ is a vector containing the counts $c_i$ for each $s_i \in S$.
We know that the product itself forms the root of the ontology tree, so we do not have to consider the product synset being a sub-feature of another synset. For each of the remaining synsets $s_i$, we calculate its super-feature $\hat{s}_i$ using row $r_i$ of the relation matrix, which contains the relatedness scores from $s_i$ to the other synsets. For example, the row corresponding to the synset of \textit{zoom} could be as follows:
We know that the product itself forms the root of the ontology tree, so we do not have to consider the product synset being a sub-feature of another synset. For each of the remaining synsets $s_i$, we calculate its super-feature $\hat{s}_i$ using row $r_i$ of the relation matrix, which contains the relatedness scores from $s_i$ to the other synsets. For example, the row corresponding to the synset of \textit{numbers} for the product \textit{watch} could be as follows:
\begin{center}
{\renewcommand{\arraystretch}{1.2}
\begin{tabular}{|c|c|c|c|c|c|}
\hline
camera & lens & battery & screen & zoom & quality \\
watch & band & dial & battery & numbers & quality \\
\hline
0.120 & 0.144 & 0.021 & 0.041 & - & 0.037 \\
\hline
\end{tabular}
}
\end{center}
Clearly, \textit{zoom} appears to be a feature of \textit{lens}, as the relatedness score for \textit{lens} is higher than for any other feature. Also the relatedness score for the product \textit{camera} is high, as is expected for any feature since any descendant of a product in the ontology is considered its sub-feature, as defined in Section \ref{sec:annotation}. Based on experimentation, we define $\hat{s_i}=s_j$ where $j = argmax(r_i)$, although other heuristics could work here as well.
Clearly, \textit{numbers} appears to be a feature of \textit{dial}, as the relatedness score for \textit{dial} is higher than for any other feature. Also the relatedness score for the product \textit{watch} is high, as is expected for any feature since any descendant of a product in the ontology is considered its sub-feature, as defined in Section \ref{sec:annotation}. Based on experimentation, we define $\hat{s_i}=s_j$ where $j = argmax(r_i)$, although other heuristics could work here as well.
Using the super-feature relations, we build the ontology tree from the root down with the function shown in pseudocode in Figure \ref{fig:gettree}.
......@@ -279,15 +281,152 @@ def get_tree(R, synsets):
\section{Evaluation}
In this section, we evaluate our ontology extraction method using human annotators both independently and against ontologies extracted using ConceptNet and WordNet. Furthermore, we independently evaluate the generalisation of the masked BERT method by experimenting with the number of the product categories used for its training.
In this section, we evaluate our ontology extraction method using human annotators both independently and against ontologies extracted using WordNet and COMeT. Furthermore, we independently evaluate the generalisation of the masked BERT method by experimenting with the number of the product categories used for its training.
\subsection{Ontology evaluation}
We evaluate five ontologies extracted for a variety of randomly selected products which were not included in the training data for the classifier: \textit{watches}, \textit{televisions}, \textit{necklaces}, \textit{stand mixers}, and \textit{video games}. For each product, we use 100,000 review texts as input to the ontology extractor, except for \textit{stand mixer}, for which we could only obtain 28,768 review texts. The full ontologies extracted for each of the products are included in Appendix \ref{sec:ontology_appendix}.
We evaluate five ontologies extracted for a variety of randomly selected products which were not included in the training data for the classifier: \textit{watches}, \textit{televisions}, \textit{necklaces}, \textit{stand mixers}, and \textit{video games}. For each product, we use 200,000 review texts as input to the ontology extractor, except for \textit{stand mixer}, for which we could only obtain 28,768 review texts due to it being a more niche category.
We also extract ontologies for the five products from ConceptNet and WordNet for comparison. For ConceptNet, we observe
We also extract ontologies for the five products from WordNet\footnote{http://wordnetweb.princeton.edu/perl/webwn} and COMeT\footnote{https://mosaickg.apps.allenai.org/comet\_conceptnet} for comparison. For WordNet, we build the ontology top-down starting from the product term. A term $t_f$ is considered a feature of $t_p$, if $t_f$ is a meronym of either $t_p$ or one of its direct \textit{hyponyms} (specialisations of $t_p$, for example \textit{camera} and \textit{digital camera}). Using the web interface for COMeT, we are only able to obtain five of the most related terms with the \textit{HasA} relation, which means that the ontologies extracted for COMeT are not complete. However, we still include them in the precision comparison. The number of relations in each extracted ontology are shown in Table \ref{tab:ontology_counts}, while the full ontologies are included in Appendix \ref{sec:ontology_appendix}.
\begin{table}[H]
\centering
\begin{tabular}{|c||c|c|c|c|c|c|}
\hline
& watch & television & necklace & stand mixer & video game & total \\
\hline \hline
Our method & 26 & 22 & 20 & 17 & 7 & 92 \\
\hline
WordNet & 6 & 7 & 1 & 6 & 0 & 20 \\
\hline
COMeT & 5 & 5 & 5 & 5 & 5 & 25 \\
\hline
\end{tabular}
\caption{Number of extracted relations for the three ontology extraction methods}
\label{tab:ontology_counts}
\end{table}
Since it is difficult to define a 'complete' ontology for a product, we concentrate our quantitative evaluation on the precision of the extracted ontologies. We will measure the precision of an ontology by the aggregated precision of its individual relations, which we will obtain by human annotation.
We present each of the 137 \textit{has feature} relations in the ontologies to 3 human annotators, and ask them to annotate the relation as either true of false in the context of Amazon products. The context is important, as features such as \textit{price} might not otherwise be considered a feature of a product. Using the majority vote among the annotators for each of the relations, we calculate the precision for each of the three methods and five products, and present the results in Table \ref{tab:ontology_precision} along with the total precision calculated for all 137 relations.
\begin{table}[H]
\centering
\begin{tabular}{|c||c|c|c|c|c|c|}
\hline
& watch & television & necklace & stand mixer & video game & total \\
\hline \hline
Our method & 0.885 & 0.864 & 0.700 & 0.882 & 1.000 & 0.848 \\
\hline
WordNet & 1.000 & 1.000 & 1.000 & 0.833 & - & 0.950 \\
\hline
COMeT & 0.600 & 0.400 & 0.600 & 0.400 & 0.200 & 0.440 \\
\hline
\end{tabular}
\caption{Precision scores for the three ontology extraction methods}
\label{tab:ontology_precision}
\end{table}
Our method achieves a total precision of 0.848, which is comparable to the in-domain validation accuracies of the entity and relation extractors (0.897 and 0.834, respectively). We note that the precision for the stand mixer ontology is equivalent to the rest of the ontologies despite using less data, which suggests that our method is effective even for products with relatively little review data.
WordNet obtains the highest total precision score of 0.95, which is expected since its knowledge has been manually annotated by human annotators. However, WordNet extracted on average only 4 relations for each ontology, while our method extracted on average 18.4 relations. Part of this could be due its outdatedness, as its last release was nine years ago in June 2011\footnote{https://wordnet.princeton.edu/news-0}, although many of the products included in the comparison are quite timeless (\textit{necklace}, \textit{watch}). Furthermore, we observe that many of the terms extracted from WordNet, although correct, are scientific rather than common-sense (\textit{electron gun}, \textit{field magnet}), and therefore unsuitable for use in the Amazon review context.
The precision of our method is almost twice as good as the precision of the top five terms extracted by COMeT. Most of the erroneous relations for COMeT are either remnants of the unstructured information on ConceptNet (\textit{game–effect of make you laugh}), or incorrectly categorised relations (\textit{watch–hand and wrist}).
In order to assess the reliability of agreement between the annotators, we calculate the \textit{Fleiss' kappa} measure $\kappa$, which calculates the degree of agreement over the degree expected by chance. The value of $\kappa$ ranges from $-1$ to $1$, with value $1$ signalling total agreement and $-1$ total disagreement. The kappa measure is generally thought of being a more reliable measure of inter-rater reliability than simple percent agreement, as it takes into account the probability of agreement by chance. We obtain $\kappa = 0.417$, which in a well-known study of the coefficient \cite{RefWorks:doc:5edfe5c1e4b064c22cd56d15} was interpreted to signify a weak level of agreement. This suggests that accurately determining \textit{feature of}-relations is difficult even for humans, which validates the high precision score obtained by our method.
\subsection{Generalisation evaluation}
In this section, we evaluate the ability of our masked BERT method to generalise for the whole domain of Amazon products. In order to do this, we train the entity and relation classifiers with five different datasets $t_1 \dots t_5$ including review instances for one to five products as shown in Table \ref{tab:dataset_products}. We evaluate the models using an unseen dataset $w_e$, which we have labelled for a sixth domain (watches). In addition, we train entity and relation classifiers on a separate in-domain dataset $w_t$, which can be evaluated with $w_e$ to obtain a in-domain score. Each of the datasets contains 50,000 instances, and all models were trained with the hyperparameter values used in Sections \ref{sec:feature_extraction} and \ref{sec:ontology_extraction}.
\begin{table}[H]
\centering
\begin{tabular}{|c||c|}
\hline
Dataset & Products included \\
\hline \hline
$t_1$ & cameras \\
\hline
$t_2$ & cameras, backpacks \\
\hline
$t_3$ & cameras, backpacks, laptops \\
\hline
$t_4$ & cameras, backpacks, laptops, acoustic guitars \\
\hline
$t_5$ & cameras, backpacks, laptops, acoustic guitars, cardigans \\
\hline
$w_e$ & watches \\
\hline
$w_t$ & watches \\
\hline
\end{tabular}
\caption{Products included in each of the datasets}
\label{tab:dataset_products}
\end{table}
The accuracies for each of the classifiers trained on the five datasets $t_1 \dots t_5$ are plotted in Figure \ref{fig:n_accuracies}. The in-domain accuracies obtained by the classifiers trained using the dataset $w_t$ are plotted as dashed lines. The accuracies for both entity and relation extraction increase significantly when trained with reviews for two products instead of just one, after which the accuracies appear to stay somewhat constant around 0.05 units below the in-domain accuracies. The initial increase of accuracy with number of training products is expected, since a product-specific dataset will encourage the classifier to learn product-specific features. However, it is surprising to note that training the classifier with just two products (\textit{camera} and \textit{backpack}) is enough to raise its accuracy to its domain-independent optimum.
It appears that the domain-specific classifier has an advantage of around 0.05 units over the domain-independent classifier. This can be attributed to various domain-specific features the classifier can learn to take advantage of, such as domain-specific adjectives like \textit{swiss} or \textit{waterproof} for \textit{watch}.\footnote{It is interesting to note that the domain-independent optimum lies approximately halfway between the initial accuracy and the domain-specific accuracy. When the classifier is trained on several products, it 'forgets' its domain-specific knowledge, which results in worse accuracy in its own domain but better accuracy in the unseen domain, as its knowledge becomes more general. It makes intuitive sense that the point of context-independence lies in between the two context-specific opposites.}
\begin{figure}[H]
\centering
\begin{tikzpicture}
\begin{axis}[
xlabel={Training dataset $t_n$},
ylabel={Evaluation accuracy on $w_e$},
xmin=1, xmax=5,
ymin=0.7, ymax=1.0,
xtick={1,2,3,4,5},
ytick={0.7,0.75,0.8,0.85,0.9,0.95,1.0},
legend pos=north west,
ymajorgrids=true,
grid style=dashed,
]
\addplot[
color=blue!40!gray,
mark=triangle*,
]
coordinates {
(1,0.8051)(2,0.8426)(3,0.8333)(4,0.8389)(5,0.8435)
};
\addplot[
color=orange,
mark=diamond*,
]
coordinates {
(1,0.7159)(2,0.7381)(3,0.7375)(4,0.7428)(4,0.7428)(5,0.7352)
};
\addplot [
line width=0.2mm,
densely dashed,
domain=1:5,
samples=100,
color=blue!40!gray,
]
{0.9067};
\addplot [
line width=0.2mm,
densely dashed,
domain=1:5,
samples=100,
color=orange,
]
{0.8046};
\legend{Entity extraction, Relation extraction}
\end{axis}
\end{tikzpicture}
\caption{Accuracies for masked BERT models trained with different numbers of products}
\label{fig:n_accuracies}
\end{figure}
......@@ -19,6 +19,8 @@
\usepackage{multirow}
\usepackage{pgfplots}
\usepackage{listings}
\lstset{basicstyle=\ttfamily\footnotesize,breaklines=true}
\renewcommand{\figurename}{Listing}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment