Automatic Business Attribute Labeling from Yelp …...Automatic Business Attribute Labeling from...
Transcript of Automatic Business Attribute Labeling from Yelp …...Automatic Business Attribute Labeling from...
Automatic Business Attribute Labeling from Yelp Reviews
Ephraim D. Rosenfeld
Courant Institute of Mathematical Sciences - New York University
251 Mercer St. New York, NY 10012
Michael Iannelli City University of New York – The Graduate Center
365 5th Avenue New York, NY 10016
ABSTRACT
In this paper, we present a predictive model capable of assigning
attributes to businesses based on their consumer reviews and
evaluations. We utilized the Yelp Dataset Challenge [1] data
sources containing approximately 140K businesses, 4 million
reviews, 1 million tips, and 1 million business-related attribute tags.
We transformed the review data into TF-IDF term-vectors as well
as Word2Vec representations, and input these into binary
classifiers to predict a set of business attributes. Our models
achieved a range of 53-85% accuracy. In this work, we present the
results of our experiments, and we provide some analysis of the
most informative features used by our models.
Keywords
Data Mining, Machine Learning, Information Retrieval
1. INTRODUCTION While researching a business or establishment to patronize,
consumers would benefit from having a set of keywords or profile
attributes that best describe or categorize the business or venue.
This information would help customers make informed decisions
and minimize their search efforts.
Assigning attribute filters to a business is typically a carefully-
curated and laborious process in which reviewers must manually
identify and ascribe these attributes to the target record. In some
cases, such as Yelp, a collective intelligence approach is desirable
as users can review a business and comment upon its qualities and
attributes. This approach has the advantage of mitigating incorrect
or malicious information, while exploiting the experiences and
judgements of a large body of informed consumers.
We present a predictive model capable of assigning attribute labels
to business records. To do so, we describe the problem as a set of
binary classification sub-problems and identify whether a business
bears or lacks a given attribute. We utilize the Yelp Dataset
Challenge resource files which consist of approximately 140K
businesses associated with ~4 million reviews, ~1 million tips, and
~1 million attribute tags. The business attribute lists were parsed
and the data was filtered to create balanced datasets of positive and
negative examples for each attribute. Feature extraction and
classification pipelines were set up using both TFI-DF and
Word2Vec methods.
We assess the goodness of our predictive models in the form of
accuracy scores. Additionally, we provide an analysis of top-
performing features utilized by these models.
2. EXPERIMENTAL SETUP
2.1 Data Preparation
2.1.1 Tools and Libraries We implemented our experiments in Python3. NLTK [1] was used
for text processing such as tokenization and lemmatization. We
utilized the Doc2Vec library in Gensim [2] for our Word2Vec
experiments. Numpy and Scipy were used for numerical
calculations as well as Scikit-Learn [3] for its built-in classifiers
and model evaluation capabilities. We used the gradient boosted
machine model in XGBoost [4]. Additionally, we used Pandas and
the Yelp Dataset Challenge dataset.
2.1.2 TF-IDF Experiments The Yelp dataset consists of categorical profiles describing each
business, as well as user-generated submissions for those
businesses.
Yelp allows its reviewers (referred to as “Yelpers”) to post two
kinds of textual submissions: a) “tips”, which are usually short,
informational posts that Yelpers provide to each other and to
potential customers, such as:
“Cash-only! Their Lyonnaise potatoes are very well seasoned,”
And b) “reviews”, which are lengthier and usually contain a
narrative of a customer’s experience. A citation from a common
Yelper reviewer is as follows:
"Like any other Zoe's, this location has great sandwiches, salads,
kabobs, and more … I like the location -- Birkdale is super
convenient -- but parking can sometimes be a challenge because of
the popularity of this shopping center. Things are tight inside the
restaurant, too, but they're very kid-friendly and don't mind if you
have a stroller with you. Friendly staff, great food, great location.”
The two examples above demonstrate that many objective
attributes used to categorize a business or venue can be inferred
from reviewer information, such as: whether its atmosphere is
child-friendly, if parking is available, and if credit cards are
accepted. Our goal is to build a predictive model that utilizes this
reviewer information to fill or predict many of the business profile
slots provided by the Yelp dataset. The profile attributes provided
by Yelp therefore act as class labels, while the reviewer information
is aggregated and tokenized into word vectors that can be used to
predict a given attribute. For simplicity, we chose attributes bearing
only two possible values (“true” or “false”), and we evaluated each
attribute separately, thereby reducing this effort to a series of binary
classifications.
The Yelp reviews and tips were aggregated and joined to the
business profile dataset to create one long review which was
labelled by the associated business attribute. This aggregated
review approach has been suggested by Cho and Yoon [5]. We
chose to use only those businesses which had a substantial number
of reviews and tips, while excluding those businesses whose
review-count fell below a given threshold. This threshold was
initially chosen at five reviews, but, as described later, we ran more
iterations using different thresholds to see if a smaller number of
well-reviewed businesses would translate into higher accuracy
scores.
These aggregated review entries were fed into Scikit-Learn’s
CountVectorizer and TfidfTransformer APIs to
generate word vectors with TF-IDF scores for each term found in a
business’s reviews [6]. In addition to using TF-IDF scoring, the
reviews were cleaned and normalized by: a) removing English
stop-words, and b) setting a minimum document frequency so that
uncommon terms found in only a small fraction of documents were
ignored. We believe that eliminating highly-uncommon terms was
necessary because user-generated data, such a Yelp reviews, are
replete with anomalous and generally-meaningless tokens, such as
misspellings, numbers, and other textual “noise”. That said, a small
term-frequency threshold (terms found in 0.05% documents or
more) was chosen, which would still allow for a large variety of
terms.
2.1.3 Word2Vec Experiments Word2Vec is a technique that utilizes a shallow neural network to
generate low-dimensional word embeddings based on the context
of the words [7]. The resulting word vectors represent the semantic
content of each word. Paragraph Vector, an extension of
Word2Vec, generalizes the model to documents of arbitrary length
[8].
We implemented a feature extraction method using the paragraph
vector implementation in Gensim’s Doc2Vec library. The method
was used to generate dense feature vectors representing the
semantic context of each business’ reviews. A Doc2Vec model
with a windows size of 8 was trained on a corpus consisting of all
reviews and tips in the dataset producing 200-dimentsional word-
vectors.
Like our TF-IDF feature extraction methodology, the reviews and
tips corresponding to each business were aggregated into one single
long review per business. Each long review was then input into the
Doc2Vec model to infer a 200-dimensional feature vector for later
use in our classifiers.
2.1.4 Data Selection Because the distribution of business attributes is highly skewed (see
Figure 1), we generated balanced datasets consisting of an equal
number of positive and negative examples. A preliminary labelled
dataset of 300 businesses was used to build out a predictive
pipeline. Once the pipeline was complete, a larger balanced dataset
was used. The number of underrepresented values for each attribute
dictated the sizes of the larger balanced data-sets. Therefore, the
size of larger balanced datasets varied by attribute.
2.2 Classification Two classifiers were used in TF-IDF modeling:
Naïve Bayes: a supervised learning algorithm that assumes
independence between each word. We specifically chose the
Multinomial implementation of Naïve Bayes, because it works on
sparse vectors; and, unlike Bernoulli Naïve Bayes, term frequency
is discreet rather than binary. This is especially true for our
aggregated reviews where a single record consists of a
concatenation of multiple reviews, where we want to
Figure 1. A distribution of business attributes. The number of businesses
that accept Bitcoin (151) cannot be seen due to the skewness of the data.
recognize the occurrence of terms as they are found at
the individual-review level.
• Linear SVC (Support Vector Classification): this
implementation of SVM (support vector machine)
learning is generally fast to execute and works well with
sparse, high-dimensional vectors, such as term vectors
For the Word2Vec experiments, we used SVC as well as the
following additional classifiers:
• Logistic Regression (LR): A linear model that outputs the
probability of a categorically-dependent variable. In our
case of binary-classification, if the probability value is
greater than 50%, the example is labelled as true.
• Gradient Boosted Machine (GBM): A boosting algorithm
that uses decision trees as its weak learners
2.3 Analysis Methods We ran a few iterations of the datasets, first using the entire TF-IDF
word vector as the feature list, and then selecting only the top
performing terms/features using a chi-squared test. We exercised
feature selection for the following reasons: a) to reduce the
dimensionality of the vector-space with the hope of improving the
execution-time and accuracy of the model b) to eliminate “noisy”
terms c) as a bi-product, to extract the top-performing features for
further analysis.
For six of the eight target attributes, we ran evaluations for both the
small and large balanced sets. The attributes with the two largest
datasets (“Bike Parking” and “Good for Kids”) introduced
performance constraints on commodity hardware, and were
therefore omitted.
Because the sample datasets consisted of balanced attribute values,
we relied on accuracy scores to measure the effectiveness of our
predictive models. Ten-fold cross validation was used to normalize
the scores of the models.
One advantage to using non-neural learning mechanisms, is that
they are “white-boxes” that allow us to peer into the judgements
-
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
Business Attributes
TRUE
FALSE
and decisions that dictate their results. This insight and analysis is
useful, if not at least reassuring, for humans who are overseeing this
automatic curation process, as it allows them to identify trends and
correlations in the data. We therefore provide top scoring features
in the appendix of this paper, as dictated by TF-IDF, chi-squared
correlation, and Linear SVC. We then suggest some analyses of
these top-scoring features.
3. SYSTEMATIC EVALUATION
3.1 TF-IDF Evaluations The performance evaluations for all iterations on the sample data-
set is available in the appendix. A few observations regarding the
performance of our experiments:
• The performance differed based upon attribute, even
when using the sample datasets in which the sample size
was uniform across all attributes. This is intuitive, as
some features are more straightforward and explicitly-
mentioned in the review information (e.g. ‘Good for
Kids’ and ‘Restaurant Delivers’) whereas others
(‘Wheelchair Accessible’ and ‘Bike Parking’) are more
obscure and therefore more difficult to infer.
• Linear SVC has a higher accuracy than Naïve Bayes. See
Figure 2. for the scores of each model
• While some of the iterations using chi-squared-
correlation feature selection improved the model’s
accuracy, the performance-gain in most instances was
negligible. Furthermore, there is little consistency in the
number of top-K features that should be mandated for
optimal performance across all classifications.
• Each iteration was generally comparable one another in
execution time. As such, time efficiency is not a relevant
factor in choosing between any of these models.
• For all models, the results were better when using the
larger datasets. See Figure 3. for the evaluations of Linear
SVC between the sample and full (i.e. using all data
available) – datasets.
Figure 2. Accuracy scores against the full balanced datasets
Figure 3. Comparing the performance of Linear SVC between
sample and full datasets. For the ‘Accepts Bitcoin’ attribute, the
dataset size is so small that the sample and full datasets are one
and the same.
Figure 4. Accuracy scores for TF-IDF Naïve Bayes using different
review-to-business minimum thresholds
As mentioned, we initially included only businesses with a
minimum of five reviews. We performed two more iterations of
Naïve-Bayes classification, once using a smaller review-per-
business threshold of three reviews; and another time with a higher
threshold of ten reviews. The purpose of this effort was to ascertain
if there was an advantage in modeling off a larger dataset (i.e. with
more businesses that meet the lower review-count threshold) versus
a smaller one with a “richer” or more dense set of reviews per-
business. An argument for choosing a higher review-count
threshold is that, a more accurate model, albeit more limited in its
scope – has greater utility than a system with compromised
performance. Figure 4. depicts these iterations and shows that there
0.5
9
0.7
2
0.6
9
0.7
6
0.8
1
0.6
60.6
0
0.8
3
0.7
5
0.8
0
0.8
5
0.7
2
0.500.550.600.650.700.750.800.850.90
Performance Against Full Datasets
nb
svc
0.50.55
0.60.65
0.70.75
0.80.85
0.9
Linear SVC Performance
Sample Dataset
Full Dataset
0.5
9
0.7
1
0.6
9
0.7
6
0.8
1
0.6
50.5
9
0.7
2 0.6
9
0.7
6
0.8
1
0.6
6
0.7
4 0.6
9
0.7
6
0.8
3
0.6
7
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
Performance with different business-review thresholds
min=3 min=5 min=10
is a slight, although perhaps negligible gain in performance by
increasing the review-per-business threshold.
Another notable observation is the length of the word vectors as the
size of the datasets grow. As mentioned, we specified a minimum
document frequency of 0.05% as we noticed that many of the low
TF-IDF-scoring features were indeed useless textual “noise”. As
the size of the datasets grew, more of these noisy features were
excluded from the feature vector, and the feature list began to level-
off in length. We believe that this desirable, as it eliminates useless
features while reducing the dimensionality of the feature vectors,
thereby reducing the memory footprint of the vector space while
increasing its accuracy and its resilience against outliers. Figure 5.
portrays the accuracy scores of the full data-sets as a bubble chart,
where the x-axis is the accuracy, the y-axis is the length of the
feature vector, and the areas of the bubbles correspond to the
sample sizes of each data-set. We see that once a dataset exceeds a
certain size, the feature-vector lengths begin to level-off and
normalize.
Figure 5. Accuracy scores (x-axis), feature vector lengths (y-axis),
and dataset size (bubble-area) of the full balanced datasets. As the
datasets grow in size, their feature-vector lengths level-off as
uncommon terms are discarded.
3.2 Top-scoring features In the appendix, we provide the following lists of top-scoring
features:
a. The top positive coefficients of Linear SVC
b. The top negative coefficients of Linear SVC
c. The top chi-squared correlated features, both positively
and negatively correlated features.
d. Terms with the highest TF-IDF scores
In general, many of the top-performing coefficients provided by
Linear SVC are straight-forward and intuitive. Several chi-squared
correlated features are also understandable, albeit the polarity of
these features is not readily available. In contrast, the top TF-IDF
features are almost identical for all attribute predictions, and they
seem to consist of common or ubiquitous terms found throughout
Yelp reviews, rather than discriminatory, domain-specific terms.
Examples include words such as “great”, “good”, “food”, and
“service”. As such, we primarily focus on top SVC coefficients and
top chi-squared correlated features in our analysis.
3.2.1 Intuitive Top-scoring Features We mention a few, intuitive examples of top-scoring Linear SVC
coefficients:
• Accepts Credit Card: positive features include “card”,
“online”, and “ordered”, while negative features include
“cash”, “atm”, “debit”, and “plastic”
• Dogs Allowed: positive features include “outside”,
“outdoor”, “pet”, and “dog”, while negative features
include formal indoor settings, such as “mariott” (a
hotel), “lounge”, and “salon”
• Good for Kids: positive features are “kids”, “family”, and
“daughter”, while negative features are terms associated
with nightlife such as “bar”, “hip”, “downtown”, and
“drunk”
• Wheelchair accessible: positive features include “mall”
and “elevator”, while negative features include “stairs”
and “upstairs”
We find comparable results with the chi-squared correlated
features. Terms referring to nightlife and casinos appear high in the
list for the “Good for Kids” business attribute as they are
negatively-correlated to the class label. The word “delivery” and
commonly-delivered food items (i.e. “pizza”) are significant for the
“Restaurant Delivers” attribute. Terms found in the SVC
coefficient for “Accepts Credit Card”, such as “cash” and “order”
also appear in the list of chi-squared correlated features.
3.2.2 Implicit Features Some top-scoring features are not at all intuitive, but they
nevertheless indicate some form of correlation or association with
the target attribute. As an example, the top chi-squared correlated
features for “Dogs Allowed” were almost exclusively terms
associated with Asian cuisine (with the exception of term “hotel”),
such as “sushi”, “rice”, “thai”, and “korean”. This would indicate
that venues that specialize in Asian food and ambience generally
prohibit the presence of pets.
Among the top negative SVC features for the “Accepts Bitcoin”
attribute are first-names, which are manifested in reviews as the
names of business owners or employees. While the data-set for this
attribute was smallest in size and is therefore more prone to “noisy”
features, we can suggest that individual or family-owned “mom-
and-pop” venues whose reviews contain first names, would be less
innovative and ambitious to promote Bitcoin transactions.
Although many terms were shared across top TF-IDF features, the
top terms in the “Restaurant Delivers” attribute consist of items
commonly associated with food deliveries in North America, such
as “pizza”, “chinese”, “rice”, and “sushi”. Almost all low-scoring
TF-IDF features for this attribute (not found in the appendix)
consisted of German terms, specifically those associated with
dining, such as “hefeweizen” (a kind of beer), “abendessen”
(dinner), and “essens” (food). This would indicate that restaurants
located in North America are more likely to deliver than those in
Germany (The Yelp Challenge dataset includes information for:
eight cities in the United States; two cities in Canada; one in
Scotland, and one in Germany).
Several cities and neighborhoods are mentioned in the lists of top-
performing features, and we can surmise that certain regions, either
due to culture, geography, topology, or urban development are
inclined or predisposed towards a given business attribute value.
Below are a few examples:
300
12,864
5,398
18,532
9,010 9,092
-
5,000
10,000
15,000
20,000
25,000
0.55 0.65 0.75 0.85
Accepts Bitcoin Accepts Credit Card
Dogs Allowed Restaurant Delivers
Restaurant Does Take-out Wheelchair Accessible
• Accepts Credit Card: Pittsburgh, PA is scored
negatively
• Dogs Allowed: Scottsdale, AZ and Stuttgart, Germany
are scored positively, whereas (Las) Vegas, NV is
scored negatively
• Wheelchair Accessible: Scottsdale, AZ is scored
positively, whereas Toronto, ON, Yorkville (a
neighborhood of Toronto), and Montreal, QC, are
ranked positively
Although this is an area of research unto itself, one could suggest
that cities with flat terrain and a lower population density (i.e. fewer
stairs or tall buildings) such as Scottsdale - are more conducive to
wheelchair accessibility than hillier, more densely-populated
regions such as Toronto or Montreal. (Note: cities that appear in the
datasets but are often associated with certain types of cuisine, such
as New York and Boston, were omitted from this analysis).
3.3 Word2Vec Evaluations The results from the three classifiers using the full, balanced
datasets ranged from 53-85% accuracy. For all but one attribute,
SVC classification performed best, followed by logistic regression,
while gradient-boosted machine (GBM) came in last place. For the
“Accepts Bitcoin” attribute, GBM had the best performance while
SVC had the worst performance. This discrepancy, however, may
simply be due to the poor availability of data for this attribute.
The results were consistent with the TF-IDF experiments.
Classification of the “Takeout” and “Delivery” attributes
performed well, achieving 85% and 78% percent accuracies,
respectively. This was comparable to TF-IDF’s 85% and 80%
corresponding performance.
When the SVC classifier was used, the Word2Vec-based technique
performed either as well as, or worse than, the TF-IDF
vectorization models. In some instances, the Word2Vec-based
approach had significantly worse performance, such as the
“Accepts Credit Card” attribute, with a drop of 6% in its accuracy
score.
Figure 6: Accuracy Results for three different classifiers using
Word2Vec Document Vectors
4. DISCUSSION The Linear SVC classifications with TF-IDF feature vectorization
achieved accuracy scores between 70-85%, which, if not useful as
standalone predictive models, could be incorporated into an
existing voting algorithm or ensemble-based learner.
We used several models to discover top-performing features, and
we suggested reasons behind the appearance of certain terms, be
they latent or explicit from the data.
The Word2Vec-SVC combination did not perform as well as we
had hoped. It underperformed when compared with the TF-IDF-
based models in all experiments that we ran.
4.1 Future Work
4.1.1 Incorporation of Image Level Data The Yelp Dataset Challenge also contains ~6GB of image level
data. We would like to see if this data can be used to improve our
predictive model. For example, identifying a picture of a margarita
in a Mexican restaurant indicates that the restaurant most likely
serves alcohol and can be labelled as such. Using Yelp image data
to predict business profile attributes has been attempted by
Varshneya et al. [9].
4.1.2 Additional Word2Vec Experiments To truly leverage the power and insight of distributed word
representation, we thought of applying vector arithmetic to assist
users in their business searches. As an example, suppose a
consumer likes all but one attribute of a given venue, e.g. that
smoking is prohibited. (S)he could perform vector arithmetic on
that establishment to discover a new venue with the desired
attribute. The formula would appear as follows (using “smoking-
permitted” as an analogy):
[non-smoking-venue] + [Smoking-permitted] =
[newly-discovered, smoking-permitted venue]
4.1.3 Improvements to TF-IDF Scoring In our experiments, the top-scoring TF-IDF features consisted of
ubiquitous terms that consistently appeared in the feature vectors,
regardless of the attribute we were attempting to predict. As an
improvement, we could adjust the inverse-document frequency
(IDF) scoring mechanism to give less weight to these common
terms with the hope that more informative features would
determine the classification of a sample record.
Only single-word tokens were used to generate the feature vectors.
However, the use of n-grams to improve prediction accuracy is
highly-intuitive, as it could recognize subtleties such as negation
(e.g. “dogs allowed” v. “no dogs allowed”) and compound words
(e.g. “monte carlo” as a single token). Our initial experimentation
with n-gram generation indicated that incorporating bigrams and
trigrams in our feature vectors was computationally expensive,
memory intensive, and ineffective at boosting the accuracy of our
models. Although some valid bigrams were detected (e.g. “las
vegas”), many nonsensical n-grams were generated as well. We still
believe that the incorporation of valid n-grams into our models
could substantially improve their respective accuracies; and our
intuition is that, with an increase in sample size, “noisy” n-grams
will be ignored while useful ones will be maintained.
We saw a slight improvement in accuracy when we selected only
businesses with a substantial number of reviews. We envision
further experiments using a higher, more extreme review-count-to-
business threshold, e.g. 20 or 30, to see if this results in a dramatic
improvement in accuracy. If such an experiment is to be performed,
we would advise that the minimum-document frequency threshold
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
Word2Vec Classification Results
SVC Logistic Regression Gradient Boosted Trees
used to eliminate uncommon terms – should be increased
accordingly as the sample-set decreases in size.
4.1.4 Top Performing Features We provided some analyses of top performing, highly-correlated
features, and we suggested some hypotheses as to the meaning and
presence of certain features. Furthermore, we identified several
latent features that would go undetected in a textual-pattern rule-
based system. As a next step, we could definitively determine the
polarity (i.e. positive or negative) of each chi-squared correlated
feature by analyzing their presence in the review data. This would
give us greater clarity into the significance of these features.
4.1.5 Investigating the Coverage/Accuracy Tradeoff Through the course of our experiments, we observed that our
classifiers were generally much more confident about correct
predictions than incorrect ones. By choosing to label only those
example that meet a certain confidence threshold, we can achieve
greater accuracy, albeit with fewer results. For example, setting
this confidence threshold to 90% in our Word2Vec models for the
“Restaurant Take-Out” classifier reduces coverage by 50% but
increases accuracy to 96%.
5. ACKNOWLEDGMENTS We thank Yelp for providing the sample datasets and for
sponsoring initiatives that inspired us to perform this research. We
are indebted to several open-source NLP and machine-learning
APIs that enabled us to perform our analyses. These include NLTK,
genism, Numpy, SciPy, and Scikit-Learn. We acknowledge the
efforts of Professor Mathew Doherty of the Courant Institute of
Mathematical Sciences of New York University for overseeing the
progress of our research.
6. References
[1] "Yelp Dataset Challenge," Yelp, [Online]. Available:
https://www.yelp.com/dataset_challenge. [Accessed 1 May 2017].
[2] S. Bird, E. Loper and E. Klein, Natural Language Processing with
Python, O’Reilly Media Inc., 2009.
[3] P. S. Radim Rehhurek, "Software Framework for Topic Modelling
with Large Corpora," in Proceedings of the LREC 2010 Workshop on
New Challenges for NLP Frameworks, 2010.
[4] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O.
Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J.
Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot and E.
Duchesnay, "Scikit-learn: Machine Learning in Python," Journal of
Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
[5] T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting
System," in The 22Nd ACM SIGKDD International Conference On
Knowledge Discovery And Data Mining, San Francisco, 2016.
[6] H. Cho and S. M. Yoon, "Feature word selection by iterative top-K
aggregation for classifying recommended shops," in 2016
International Conference on Information and Communication
Technology Convergence (ICTC), Jeju, 2016.
[7] "Working with Text Data," scikit-learn, 2016. [Online]. Available:
http://scikit-
learn.org/stable/tutorial/text_analytics/working_with_text_data.html.
[Accessed 1 5 2017].
[8] T. Mikolov, I. Sutskever, K. Chen, G. Corrado and J. Dean,
"Distributed representations of words and phrases and their
compositionality," in NIPS'13 Proceedings of the 26th International
Conference on Neural Information Processing Systems, Lake Tahoe,
2013.
[9] Q. Le and T. Mikolov, "Distributed Representations of Sentences and
Documents," in 31st International Conference on International
Conference on Machine Learning, Beijing, 2014.
[10] D. Varshneya, P. Dhananjay and D. Babu Jayagopi, "Restaurant
Attribute classification using Deep Learning," in India Conference
(INDICON), 2016 IEEE Annual, Bangalore, 2016.
6.1.1 APPENDIX
Top Positive SVC Features
Bike
Parking
Accepts
Bitcoin
Accepts
Credit Card
Dogs
Allowed
Good for
Kids
Restaurant
Delivers
Restaurant Offers
Takeout
Wheelchair
Accessible
1 neighborhood vegas location patio kids delivery hibachi mall
2 gives cheese store outside family deliver seating notch
3 southside pho service nypd ice delivered barista basil
4 Madison company card zimmer families phone love pretty
5 combinations photos pricey truck golds work honestly picnic
6 clay dog online scottsdale friendly deliveries workers valley
7 annex said ordered residence arena ordered craving scottsdale
8 lap barber hotel downtown auch jason seasoned children
9 spoiled crepe christmas favorite admission lavender sandwich champaign
10 safeway patrick noise pet zip online crepe mum
11 salami pedicure mini stuttgart daughter yard merchant ny
12 Toronto delivery flavor flavor easy christian ich gilbert
13 coriander zombie missed dog basketball israeli karaage elevator
14 holly casino app outdoor staple negative great purchased
15 interesting las register thank slushies recipes marche tempe
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
BikeParking
AcceptsBitcoin
AcceptsCredit Card
DogsAllowed
Good forKids
RestaurantDelivers
RestaurantDoes Take-
out
WheelchairAccessible
Acc
ura
cy
Business Attributes
Performance Evaluations for Sample Balanced Datasets
tf-idf-NB
NB-chi2-k=100
NB-chi2-k=1000
NB-chi2-k=5000
tf-idf-SVM
SVM-chi2-k=100
SVM-chi2-k=1000
SVM-chi2-k=5000
Top Negative SVC Features
Bike
Parking
Accepts
Bitcoin
Accepts
Credit
Card
Dogs
Allowed
Good for
Kids
Restaurant
Delivers
Restaurant
Offers Takeout
Wheelchair
Accessible
1 mandalay breakfast cash marriot bar murphy bitter stairs
2 vegas aaron atm lobster reservations smoking sausages toronto
3 monroeville glasses debit told crowd casino dipping macaroni
4 robinson phillip plastic lounge tiny register junction hut
5 venetian connie process zipps hip cost en cozy
6 gate battery meds decent stools seated reservation firm
7 path joseph cards york downtown casa museum delivered
8 smelled salad flavour financing soju perch signs Yorkville
9 mentor troy cart teas ambiance fajita waiters upstairs
10 carlo charlotte maintained clean drunk southside vegas cousins
11 cushions gabe english beginner trendy list vibe bob
12 cosmopolitan brad ladies gardens casino italy main colour
13 2014:00:00 kevin dollar meals booked prefer dejeuner winter
14 sorely amber flower salon louis Einstein casino montreal
15 effective kirsty bio birthday reservation carrot kbbq boxes
Top Chi-Squared Correlated Features
Bike
Parking
Accepts
Bitcoin
Accepts
Credit
Card
Dogs
Allowed
Good for
Kids
(sample
dataset)
Restaurant
Delivers
Restaurant Offers
Takeout
Wheelchair
Accessible
1 vegas food food sushi room pizza buffet vegas
2 ramen steak service hotel hotel food wine food
3 steak good good rice steak good vegas good
4 chicken menu great thai vegas delivery dessert great
5 club restaurant vegas patio pool great steak place
6 bar salad und breakfast night service dinner buffet
7 lo meal steak room carlo bar und service
8 waffles ordered room beef monte place pizza just
9 lobster dinner time dishes strip steak experience time
10 wine delicious buffet Korean stay burger buffets und
11 wagyu sides just rolls wine breakfast die like
12 night dessert like bike bar like restaurant steak
13 craftsteak bread ordered noodles casino table course love
14 pool table hotel Japanese lobster drinks rib best
15 drinks server nice shop rooms just view der
Top TF-IDF Features
Bike
Parking
Accepts
Bitcoin
Accepts
Credit
Card
Dogs
Allowed
Good for
Kids
Restaurant
Delivers
Restaurant Offers
Takeout
Wheelchair
Accessible
1 food great food place food pizza food place
2 place hair place food good food good good
3 good service good good place good place great
4 great place great service great place great like
5 like time service like service great pizza just
6 service food like just like chicken like service
7 just good just time chicken order service time
8 time company time store just like just really
9 pizza just store really time service chicken chicken
10 really recommend really und order just time pizza
11 chicken did chicken nice pizza time order ve
12 store work staff staff restaurant ordered really store
13 ‘ve like pizza shop really delivery ordered restaurant
14 order really ‘ve friendly got really restaurant nice
15 nice repair order bar lunch sushi ve ordered