Sentiment Analysis

###What is sentiment analysis?

Sentiment analysis is the process by which we use statistical techniques in order to gauge/understand opinions and sentiment of a piece of text.

Example: Movie Reviews

• Unbelievably disappointing, directors should kill themselves
• Amazing movie, great plot twist and action scenes!
• Wow Nicholas Cage is really bad

Example: Twitter Sentiment

Connor et al '10 Twitter Sentiment vs Gallup Poll of Consumer Confidence

Example: Product Reviews

Tasks

Easy Tasks in Sentiment Analysis: Is the text positive or negative?
Medium Tasks in Sentiment Analysis: Score the text sentiment from -1 to 1
Hard Tasks in Sentiment Analysis: WHY does the user feel the way s/he does?

Process:

Scrape
Clean
Feature Extraction
Classification

Scrape

You (usually) have to scrape your data, usually from the web. We can use BeautifulSoup to scrape data OR call it from a web API.

def main(query):
	url = "http://www.rottentomatoes.com/search/?search="
	raw_string = re.compile(r' ')
	fullQuery = raw_string.sub('+', query)
	r = requests.post(url + fullQuery)
	soup = bs(r.content)
	if soup.find('ul',{'class':"results_ul"}):
		tags = soup.findAll('li',{'class':"media_block bottom_divider clearfix"})
		results = {}
		for tag in tags:
			info = MovieSearchResult(tag)
			resultNum = str(tags.index(tag) + 1)
...

Clean

html markup
Tokenize
n-gram?
Capitalization
Punctuation
Lemmatize vs Stem?

<p>
This film isn't for all people. That's to say about a lot of movies in
general of course, but this one in particular brings up a big clashing
point between critics; What do we want to see in our movies? What is
more important, to portray a fictional setting for the sake of giving
people a mind blowing visual experience or to amuse and amaze them with
clever plot twists and intelligent dialogs?<br><br>First lets analyze what exactly this film is made of. Basically, the
whole thing is just one epic fighting scene after another. Most
</p>

Feature Extraction

Emojis?
Unigrams
Linguistic Tokens: Noun, nounphrases,adjectives, adverbs are usually used to express sentiment

Classification

Now we have our features. How do we categorize these?

"In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. " - Wikipedia

Popular Classifiers: Naive-Bayes, Linear Regression, SVM, Random Forest, etc

Classifiers are either supervised, unsupervised, semi-supervised.

Splitting

You should split your training set.

Cross-Validation

Average your scores across 10 folds. k-fold cross validation.

Evaluation of Success

Precision:

How many of the retrieved documents are relevant? Precision @ 10: How many of the top 10 documents you retrieve are good?

Recall

How many of total relevant documents were retrieved? Recall @ 10: How many of the actual top 10 documents did you retrieve?

F-score:

Harmonic Mean of Recall and Precision:

Latent Dirichlet Allocation

Generative statistical model that groups similar documents together. NOT supervised, there is no training data!

Process

Go through each document, and randomly assign each word in the document to one of the K topics.
Go through each word w in d
Calculate p(topic t | document d) = the proportion of words in document d that are currently assigned to topic t
p(word w | topic t) = the proportion of assignments to topic t over all documents that come from this word w.
Reassign w a new topic, where you choose topic t with probability p(topic t | document d) * p(word w | topic t) this is essentially the probability that topic t generated word w

Using LDA or Sentiment Analysis for projects

Few libraries available:

coreNLP
gensim
nltk

Sentiment Analysis NLTK tutorial with NB and tokenization: http://www.nltk.org/howto/sentiment.html Gensim LDA: https://radimrehurek.com/gensim/models/ldamodel.html

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis

Example: Movie Reviews

Example: Twitter Sentiment

Example: Product Reviews

Tasks

Process:

Scrape

Clean

Feature Extraction

Classification

Splitting

Cross-Validation

Evaluation of Success

Precision:

Recall

F-score:

Latent Dirichlet Allocation

Process

Using LDA or Sentiment Analysis for projects

Few libraries available:

About

Releases

Packages

SIG-IR/Sentiment-Analysis-and-Topic-Modelling

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis

Example: Movie Reviews

Example: Twitter Sentiment

Example: Product Reviews

Tasks

Process:

Scrape

Clean

Feature Extraction

Classification

Splitting

Cross-Validation

Evaluation of Success

Precision:

Recall

F-score:

Latent Dirichlet Allocation

Process

Using LDA or Sentiment Analysis for projects

Few libraries available:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages