“Social many challenges faced in sentiment analysis such

“Social Media” is the trending
platform now and almost everybody are involved in it one way or the other.
Another platform that is flourishing is “Machine Learning” and when social
media is combined with machine learning, it is just magnificent. For example,
when somebody wants to try a new restaurant, watch a movie, learn a course on
the internet or join a college, the first thing a person does is “Google” about
it and see if it has good reviews. One of the ways to do it is by Sentiment
Analysis where the emotions of the text is captured and classified.

Sentiment Analysis refers to the elegance of computational
and natural language processing based totally strategies used to become aware
of, extract or represent subjective facts, which includes evaluations,
expressed in a given piece of text. The primary purpose of sentiment analysis
is to categorize a creator’s mind-set closer to numerous subjects into
fantastic, terrible or neutral categories. Sentiment evaluation has many
packages in exclusive domain names consisting of, but no longer constrained to,
enterprise intelligence, politics, sociology, and so on. Facts consisting of movie
reviews, net-postings, tweets, motion pictures, etc., offer significant
opportunities to look at and analyze human critiques and sentiments.

There are numerous eventualities wherein sentiment analysis
is used. One such scenario is to rate the movie (on a scale of 1-10) based on
the online movie reviews on the internet using Sentiment Analysis. Film
assessment evaluation is one of the most popular fields to analyze public
sentiment. There are many challenges faced in sentiment analysis such as
cleaning and pre-processing the data. Only if it is done properly, the
classifier will be trained and accuracy level will be at the maximum. There are
lot of packages and tools specifically for data cleaning and these packages can
be imported from nltk.corpus library.

Introduction

In the current era, data can be used for many purposes and it
has become the new trend to analyse data and come to a conclusion about a
certain aspect. Big giants like Google, Twitter, Facebook give data to engineers
for them to analyse and extract opinions from it. Based on the opinions,
companies improve their efficiency or change their gameplay in order to make
the company better. For instance, during the early 2000’s companies used to
give feedback forms which will be filled by the user and it will be analysed
manually by the employees of the company but now the feedback form is given
online, the feedback data is used as the dataset and an algorithm is written to
analyse the data and a conclusion is made. This is done by a term commonly
called as “Sentiment Analysis”. Sentiment Analysis 1 is analysing the data by
examining the sentiment of the data. Based on the polarity of the data, it can
be classified if it is positive, negative or neutral. This is a sub domain of
Natural Language Processing 2 which is a machine trying to understand the
human language. In this project, we are going to use the movie reviews from
different websites such as IMDB, fandango, metacritic and amazon as our dataset
to find out if the movie is good or bad based on the reviews given by different
users and each movie has got a minimum of 30 reviews and a maximum of 200
reviews. Based on the polarity of the movie reviews of a movie, the movie will
be rated on a scale of 1 to 10. Our initial step is to clean the data by
tokenizing, removing the stopwords and apply the part of speech tagging. Once
the data is cleaned the text classification can be done using SentiWordNet.
SentiWordNet 3 is an extension of the wordnet database which helps to
classify the words into positive, negative and neutral. There are about 450
movies which is used for training and about 250 movies which will be used for
testing the classifier and the classifier which will be used is the Support
Vector Machine (SVM). SVM 4 is a supervised machine learning algorithm which
is extensively used for classification of images and text. This classifier is
very efficient because it sets a hyper-plane and divides them into two classes
and. In our case, the negative and the positive reviews will be divided and fed
to the different classes. SVM is known for its high accuracy level.