SlideShare a Scribd company logo
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016
DOI:10.5121/ijcsit.2016.8306 67
STOCK TREND PREDICTION USING NEWS
SENTIMENT ANALYSIS
Kalyani Joshi1
, Prof. Bharathi H. N.2
, Prof. Jyothi Rao3
1
Department of Computer Engineering, KJSCE, Mumbai
2
Department of Computer Engineering, KJSCE, Mumbai
3
Department of Computer Engineering, KJSCE, Mumbai
ABSTRACT
Efficient Market Hypothesis is the popular theory about stock prediction. With its failure much research
has been carried in the area of prediction of stocks. This project is about taking non quantifiable data such
as financial news articles about a company and predicting its future stock trend with news sentiment
classification. Assuming that news articles have impact on stock market, this is an attempt to study
relationship between news and stock trend. To show this, we created three different classification models
which depict polarity of news articles being positive or negative. Observations show that RF and SVM
perform well in all types of testing. Naïve Bayes gives good result but not compared to the other two.
Experiments are conducted to evaluate various aspects of the proposed model and encouraging results are
obtained in all of the experiments. The accuracy of the prediction model is more than 80% and in
comparison with news random labelling with 50% of accuracy; the model has increased the accuracy by
30%.
KEYWORDS
Text Mining, Sentiment analysis, Naive Bayes, Random Forest, SVM, Stock trends
1. INTRODUCTION
In the finance field, stock market and its trends are extremely volatile in nature. It attracts
researchers to capture the volatility and predicting its next moves. Investors and market analysts
study the market behaviour and plan their buy or sell strategies accordingly. As stock market
produces large amount of data every day, it is very difficult for an individual to consider all the
current and past information for predicting future trend of a stock. Mainly there are two methods
for forecasting market trends. One is Technical analysis and other is Fundamental analysis.
Technical analysis considers past price and volume to predict the future trend where as
Fundamental analysis On the other hand, Fundamental analysis of a business involves analyzing
its financial data to get some insights. The efficacy of both technical and fundamental analysis is
disputed by the efficient-market hypothesis which states that stock market prices are essentially
unpredictable.
This research follows the Fundamental analysis technique to discover future trend of a stock by
considering news articles about a company as prime information and tries to classify news as
good (positive) and bad (negative). If the news sentiment is positive, there are more chances that
the stock price will go up and if the news sentiment is negative, then stock price may go down.
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016
70
This research is an attempt to build a model that predicts news polarity which may affect changes
in stock trends. In other words, check the impact of news articles on stock prices. We are using
supervised machine learning as classification and other text mining techniques to check news
polarity. And also be able to classify unknown news, which is not used to build a classifier. Three
different classification algorithms are implemented to check and improve classification accuracy.
We have taken past three years data from Apple Company as stock price and news articles.
2. LITERATURE SURVEY
Stock price trend prediction is an active research area, as more accurate predictions are directly
related to more returns in stocks. Therefore, in recent years, significant efforts have been put into
developing models that can predict for future trend of a specific stock or overall market. Most of
the existing techniques make use of the technical indicators. Some of the researchers showed that
there is a strong relationship between news article about a company and its stock prices
fluctuations. Following is discussion on previous research on sentiment analysis of text data and
different classification techniques.
Nagar and Hahsler in their research [1] presented an automated text mining based approach to
aggregate news stories from various sources and create a News Corpus. The Corpus is filtered
down to relevant sentences and analyzed using Natural Language Processing (NLP) techniques. A
sentiment metric, called NewsSentiment, utilizing the count of positive and negative polarity
words is proposed as a measure of the sentiment of the overall news corpus. They have used
various open source packages and tools to develop the news collection and aggregation engine as
well as the sentiment evaluation engine. They also state that the time variation of NewsSentiment
shows a very strong correlation with the actual stock price movement.
Yu et al [2] present a text mining based framework to determine the sentiment of news articles
and illustrate its impact on energy demand. News sentiment is quantified and then presented as a
time series and compared with fluctuations in energy demand and prices.
J. Bean [3] uses keyword tagging on Twitter feeds about airlines satisfaction to score them for
polarity and sentiment. This can provide a quick idea of the sentiment prevailing about airlines
and their customer satisfaction ratings. We have used the sentiment detection algorithm based on
this research.
This research paper [4] studies how the results of financial forecasting can be improved when
news articles with different levels of relevance to the target stock are used simultaneously. They
used multiple kernels learning technique for partitioning the information which is extracted from
different five categories of news articles based on sectors, sub-sectors, industries etc.
News articles are divided into the five categories of relevance to a targeted stock, its sub
industry, industry, group industry and sector while separate kernels are employed to analyze each
one. The experimental results show that the simultaneous usage of five news categories improves
the prediction performance in comparison with methods based on a lower number of news
categories. The findings have shown that the highest prediction accuracy and return per trade
were achieved for MKL when all five categories of news were utilized with two separate kernels
of the polynomial and Gaussian types used for each news category.
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016
71
3. METHODOLOGY
3.1. SYSTEM DESIGN
Following system design is proposed in this project to classify news articles for generating stock
trend signal.
News Collection
Text Preprocessing
Polarity Detection
Algorithm
(News, Polarity
score)
Document
Representation
Classifier Learning
(Build the model)
System Evaluation
Test the model
with new data
Plot time series of
past Adj_close price
Plot Scoring of news
sentiment
Observe the relationship
between news sentiment
score and stock price
Figure 1: System Design
This design can logically be seen as three phases with first column of blocks in phase 1, second
column as phase 2 and third column contains blocks in phase 3. Result of phase 1 is news articles
with its polarity score. This result is given as an input to the phase 2. In phase 2, text is converted
in tf-idf vector space so that it can be given to the classifier. Then three different classifiers are
programmed for the same data to compare results. At the end of phase 2, we evaluate the results
given by all classifiers and also test for checking classifier performance for new news articles. In
phase 3, we check for relationship between news articles and stock price data. We plot both the
data using R language and record the results. In the following sections, each block of the design is
explained.
3.1.1. NEWS COLLECTION
We collected Apple Inc. Company’s data for past three years, from 1 Feb 2013 to 2 April 2016.
This data includes major key events news articles of the company and also daily stock prices of
AAPL for the same time period. Daily stock prices contain six values as Open, High, Low, Close,
Adjusted Close, and Volume. For integrity throughout the project, we considered Adjusted Close
price as everyday stock price. We have collected this data from major news aggregators such as
news.google.com, reauters.com, finance.yahoo.com.
3.1.2. PRE PROCESSING
Text data is unstructured data. So, we cannot provide raw test data to classifier as an input.
Firstly, we need to tokenize the document into words to operate on word level. Text data contains
more noisy words which are not contributing towards classification. So, we need to drop those
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016
72
words. In addition, text data may contain numbers, more white spaces, tabs, punctuation
characters, stop words etc. We also need to clean data by removing all those words. For this
purpose, we created own stop-word list which specifically contains stopwords related to finance
world and also general English stop words. We built this using reference from [16]. This stop
words list contains general words including Generic, names, Date and numbers, Geographic,
Currencies.
Also, to ignore words that appear in only one or two documents, we are considering minimum
document frequency which considers words that appear in minimum three documents. Stemming
is also important to reduce redundancy in words. Using stemming process, all the words are
replaced by its original version of word. For example, the words ‘developed’, ‘development’,
‘developing’ are reduced to its stem word ‘develop’. Some of the pre-processing is done before
applying polarity detection algorithm. And some of them are applied after applying polarity
detection algorithm.
3.1.3. SENTIMENT DETECTION ALGORITHM
For automatic sentiment detection of news articles, we are following Dictionary based approach
which uses Bag of Word technique for text mining. This method is based on the research of J.
Bean in his implementation of Twitter sentiment analysis for airline companies [6]. To build the
polarity dictionary, we need two types of words collection; i.e. positive words and negative
words. Then we can match the article’s words against both these words list and count numbers of
words appears in both the dictionaries and calculate the score of that document.
We created the polarity words dictionary using general words with positive and negative polarity.
Also addition to this, we used Finance specific words with its polarity using McDonald’s research
[16]. In this dictionary, we collected 2360 positive words and 7383 negative words.
For the news article, we are considering the string which contains headline and news body, both.
The algorithm to calculate sentiment score of a document is given below.
Algorithm:
1. Tokenize the document into word vector.
2. Prepare the dictionary which contains words with its polarity (positive or negative)
3. Check against each word weather it matches with one of the word from positive word
dictionary or negative words dictionary.
4. Count number of words belongs to positive and negative polarity.
5. Calculate Score of document = count (pos.matches) – count (neg.matches)
6. If the Score is 0 or more, we consider the document is positive or else, negative.
Here, we are considering one assumption as if the score of the document is 0, then we label it as
positive as we are considering two class problem for this implementation. As a result, we get
news collection with its sentiment score and polarity as positive or negative.
3.1.4. DOCUMENT REPRESENTATION
In order to reduce the complexity of text documents and make them easier to work with, the
documents has to be transformed from the full text version to a document vector which describes
the contents of the document. To represent text documents, we are using TF-IDF scheme. The
higher tf-idf value a term gets, the more important it is. A high value is reached when the term
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016
73
frequency in the given document is high and when there are few other documents in the collection
containing the given term/feature. This term weighting method tends therefore to filter out
common terms by giving them a very low value.
3.1.5. CLASSIFIER LEARNING
As most of the research shows that SVM, Random Forest and Naïve Bayes classification
algorithms performs good in text classification. So, we are considering all three algorithms to
classify the text and check each algorithm’s accuracy. We can compare all the results such as
accuracy, precision, recall and other model evaluation methods. All three classification algorithms
are implemented and tested using Weka tool.
3.1.6. SYSTEM EVALUATION
We divided the data into train and test set. Also, we created unknown data set for classifier to
check accuracy of classifier against new data. We evaluated all three classifiers performance by
checking each one’s accuracy, precision, recall, ROC curve area. The results are as given in the
next section.
3.1.7. TESTING WITH NEW DATA
News articles from Jan 2016 to April 2016 are used as unknown test set. When comparing results
of all classifiers, SVM classifier performs well for unknown data. Random Forest algorithm also
worked good comparing to naive bayes algorithm.
3.1.8. PLOTTING THE VALUES
After classification of unknown data, we plotted the news score chart and compared with
historical price chart.
4. EVALUATION
We tested the models using different testing options so that we can compare each method against
different scenarios. Following are the test options on which we tested our models.
• 5-fold cross validation
• 10-fold cross validation
• 15-fold cross validation
• 70% Data split
• 80% Data split
• New testing data
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016
74
Figure 2: Comparison of three classifiers against different test options
Figure 3: Result of testing models with new data
Figure 4: Time series plot of news sentiment score vs. actual stock price for test dataset
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016
75
5. CONCLUSION
Finding future trend for a stock is a crucial task because stock trends depend on number of
factors. We assumed that news articles and stock price are related to each other. And, news may
have capacity to fluctuate stock trend. So, we thoroughly studied this relationship and concluded
that stock trend can be predicted using news articles and previous price history.
As news articles capture sentiment about the current market, we automate this sentiment detection
and based on the words in the news articles, we can get an overall news polarity. If the news is
positive, then we can state that this news impact is good in the market, so more chances of stock
price go high. And if the news is negative, then it may impact the stock price to go down in trend.
We used polarity detection algorithm for initially labelling news and making the train set. For this
algorithm, dictionary based approach was used. The dictionaries for positive and negative words
are created using general and finance specific sentiment carrying words. Then pre-processing of
text data was also a challenging task. We created own dictionary for stop words removal which
also includes finance specific stop words. Based on this data, we implemented three classification
models and tested under different test scenarios. Then after comparing their results, Random
Forest worked very well for all test cases ranging from 88% to 92% accuracy. Accuracy followed
by SVM is also considerable around 86%. Naive Bayes algorithm performance is around 83%.
Given any news article, it would be possible for the model to arrive on a polarity which would
further predict the stock trend.
FUTURE WORK
We would like to extend this research by adding more company’s data and check the prediction
accuracy. For those companies where availability of financial news is a challenge, we would be
using twitter data for similar analysis. We can also incorporate similar strategies for algorithmic
trading.
ACKNOWLEDGEMENTS
Authors would like to thank our guides, teachers, family and friends who supported in the
completion of this research project. Appreciating everyone who helped us knowingly or
unknowingly for this project.
REFERENCES
[1] Anurag Nagar, Michael Hahsler, Using Text and Data Mining Techniques to extract Stock Market
Sentiment from Live News Streams, IPCSIT vol. XX (2012) IACSIT Press, Singapore
[2] W.B. Yu, B.R. Lea, and B. Guruswamy, A Theoretic Framework Integrating Text Mining and
Energy Demand Forecasting, International Journal of Electronic Business Management. 2011, 5(3):
211-224
[3] J. Bean, R by example: Mining Twitter for consumer attitudes towards airlines, In Boston Predictive
Analytics Meetup Presentation, 2011
[4] Yauheniya Shynkevich, T.M. McGinnity, Sonya Coleman, Ammar Belatreche, Predicting Stock
Price Movements Based on Different Categories of News Articles, 2015 IEEE Symposium Series on
Computational Intelligence
[5] P. Hofmarcher, S. Theussl, and K. Hornik, Do Media Sentiments Reflect Economic Indices?
Chinese Business Review. 2011, 10(7): 487-492
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016
76
[6] R. Goonatilake and S. Herath, The volatility of the stock market and news, International Research
Journal of Finance and Economics, 2007, 11: 53-65.
[7] Spandan Ghose Chowdhury, Soham Routh , Satyajit Chakrabarti, News Analytics and Sentiment
Analysis to Predict Stock Price Trends, (IJCSIT) International Journal of Computer Science and
Information Technologies, Vol. 5 (3) , 2014, 3595-3604
[8] Robert P. Schumaker, Yulei Zhang, Chun-Neng Huang, Sentiment Analysis of Financial News
Articles
[9] Győző Gidófalvi, Using News Articles to Predict Stock Price Movements, University of California,
San Diego La Jolla, CA 92037, 2001
[10] L. Breiman, Random forests. Machine Learning, 45(1):5-32, 2001
[11] Data Mining Lab 7: Introduction to Support Vector Machines (SVMS)
[12] oachims T., Text Categorization with Support Vector Machines: Learning with Many Relevant
Features, European Conference on Machine Learning (ECML), Application of Machine Learning
and Data mining in Finance, Chemnitz, Germany, 1998)
[13] Kyoung-jae Kim, Financial time series forecasting using support vector machines, Neurocomputing
55 (2013) 307 – 319
[14] Pegah Falinouss, Stock Trend Prediction using News articles, The Lulea University of Technology,
2007
[15] https://en.wikipedia.org/wiki/Support_vector_machine
[16] http://www3.nd.edu/~mcdonald/Word_Lists.html
[17] https://jeffreybreen.wordpress.com/2011/07/04/twitter-text-mining-r-slides/
Authors
Kalyani Joshi
Student of Master in Engineering in at K. J. Somaiya College of Engineering, Mumbai.
Completed Bachelors in Engineering from Pune University, 2013.
Prof. Bharathi H. N.
Currently working as Head of Department of Computer Engineering at K. J. Somaiya College of
Engineering, Mumbai.
Prof. Jyothi M. Rao
Currently working as Associate professor and Associate head of Computer Engineering Department at K.
J. Somaiya College of Engineering, Mumbai.
Ad

More Related Content

Similar to STOCK TREND PREDICTION USING NEWS SENTIMENT ANALYSIS (20)

Stock Market Prediction using Machine Learning
Stock Market Prediction using Machine LearningStock Market Prediction using Machine Learning
Stock Market Prediction using Machine Learning
ijtsrd
 
OPENING RANGE BREAKOUT STOCK TRADING ALGORITHMIC MODEL
OPENING RANGE BREAKOUT STOCK TRADING ALGORITHMIC MODELOPENING RANGE BREAKOUT STOCK TRADING ALGORITHMIC MODEL
OPENING RANGE BREAKOUT STOCK TRADING ALGORITHMIC MODEL
IJCI JOURNAL
 
IJET-V3I1P16
IJET-V3I1P16IJET-V3I1P16
IJET-V3I1P16
IJET - International Journal of Engineering and Techniques
 
Project report on Share Market application
Project report on Share Market applicationProject report on Share Market application
Project report on Share Market application
KRISHNA PANDEY
 
FYP
FYPFYP
FYP
Thomas Mc Callig
 
Sentiment Analysis based Stock Forecast Application
Sentiment Analysis based Stock Forecast ApplicationSentiment Analysis based Stock Forecast Application
Sentiment Analysis based Stock Forecast Application
IRJET Journal
 
stock price prediction using sentiment analysis
stock price prediction using sentiment analysisstock price prediction using sentiment analysis
stock price prediction using sentiment analysis
SurbhiSharma889936
 
13F_working_paper
13F_working_paper13F_working_paper
13F_working_paper
Raphael Rottgen
 
Analysis of Trends in Stock Market.pdf
Analysis of Trends in Stock Market.pdfAnalysis of Trends in Stock Market.pdf
Analysis of Trends in Stock Market.pdf
Valerie Felton
 
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTIONIMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IJDKP
 
IRJET- Stock Market Prediction using Deep Learning and Sentiment Analysis
IRJET- Stock Market Prediction using Deep Learning and Sentiment AnalysisIRJET- Stock Market Prediction using Deep Learning and Sentiment Analysis
IRJET- Stock Market Prediction using Deep Learning and Sentiment Analysis
IRJET Journal
 
IRJET - Stock Market Analysis and Prediction
IRJET - Stock Market Analysis and PredictionIRJET - Stock Market Analysis and Prediction
IRJET - Stock Market Analysis and Prediction
IRJET Journal
 
STOCK MARKET PREDICTION USING MACHINE LEARNING IN PYTHON
STOCK MARKET PREDICTION USING MACHINE LEARNING IN PYTHONSTOCK MARKET PREDICTION USING MACHINE LEARNING IN PYTHON
STOCK MARKET PREDICTION USING MACHINE LEARNING IN PYTHON
IRJET Journal
 
Stock Market Prediction Using Artificial Neural Network
Stock Market Prediction Using Artificial Neural NetworkStock Market Prediction Using Artificial Neural Network
Stock Market Prediction Using Artificial Neural Network
INFOGAIN PUBLICATION
 
STOCK PRICE PREDICTION AND RECOMMENDATION USINGMACHINE LEARNING TECHNIQUES AN...
STOCK PRICE PREDICTION AND RECOMMENDATION USINGMACHINE LEARNING TECHNIQUES AN...STOCK PRICE PREDICTION AND RECOMMENDATION USINGMACHINE LEARNING TECHNIQUES AN...
STOCK PRICE PREDICTION AND RECOMMENDATION USINGMACHINE LEARNING TECHNIQUES AN...
IRJET Journal
 
Forex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News ArticlesForex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News Articles
Parang Saraf
 
Data-Driven Approach to Stock Market Prediction and Sentiment Analysis
Data-Driven Approach to Stock Market Prediction and Sentiment AnalysisData-Driven Approach to Stock Market Prediction and Sentiment Analysis
Data-Driven Approach to Stock Market Prediction and Sentiment Analysis
IRJET Journal
 
STOCK MARKET PREDICTION USING MACHINE LEARNING METHODS
STOCK MARKET PREDICTION USING MACHINE LEARNING METHODSSTOCK MARKET PREDICTION USING MACHINE LEARNING METHODS
STOCK MARKET PREDICTION USING MACHINE LEARNING METHODS
IAEME Publication
 
Applications of Artificial Neural Network in Forecasting of Stock Market Index
Applications of Artificial Neural Network in Forecasting of Stock Market IndexApplications of Artificial Neural Network in Forecasting of Stock Market Index
Applications of Artificial Neural Network in Forecasting of Stock Market Index
paperpublications3
 
Stock Market Prediction Analysis
Stock Market Prediction AnalysisStock Market Prediction Analysis
Stock Market Prediction Analysis
IRJET Journal
 
Stock Market Prediction using Machine Learning
Stock Market Prediction using Machine LearningStock Market Prediction using Machine Learning
Stock Market Prediction using Machine Learning
ijtsrd
 
OPENING RANGE BREAKOUT STOCK TRADING ALGORITHMIC MODEL
OPENING RANGE BREAKOUT STOCK TRADING ALGORITHMIC MODELOPENING RANGE BREAKOUT STOCK TRADING ALGORITHMIC MODEL
OPENING RANGE BREAKOUT STOCK TRADING ALGORITHMIC MODEL
IJCI JOURNAL
 
Project report on Share Market application
Project report on Share Market applicationProject report on Share Market application
Project report on Share Market application
KRISHNA PANDEY
 
Sentiment Analysis based Stock Forecast Application
Sentiment Analysis based Stock Forecast ApplicationSentiment Analysis based Stock Forecast Application
Sentiment Analysis based Stock Forecast Application
IRJET Journal
 
stock price prediction using sentiment analysis
stock price prediction using sentiment analysisstock price prediction using sentiment analysis
stock price prediction using sentiment analysis
SurbhiSharma889936
 
Analysis of Trends in Stock Market.pdf
Analysis of Trends in Stock Market.pdfAnalysis of Trends in Stock Market.pdf
Analysis of Trends in Stock Market.pdf
Valerie Felton
 
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTIONIMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IMPROVED TURNOVER PREDICTION OF SHARES USING HYBRID FEATURE SELECTION
IJDKP
 
IRJET- Stock Market Prediction using Deep Learning and Sentiment Analysis
IRJET- Stock Market Prediction using Deep Learning and Sentiment AnalysisIRJET- Stock Market Prediction using Deep Learning and Sentiment Analysis
IRJET- Stock Market Prediction using Deep Learning and Sentiment Analysis
IRJET Journal
 
IRJET - Stock Market Analysis and Prediction
IRJET - Stock Market Analysis and PredictionIRJET - Stock Market Analysis and Prediction
IRJET - Stock Market Analysis and Prediction
IRJET Journal
 
STOCK MARKET PREDICTION USING MACHINE LEARNING IN PYTHON
STOCK MARKET PREDICTION USING MACHINE LEARNING IN PYTHONSTOCK MARKET PREDICTION USING MACHINE LEARNING IN PYTHON
STOCK MARKET PREDICTION USING MACHINE LEARNING IN PYTHON
IRJET Journal
 
Stock Market Prediction Using Artificial Neural Network
Stock Market Prediction Using Artificial Neural NetworkStock Market Prediction Using Artificial Neural Network
Stock Market Prediction Using Artificial Neural Network
INFOGAIN PUBLICATION
 
STOCK PRICE PREDICTION AND RECOMMENDATION USINGMACHINE LEARNING TECHNIQUES AN...
STOCK PRICE PREDICTION AND RECOMMENDATION USINGMACHINE LEARNING TECHNIQUES AN...STOCK PRICE PREDICTION AND RECOMMENDATION USINGMACHINE LEARNING TECHNIQUES AN...
STOCK PRICE PREDICTION AND RECOMMENDATION USINGMACHINE LEARNING TECHNIQUES AN...
IRJET Journal
 
Forex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News ArticlesForex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News Articles
Parang Saraf
 
Data-Driven Approach to Stock Market Prediction and Sentiment Analysis
Data-Driven Approach to Stock Market Prediction and Sentiment AnalysisData-Driven Approach to Stock Market Prediction and Sentiment Analysis
Data-Driven Approach to Stock Market Prediction and Sentiment Analysis
IRJET Journal
 
STOCK MARKET PREDICTION USING MACHINE LEARNING METHODS
STOCK MARKET PREDICTION USING MACHINE LEARNING METHODSSTOCK MARKET PREDICTION USING MACHINE LEARNING METHODS
STOCK MARKET PREDICTION USING MACHINE LEARNING METHODS
IAEME Publication
 
Applications of Artificial Neural Network in Forecasting of Stock Market Index
Applications of Artificial Neural Network in Forecasting of Stock Market IndexApplications of Artificial Neural Network in Forecasting of Stock Market Index
Applications of Artificial Neural Network in Forecasting of Stock Market Index
paperpublications3
 
Stock Market Prediction Analysis
Stock Market Prediction AnalysisStock Market Prediction Analysis
Stock Market Prediction Analysis
IRJET Journal
 

Recently uploaded (20)

Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
AI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdfAI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
Vaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without HallucinationsVaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without Hallucinations
john409870
 
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
BookNet Canada
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
The Microsoft Excel Parts Presentation.pdf
The Microsoft Excel Parts Presentation.pdfThe Microsoft Excel Parts Presentation.pdf
The Microsoft Excel Parts Presentation.pdf
YvonneRoseEranista
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
Viam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdfViam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdf
camilalamoratta
 
Financial Services Technology Summit 2025
Financial Services Technology Summit 2025Financial Services Technology Summit 2025
Financial Services Technology Summit 2025
Ray Bugg
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
AI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdfAI You Can Trust: The Critical Role of Governance and Quality.pdf
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
Vaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without HallucinationsVaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without Hallucinations
john409870
 
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
BookNet Canada
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
The Microsoft Excel Parts Presentation.pdf
The Microsoft Excel Parts Presentation.pdfThe Microsoft Excel Parts Presentation.pdf
The Microsoft Excel Parts Presentation.pdf
YvonneRoseEranista
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
Viam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdfViam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdf
camilalamoratta
 
Financial Services Technology Summit 2025
Financial Services Technology Summit 2025Financial Services Technology Summit 2025
Financial Services Technology Summit 2025
Ray Bugg
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Ad

STOCK TREND PREDICTION USING NEWS SENTIMENT ANALYSIS

  • 1. International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016 DOI:10.5121/ijcsit.2016.8306 67 STOCK TREND PREDICTION USING NEWS SENTIMENT ANALYSIS Kalyani Joshi1 , Prof. Bharathi H. N.2 , Prof. Jyothi Rao3 1 Department of Computer Engineering, KJSCE, Mumbai 2 Department of Computer Engineering, KJSCE, Mumbai 3 Department of Computer Engineering, KJSCE, Mumbai ABSTRACT Efficient Market Hypothesis is the popular theory about stock prediction. With its failure much research has been carried in the area of prediction of stocks. This project is about taking non quantifiable data such as financial news articles about a company and predicting its future stock trend with news sentiment classification. Assuming that news articles have impact on stock market, this is an attempt to study relationship between news and stock trend. To show this, we created three different classification models which depict polarity of news articles being positive or negative. Observations show that RF and SVM perform well in all types of testing. Naïve Bayes gives good result but not compared to the other two. Experiments are conducted to evaluate various aspects of the proposed model and encouraging results are obtained in all of the experiments. The accuracy of the prediction model is more than 80% and in comparison with news random labelling with 50% of accuracy; the model has increased the accuracy by 30%. KEYWORDS Text Mining, Sentiment analysis, Naive Bayes, Random Forest, SVM, Stock trends 1. INTRODUCTION In the finance field, stock market and its trends are extremely volatile in nature. It attracts researchers to capture the volatility and predicting its next moves. Investors and market analysts study the market behaviour and plan their buy or sell strategies accordingly. As stock market produces large amount of data every day, it is very difficult for an individual to consider all the current and past information for predicting future trend of a stock. Mainly there are two methods for forecasting market trends. One is Technical analysis and other is Fundamental analysis. Technical analysis considers past price and volume to predict the future trend where as Fundamental analysis On the other hand, Fundamental analysis of a business involves analyzing its financial data to get some insights. The efficacy of both technical and fundamental analysis is disputed by the efficient-market hypothesis which states that stock market prices are essentially unpredictable. This research follows the Fundamental analysis technique to discover future trend of a stock by considering news articles about a company as prime information and tries to classify news as good (positive) and bad (negative). If the news sentiment is positive, there are more chances that the stock price will go up and if the news sentiment is negative, then stock price may go down.
  • 2. International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016 70 This research is an attempt to build a model that predicts news polarity which may affect changes in stock trends. In other words, check the impact of news articles on stock prices. We are using supervised machine learning as classification and other text mining techniques to check news polarity. And also be able to classify unknown news, which is not used to build a classifier. Three different classification algorithms are implemented to check and improve classification accuracy. We have taken past three years data from Apple Company as stock price and news articles. 2. LITERATURE SURVEY Stock price trend prediction is an active research area, as more accurate predictions are directly related to more returns in stocks. Therefore, in recent years, significant efforts have been put into developing models that can predict for future trend of a specific stock or overall market. Most of the existing techniques make use of the technical indicators. Some of the researchers showed that there is a strong relationship between news article about a company and its stock prices fluctuations. Following is discussion on previous research on sentiment analysis of text data and different classification techniques. Nagar and Hahsler in their research [1] presented an automated text mining based approach to aggregate news stories from various sources and create a News Corpus. The Corpus is filtered down to relevant sentences and analyzed using Natural Language Processing (NLP) techniques. A sentiment metric, called NewsSentiment, utilizing the count of positive and negative polarity words is proposed as a measure of the sentiment of the overall news corpus. They have used various open source packages and tools to develop the news collection and aggregation engine as well as the sentiment evaluation engine. They also state that the time variation of NewsSentiment shows a very strong correlation with the actual stock price movement. Yu et al [2] present a text mining based framework to determine the sentiment of news articles and illustrate its impact on energy demand. News sentiment is quantified and then presented as a time series and compared with fluctuations in energy demand and prices. J. Bean [3] uses keyword tagging on Twitter feeds about airlines satisfaction to score them for polarity and sentiment. This can provide a quick idea of the sentiment prevailing about airlines and their customer satisfaction ratings. We have used the sentiment detection algorithm based on this research. This research paper [4] studies how the results of financial forecasting can be improved when news articles with different levels of relevance to the target stock are used simultaneously. They used multiple kernels learning technique for partitioning the information which is extracted from different five categories of news articles based on sectors, sub-sectors, industries etc. News articles are divided into the five categories of relevance to a targeted stock, its sub industry, industry, group industry and sector while separate kernels are employed to analyze each one. The experimental results show that the simultaneous usage of five news categories improves the prediction performance in comparison with methods based on a lower number of news categories. The findings have shown that the highest prediction accuracy and return per trade were achieved for MKL when all five categories of news were utilized with two separate kernels of the polynomial and Gaussian types used for each news category.
  • 3. International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016 71 3. METHODOLOGY 3.1. SYSTEM DESIGN Following system design is proposed in this project to classify news articles for generating stock trend signal. News Collection Text Preprocessing Polarity Detection Algorithm (News, Polarity score) Document Representation Classifier Learning (Build the model) System Evaluation Test the model with new data Plot time series of past Adj_close price Plot Scoring of news sentiment Observe the relationship between news sentiment score and stock price Figure 1: System Design This design can logically be seen as three phases with first column of blocks in phase 1, second column as phase 2 and third column contains blocks in phase 3. Result of phase 1 is news articles with its polarity score. This result is given as an input to the phase 2. In phase 2, text is converted in tf-idf vector space so that it can be given to the classifier. Then three different classifiers are programmed for the same data to compare results. At the end of phase 2, we evaluate the results given by all classifiers and also test for checking classifier performance for new news articles. In phase 3, we check for relationship between news articles and stock price data. We plot both the data using R language and record the results. In the following sections, each block of the design is explained. 3.1.1. NEWS COLLECTION We collected Apple Inc. Company’s data for past three years, from 1 Feb 2013 to 2 April 2016. This data includes major key events news articles of the company and also daily stock prices of AAPL for the same time period. Daily stock prices contain six values as Open, High, Low, Close, Adjusted Close, and Volume. For integrity throughout the project, we considered Adjusted Close price as everyday stock price. We have collected this data from major news aggregators such as news.google.com, reauters.com, finance.yahoo.com. 3.1.2. PRE PROCESSING Text data is unstructured data. So, we cannot provide raw test data to classifier as an input. Firstly, we need to tokenize the document into words to operate on word level. Text data contains more noisy words which are not contributing towards classification. So, we need to drop those
  • 4. International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016 72 words. In addition, text data may contain numbers, more white spaces, tabs, punctuation characters, stop words etc. We also need to clean data by removing all those words. For this purpose, we created own stop-word list which specifically contains stopwords related to finance world and also general English stop words. We built this using reference from [16]. This stop words list contains general words including Generic, names, Date and numbers, Geographic, Currencies. Also, to ignore words that appear in only one or two documents, we are considering minimum document frequency which considers words that appear in minimum three documents. Stemming is also important to reduce redundancy in words. Using stemming process, all the words are replaced by its original version of word. For example, the words ‘developed’, ‘development’, ‘developing’ are reduced to its stem word ‘develop’. Some of the pre-processing is done before applying polarity detection algorithm. And some of them are applied after applying polarity detection algorithm. 3.1.3. SENTIMENT DETECTION ALGORITHM For automatic sentiment detection of news articles, we are following Dictionary based approach which uses Bag of Word technique for text mining. This method is based on the research of J. Bean in his implementation of Twitter sentiment analysis for airline companies [6]. To build the polarity dictionary, we need two types of words collection; i.e. positive words and negative words. Then we can match the article’s words against both these words list and count numbers of words appears in both the dictionaries and calculate the score of that document. We created the polarity words dictionary using general words with positive and negative polarity. Also addition to this, we used Finance specific words with its polarity using McDonald’s research [16]. In this dictionary, we collected 2360 positive words and 7383 negative words. For the news article, we are considering the string which contains headline and news body, both. The algorithm to calculate sentiment score of a document is given below. Algorithm: 1. Tokenize the document into word vector. 2. Prepare the dictionary which contains words with its polarity (positive or negative) 3. Check against each word weather it matches with one of the word from positive word dictionary or negative words dictionary. 4. Count number of words belongs to positive and negative polarity. 5. Calculate Score of document = count (pos.matches) – count (neg.matches) 6. If the Score is 0 or more, we consider the document is positive or else, negative. Here, we are considering one assumption as if the score of the document is 0, then we label it as positive as we are considering two class problem for this implementation. As a result, we get news collection with its sentiment score and polarity as positive or negative. 3.1.4. DOCUMENT REPRESENTATION In order to reduce the complexity of text documents and make them easier to work with, the documents has to be transformed from the full text version to a document vector which describes the contents of the document. To represent text documents, we are using TF-IDF scheme. The higher tf-idf value a term gets, the more important it is. A high value is reached when the term
  • 5. International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016 73 frequency in the given document is high and when there are few other documents in the collection containing the given term/feature. This term weighting method tends therefore to filter out common terms by giving them a very low value. 3.1.5. CLASSIFIER LEARNING As most of the research shows that SVM, Random Forest and Naïve Bayes classification algorithms performs good in text classification. So, we are considering all three algorithms to classify the text and check each algorithm’s accuracy. We can compare all the results such as accuracy, precision, recall and other model evaluation methods. All three classification algorithms are implemented and tested using Weka tool. 3.1.6. SYSTEM EVALUATION We divided the data into train and test set. Also, we created unknown data set for classifier to check accuracy of classifier against new data. We evaluated all three classifiers performance by checking each one’s accuracy, precision, recall, ROC curve area. The results are as given in the next section. 3.1.7. TESTING WITH NEW DATA News articles from Jan 2016 to April 2016 are used as unknown test set. When comparing results of all classifiers, SVM classifier performs well for unknown data. Random Forest algorithm also worked good comparing to naive bayes algorithm. 3.1.8. PLOTTING THE VALUES After classification of unknown data, we plotted the news score chart and compared with historical price chart. 4. EVALUATION We tested the models using different testing options so that we can compare each method against different scenarios. Following are the test options on which we tested our models. • 5-fold cross validation • 10-fold cross validation • 15-fold cross validation • 70% Data split • 80% Data split • New testing data
  • 6. International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016 74 Figure 2: Comparison of three classifiers against different test options Figure 3: Result of testing models with new data Figure 4: Time series plot of news sentiment score vs. actual stock price for test dataset
  • 7. International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016 75 5. CONCLUSION Finding future trend for a stock is a crucial task because stock trends depend on number of factors. We assumed that news articles and stock price are related to each other. And, news may have capacity to fluctuate stock trend. So, we thoroughly studied this relationship and concluded that stock trend can be predicted using news articles and previous price history. As news articles capture sentiment about the current market, we automate this sentiment detection and based on the words in the news articles, we can get an overall news polarity. If the news is positive, then we can state that this news impact is good in the market, so more chances of stock price go high. And if the news is negative, then it may impact the stock price to go down in trend. We used polarity detection algorithm for initially labelling news and making the train set. For this algorithm, dictionary based approach was used. The dictionaries for positive and negative words are created using general and finance specific sentiment carrying words. Then pre-processing of text data was also a challenging task. We created own dictionary for stop words removal which also includes finance specific stop words. Based on this data, we implemented three classification models and tested under different test scenarios. Then after comparing their results, Random Forest worked very well for all test cases ranging from 88% to 92% accuracy. Accuracy followed by SVM is also considerable around 86%. Naive Bayes algorithm performance is around 83%. Given any news article, it would be possible for the model to arrive on a polarity which would further predict the stock trend. FUTURE WORK We would like to extend this research by adding more company’s data and check the prediction accuracy. For those companies where availability of financial news is a challenge, we would be using twitter data for similar analysis. We can also incorporate similar strategies for algorithmic trading. ACKNOWLEDGEMENTS Authors would like to thank our guides, teachers, family and friends who supported in the completion of this research project. Appreciating everyone who helped us knowingly or unknowingly for this project. REFERENCES [1] Anurag Nagar, Michael Hahsler, Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams, IPCSIT vol. XX (2012) IACSIT Press, Singapore [2] W.B. Yu, B.R. Lea, and B. Guruswamy, A Theoretic Framework Integrating Text Mining and Energy Demand Forecasting, International Journal of Electronic Business Management. 2011, 5(3): 211-224 [3] J. Bean, R by example: Mining Twitter for consumer attitudes towards airlines, In Boston Predictive Analytics Meetup Presentation, 2011 [4] Yauheniya Shynkevich, T.M. McGinnity, Sonya Coleman, Ammar Belatreche, Predicting Stock Price Movements Based on Different Categories of News Articles, 2015 IEEE Symposium Series on Computational Intelligence [5] P. Hofmarcher, S. Theussl, and K. Hornik, Do Media Sentiments Reflect Economic Indices? Chinese Business Review. 2011, 10(7): 487-492
  • 8. International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 3, June 2016 76 [6] R. Goonatilake and S. Herath, The volatility of the stock market and news, International Research Journal of Finance and Economics, 2007, 11: 53-65. [7] Spandan Ghose Chowdhury, Soham Routh , Satyajit Chakrabarti, News Analytics and Sentiment Analysis to Predict Stock Price Trends, (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (3) , 2014, 3595-3604 [8] Robert P. Schumaker, Yulei Zhang, Chun-Neng Huang, Sentiment Analysis of Financial News Articles [9] Győző Gidófalvi, Using News Articles to Predict Stock Price Movements, University of California, San Diego La Jolla, CA 92037, 2001 [10] L. Breiman, Random forests. Machine Learning, 45(1):5-32, 2001 [11] Data Mining Lab 7: Introduction to Support Vector Machines (SVMS) [12] oachims T., Text Categorization with Support Vector Machines: Learning with Many Relevant Features, European Conference on Machine Learning (ECML), Application of Machine Learning and Data mining in Finance, Chemnitz, Germany, 1998) [13] Kyoung-jae Kim, Financial time series forecasting using support vector machines, Neurocomputing 55 (2013) 307 – 319 [14] Pegah Falinouss, Stock Trend Prediction using News articles, The Lulea University of Technology, 2007 [15] https://en.wikipedia.org/wiki/Support_vector_machine [16] http://www3.nd.edu/~mcdonald/Word_Lists.html [17] https://jeffreybreen.wordpress.com/2011/07/04/twitter-text-mining-r-slides/ Authors Kalyani Joshi Student of Master in Engineering in at K. J. Somaiya College of Engineering, Mumbai. Completed Bachelors in Engineering from Pune University, 2013. Prof. Bharathi H. N. Currently working as Head of Department of Computer Engineering at K. J. Somaiya College of Engineering, Mumbai. Prof. Jyothi M. Rao Currently working as Associate professor and Associate head of Computer Engineering Department at K. J. Somaiya College of Engineering, Mumbai.