Regression-Based Microblogging Influence Detection ...€¦ · Regression-Based Microblogging...

8
Regression-Based Microblogging Influence Detection Framework for Stock Market Nanli Zhu 1, 2, 5 , Yibo Wang 3 , Cheng Cheng 3 , Wei Xu 3* , Yongping Zhang 2 , Ping Zou 1, 4, 5 , and M. S. K Awan 6 1. Faculty of Management and Economics, Kunming University of Science and Technology, Kunming, P. R. China 2. School of Electronic & Information, Ningbo University of Technology, Ningbo, P. R. China 3. School of Information, Renmin University of China, P. R. China 4. Yunnan Education Office, Kunming, P. R. China 5. Yunnan Normal University, Kunming, P. R. China 6. School of Computer Science & Informatics, Cardiff University, Cardiff, UK Email: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] AbstractMicroblogs and social networks have become a valuable resource for mining sentiments in various fields. The sentiments posted on the web have reportedly influenced the trading and investment decisions and activities taking place in the stock exchanges. In this study, we have investigated explored the effects of microblogs on Chinese Stock Market. We have particularly focused on whether measurements of collective mood states (sentiments and persuasions) derived from large-scale microblogging posts are correlated to the values of Chinese stock market over a period of time. We have proposed a Regression-Based Microblogging Influence Detection Scheme (RMIDS) as a framework to detect the influence of microblogging posts on stock market in this paper. Our results showed that sentiments in microblogging has significant influence on the market, while the persuasion posts trying to interfere the stock market, by taking the advantage of lack of regulations in microblogging, do not succeed. Index TermsData Mining; Sentiment Analysis; Stock Market; Microblogging; Social Networks I. INTRODUCTION SinaMircoblogging is a very popular Chinese microblogging platform, as influential in China as twitter is in US and rest of the world, where users can do microblog-ging on the topics of their interests. They comment on other’s post and repost, even communicate with them directly. Its user base has been growing exponentially since it’s launch in 2009. As of December 2013, there are more than 129 million monthly active users in SinaMircoblogging [1]. The popularity of microblogging has attracted more and more attention from academia [2]. Facilitated by the blooming of social media and the associated platform for user-generated contents, consumers increasingly turn to fellow online reviews instead of expert suggestions when purchasing nowadays [3]. Peer opinions have also begun to play a greater role in stock market. A report of 2008 shows that about 25% adults indirectly rely on investment advice transmitted via social media outlets [4]. Till 2013, it is reported that social media influences 70% of investors for personal finances and investments [5]. From an investment perspective, there are a few important questions that should be answered to better understand the effect of micro-blogging on investment decisions. These include: Do peer opinions in microblogging actually impart value-relevant news? Or do they merely constitute “random chatter”? What’s more, are some users taking advantage of the lack of regulations inherent in social media outlets and attempting to interfere the trend of the market by persuading fellow market participants? Behavioral finance suggests that stock market prices do not exactly follow a random walk and the Efficient Market Hypothesis (EMH), which means the price could be predicted to some extent. Behavioral finance has further proven that emotions and moods will drive financial decisions significantly [6, 7]. In order to predict stock market trend, time series and technical indicators have been suggested. For example, Liao and Wang suggested a stochastic time effective neural network for financial market prediction [8]. Similarly, Wang offered a fuzzy grey model to predict stock prices [9]. Refenes and Holt proposed technical indicators to forecast volatility with neural regression [10], while Rodriguez-Gonzalez et al. suggested an RSI indicator to improve trading systems using neural network [11]. Meanwhile, Hsu offered sixteen technical indicators to forecast stock market by using the SOM-GP procedure [12]. Furthermore, foundational indicators have been taken into consideration for financial market prediction. Kanas and Yannopoulos proposed several financial indicators to forecast stock market [13]. Olsonand Mossman offered various accounting ratios to forecast Canadian stock returns [14]. With the increase of web news and social media, web mining-based prediction methods have been carried out to enhance the performance of financial market prediction [15]. For instance, Schumaker and Chen developed a discrete stock price prediction engine based on financial JOURNAL OF NETWORKS, VOL. 9, NO. 8, AUGUST 2014 2129 © 2014 ACADEMY PUBLISHER doi:10.4304/jnw.9.8.2129-2136

Transcript of Regression-Based Microblogging Influence Detection ...€¦ · Regression-Based Microblogging...

Page 1: Regression-Based Microblogging Influence Detection ...€¦ · Regression-Based Microblogging Influence Detection Framework for Stock Market . Nanli Zhu 1, 2, 5, Yibo Wang 3, Cheng

Regression-Based Microblogging Influence

Detection Framework for Stock Market

Nanli Zhu 1, 2, 5

, Yibo Wang 3, Cheng Cheng

3, Wei Xu

3*, Yongping Zhang

2, Ping Zou

1, 4, 5, and M. S. K

Awan 6

1. Faculty of Management and Economics, Kunming University of Science and Technology, Kunming, P. R. China

2. School of Electronic & Information, Ningbo University of Technology, Ningbo, P. R. China

3. School of Information, Renmin University of China, P. R. China 4. Yunnan Education Office, Kunming, P. R. China

5. Yunnan Normal University, Kunming, P. R. China

6. School of Computer Science & Informatics, Cardiff University, Cardiff, UK

Email: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract—Microblogs and social networks have become a

valuable resource for mining sentiments in various fields.

The sentiments posted on the web have reportedly influenced the trading and investment decisions and

activities taking place in the stock exchanges. In this study,

we have investigated explored the effects of microblogs on

Chinese Stock Market. We have particularly focused on

whether measurements of collective mood states (sentiments

and persuasions) derived from large-scale microblogging

posts are correlated to the values of Chinese stock market over a period of time. We have proposed a Regression-Based

Microblogging Influence Detection Scheme (RMIDS) as a

framework to detect the influence of microblogging posts on

stock market in this paper. Our results showed that

sentiments in microblogging has significant influence on the

market, while the persuasion posts trying to interfere the

stock market, by taking the advantage of lack of regulations in microblogging, do not succeed. Index Terms—Data Mining; Sentiment Analysis; Stock

Market; Microblogging; Social Networks

I. INTRODUCTION

SinaMircoblogging is a very popular Chinese

microblogging platform, as influential in China as twitter

is in US and rest of the world, where users can do

microblog-ging on the topics of their interests. They comment on other’s post and repost, even communicate

with them directly. Its user base has been growing

exponentially since it’s launch in 2009. As of December 2013, there are more than 129 million monthly active

users in SinaMircoblogging [1].

The popularity of microblogging has attracted more

and more attention from academia [2]. Facilitated by the blooming of social media and the associated platform for

user-generated contents, consumers increasingly turn to

fellow online reviews instead of expert suggestions when

purchasing nowadays [3]. Peer opinions have also begun to play a greater role in stock market. A report of 2008

shows that about 25% adults indirectly rely on investment

advice transmitted via social media outlets [4]. Till 2013,

it is reported that social media influences 70% of

investors for personal finances and investments [5]. From an investment perspective, there are a few important

questions that should be answered to better understand

the effect of micro-blogging on investment decisions.

These include: Do peer opinions in microblogging actually impart value-relevant news? Or do they merely

constitute “random chatter”? What’s more, are some

users taking advantage of the lack of regulations inherent in social media outlets and attempting to interfere the

trend of the market by persuading fellow market

participants?

Behavioral finance suggests that stock market prices do not exactly follow a random walk and the Efficient

Market Hypothesis (EMH), which means the price could

be predicted to some extent. Behavioral finance has further proven that emotions and moods will drive

financial decisions significantly [6, 7].

In order to predict stock market trend, time series and

technical indicators have been suggested. For example, Liao and Wang suggested a stochastic time effective

neural network for financial market prediction [8].

Similarly, Wang offered a fuzzy grey model to predict

stock prices [9]. Refenes and Holt proposed technical indicators to forecast volatility with neural regression

[10], while Rodriguez-Gonzalez et al. suggested an RSI

indicator to improve trading systems using neural network [11]. Meanwhile, Hsu offered sixteen technical

indicators to forecast stock market by using the SOM-GP

procedure [12]. Furthermore, foundational indicators

have been taken into consideration for financial market prediction. Kanas and Yannopoulos proposed several

financial indicators to forecast stock market [13].

Olsonand Mossman offered various accounting ratios to

forecast Canadian stock returns [14]. With the increase of web news and social media, web

mining-based prediction methods have been carried out to

enhance the performance of financial market prediction [15]. For instance, Schumaker and Chen developed a

discrete stock price prediction engine based on financial

JOURNAL OF NETWORKS, VOL. 9, NO. 8, AUGUST 2014 2129

© 2014 ACADEMY PUBLISHERdoi:10.4304/jnw.9.8.2129-2136

Page 2: Regression-Based Microblogging Influence Detection ...€¦ · Regression-Based Microblogging Influence Detection Framework for Stock Market . Nanli Zhu 1, 2, 5, Yibo Wang 3, Cheng

news, and experimental results showed that the proposed system outperforms the market average and performs

well against existing quant funds [16]. Zhang et al. found

that emotional tweet percentage significantly negatively

correlated with Dow Jones, NASDAQ and S&P500 [17], which makes it possible to assume that public mood and

sentiment in the user-generated contents in

microblogging can drive stock market values. Das and Chen extracted sentiments from small talks on the web,

and examined the relationship between sentiment and the

stock values [18]. Bollenet al. found that the accuracy of

DJIA predictions can be significantly improved by the inclusion of specific public mood dimensions by Granger

causality analysis and Self-Organizing Fuzzy Neural

Network [19], and they suggested twitter mood as a stock market predictor to model the relationship between online

emotions and stock prices [20]. This raised questions

about the generality of the findings. Will it be the same in

other countries with totally different cultural background? Could the mood in microblogging be helpful for price

prediction in Chinese stock market? Furthermore, there

could be geographical and cultural sampling errors in

dealing the price of DJIA with information collected from Twitter.com. US stock markets are affected by

individuals worldwide, but twitter users, for the particular

period under observation, were predominantly English speaking and located in the US. What’s more, being an

open platform with contents generated by users, some

microblogging would strongly recommend readers to

make certain kinds of investment decisions due to lack of information regulation. If the microblogging posts with a

clear and persuasive purpose interfere readers' investment

decisions successfully? To consider these concerns, it

will be interesting to introduce the influence of persuading microblogging posts on stock market.

In this paper, we worked on the effectiveness of

sentiments and persuasion of microblogging on Chinese financial market. We collected posts from

SinaMircoblogging platform and stock prices from

HS300 index, whose users are mostly Chinese speaking

residing in China. A Regression-Based Microblogging Influence Detection Scheme (RMIDS) is proposed as the

framework to detect the influence of sentiments and

persuasion in microblogging posts on stock market.

II. THEORETICAL FOUNDATION

We used several regression algorithms to test our

research hypotheses, involving Linear Regression [21],

Sequential Minimal Optimization for Regression (SMOReg) [22], Gaussian Processes for Regression

(GaussianProcesses) [23] and Support Vector Regression

(SVR) [24, 25]. The following sub-sections give a brief

description of these techniques.

A. Linear Regression

Given a data set {yi, xi1, …, xip} where i=1, …, n, a lin-ear regression model consider that the relationship be-

tween the dependent variable as linear. The model is de-

fined as follows:

1 1 ··· , 1,...,T

i i p ip i i iy x x X i n (1)

where XiTβ is the inner product between vectors Xi

T and β,

and εi is an unobserved random variable that adds noise to

the linear relationship between yi and xi. In the algorithm,

we used the Akaike criterion for model selection.

B. Sequential Minimal Optimization for Regression

Sequential minimal optimization (SMO) is an algorithm which solves the quadratic programming(QP)

problem during the training of support vector machines.

SMOReg implements the support vector machine for

regression. SMO selects a Lagrange multiplier α1 which does not confirm to the Karush–Kuhn–Tucker (KKT)

conditions. Then another multiplier α2 is picked and used

to optimize the pair (α1, α2) and repeat until all the

Lagrange multipliers satisfy the KKT conditions. Let Ei denote the output error on the ith pattern, β denote the

threshold parameter. The SMO algorithm employs a two

loop approach: the outer loop chooses α2; for a chosen α2

the inner loop chooses α1. The outer loop iterates over all

patterns violating the optimality conditions, first only

over those with Lagrange multipliers neither on the upper

nor on the lower boundary, and once all of them are satisfied, over all patterns violating the optimality

conditions to ensure that the problem has indeed been

solved. For efficient implementation a cache for is

maintained and updated for those indices corresponding to non-boundary Lagrange multipliers. The remaining are

computed as and when needed. Then the regression

process is finished.

C. Gaussian Processes for Regression

A Gaussian process is a stochastic process. A

stochastic process is a collection of {Y(x)| x∈X}, where

X is the input space with dimension d(the number of

inputs). The stochastic process is specified by giving the probability distribution for every finite subset of variables

Y(x(1)), …, Y(x(k)) in a consistent manner. A Gaussian

process which can be specified by its mean function μ(x)= E [Y(x)] and its covariance function

( , ') [ ( ) ( ))( ( ') ( '))]C x x E Y x x Y x x (2)

In the Gaussian process, each finite linear combination of instances has a joint Gaussian distribution.

Furthermore, linear functions which are applied to the

function will give a normally distributed result. In this

paper, we apply Gaussian Processes for regression.

D. Support Vector Regression

The basic idea of SVR is to find a function that ap-proaches the training points by minimizing the prediction

error. In SVR all deviations up to a specified parameter

are discarded. For the linear situation, the support vector regression function is defined as:

(i)i

i

x b a a (3)

where a refers to the support vector. Furthermore, a kernel function, such as radial basis function (RBF)

kernel is used for nonlinear problems.

2130 JOURNAL OF NETWORKS, VOL. 9, NO. 8, AUGUST 2014

© 2014 ACADEMY PUBLISHER

Page 3: Regression-Based Microblogging Influence Detection ...€¦ · Regression-Based Microblogging Influence Detection Framework for Stock Market . Nanli Zhu 1, 2, 5, Yibo Wang 3, Cheng

Sentiment Analysis

Persuasion Analysis

Text spliting

Model Training

Sina Microblog

Text Dataset

Wind HS300 Dataset

Sentiment Indices

Persuasion indices

HS 300 Time Series

Prediction Model

Sentiment WordsDictionary

Persuasion WordsDictionary

Dataset

Data Preprocessing

Model Training

Dictionary

DOM-BasedSpider

Figure 1. Diagram of Regression-Based Microblogging Influence Detection Scheme (RMIDS)

III. REGRESSION-BASED MICROBLOGGING INFLUENCE

DETECTION SCHEME

In this part, Regression-Based Microblogging Influence Detection Scheme (RMIDS) is proposed as a

framework to detect the influence of microblogging posts

on stock market in China. Six models are constructed for

evaluating the stock price prediction power of involving sentiment analysis and persuasion analysis. In the first

phase, a persuasion words dictionary is constructed

through experts consultation, and a sentiment words dictionary is also retrieved from related researches [26].

The whole evaluation work is based on these two

dictionaries. In the second phase, microblogging data

related to HS300 index are collected from SinaMircoblogging by a spider with a DOM (Document

Object Model) based algorithm [27]. In addition,HS300

time series data are downloaded from Financial Terminal

of Wind Information. In the third phase, potential influential attributes, which include sentiment related

words and persuasion related words, are extracted from

the data we collected, and being used for constructing 6 different models for comparison. In the last phase, each

model is compared and analyzed for evaluating the

prediction power of including sentiment and persuasion

attributes. The framework of RMIDS is illustrated in Fig. 1.

To be specific, the whole work can be divided into the

following four steps:

Step1: Through expert consulting, a persuasion words

dictionary is constructed with those strong

recommendatory words in investment. All the words in this dictionary are those which stimulate the public to buy

or sell the stocks, such as “will go up” and etc.

Step 2: After the collection of HS300 index data,

related microblogging data is also collected from Internet. Specifically, we’ve retrieved more than 170,000

microblogging posts related to HS300 from 2011 to 2013

from SinaMircoblogging platform by a DOM based spider programming in java.

Step 3: ICTCLAS text split system is then used to split

all the microblogging posts we’ve collected. Through

matching spilt text with sentiment word dictionary and persuasion word dictionary, we are able to calculate the

number of positive sentiment words, number of negative

sentiment words and number of strong persuasion words

in each post of microblogging. By grouping the daily posts according to whether there is obvious persuasion

and calculating the sum of positive sentiment and sum of

negative sentiment for each group, we are capable of calculating six proposed attributes--positive sentiment of

persuasion group (PoP), negative sentiment of persuasion

group (NoP), positive sentiment of non-persuasion group

(PoNP), negative sentiment of non-persuasion group (NoNP), positive sentiment sum (PS) and negative

sentiment sum (NS).

Step 4: Along with the afore-mentioned six attributes and HS300 time series, we constructed six different

models for exploring the stock price prediction power of

JOURNAL OF NETWORKS, VOL. 9, NO. 8, AUGUST 2014 2131

© 2014 ACADEMY PUBLISHER

Page 4: Regression-Based Microblogging Influence Detection ...€¦ · Regression-Based Microblogging Influence Detection Framework for Stock Market . Nanli Zhu 1, 2, 5, Yibo Wang 3, Cheng

involving sentiment and persuasion words as attributes in the model. The six models(where yt refers to the closing

price of day t, yt-1 refers to the closing price of the day

before t) are as follows.

Model A -The model predicts stock price yt with PoP, NoP, PoNP, NoNP and stock price yt-1 as attributes.

Model B - The model predicts stock price yt with PoP,

NoP and stock price yt-1 as attributes. Model C -The model predicts stock price yt with PoNP,

NoNP and stock price yt-1 as attributes.

Model D -The model predicts stock price yt with PS,

NS and stock price yt-1 as attributes. Model E -The model predicts stock price yt with stock

price yt-1 as attribute.

Model F -The model predicts stock price yt with stock price yt-1 and stock price yt-2 as attributes.

Eventually, by comparison and analysis of the

prediction performance of each model, the evaluation and

comprehension of the power of sentiment analysis and persuasion analysis of microblogging influence on

investors in stock market could be done.

IV. RESULTS

A. Data Description and Evaluation Criteria

To evaluate the stock price prediction power of

involving sentiment analysis and persuasion analysis, we conducted a series of experiments. To ensure the

robustness and effectiveness of the experiments, we used

a large microblogging dataset which amounted to more than 170,000 posts retrieved from SinaMircoblogging

platform during the period of1st Jan. 2011 - 31th Dec.

2013, as well as stock dataset ranking in the same period.

Along with the microblogging data, we calculated the six attributes--PoP, NoP, PoNP, NoNP, PS and NS. Then

the six models(Model A-F)were constructed with the

attributes and the HS300 time series. Specifically, the six models can be divided into three groups according to the

attributes, Model A, B, and C verify the stock price

prediction power of involving persuasion analysis while

Model D provides contrast with sentimental analysis, and Model E and F only involves stock price attributes.

Furthermore, four criteria are used to assess the

performance of persuasion analysis on stock price

prediction. The Mean absolute error (MAE) is used to measure how close predictions are to the actual values.

The root mean squared error (RMSE) measures the

difference between the predictions and the actual values. They have been used as

1

1 n

i ii

MAE f yn

(4)

2n

i i

i=1

1RMSE = f - y

n (5)

The relative absolute error (RAE) is the sum of the absolute error of the predictions and the actual values is

then divided by the sum of the difference between the

actual valuesand the mean of them, which is defined as

1

1

n

i i

i

n

j

j

f y

RAE

y y

(6)

The root relative squared error (RRSE) is the root of relative squared error (RSE), which takes the squared

error and normalizes it by dividing with the squared error

of the actual values. This criteria is defined as following:

2

1

2

1

n

i i

i

n

i

j

f y

RRSE

y y

(7)

In (4), (5), (6), (7), n refers to the number of instances,

fi refers to the prediction, yi refers to the true value, y refers to the mean value of true value.

B. Experimental Results

Persuasion attributes and sentiment attributes are

generated from the microblogging data through

persuasion analysis and sentiment analysis. To compare the influence of persuasion and sentiments on stock price,

we conducted six experiments with different attributes

derived from original microblogging data. Then the six

aforementioned models are tested by several regression models, including Linear Regression, SMOReg,

GaussianProcesses and SVR. The empirical results are

listed in TABLE I, TABLE II, TABLE III, and TABLE

IV. In Fig. 2, we compared the MAE of Linear Regression,

SMOReg, GaussianProcesses and SVR with linear kernel.

The performance of GaussianProcesses on the six models is not as stable as the other three methods. What’s more,

the MAE of Linear Regression, SMOReg and SVR with

linear kernel are in the same level, which is better than

the MAE of GaussianProcesses.

Figure 2. The MAE of four regression algorithms on six models

The performance of the RMSE of Linear Regression,

SMOReg, GaussianProcesses and SVR with linear kernel

are shown in Fig. 3. The RMSE of Linear Regression, SMOReg and SVR with linear kernel are in the same

2132 JOURNAL OF NETWORKS, VOL. 9, NO. 8, AUGUST 2014

© 2014 ACADEMY PUBLISHER

Page 5: Regression-Based Microblogging Influence Detection ...€¦ · Regression-Based Microblogging Influence Detection Framework for Stock Market . Nanli Zhu 1, 2, 5, Yibo Wang 3, Cheng

TABLE I. THE MAE FROM EXPERIMENTAL RESULTS

Methods Model

A B C D E F

Linear Regression 28.14 28.06 28.13 27.99 28.14 28.13

SMOReg 28.06 28.10 28.11 27.93 28.11 28.13

GaussianProcesses 32.04 31.44 31.14 31.31 32.89 30.58

SVR(Linear) 28.21 28.11 28.20 28.16 28.09 28.19

TABLE II. THE RMSE FROM EXPERIMENTAL RESULTS

Methods Model

A B C D E F

Linear Regression 38.34 38.32 38.55 38.15 38.66 38.62

SMOReg 38.33 38.41 38.57 38.23 38.70 38.67

GaussianProcesses 44.08 43.51 41.97 42.74 43.45 41.10

SVR(Linear) 38.52 38.48 38.63 38.57 38.62 38.74

TABLE III. THE RAE FROM EXPERIMENTAL RESULTS

Methods Model

A B C D E F

Linear Regression 9.30% 9.27% 9.30% 9.25% 9.32% 9.30%

SMOReg 9.27% 9.29% 9.29% 9.23% 9.31% 9.30%

GaussianProcesses 10.59% 10.39% 10.29% 10.35% 10.89% 10.11%

SVR(Linear) 9.33% 9.29% 9.32% 9.31% 9.30% 9.32%

level, which is better than the RMSE of

GaussianProcesses.

In Fig. 4, we used RAE to compare the performance of Linear Regression, SMOReg, GaussianProcesses and

SVR with linear kernel. The RAE from the results of

Linear Regression, SMOReg and SVR with linear kernel

on six models is similar to each other, while GaussianProcesses performs worse than them.

As can be seen from Fig. 5, we compared the RRSE of

Linear Regression, SMOReg, GaussianProcesses and

SVR with linear kernel. The RRSE of Linear Regression, SMOReg and SVR with linear kernel are in the same

level, which is better than the RRSE of

GaussianProcesses.

Figure 3. The RMSE of four regression algorithms on six models

To conclude, the RRSE of Linear Regression,

SMOReg and SVR with linear kernel are in the same

level and perform better than GaussianProcesses in all the four criteria. Based on this conclusion, we checked the

influence of sentiment analysis and persuasion analysis of

SinaMircoblogging based on these three regression

algorithms.

Figure 4. The RAE of four regression algorithms on six models

Figure 5. The RRSE of four regression algorithms on six models

It is shown in Fig. 6, Fig. 7, Fig. 8, and Fig. 9 that

sentiment analysis (Model D) could reduce the errors of

stock price prediction stably, while strong persuasion (Model A, B, C) may cause more noise.

JOURNAL OF NETWORKS, VOL. 9, NO. 8, AUGUST 2014 2133

© 2014 ACADEMY PUBLISHER

Page 6: Regression-Based Microblogging Influence Detection ...€¦ · Regression-Based Microblogging Influence Detection Framework for Stock Market . Nanli Zhu 1, 2, 5, Yibo Wang 3, Cheng

TABLE IV. THE RRSE FROM EXPERIMENTAL RESULTS

Methods Model

A B C D E F

Linear Regression 10.76% 10.75% 10.81% 10.70% 10.87% 10.83%

SMOReg 10.75% 10.77% 10.82% 10.73% 10.88% 10.85%

GaussianProcesses 12.37% 12.20% 11.77% 11.99% 12.22% 11.53%

SVR(Linear) 10.81% 10.80% 10.84% 10.82% 10.86% 10.87%

Figure 6. The influence of persuasion and sentiment measured by MAE

Figure 7. The influence of persuasion and sentiment measured by RMSE

Figure 8. The influence of persuasion and sentiment measured by RAE

Figure 9. The influence of persuasion and sentiment measured by RRSE

2134 JOURNAL OF NETWORKS, VOL. 9, NO. 8, AUGUST 2014

© 2014 ACADEMY PUBLISHER

Page 7: Regression-Based Microblogging Influence Detection ...€¦ · Regression-Based Microblogging Influence Detection Framework for Stock Market . Nanli Zhu 1, 2, 5, Yibo Wang 3, Cheng

This could be a good news that those who posted strongly recommending posts on purpose were unable to

achieve the interference of the moving of stock market as

they wanted. This could be a good news that those who

post strongly recommending posts on purpose could not achieve the interference of the moving of stock market as

they wanted.

V. DISCUSSION

In this paper, we investigated whether public mood, as

measured from large-scale collection of microblogging

posts on weibo.com, has influential power on stock

values in Chinese market. Our results show that changes in the public mood state can indeed be tracked from the

content of large-scale microblogging posts by means of

text processing techniques. It is shown that the sentiments in microblogging do influence stock prices in the stock

market, as the prediction errors could be stably reduced

with the involvement of sentiment analysis from

microblogging posts collected in the according period. While the persuasion analysis introduced noise into the

prediction, which means those posts try to interfere the

stock market by taking the advantage of lack of

regulations in microblogging and do not succeed. Further work on how the mood in microblogging

influence the stock market and the causative mechanisms

that may connect online public mood states with stock values could be done in the future.

ACKNOWLEDGMENT

This work was supported by the National Natural

Science Foundation of China (Grant No.71362024, 71362016), Information Technology Education Project in

the Twelfth Five-Year Guideline of China (No.

126240685), Science Foundation of Yunnan Education

Office (No. 2012J086), and Science Foundation of Ministry of Education of China (No. 13YJC630210).

REFERENCES

[1] http://it.21cn.com/itnews/a/2014/0315/10/26702861.shtml, 2014.

[2] J. Sang, C. Xu. "Faceted Subtopic Retrieval: Exploiting the Topic Hierarchy via a Multi-modal Framework." Journal of Multimedia, vol. 7, no. 1, 2012.

[3] A. Garcia-Crespo, R. Colomo-Palacios, J. Miguel Gomez-Berbis, B. Ruiz-Mezcua. "SEMO: a framework for cus-tomer social networks analysis based on semantics." Jour-nal of Information Technology, vol. 25, no. 2, pp. 178-188, 2010.

[4] C. Research. Social media's impact on personal finance & investing, Available from: http://www.cogentresearch.com, 2008.

[5] C. Langlois. Social Media Influences 70% Of Investors for Personal Finance and Investing [REPORT], Available from:http://www.visiblebanking.com, 2013.

[6] J. S. Kim, D. Ryu, S. W. Seo. "Investor sentiment and return predictability of disagreement." Journal of Banking & Finance, vol. 42, pp. 166-178, 2014.

[7] G. Rubbaniy, R. Asmerom, S. K. A. Rizvi, B. Naqvi. "Do fear indices help predict stock returns?." Quantitative Fi-nance, vol. 14, no. 5, pp. 831-847, 2014.

[8] Z. Liao, J. Wang. "Forecasting model of global stock index by stochastic time effective neural network." Expert Sys-tems with Applications, vol. 37, no. 1, pp. 834-841, 2010.

[9] Y. F. Wang. "Predicting stock price using fuzzy grey pre-diction system." Expert Systems with Applications, vol. 22, no. 1, pp. 33-38, 2002.

[10] A. P. Refenes, W. T. Holt. "Forecasting volatility with neural regression: A contribution to model adequacy." Neural Networks, IEEE Transactions on, vol. 12, no. 4, pp. 850-864, 2001.

[11] C. M. Hsu. "A hybrid procedure for stock price prediction by integrating self-organizing map and genetic program-ming." Expert Systems with Applications, vol. 38, no. 11, pp. 14026-14036, 2011.

[12] A. Kanas, A. Yannopoulos. "Comparing linear and nonlin-ear forecasts for stock returns." International Review of Economics & Finance, vol. 10, no. 4, pp. 383-398, 2001.

[13] D. Olson, C. Mossman. "Neural network forecasts of Ca-nadian stock returns using accounting ratios." International Journal of Forecasting, vol. 19, no. 3, pp. 453-465, 2003.

[14] A. Rodríguez-González, Á. García-Crespo, R. Colomo-Palacios, F. Guldrís Iglesias, J. M. Gómez-Berbís. "CAST: Using neural networks to improve trading systems based on technical analysis by means of the RSI financial indica-tor." Expert Systems with Applications, vol. 38, no. 9, pp. 11489-11500, 2011.

[15] W. Xu, T. Li, B. Jiang, C. Cheng. "Web Mining For Finan-cial Market Prediction Based On Online Sentiments," in PACIS. 2012, pp. 43, 2012.

[16] R. P. Schumaker, H. Chen. "A quantitative stock prediction system based on financial news." Information Processing & Management, vol. 45, no. 5, pp. 571-583, 2009.

[17] X. Zhang, H. Fuehres, P. A. Gloor. "Predicting Stock Mar-ket Indicators through Twitter “I hope it is not as bad as I fear”." Procedia - Social and Behavioral Sciences, vol. 26, no. 0, pp. 55-62, 2011.

[18] S. R. Das, M. Y. Chen. "Yahoo! for Amazon: Sentiment extraction from small talk on the web." Management Sci-ence, vol. 53, no. 9, pp. 1375-1388, 2007.

[19] J. Bollen, H. Mao, X. Zeng. "Twitter mood predicts the stock market." Journal of Computational Science, vol. 2, no. 1, pp. 1-8, 2011.

[20] J. Bollen, H. Mao. "Twitter Mood as a Stock Market Pre-dictor." Computer, vol. 44, no. 10, pp. 90-93, 2011.

[21] G. A. Seber, A. J. Lee. Linear regression analysis. John Wiley & Sons, 2012.

[22] S. K. Shevade, S. S. Keerthi, C. Bhattacharyya, K. R. K. Murthy. "Improvements to the SMO algorithm for SVM regression." Neural Networks, IEEE Transactions on, vol. 11, no. 5, pp. 1188-1193, 2000.

[23] D. J. MacKay. Introduction to Gaussian processes. Cam-bridge University, 1998.

[24] A. J. Smola, B. Schölkopf. "A tutorial on support vector regression." Statistics and computing, vol. 14, no. 3, pp. 199-222, 2004.

[25] H. Drucker, C. J. Burges, L. Kaufman, A. Smola, V. Vapnik. "Support vector regression machines." Advances in neural information processing systems, vol. 9, pp. 155-161, 1997.

[26] J. Li, M. Sun. "Experimental study on sentiment classifica-tion of Chinese review using machine learning tech-niques," in Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Confer-ence on. IEEE, 2007, pp. 393-400.

[27] N. Zhu, J. Sun, J. Lou, H. Wang. "Study on the Framework Incorporating Feedback into Experiment Ancillary Information Extraction and Integration Programs."

JOURNAL OF NETWORKS, VOL. 9, NO. 8, AUGUST 2014 2135

© 2014 ACADEMY PUBLISHER

Page 8: Regression-Based Microblogging Influence Detection ...€¦ · Regression-Based Microblogging Influence Detection Framework for Stock Market . Nanli Zhu 1, 2, 5, Yibo Wang 3, Cheng

Manufacturing Automation, vol. 33, no. 006, pp. 85-87, 2011(in Chinese).

Nanli Zhu is pursuing the Ph.D. degree at Kunming University of Science and Technology. She received the Master degree in Computer Application Tech-nology from Kunming University of Science and Technology in 2006. From 2006 to 2011, she was a lecturer at Ningbo University of Technology. She was a Senior Experimentalist at Ningbo

University of Technology in 2011. Her research interests in-clude information systems, web mining , behavioral finance and decision support systems.

Cheng Cheng received the Master degree from Renmin Uni-versity of China. His interests include data mining, business intelligence and decision support systems.

Yibo Wang is pursuing the Master degree at Renmin University of China. His interests include business intelligence and deci-sion support systems.

Wei Xu is an associate professor at School of Information, Renmin Univer-sity of China. He is a research fellow at Department of Information Systems, City University of Hong Kong. He got his Ph.D. degree in Management Science at Chinese Academy of Sciences. He has published over 50 papers in international journals and conferences, such as Deci-

sion Support Systems, European Journal of Operational Re-search, IEEE Trans. Systems, Man and Cybernetics, and Fuzzy Sets and Systems. His interests include web mining, business intelligence and decision support systems.

Yongping Zhang received the Ph.D. degree from Xi'an Jiao Tong University. From 2001 to 2003 he was a postdoctor-al fellow at Massey University. From 2003 to 2005 he was employed as a re-search fellow at The University of Auck-land. From 2006 to 2007 he was a re-searcher at National University of Sin-gapore. He has been a full professor in

computer science at Ningbo University of Technology since 2007. His interests include intelligent information processing, computer vision, image processing, and data mining.

Ping Zou has been a professor at Kun-ming University of Science and Tech-nology since 1996. From 1993 to 1994 he was a senior visiting scholar at Uni-versity of Karlsruhe. He is the vice dean of Yunnan Education Office. His inter-ests include information systems, opera-tions research and management decision-making.

Malik Shahzad Kaleem Awan is a researcher at the School of Computer Science & Informatics, Cardiff Universi-ty, UK. He did masters in software engi-neering from the University of Limerick and a master in computer science from Lahore University of Management Sci-ence (LUMS). He did Ph.D. in computer science from the University of Warwick,

UK in 2013. In addition, he carries more than 5 years of indus-trial experience in the area of intelligent systems. His research interests include: cyber security, data mining, performance evaluation and modeling, distributed computing and applied artificial intelligence.

2136 JOURNAL OF NETWORKS, VOL. 9, NO. 8, AUGUST 2014

© 2014 ACADEMY PUBLISHER