Popularity Prediction of Online News Based on Radial Basis...
Transcript of Popularity Prediction of Online News Based on Radial Basis...
Popularity Prediction of Online News Based on Radial Basis Function
Neural Networks with Factor Methodology
WU Wei1,2, DU Wencai*2,3, XU Hongzhou1, ZHOU Hui2, HUANG Mengxing2
1 Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, Sanya 572000, China 2 College of Information Science and Technology, Hainan University, Haikou 570228, China
3 Faculty of International Tourism and Management, City University of Macau, Macau 999078, China
[email protected], [email protected], [email protected], [email protected], [email protected]
Abstract Online news reflects the dramatically increasing trend
of social network use. Understanding what type of online
news is popular and easy to spread to the public is a
valuable focus for media influence analysis and social
marketing. By abstracting detailed characteristics of online
news, important influential factors are selected from
diverse variables according to the principle component
method and function approximation. In consideration of
the high-dimensionality of the popularity ranking model,
back-propagation neural networks (BPNN) was employed
to predict popularity using artificial neural networks. The
simulation results compare various forecasting methods
based on factors achieved in previous work. This provides
an effective prediction model according to real situations,
with an accuracy level of 95%.
Keywords: online news popularity; back-propagation
neural networks; factor analysis; model identification;
neural network prediction.
1 Introduction
Online news dominates web resources and continues to
consolidate its superior status. It is easy to produce and to
distribute online news in user-friendly ways at low cost; as
a result, the internet is full of articles written by numerous
authors, whose works suffer from comprehensive favor
ranking by viewers, i.e., popularity rating. These rating
systems are accounted for by comment columns or
feedback access from users via personal computers, tablets,
and mobile phones, including text messages, emoticons,
and sharing mechanisms [1]. The number of article hits is
not well-accepted as a measure of real situations because it
can be a misleading number affected by web crawlers or
search engines. A widely-accepted method is to focus on
the sharing times of the article, because this can represent
its general influence. This statistical idea is widely
accepted by mass media and advertising companies. For
website managers, blogs, Twitter, or news applications, it
is important to understand what type of online news is
popular and easy to disseminate to the public, particularly
in order to execute proper advertising activities and
distribute specific content in more effective ways [2].
Therefore, the primary aim of this research is to determine
what type of online news enjoys the greatest popularity
and shares in public (or specific groups) in order to capture
the characteristics of news and to build a connection model
between these factors and popularity ratings.
The remainder of this paper is organized as follows: in
Section 2, research progress relevant to this issue is
introduced, including several inspiring achievements and
models. Section 3 presents the proposed dataset
description and model structure. The simulation results
and discussion are presented in Section 4. Conclusions and
some possible future research directions are provided in
Section 5.
2 Related Works The internet represents an industrialized concept
growing as a result of impetus from an industrial chain.
Online news is quickly delivered between parties via this
capable carrier. Sharing on Facebook can share more than
2,600,000 i/min (items per minute), sharing video links on
VINE can reach approximately 8,333 i/min, and sharing
links on Twitter can share approximately 300,000 i/min.
Online news tracking with real-time coverage prevails
over traditional physical media. Understanding how the
public respond to specific issues (the popularity of a
specific article) is a burgeoning research branch.
Many encouraging media tracking and prediction
models have been achieved from a time-based analysis
perspective, and some researchers have posited that
popularity rating is time-sensitive: numbers of YouTube
video viewers fluctuate within 24 hours, and sharing times
constantly vary [3]. U.S. online newspapers indicate a
quality of hysteresis when compared to user activities.
Structural equations corresponding to this phenomenon
can be determined
[4]. A survival model is applied to the
lifetime analysis of online news by setting a threshold for
comparison computing over one week (an observation
period of less than seven days)[5].
Characteristics analysis is another exciting direction
for social news analysis, which focuses on the news itself
(represented by global sharing) and eliminates other
possible influential factors such as time, comments, and
emoticons. It is a reasonable approach because sharing is
continuously increasing, its impact on the public fluctuates
day-by-day, and it will exceed certain time limits (several
hours or a couple of days). Moreover, this research aspect
is diverse due to the multiplicity of media formats. Video
clips, music albums, and text news are spreading at various
speeds [6, 7]. Particularly in regard to online news prediction,
an article is comprised of many characteristics, leading
some experts to forecast trends by dividing it into smaller
unites such as subject, content sensitivity, and semantic
networks, so that the news contains more local variables
for prediction. This method is valuable because it can
capture detailed information in one piece of news.
Mathematical methods of online news popularity
prediction are evolving from simple function regressions
to intelligent identification. Variable prediction accuracy
is not easy to control due to system characteristics and
sampling datasets. For a stable media company, variable
prediction accuracy indicates gradual data trends, which
provide an effective basis for function approximation;
typical examples are shown in [8]. However, in most
circumstances, due to the uncertain system status of users
and operations, online media prediction models are
nonlinear; thus, there is a need to develop better
identification mechanisms, including exponential
functions, differential algorithms, and support vector
regressions[9]. However, a significant problem is that the
complexity of advanced prediction methods requires
advanced computing facilities, which may provide
unattainable solutions [10, 11].
Based on the discussion above, it is clear that online
news prediction is a complex compromise between
variable selection and computing modes. Achieving
acceptable prediction results requires elaborate efforts
including: a) data preparation and data cleaning, which
require experimental and data processing techniques; b)
selection of representable independent variables from
irrelevant ones; c) confirmation of suitable models or
algorithms; and d) adjustment of simulation algorithm
adaptability for different circumstances.
3 Popularity Prediction Modeling In this section, datasets are introduced and the overall
structure of a popularity prediction model with neural
networks based on factor methodology is proposed.
3.1 Dataset Description
This dataset summarizes a heterogeneous set of
features of articles published by Mashable over a period of
two years, The dataset is an open source donation in
machine learning databases of UC Irvine Machine
Learning Repository. The goal is to predict the number of
shares in social networks (popularity). A rough description
is provided in Table 1, and some variables are illustrated in
Figs. 1 through 4.
Fig. 1. Variable descriptions (1rd to 13th 39,797 samples): (a) n_tokens_title#1 represents the number of words in the title; (b)n_tokens_content#2 represents the number of words in the content; (c) n_unique_tokens#3 represents the rate of unique words in the content; (d) n_non_stop_words#4
represents the rate of non-stop words in the content, and n_non_stop_unique_tokens#5 represents the rate of unique non-stop words in the content; (e)
num_hrefs#6 represents the number of links; (f) num_self_hrefs#7 represents the number of links to other articles published by Mashable; (g) num_imgs#8
represents the number of images; (h) num_videos#9 represents the number of videos; (i) average_token_length#10 represents the average length of the
words in the content; (j) num_keywords#11 represents the number of keywords in the metadata; (k) rate_positive_words#12 represents the rate of positive words among non-neutral tokens; (l) rate_negative_words#13 represents the rate of negative words among non-neutral tokens.
Fig. 2. Variable descriptions (14th to 32th 39,797 samples): (a) Data channel is lifestyle#14, entertainment#15, business#16, socmed#17, tech#18 and world#19. (b)
Article published date Monday#20, Tuesday#21, Wednesday#22, Thursday#23, Friday#24, Saturday#25, Sunday#26 and Weekend#27. The answer is negative/no
when the x value is less than 1, the answer is positive/yes when the x value is greater than 1. (c) The article closeness to LDA topics of LDA_00#28,
LDA_01#29, LDA_02#30, LDA_03#31, LDA_04#32.
Fig. 3. Variable descriptions (33th to 48th 39,797 samples): (a) kw_min_min#33 represents the worst keyword (minimum shares); (b) kw_max_min#34 represents the best keyword (maximum shares); (c) kw_avg_min#35 represents the worst keyword (average shares); (d) kw_min_max#36 represents the best
keyword (minimum shares); (e) kw_max_max#37 represents the best keyword (maximum shares); (f) kw_avg_max#38 represents the best keyword (average shares); (g) kw_min_avg#39 represents the average keyword (minimum shares); (h) kw_max_avg#40 represents the average keyword (maximum shares); (i)
kw_avg_avg#41 represents the average keyword (average shares); (j) self_reference_min_shares#42 represents the minimum shares of referenced articles in
Mashable; (k) self_reference_max_shares#43 represents the maximum shares of referenced articles in Mashable; (l) self_reference_avg_sharess#44
represents the average shares of referenced articles in Mashable.(m) global_subjectivity#45 represents text subjectivity ; (n) global_sentiment_polarity#46
represents text sentiment polarity; (o) global_rate_positive_words#47 represents the rate of positive words; (p) global_rate_negative_words#48 represents the rate of negative words in the content.
Fig. 4. Variable descriptions (49th to 58th 39,797 samples): (a) avg_positive_polarity#49 represents the average polarity of positive words; (b) min_positive_polarity#50 represents the minumum polarity of positive words; (c) max_positive_polarity#51 represents the maximum polarity of positive
words; (d) avg_negative_polarity#52 represents the average polarity of negative words; (e) min_negative_polarity#53 represents the minimum polarity of negative words ; (f) max_negative_polarity#54 represents the maximum polarity of negative words; (g) title_subjectivity#55 and abs_title_subjectivity#56
represent the title subjectivity and absolute subjectivity level of the article, respectively; (h) title_sentiment_polarity#57 and abs_title_sentiment_polarity#58
represent the title polarity and absolute polarity level of the article, respectively.
The dataset in Table 1 contains 61 attributes, with 58
predictive attributes and 1 target value; the dataset is an
open source file provided by the UCI Machine Learning
Repository. In this file, data with missing or incorrect
values is absent, so the major priority accomplished by
data cleaning is negligible. In regard to an excess of
sparese data (0 or 1) representing the published news date,
each specific data point is non-removable. As shown in
Fig. 1, attributes such as n_tokens_title#1, average_token_length#10,
num_keywords#11, rate_positive_words#12, and rate_negative_words#13
indicate that the maximun share is achieved when these
factors reach specific appropriate values, neither too large
nor too small. Some variables such as n_tokens_content#2,
num_hrefs#6, num_self_hrefs#7, num_imgs#8, and num_videos#9 can be
simulated as F-distributions. As shown in Fig. 2, the
published date and data channel are discrete attributes; this
type of data is acceptable for artificial model simulation.
3.2 Problem Formulation & System Structure
As demonstrated in the previous section, let the 1st to
58th variables form a parameter matrix P(39,769×58), and let
shares form the target matrix T(39,769×1). The system
function is then determined as follows:
(1)
System F is the given form of true transmission from P to
T (data pairs). The primary research goal is to build an
approximate and robust Fζ to replace the unknown real
system with an acceptable error ε; this error can be
calculated according to M with Euclidean distance,
mean-squared error, and many effective equations, and all
processes are expected to finish in limited time t0.
(2)
Let a temporal data sequence Sζ be produced by system
Yζ, in which Yζ is measurable and bounded, Yζ∈ Ω, Ω is a
Compact set, and the system status is represented by
regression deterministic tracking [12].
(3)
With regard to the aforementioned recorded time series,
news sharing T is a bounded series, and all variables vary
within limited bounds, i.e., let Tζ → T, T = [t(1),t(2),...,t(n)]
∈ Rm×n, where Tζ=[tζ(1), tζ(2),..., tζ(m)], i = 1,2,...,m.
Model prediction methods according to offline training
is a strong basis for online forecasting [13]. Online
prediction models are often derived from offline models,
and rely heavily on training and preliminary results. The
research goal is to predict the popularity of news with a
given dataset in an offline model design, which provides a
useful structure for further advancements. In regard to the
collection of 39,797 samples with 61 attributes, they can
be divided into two parts: let the majority serve as training
items and the remaining samples be used as testing
instances.
3.3 Principle Component Abstracting of Dataset
The matrix P contains 58 variables related to online
news, but some may not be necessary for system
simulation. As shown in Figs. 1(c) and 1(d),
n_unique_tokens#3, n_non_stop_words#4, and
n_non_stop_unique_tokens#5 represent sharing with identical
trends, therefore indicating similar features in most
circumstances. This phenomenon is not rare in this matrix:
the kw_max_min#34 and kw_avg_min#35 in Figs. 3(b) and 3(c);
self_reference_min_shares#42, self_reference_max_shares#43, and
self_reference_avg_sharess#44 in Figs. 3(j), 3(k), and 3(l). For
modeling systems, these variables convey specific
influences on shares, but there is no need to include them
all at the expense of algorithm convergence speed.
Removal of repetitive variables from P(39,769×58) is a
preliminary procedure before further investigation and
modeling, which requires elimination of irrelevant and
similar parameters.
Principal component analysis (PCA) is a mathematical
method designed to reassemble independent variables x1,
x2, xp from a full matrix X, effective for dimensionality
reduction. For an observation dataset with p variables, X
can be arranged as follows:
(4)
Where xj = (x1j, x2j, xnj) T, j=1, 2, p. Principal component
transform can be expressed as follows:
(5)
In all linear combinations of xj, Fi is the first principal
component because it effectively conveys information
with maximum variance value; it requires the following:
(6)
The operation details of the current research include the
following: let F0 represent knowledge from P(39,769×58) fully,
so that:
(7)
Step 1: Normalize the dataset by
(8)
Where
(9)
Step 2: Calculate the correlation matrix of the dataset
(10)
Step 3: Calculate eigenvalues λ1, λ2,…, λp and
eigenvectors ai1, ai2,…, aip of R according to the Jacobi
method, and select a principle component for further
investigation.
3.4 Prediction Based on Neural Networks
A nonlinear system realization requires identification
models with more elaborate structures, in which artificial
intelligence is promising. It provides an effective
computing mode derived from biology and neural science,
and has recently been embraced by the field of computer
science. Achievements in engineering areas such as
nonlinear control, pattern recognition, optimization, signal
analysis and processing, aerospace, and intelligent
monitoring have provided inspiring results which impact
daily life, companies, and projects[14].
A mixed structure is necessary to describe the relationship
between influential factors due to the interconnection
among neurons; identification modeling based on dynamic
neural networks can be adopted as an appropriate
simulation; the basic form of neural networks is shown in
Fig. 5.
Fig. 5. Structure of artificial neural networks: (a) neuron connection and activation signal; (b) neural networks with three layers (with M nodes in
the input layer, K nodes in the hidden layer, and L nodes in the output layer).
As shown in Fig. 5, the perception layer can receive input
signals from real circumstances, and these real data are
transmitted to the hidden layer via nonlinear mapping.
The linear weighting procedure is executed in the output
layer, computing results from the hidden layer. The final
stage is the application level, which introduces the results
from the output layer as control/prediction attributes for
real applications.
Let input x=(x1,x2,…,xM)T, x∈ RM. Data from real
systems is accepted; this dataset is sent to the hidden layer
for nonlinear mapping, and the output is y=(y1,y2,…,yL)T , y
∈ RL. The activation functions adopted here include
sigmoid, gaussian, piecewise linear, and threshold forms
[15]
(11)
Where this logistic function is determined by the slope
coefficient a. Here, ci represents the center nodes set.
Neural networks are considered to approach the
continuous function u(x) in the bounded compact set Ω if
the center nodes are reasonably distributed, and the
approximation error is arbitrarily small. The output of
node j can be expressed as follows:
(12)
Where σj is the normalized constant of j, cj is the center
vector of j, and c=(c1,c2,…,cM)T, c∈RM.
The linear mapping process uj(x) →yk is achieved from
the hidden layer to the output layer, i.e., the output of the
output layer node k is represented as follows:
(13)
Where x represents the adjusted weights from the hidden
layer to the output layer, and bk is the bias. Here, yk is a
response signal to the corresponding input; these data are
transferred to the workspace depending on the application
background.
Multilayer perception is appropriately applied due to the
nonlinear parameter structure, in which back-propagation
neural networks (BPNNs) and radial basis function neural
networks (RBFNNs) are widely applied with several
advantages including massively parallel distributed
architecture and self-learning adaptive ability [16].
4 Simulation and Discussion The prediction model is shown in Fig. 6, it consists of
three portions: (1) Reduction of data scale to decrease
irrelevant and secondary parameters by factor analysis; (2)
Building of a prediction model based on neural networks
using factors achieved in the previous stage; (3)
Evaluation of system performance by dataset validation,
and improvement of the accuracy and adaptability of the
algorithm by adjusting the scale of factors and the network
structure.
4.1 Prediction Based on Neural Networks
Factor abstraction is a compulsory means of system
modeling; the reasoning is provided in Section 3.3. For all
labeled variables of matrix P(39,769 × 58), the factors are
abstracted in Table 2.
The information contained in Fi decreases due to the
downsizing variance of each principle component. As
shown in Table 2, variables can be sorted by their
contribution rate Ci, calculated as follows:
(14)
Fig. 6. Popularity prediction model of online news with three interconnected parts: factor abstraction, identification with artificial intelligence and model evaluation (note: the identification model in the third stage results from stage 2).
The number of principle components is determined by
Ci. In most cases, Ci must be equal to or exceed 80% to
ensure that the combined variables can represent the
majority of information from the original variables; the
accumulative total variance is summarized in Table 3.
According to C22=81.923% accumulated from Table 3,
one possible option is to select these 22 variables as PCA
parameters Pd from Table 2.
(15)
where pj=(p1j, p2j, … , pnj)T, j = 1, 2, …, 22; n = 1, 2, … ,
39767. Specifically, the parameter Pd is achieved by the
22 variables described in Table 4. This result indicates that
the popularity level is affected by the published date, data
channel, LDA closeness, and other variables. This
connected relationship is vague and beyond subjective
judgments; meanwhile, a tree model is provided in Fig. 5
to visually indicate these results. Certain differences exist
between the tree model and the PCA method; however, the
variables selected are similar.
As shown in Fig. 7, the new popularity can be roughly
determined by variables. For example, if the news
publishing date is a weekend, then it may receive more
shares with an average of 4186.155 shares; however, most
articles (91.6% of instances) are not published at this
prime time, receiving an average of 2351.337 shares.
Based on node 3, it is suggested that if the news is
published on social media channels, the shares will be
twice as frequent (4310.918) as the other share types
(2211.531). Similarly, lifestyle channels are more popular
than other channels.
Fig. 7. Decision tree representing shares (by Chi-Square Automatic Interaction Detection increasing method).
4.2 Prediction Model Based on Neural Networks
4.2.1 Modeling Based on Neural Networks Training samples with correct outputs (goals) are
indispensable for identification with neural networks.
Fortunately, matrices Pd and T are competent as
supervisors to guide the offline learning processing. This
adjusting process includes two primary phases [17, 18].
The first step is unsupervised learning, which
determines the center vectors cj (from the hidden layer)
and the normalization constant by clustering all samples,
in which the K-means algorithm is executed as follows:
Step 1: Initialize cj(0)=[c1j(0),c2j(0),…,cMj(0)]T,
(j=1,2,…,K), learning rate β (0), and the termination
condition of error calculation ε.
Step 2: Calculate the Euclidean distance to confirm
node r with minimum distance
(16)
Where p is the sample sequence, and r is the node with the
minimum distance between cj(p-1)$ and x(p).
Step 3: Center adjustment
(17)
Where β(p) is the learning rate, and int() indicates a
number-rounding operation.
Step 4: Evaluation of clustering quality: for all samples
p(1,2,…,N), execute steps 2 and 3 until
(18)
The second stage consists of supervised learning. The
goal is to train weight wij by the least mean square method
or delta rules.
Step 1: Initialize weights wij (0), j=1,2,…,K; k=1,2, …,
L.
Step 2: Define the input-output data pairs; the desired
output is yk﹡
(k=1,2,…, L).
Step 3: The output from the hidden layer node j is
expressed as follows (current input is group p):
(19)
The output of node k in the output layer is expressed as
follows:
Step 4: Delta rule (weight adjustment rule):
(20)
where u(x(p)) = [u1(x(p)),u2 (x(p)), … ,uK (x(p))]T, and
η is the learning rate.
Let the output yk﹡
(x(p))(k=1,2,…,L;p=1,2,…,N); then,
the local error function and global total error function are
expressed as follows:
(22)
This circulation is terminated until J converges to ε.
4.2.2 Simulation Results and Discussion Identifying the relationship between decisional
variables of news and shares with BPNN is well
established. In consideration of 22 independent variables,
this model does not provide linear function expressions
and thus, variables cannot be separated from one another
because some are interconnected (e.g., the number of
statistics).
Let the input matrix be Pd. It is sent to the hidden layer
for the mapping operation, with adjusted weights. Finally,
popularity (represented by shares) is derived from the
output layer. Prediction results are shown in Fig. 8.
Fig. 8. Prediction based on back propagation neural networks (model parameters:ε=10-6, η=0.001).
case 1: (a) share comparison between predictions and expectations, K=20, selected previous 80% samples trained, the remaining 20% tested; (b) errors; (c) relative error coefficient.
case 2: (d) share comparison between predictions and expectations, K=20, random selected 80% samples trained, the remaining 20% tested;
(e) errors; (f) relative error coefficient. case 3: (g) share comparison between predictions and expectations, K=30, selected previous 80% samples trained, the remaining 20% tested;
(h) errors; (i) relative error coefficient. case 4: (j) share comparison between predictions and expectations, K=30, random selected 80% samples trained, the remaining 20% tested;
(h) errors; (l) relative error coefficient.
The simulation results in Fig. 8 provide acceptable
prediction shares under various conditions: the relative
error coefficient in (c) demonstrates that only five points
exceed 10%. The overall model accuracy is 95%, with
detailed data depicted in Table 5.
5 Conclusions Online news popularity prediction with neural network
modeling is feasible and valuable. The PCA method can be
used as an effective tool for factor analysis and selection.
The supervised training of radial basis function neural
networks requires a proper dataset with determined tutors
(given the answers), so this nonlinear system is well-suited
to data forecasting and self-evolution. As shown by the
simulation results in the previous section, this
identification method (influential factor selection and
network simulation) can be popularized into other
simulation circumstances. Parallel computing and
distributed file systems can be adopted in data processing;
however, factor abstraction is a preliminary procedure,
which requires intervention by human guides.
Acknowledgements This research was supported by the following grants:
National Science Foundation of China (Grant NO.
61162010, NO.61440019, NO.61462022, and NO
71161007 and the projects of Ministry of Science and
Technology of China (Grant NO. S2013HR0034L), the
100 Talents Project of Chinese Academy of Sciences
(Grant NO.SIDSSE-BR-201304, and Hainan Science
Foundation (Grant NO. 614228).
*responding author.
References [1] Tatar A, Antoniadis P, De Amorim M D, et al. From
popularity prediction to ranking online news. Social
Network Analysis and Mining, Vol. 4, No. 1, pp. 1-12,
April, 2014.
[2] Liu Q, Zhou M, Zhao X. Understanding News 2.0: A
framework for explaining the number of comments
from readers on online news. Information &
Management, Vol. 52, No. 7, pp. 764-776, April, 2015.
[3] Pinto H, Almeida J M, Gonalves M A. Using early
view patterns to predict the popularity of youtube
video. The sixth ACM international conference on
Web search and data mining. Rome, Italy, 2013, pp.
365-374.
[4] Lee J G, Moon S, Salamatian K. An approach to model
and predict the popularity of online contents with
explanatory factors. IEEE/WIC/ACM International
Conference on Web Intelligence and Intelligent Agent
Technology. Toronto, Canada, 2010, pp. 623-630.
[5] Lee A M, Lewis S C, Powers M. Audience Clicks and
News Placement A Study of Time-Lagged Influence in
Online Journalism. Communication Research, Vol. 41,
No. 4, pp. 505-530, November, 2014.
[6] Bhaskar A, Gyani J, Narsimha G. A novel approach to
predict the popularity of the video. IEEE Region 10
Symposium. Kuala Lumpur, Malaysia, 2014, pp.
578-583.
[7] Ren Y, Shen J, Wang J, et al. Mutual verifiable
provable data auditing in public cloud storage. Journal
of Internet Technology, Vol. 16, No. 2, pp. 317-323,
March, 2015.
[8] Nuutinen T, Ray C, Roos E. Do computer use, TV
viewing, and the presence of the media in the bedroom
predict school-aged children's sleep habits in a
longitudinal study. BMC Public Health, Vol. 13, No. 1,
pp. 684-685, March, 2013.
[9] F, Almeida J M, Gonalves M A, et al. On the
dynamics of social media popularity: a YouTube case
study [J]. ACM Transactions on Internet Technology
(TOIT), Vol.14, No.4, pp. 1-22, December, 2014.
[10] Shen C C. Maximum Likelihood DOA Estimation
Using Particle Swarm Optimization under Sensor
Perturbation Conditions. Journal of Internet
Technology, Vol. 16, No. 5, pp. 847-855, September,
2015.
[11] Du C, Zhou Z B, Ying S, et al. An efficient indexing
and query mechanism for ubiquitous IoT services.
International Journal of Ad Hoc and Ubiquitous
Computing, Vol. 18, No. 4, pp. 245-255, June, 2015.
[12]Wang C, Hill D J. Deterministic learning and rapid
dynamical pattern recognition. IEEE Transactions on
Neural Networks, Vol. 18, No.3, pp. 617-630. May,
2007.
[13]Mohanty S, Chattopadhyay A, Peralta P, et al.
Bayesian statistic based multivariate Gaussian process
approach for offline/online fatigue crack growth
prediction. Experimental mechanics, Vol. 51, No.6, pp.
833-843, July, 2011.
[14]Hornik K, Stinchcombe M, White H. Multilayer
feedforward networks are universal approximators.
Neural networks, Vol. 2, No.5, pp. 359-366, May,
1989.
[15]Seshagiri S, Khalil H K. Output feedback control of
nonlinear systems using RBF neural networks, IEEE
Transactions on Neural Networks, Vol. 11, No. 1, pp.
69-79, January, 2000.
[16] Shen B, Hu B W, Zhang H. Method for the analysis of
the preferences of network users. IET Networks, Vol.
5, No. 1, pp. 8-12, January, 2016.
[17] Karia D C, Lande B K, Daruwala R D. Performance
analysis of HMM–and ANN–based spectrum vacancy
predictor behavior for cognitive radios. International
Journal of Ad Hoc and Ubiquitous Computing, Vol. 11,
No. 4, pp. 206-213, May, 2012.
[18] Schmidhuber J. Deep learning in neural networks: An
overview. Neural Networks, Vol. 61, pp. 85-117,
January, 2015.
Biographies
Wei Wu received the B.Sci from
Hubei University of Science and
Technology, the Master Degrees in
College of Information Science and
Technology from Hainan University.
Now he is serving as a full time faculty
in the Institute of Deep-sea Science and
Engineering, Chinese Academy of
Sciences. His research interest covers artificial
intelligence, big data theory and application.
Wencai Du received the B.Sci from
Peking University, China, two
Master Degrees from Twente
University (ITC), The Netherlands
and Hohai University, China,
respectively, the Ph.D. degree from
South Australia University,
Australia, and Post-doct fellow in
Israel Institute of Technology,
Haifa, Israel. He is a Professor of ICT, working in
Hainan University and City University of Macau. His
expertise covers broad areas of information and
communication technologies, social networking and
e-service. His research interests are in the areas of
maritime communication, information management,
and marketing, the focus especially being on tourism
industry operating in the domains of social media
marketing, e-Commerce and e-Education.
Hongzhou Xu received the B.Sc. from
Ocean University of China, Master
Degree and Ph.D. degree from South
China Sea Institute of Oceanology,
Chinese Academy of Sciences. Now he
is serving as a full time faculty in the
Institute of Deep-sea Science and
Engineering, Chinese Academy of
Sciences. His research interest covers
massive data modeling and simulation, ocean circulation
observation and numerical simulation.
Hui Zhou received the B.S. degree in
computer science from University of
Science and Technology of China in
2002, the PhD degree in computer
software and technology from
Graduate University of Chinese
Academy of Sciences (GUCAS) in
2008. Hui Zhou has worked in IBM
Research & Development Center
(Beijing) from July 2008, and joined Hainan University as
a staff college since May 2011. Hui Zhou’s research
interests include computer network, digital tourism, and
cluster file system.
Mengxing Huang received the
PhD degree from the School of
Automation, Northwestern
Polytechnic University, and
Post-doct fellow of Computer
Science and Technology in
Tsinghua University. Now he is
serving as a full time faculty in the
College of Information Science
and Technology from Hainan University. His research
interest covers data and knowledge engineering, big data
and cloud computing, Internet of Things.