my model genuines.
-
Upload
teng-xiaolu -
Category
Documents
-
view
9 -
download
0
Transcript of my model genuines.
1
iot, my small ai VERSION, scope of analytics 2015 learning, thinking, practicing
Any other thoughts please feel free to contact, luxiaoteng0 (at) gmail (dot) com
Data has no shadow. Whatever it’s sculptured, it’s. As long as it has landed to the completion of model-‐100, TENG’s self-‐assignment on modeling techniques self-‐learning, predictive analytics related, TENG is able to decipher hybrid data solution, which is crossing of generic model techniques such as familiar hierarchical cluster and a lot, plus domain knowledge i.e. bank, retail, mobile etc. as well as it empowers advanced data mining applied with machine learning. It has built into three dimensional data analysis matrix. Inside the genres, TENG is unique to design data analysis models, not only learning from traditional analysis modules, but also consolidate multiple techniques to address realistic business needs. It rolls out a positive loop that algorithms could be best fit into where it’s needed for performance improving related to well recognized business situation. <data UNME> converge the usages of text mining, semantic analysis and recommendation system. Each of these analytics modules have been evidently extracted from high profile source. It consolidates movie/drama/program viewingship pattern through multiple platforms x devices. Individual could be classified by tag through the output scoring counts. This process is consistently applied with supervised learning per Google’s tagged word embedding methodology. Brand could leverage the viewing score to plan optimum brand awareness as customer-‐centric driven. Furthermore, it renders in the range of multiple channel intelligences because it
2
makes the variables standardization available across online video, time-‐shifted TV, social media plus influences from movie on-‐air.
Case Study, NLP adoption, A famous early example of the use of cognitive technology to improve a product offering is the recommendation feature of the Netflix online movie rental service, which uses machine learning to predict which movies a customer will like. This feature has had a significant impact on customers’ use of the service; it accounts for as much as 75 percent of Netflix usage.
To improve marketing and customer service, BBVA Compass bank uses a social media sentiment monitoring tool to track and understand what consumers are saying about the bank and its competitors. The tool, which incorporates natural language processing technology, automatically identifies salient topics of consumer chatter and the sentiments surrounding those topics. These insights influence the bank’s decisions on setting fees and offering consumer perks, and how customer service representatives should respond to certain customer inquiries about services and fees.22
Source: Deloitte Review Seed Program (1)# <data UNME> follows the working principle, Since content consumption dominates online video viewing, it is possible to reinvent viewingship measurement according to preferences of the programs. With the chosen variables, targeting market could be defined whereas brands attribute according to the methods of content connection instead of digital metrics only. It makes feasible translation with data works out virtually channel mix strategy. So does it fulfill end-‐to-‐end data solution about performance, segmentation, engagement. . Track the influences generated from each viewingship; . It carries in defined framework; . Viewingship is measured as a link to the contents counted by chosen variables; . Besides monitoring actions on programs, it also has advantage to decipher similar program influences differentiated in various platforms; . In other words, the platform performance could be taken as the adoption of variety of programs which associate to the observing variables; . The preferences on programs reflect the quality of users and it’s conveniently to quantify users attributions. More releases that social network analysis links to utilize tangible conditional probability predictive method. In the mean time, it has the other ascendencies. First, it could be synergized with well-‐established segmentation. According to log linear model, target groups’ viewing habit could be drawn simultaneously with pre-‐option affinity variables, besides viewingship data, reviews, favors, there are more like program type, host, director, actor, geo, timing, devices. With time being, it’s obvious to maximize the significance bonding between brand
3
consumer features and viewing indexation. Consequently, the value of data insight could be further solicited on brand awareness optimization. How much it correlates to the revenues? Only when this solid awareness combination could be visualized, the equation about brand, consumer interaction, learned from data analysis guru Dawn Iacobucci [fig.1] can be measured in a learning path wherever it’s under the bigger background as forming intelligent enterprise [fig.2]. [fig.1] Ad exposure ! Brand awareness ! Attitude ~ ad ! Buying intention ! Purchase Ad exposure ! Brand awareness ! Attitude ~ brand ! Buying intention ! Purchase Price ! Buying intention [fig.2]
Have to thank for references being with theoretical proofs,
1. Natural Language Processing (almost) from Scratch, Journal of Machine Learning Research 1 (2000) 1-‐48, Ronan Collobert, Jason Weston, L ́eon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa.
2. Deep learning & NLP -‐ Graphs to the Rescue, (or not yet!), Stockholm, Sics, October 21 2014, Roelof Pieters, KTH/CSC, Graph Technologies R&D -‐
3. Deep learning for NLP, An Introduction to Neural Word Embeddings* and some more fun stuff…, KTH, December 4, 2014, Roelof Pieters PhD candidate KTH/CSC CIO/CTO Feeda AB
4. Three New Graphical Models for Statistical Language Modelling, Andriy Mnih, Geoffrey Hinton, Department of Computer Science, University of Toronto, Canada
4
5. Statistical Language Models Based on Neural Networks, Google, Moutain View, 2nd April 2012, Tomas Mikolov, Strategies for training large scale neural network language models, Microsoft Research, Redmond, WA, USA, Toma ́sˇ Mikolov #1, Anoop Deoras ∗2, Daniel Povey †3, Luka ́sˇ Burget #4, Jan “Honza” Cˇ ernocky ́ #5 # Brno University of Technology, Speech@FIT, Brno, Czech Republic
6. DeViSE: A Deep Visual-‐Semantic Embedding Model, Google, Inc. Mountain View, CA, USA, Andrea Frome*, Greg S. Corrado*, Jonathon Shlens*, Samy Bengio Jeffrey Dean, Marc’Aurelio Ranzato, Tomas Mikolov
Whatever it’s probably not a game-‐changer, it still transforms to model designer and pursues my own best practices. Thank you.
Is it formidable? What if it’s just thy truth too much. Who just plays autocorrelation? Unsuspected social site easily becomes assistive. How about time-‐shifted TV? It gears up the follower in ahead. Is it awesome? Whenever think about brand tailors your own broadcasting station, secret recipe is about virtual channel * type * program. Please contact Teng, it will have hypothesis, experiments, data analysis methodology decipher, insight report and your brand’s quotient, everything in the case looking at viewhingship/clicks/comments as the series of actions caused by users. There is the correlation between the certain group of viewers and their preferences. Preferences have been discerned by the program viewingshing habit. How about brand overarches the findings in veins, thus exert the inherent influences into viewers who has been identified as worth of targeting according to previous program classifications. [fig.3].
As long as observing on conditional probability merged with social mining, unstructured data utilization could be standardized firstly in the sequence to apply hierarchy bayesian. It is consistent to Seed Program (2)# e-‐chainTM that the extreme focus on top three conversions in data stream, e-‐commerce, engine, email. In tradition, it veins into impression, click, acquisition, transaction. After developing word bag, 2 algorithms Gibbs Sampling and Metropolis Hasting are particularly useful to draw posterior distribution. This is very crucial conclusion drawn from my some researches and theoretical learning. Probably it’s familiar by others, but for me it takes some efforts. It has 4 strengths by adopting of this data solution from my point of view, 1. It has the completed achievement of data tracking in each outcome layer; 2. Follows contextual measurement without losing focus 90% conversions regarding of e-‐business backbone; 3. It works under hive philosophy. It involves the touch points as beneficial extensions; 4. It’s effective to merge with other analysis module i.e. brand simulation in advance. It’s able to utilize Deep Learning as far as there are feasibly 55 layers applied in Google’s analysis that I read from one article before in somewhere.
[fig.3]
. about Spark Internet Button / SIB*, {if this, then that.} data analysis integration inside <data UNME> applied with
myself some reliable proofs in data analysis theories, [fig.3]
5
other scenario soonest specifically for CIO references.
What is differentiations between supervised learning and unsupervised learning in terms of social mining.
• A new study has revealed a way to do sentiment analysis on a large number of social media images using unsupervised learning.
• Unsupervised learning in AI is a step above supervised learning where machines have to work with unlabelled data, observe and make sense of it, and provide an outcome. Supervised learning, on the other hand, gives machines labelled data or examples to learn from when carrying out certain tasks such as classifying an object or predicting future outcomes. The study, Unsupervised Sentiment Analysis for Social Media Images, was released as part of the International Joint Conference on Artificial Intelligence in Argentina this week. It reveals a novel framework, called Unsupervised Sentiment Analysis (USEA), that uses both textual and visual data in a single model for learning.
• Images from social media sites offer rich data to work with when doing sentiment analysis. However, manually labelling millions of images is too labour-‐ and time-‐intensive, meaning this data often goes untapped. This is why the study's authors focused their efforts on unsupervised learning.
• “In order to utilise the vast amount of unlabelled social media images, an unsupervised approach would be much more desirable,” researchers from Arizona State University wrote in their paper.
• “As of 2013, 87 millions of users have registered with Flickr. Also, it was estimated that about 20 billion Instagram photos are shared to 2014.
• “To our best knowledge, USEA is the first unsupervised sentiment analysis framework for social media images.”
• The framework infers sentiments by combining visual data with accompanying textual data. As textual data is often incomplete with hardly any tags or noisy with irrelevant comments, relying on it alone is difficult when doing sentiment analysis.
6
• Therefore, the researchers used the supporting textual data to provide semantic information on the images to enable unsupervised learning.
• “Textual information bridges the semantic gap between visual features and sentiment labels.”
• The researchers crawled images from Flickr and Instagram users, collecting 140,221 images from Flickr and 131,224 from Instagram.
• They built a framework to classify images into three categories or class labels – positive, negative and neutral, looking at image captions and comments associated with the images.
• “Some words may contain sentiment polarities. For example, some words are positive such as ‘happy’ and ‘terrific’; while others are negative such as ‘gloomy’ and ‘disappointed’.
• “The sentiment polarities of words can be obtained via some public sentiment lexicons. For example, the sentiment lexicon MPQA [Multiple Perspective Question Answering] contains 7,504 human labeled words which are commonly used in the daily life with 2,721 positive words and 4,783 negative words.
• “Second, some abbreviations and emoticons are strong sentiment indicators. For example, ‘lol’ [laugh out loud] is a positive indicator while ‘:(‘ is a negative indicator.”
• Visual features from the images were extracted by large-‐scale visual attribute detectors, with term frequency and stop words (removing words like ‘a’ and ‘the’) used to form text-‐based features.
• The framework was compared to other sentiment analysis algorithms such as Senti API for unsupervised sentiment prediction and a variant of the framework, USEA-‐T, which only takes textual data into account when doing sentiment analysis.
• Other methods that were also compared with the USEA framework were Sentibank with K-‐means clustering, which uses large scale visual attribute detectors, and adjective and nouns visual sentiment description pairs; EL with K-‐means clustering, which is a topical graphical model for sentiment analysis; and Random, which randomly guesses to predict sentiment labels of images.
• The results show that USEA performed better than all the other algorithms tested, receiving 56.18 per cent accuracy with the Flickr dataset compared to Senti API at 34.15 per cent and USEA-‐T at 40.22 per cent. With the Instagram dataset, it received 59.94 per cent accuracy compared to Senti API at 37.80 per cent and USEA-‐T at 36.41 per cent.
• “The proposed framework often obtains better performance than baseline methods. There are two major reasons. First, textual information provides semantic meanings and sentiment signals for images. Second we combine visual and textual information for sentiment analysis.”
• The research pointed out that deep learning approaches (many hidden layers in artificial neural networks) to this have shown to be effective, but still are mostly used in a supervised learning way, which depends on the availability of a good training dataset with labels.
• “In the future, we will exploit more social media sources, such as link information, user history, geo-‐location, etc., for sentiment analysis.”
• Source: http://www.cio.com.au/article/580602/study-‐uncovers-‐unsupervised-‐learning-‐framework-‐image-‐sentiment-‐analysis/?fp=16&fpid=1
There are some opinions from Professor Miller about text mining
supervised vs unsupervised: Unsupervised text analytics problems are those for which there is no response or class to be predicted. Rather, as we showed with the movie taglines, the task is to identify common patterns or trends in the data. As part of the task, we may define text measures describing the documents in the corpus. For supervised text analytics problems there is a response or class of documents to be predicted. We build a model on a training set and test it on a test set. Text
7
classification problems are common. Span filtering has long been a subject of interest as a classification problem, and many e-‐mail users have benefitted from the efficient algorithm that have evolved in this area. In the context of information retrieval, search engines classify documents as being relevant to the search or not. Useful modeling techniques for text classification include logistic regression, linear discriminant function analysis, classification trees, and support vector machines. Various ensemble or committee methods may be employed. Automatic text summarization is an area of research and development that can help with information management. Imagine a text processing program with the ability to read each document in a collection and summarize it in a sentence or two, perhaps quoting from the document itself. Today’s search engines are providing partial analysis of documents prior to their being displayed. They create automated summaries for fast information retrieval. They recognize common text strings associated with user requests. These applications of text analysis comprise tool of information search that we take of granted as part of our daily lives. Seed Program (3)# Data Analysis in general + Bank in particular (just name it symphony analysis) >>> unpublished my first book <data analysis is a symphony in big data jungle>
In an analysis list about banking data, may bring my version listed* live. My little behemoth, it’s with all my nurturing from learning. Whereas it’s fully understandable my deepest respects to the behemoths who has been authoritative over 50 years in bank data analysis relates, especially approached with seeable continuous advances e.g. machine learning, predictive analytics. . Customer portfolio management . Customer segmentation . RFM models & Migration . Market basket analysis . Recommendation tool . Existing customer analysis . Customer acquisition . Customer retention including churn analysis, and the side of risk management (over 50 years’ professions in FICO & Others) . Cross-‐selling & Up-‐selling . Multiple channel planning . ROI modeler . Customer lifetime value system . Techniques in predictive analytics, machine learning & Neural Network . Risk analysis (over 50 years’ professions in FICO & SAS & Others) . Fraud analysis (over 50 years’ professions in FICO & SAS & Others) . Credit score (over 50 years’ professions in FICO & Others) (& more a lot about financial data areas that probably I don’t know, related to over 50 years’ professions in FICO & SAS & Others) . Hypothesis * Experiments
8
• *about the list, • It’s suggested to remove the name limitation here for convenient reading,
despite name system in list is simply consistent to e-‐business. My learning has been through data mining methodology, so that, within my available data capabilities, it enables to switch verified data situation with specific data analysis techniques behind the name. It also includes some instances that there are data analysis essences I have learned about e.g. logit, conjunct analysis and a lot etc. despite it can’t tell from the listed names.
And there are more small modelers related to modeling techniques.
Need to highlight Seed Program (4)# the innovation analytics would consist to holistic analytics list and resonating to industry shift on both technology and bank network including bank urbanization phenomena, why IoT is much relevant to bank business, influences caused from millennials, mobile bank, e-wallet etc. This part will be more involved into continuous industry insight decipher. The similar analytics could be expanded into other sectors Seed Program (5)#, like retail, telecom, travel/hotel, restaurant. Nonetheless, it’s still necessary to tackle the equation**.
From TENG:“Data is new currency. Banking it.” Roadmap as one of the topics [fig.4] data in mock-up for category survey, for instance, Newer/Driver/Challenger.
My 1 case connects to bank analysis.
A few years ago, in my chat with my one ex-colleague, he talked about one hurdle
0"1"2"3"4"5"
Newer" Driver" Challenger"
• Data$Analy)cs$Matrix$consolidates$Bank$dynamic$insights$(Newer/Driver/Challenger)$throughout$comprehensions$of$shiBing$fric)on$which$is$caused$by$millennials’$dis)nc)on.$$
TENG"data"unme"FRAMEWROK"~"9th"pillar"
Industry"Insight"
EGBusiness"
Consumer"+/G"Channel"
Social"Network"""
Brand"+/G"Consumer"
Movie/Drama"
IndexaPon"""
Modeling$Techniques$
Machine$Learning/Deep$
Learning$
Business$Savvy$
Biz"Model"
my$contacts:$$erinteng$(at)$hotmail$(dot)$com$139$1862$0956$
9
happened in performance marketing during his working in insurance services. In the challenges to decide how much scaling is most appropriate during dealing with targeting, there is a phenomena there is absolute loss of quality acquisitions in any case when enlarging recruitment base.
Case Study, it’s all about logistic regression to fix targeting paradox.
All other things being equal, the customers with the highest predicted sales should be the ones the sales team will approach first. Alternatively, we could set a cutoff for predicted sales. Customers above the cutoff are the customers who get sales calls—these are the targets. Customers below the cutoff are not given calls.
When evaluating a regression model using data from the previous year, we can determine how close the predicted sales are to the actual/observed sales. We can find out the sum of the absolute values of the residuals (observed minus predicted sales) or the sum of the squared residuals.
Another way to evaluate a regression model is to correlate the observed and predicted response values. Or, better still, we can compute the squared correlation of the observed and predicted response values. This last measure is called the coefficient of determination, and it shows the proportion of response variance accounted for by the linear regression model. This is a number that varies between zero and one, with one being perfect prediction.
If we plotted observed sales on the horizontal axis and predicted sales on the vertical axis, then the higher the squared correlation between observed sales and predicted sales, the closer the points in the plot will fall along a straight line. When the points fall along a straight line exactly, the squared correlation is equal to one, and the regression model is providing a perfect prediction of sales, which is to say that 100 percent of sales response is accounted for by the model. When we build a regression model, we try to obtain a high value for the proportion of response variance accounted for. All other things being equal, higher squared correlations are preferred.
The focus can be on predicting sales or on predicting cost of sales, cost of support, profitability, or overall customer lifetime value. There are many possible regression models to use in with regression methods.
To develop a classification model for targeting, we proceed in much the same way as with a regression, except the response variable is now a category or class. For each customer, a logistic regression model, for example, would provide a predicted probability of response. We employ a cut-off value for the probability of response and classify responses accordingly. If the cut-off were set at 0.50, for example, then we would target the customer if the predicted probability of response is greater than 0.50, and not target otherwise. Or we could target all customers who have a predicted probability of response of 0.40, or 0.30, and so on. The value of the cut-off will vary from one problem to the next.
When observed binary responses or choices are about equally split between yes and no, for example, we would use a cut-off probability of 0.50. That is, when the predicted probability of responding yes is greater than 0.50, we predict yes. Otherwise, we predict no.
Logistic regression provides a means for estimating the probability of a favorable (yes) response to the offer. The density lattice in figure 3.6 provides a pictorial representation of the model and a glimpse at model performance.
To evaluate the performance of this targeting model, we look at a two-by-two contingency table or confusion matrix showing the predicted and observed response values. A 50 percent cut-off does not work in the Bank Marketing Study, given the low base rate of responses to the offer.
A 50 percent cut-off will not work for the bank, but using a 10 percent cutoff for the response variable (accepting the term deposit offer or not), yields 65.9 percent accuracy in classification. The confusion matrix for the logistic regression and 10 percent cut-off is shown.
10
The Bank Marketing Study is typical of target marketing problems. Response rates are low, much lower than 0.50, so a 50 percent cut-off performs poorly. In fact, if bank analysts were to use a 50 percent cut-off, they would predict that every client would respond no, and the bank would target no one. Too high a cut-off means the bank will miss out on many potential sales.
Too low a cut-off presents problems as well. Too low a cut-off means the bank will pursue sales with large numbers of clients, many of whom will never subscribe to the term deposit offer. It is wise to pick a cut-off that maximizes profit, given the unit revenues and costs associated with each cell of the confusion matrix. Target marketing, employed in the right situations and with the right cut-offs, yields higher profits for a company.
Source: <Modeling Techniques in Predictive Analytics>
Being through the enterprise & innovation equation**,
Business value = ecosystem x business model x category pacing x data skill x resources
End. Thank You. My ‘thank you’ has to be sent to, with my rough counts, around data analysis gurus x1,000 ppl, and a group of professions, authors, contributors x5,000 ppl, besides corporations, universities, institutions, organizations, and other team members. Mirroring infusive magnets, we can use data analysis capability to refine data into tangible data model, thus it’s enable to decode human being’s new information adoption pattern when it embeds into a shifting lifestyle movement. It’s ready to embark into a learning mode of the new experiences. It’s found multiple dimensional relationships between customer and brand, reciprocity since it’s along with disruptive technology revolution.