OTC and the Analytics Framework Social Media Analytics T-45 days GTRI - Proprietary.

8
OTC and the Analytics Framework Social Media Analytics T-45 days GTRI - Proprietary

Transcript of OTC and the Analytics Framework Social Media Analytics T-45 days GTRI - Proprietary.

Page 1: OTC and the Analytics Framework Social Media Analytics T-45 days GTRI - Proprietary.

OTC and the Analytics Framework

Social Media AnalyticsT-45 days

GTRI - Proprietary

Page 2: OTC and the Analytics Framework Social Media Analytics T-45 days GTRI - Proprietary.

Yay, an IRAD that helps us do our jobs

DataLoader API

TwitterOTC

TwitterFollowers

Analytics API

HierNMF

Others?

Visualization APIHappy Hour

FB Feelings

Reach

Timeline Deliveries

Platform

Tweets/min

Other Hashtags

Top Links

And so on…

Workflow: Hourly topic models

Supporting architecture and data management

Page 3: OTC and the Analytics Framework Social Media Analytics T-45 days GTRI - Proprietary.

So here’s what we need to build

DataLoader API

TwitterOTC

TwitterFollowers

Analytics API

HierNMF

Others?

Visualization API

Happy Hour

FB Feelings

Reach

Timeline Deliveries

Platform

Tweets/min

Other Hashtags

Top Links

And so on…

Workflow: Hourly topic models

Supporting architecture and data management

Page 4: OTC and the Analytics Framework Social Media Analytics T-45 days GTRI - Proprietary.

Let’s use our words

Ingest Modules• TwitterOTC

– Define where to store tweets in mongo– Define where to store counts in MySQL– All specific counts and transformations

(sentiment and dates) are hardcoded into the ingest module (these could later become filters)

• TwitterFollowers– Define where to store followers in MySQL– GET from Twitter API with followers for user– For each hour, counts of unfollow and

follow actions

Workflow Prototype• Hourly topic models

– Every hour, create a matrix for the previous hour from the TwitterOTC source

– Once a new matrix shows up, HierNMF clustering will kick off to create new topic models

– Once new models show up, the MySQL variable storing the most recent model path will be updated

Repositories now have otc_demo branches for our use

Page 5: OTC and the Analytics Framework Social Media Analytics T-45 days GTRI - Proprietary.

Creating Visualizations

1. Create the python handler– Use the visualization interface as a template

• otc_vis/VISUALIZATION_INTERFACE.py

– All visualizations are located here:• otc_vis/• For reference: analytics-framework/visualization/python/Visualization/vis/

2. Create the directive, if you are not using the default directive• src/js/directives/VisDirectives.js

3. Test in a single page with known data– Use sample page as starting template

• src/widgets/topics.html

– Modify the createVis function to use your inputs and send the request to your newly created vis

Page 6: OTC and the Analytics Framework Social Media Analytics T-45 days GTRI - Proprietary.

Creating Ingest Modules

1. Create the python handler– Place in otc_ingest/– Use existing TwitterOTC as starting point– Register with the analytics framework:

python analytics-framework/configure.py --api ingest--mode add--filename otc_ingest.[your_python_file].py

Page 7: OTC and the Analytics Framework Social Media Analytics T-45 days GTRI - Proprietary.

Our friend, Vincent Vega

We do this in Vincent:scatter = vincent.Scatter(self.matrix, iter_idx=0) .colors(brew='Set1') .axis_titles(x=self.features[0], y='') .legend(title='Features') .to_json()

We get this in Vega:{"title": "Features", "offset": 0, "properties": {}, "fill": "color"}], "scales": [{"range": "width", "domain": {"field": "data.idx", "data": "table"}, "type": "linear", "name": "x", "zero": false}, {"range": "height", "domain": {"field": "data.val", "data": "table"}, "name": "y", "nice": true}, {"range": ["#e41a1c", "#377eb8", "#4daf4a", "#984ea3", "#ff7f00", "#ffff33", "#a65628", "#f781bf", "#999999"], "domain": {"field": "data.col", "data": "table"}, "type": "ordinal", "name": "color"}], "axes": [{"scale": "x", "type": "x", "title": "Petal length"}, {"scale": "y", "type": "y", "title": ""}], "height": 500, "padding": "auto", "width": 960, "marks": [{"type": "group", "from": {"data": "table", "transform": [{"keys": ["data.col"], "type": "facet"}]}, "marks": [{"type": "symbol", "properties": {"enter": {"y": {"field": "data.val", "scale": "y"}, "x": {"field": "data.idx", "scale": "x"}, "size": {"value": 100}, "fill": {"field": "data.col", "scale": "color"}}}}]}], "data": [{"values": [{"val": 1.4, "col": "Petal length", "idx": 0}, {"val": 0.2, "col": "Petal width", "idx": 0}, {"val": 5.1, "col": "Sepal length", "idx": 0}, {"val": 3.5, "col": "Sepal width", "idx": 0}, {"val": 1.4, "col": "Petal length", "idx": 1}, {"val": 0.2, "col": "Petal width", "idx": 1},…

Page 8: OTC and the Analytics Framework Social Media Analytics T-45 days GTRI - Proprietary.

• Various basic, predefined vis (see Scatter.py, Linechart.py…)– http://vincent.readthedocs.org/en/latest/

• Basic building block is a Chart (see ClusterScatter.py)– Chart.data is a list of the data sources (each with a unique name), which can be created from basic

python types or pandas and numpy types– Chart.scales is a list of Scale objects, which use a data source and can define x range, y range, and

colors– Chart.axes is a list of Axis objects which identify Scale objects to use for defining the axes– Chart.marks is a list of Mark objects, each of which define a particular set of data to display in a

particular way– Chart.legend defines what to call the legend (the content of the legend itself will be drawn from

Chart.marks)

• Vega is a higher-level visualization specification language on top of D3– In general, if you can do it in Vega, you can do it in Vincent via kwargs– https://github.com/trifacta/vega/wiki/Vega-and-D3 – http://trifacta.github.io/vega/

Getting to know Vincent Vega