Future of data visualization

Post on 15-Apr-2017

400 views 0 download

Transcript of Future of data visualization

hadoopsphere

Future of Data VisualizationHadoopSphere Virtual ConclaveAugust 2015

2

Commonly understood components of data visualization

• Graphs, maps, tables, shapes

• WYSIWYG editors

• Dashboards

• HTML5 views

• Infographics

3

Defining data visualization• Data visualization is the presentation of data in a

pictorial or graphical format. - Wikipedia

• Data visualization is a visual representation of the insights gained from your analysis. - Datameer

4

Emerging Trends• New Channels–Mobile, VR devices

• More interactive charts– Redraw, filter, annotations

• Multidimensional visual– VR, GL

• Network visualization– Social, Linkages

• Collaborations– Share, Review, Workflow

• And we may have ‘audiolizations’ as well– Audio narrations

5

Process of data visualization

Prepare

Explore

Design

Deliver

6

Challenges

Access to dataParse dataCentral data accessFast queriesComplex visual typesLinked ViewsData miningCollaborationWorkflow

7

Introducing Apache Zeppelin

HDFS/ Data Store

Oper

atio

nsGovernance/Security

YARN

Spark / Flink / Tajo …

• Apache Zeppelin is a web-based multi-purpose notebook for interactive data analysis.

• It is a 100% open source incubator project of Apache Software Foundations.

• As per HadoopSphere, Apache Zeppelin is going to influence big data visualization tools for next 2 years or more.

8

Zeppelin Notebook• A web-based notebook

that enables interactive data analytics.

• You can type in code in SQL, Scala and more in the notebook.

• Run the commands directly from the notebook.Source for this slide and subsequent slides:

(1) http://zeppelin.apache.org(2) Lee Moon Soo, Introduction to Zeppelin, ApacheCon 2015

9

Zeppelin user interface

10

Behind the scenes• Java based backend• Active development community- Built-in Apache Spark integration- Uses Angular JS, D3.js- Tested on Mac OSx, Ubuntu 14.x, CentOS 6.x

11

Zeppelin features - Visualization• Some basic charts are

currently included in Zeppelin and more will be added in future.

• Visualizations are not limited to Spark SQL's query - relational output from many other language backends can be recognized and visualized.

12

Zeppelin features - Pivots• With simple drag and

drop Zeppelin aggregates the values and display them in pivot chart.

• You can easily create chart with multiple aggregated values including sum, count, average, min, max.

13

Zeppelin features – Dynamic forms• Zeppelin can

dynamically take inputs in forms as part of the notebook.

• These dynamic forms can be used to see input based results or render charts.

14

Zeppelin features – Collaboration and publishing• Notebook URL can be

shared among collaborators. Zeppelin can then broadcast any changes in real time, just like the collaboration in Google docs.

• Zeppelin provides a URL to display the results only that can easily be embedded as an iframe inside a web page.

15

Zeppelin interpreter architecture• Zeppelin Interpreter is a connector between Zeppelin and backend

data processing system. For example to use scala code in Zeppelin, you need scala interpreter.

• Every Interpreter belongs to an InterpreterGroup which is a unit of start/stop interpreter. Interpreters in the same InterpreterGroup can reference each other. For example, SparkSqlInterpreter can reference SparkInterpreter to get SparkContext from it while they're in the same group. ZeppelinServer

InterpreterGroup

Separate JVM process

Interpreter

Interpreter

Interpreter

Spark

Spark PySpark SparkSQL Dep

Load libraries

Maven repositorySpark cluster

Share single SparkDriver

Thrift

16

Zeppelin interaction ecosystem

* includes future roadmap components

17

Getting involved with Zeppelin• http://zeppelin.apache.org/• http://github.com/apache/incubator-zeppelin

Installation reference:• http://hortonworks.com/blog/introduction-to-data-

science-with-apache-spark/• http://nflabs.github.io/z-manager/

Mailing List• users@zeppelin.incubator.apache.org 

18

Other Notebook options• iPython Notebook • Beaker• Spark-Notebook • Databricks Cloud Notebook

19

Thank you

scale@hadoopsphere.com

Twitter: @hadoopsphere