Future of data visualization

19
hadoopsphere Future of Data Visualization HadoopSphere Virtual Conclave August 2015

Transcript of Future of data visualization

Page 1: Future of data visualization

hadoopsphere

Future of Data VisualizationHadoopSphere Virtual ConclaveAugust 2015

Page 2: Future of data visualization

2

Commonly understood components of data visualization

• Graphs, maps, tables, shapes

• WYSIWYG editors

• Dashboards

• HTML5 views

• Infographics

Page 3: Future of data visualization

3

Defining data visualization• Data visualization is the presentation of data in a

pictorial or graphical format. - Wikipedia

• Data visualization is a visual representation of the insights gained from your analysis. - Datameer

Page 4: Future of data visualization

4

Emerging Trends• New Channels–Mobile, VR devices

• More interactive charts– Redraw, filter, annotations

• Multidimensional visual– VR, GL

• Network visualization– Social, Linkages

• Collaborations– Share, Review, Workflow

• And we may have ‘audiolizations’ as well– Audio narrations

Page 5: Future of data visualization

5

Process of data visualization

Prepare

Explore

Design

Deliver

Page 6: Future of data visualization

6

Challenges

Access to dataParse dataCentral data accessFast queriesComplex visual typesLinked ViewsData miningCollaborationWorkflow

Page 7: Future of data visualization

7

Introducing Apache Zeppelin

HDFS/ Data Store

Oper

atio

nsGovernance/Security

YARN

Spark / Flink / Tajo …

• Apache Zeppelin is a web-based multi-purpose notebook for interactive data analysis.

• It is a 100% open source incubator project of Apache Software Foundations.

• As per HadoopSphere, Apache Zeppelin is going to influence big data visualization tools for next 2 years or more.

Page 8: Future of data visualization

8

Zeppelin Notebook• A web-based notebook

that enables interactive data analytics.

• You can type in code in SQL, Scala and more in the notebook.

• Run the commands directly from the notebook.Source for this slide and subsequent slides:

(1) http://zeppelin.apache.org(2) Lee Moon Soo, Introduction to Zeppelin, ApacheCon 2015

Page 9: Future of data visualization

9

Zeppelin user interface

Page 10: Future of data visualization

10

Behind the scenes• Java based backend• Active development community- Built-in Apache Spark integration- Uses Angular JS, D3.js- Tested on Mac OSx, Ubuntu 14.x, CentOS 6.x

Page 11: Future of data visualization

11

Zeppelin features - Visualization• Some basic charts are

currently included in Zeppelin and more will be added in future.

• Visualizations are not limited to Spark SQL's query - relational output from many other language backends can be recognized and visualized.

Page 12: Future of data visualization

12

Zeppelin features - Pivots• With simple drag and

drop Zeppelin aggregates the values and display them in pivot chart.

• You can easily create chart with multiple aggregated values including sum, count, average, min, max.

Page 13: Future of data visualization

13

Zeppelin features – Dynamic forms• Zeppelin can

dynamically take inputs in forms as part of the notebook.

• These dynamic forms can be used to see input based results or render charts.

Page 14: Future of data visualization

14

Zeppelin features – Collaboration and publishing• Notebook URL can be

shared among collaborators. Zeppelin can then broadcast any changes in real time, just like the collaboration in Google docs.

• Zeppelin provides a URL to display the results only that can easily be embedded as an iframe inside a web page.

Page 15: Future of data visualization

15

Zeppelin interpreter architecture• Zeppelin Interpreter is a connector between Zeppelin and backend

data processing system. For example to use scala code in Zeppelin, you need scala interpreter.

• Every Interpreter belongs to an InterpreterGroup which is a unit of start/stop interpreter. Interpreters in the same InterpreterGroup can reference each other. For example, SparkSqlInterpreter can reference SparkInterpreter to get SparkContext from it while they're in the same group. ZeppelinServer

InterpreterGroup

Separate JVM process

Interpreter

Interpreter

Interpreter

Spark

Spark PySpark SparkSQL Dep

Load libraries

Maven repositorySpark cluster

Share single SparkDriver

Thrift

Page 16: Future of data visualization

16

Zeppelin interaction ecosystem

* includes future roadmap components

Page 17: Future of data visualization

17

Getting involved with Zeppelin• http://zeppelin.apache.org/• http://github.com/apache/incubator-zeppelin

Installation reference:• http://hortonworks.com/blog/introduction-to-data-

science-with-apache-spark/• http://nflabs.github.io/z-manager/

Mailing List• [email protected] 

Page 18: Future of data visualization

18

Other Notebook options• iPython Notebook • Beaker• Spark-Notebook • Databricks Cloud Notebook

Page 19: Future of data visualization

19

Thank you

[email protected]

Twitter: @hadoopsphere