PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

34
BIG DATA WORKLOAD ANALYSIS USING SWAT AND IPYTHON NOTEBOOKS MONIR MOZUMDER AMD RESEARCH THANKS: JAY OWEN (AMD RESEARCH) LEONARDO PIGA (AMD RESEARCH) MAURICIO BRETERNITZ (AMD RESEARCH) SABARISHYAM SRINIVASARAJU (AMD RESEARCH) * I also want to acknowledge and thank KEITH LOWERY for his contributions to the development of the SWAT tool at AMD.

description

Presentation PL-4047 by Monir Mozumder at the AMD Developer Summit (APU13) November 11-13, 2013.

Transcript of PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

Page 1: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

BIG DATA WORKLOAD ANALYSIS USING SWAT ANDIPYTHON NOTEBOOKS MONIR MOZUMDER

AMD RESEARCH

THANKSJAY OWEN (AMD RESEARCH)

LEONARDO PIGA (AMD RESEARCH)

MAURICIO BRETERNITZ (AMD RESEARCH)

SABARISHYAM SRINIVASARAJU (AMD RESEARCH)

I also want to acknowledge and thank KEITH LOWERY for his contributions to the development of the SWAT tool at AMD

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL2

AGENDA

SWAT OVERVIEW

IPython overview

Performance Bottleneck Analysis using IPython Notebooks

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL3

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)

SWATSoftware platform for automating creation deployment execution and data gathering of

synthetic compute workloads on clusters of arbitrary sizes

Allows deployment of workloads on Virtual Clusters (Amazon EC2) or physical in-house clusters (Seamicro server Bare hardware clusterhellip)

Supports benchmark workloads from CloudSuite and some research workloads like GraphLab HadoopCL etc

Gathers various system statistics during run and collects them along with workload logs in a batch folder which are stored in the UI box for later analysisvisualization

OVERVIEW

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL4

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)COMPONENTS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL5

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)

Front End

Houses the Control Box of the SWAT tool

Separate from the actual cluster that runs the workloads

Does not need the benchmark workloads to be installed

Manages the cluster if needed boots them with configuration options chosen by User

Stores logs generated for the runs for later analysis

Cluster nodes

Runs the actual workloads as directed by Front End

Needs workloads installed on each node

Can reside on cloud service providerrsquos data center (Amazon) or in a local internal cluster

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL6

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)FRONT END (UI)

Major steps in running a workload

Select nodes (instances)

Select Workload Container

Hadoop flavor

Other container (memcached)

Select Actual Workload

Basic hadoop jobs

Cloudsuite benchmarks Data Analytics (Mahout)

Data Serving (Cassandra)

Media Streaming (darwin)

Software Testing (Cloud9)

Web Search (nutch)

Web Serving (Oilo Faban)

GraphLab

McBlaster (memcached)

Start selected job in batch mode or standalone mode

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL7

SWAT WORKFLOW STEPS

1 Cluster selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL8

SWAT WORKFLOW STEPS

2 Workload Container Selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL9

SWAT WORKFLOW STEPS

3 Workload selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL10

SWAT WORKFLOW STEPS

4 Job initiation progress and termination

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL11

IPYTHON BASICS

What is Ipython

‒ Ipython stands for Interactive Python‒ A better python interpreter

‒ Interactive IDE for python (QTConsolehellip)

‒ Better web based front end for interactive analysis of scientific data (IPython notebooks)

‒ Also has parallel execution engine for running workloads on a cluster (not as full-featured as SWAT)

Installing

‒ Install as all-in-one package in Windowsreg Enthought Canopy Anaconda ActivePython pythonxy

‒ In Linuxreg we need to install the components separately (assuming setuptools already installed)

‒easy_install ipython[zmqqtconsolenotebook]

‒easy_install pandas

‒easy_install matplotlib

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL12

IPYTHON INTERFACES

Starting ipython

ipython

opens up default shell like console

ipython qtconsole --pylab=inline

opens up graphical qt-based console

ipython notebook --pylab=inline

instantiates server and a browser window pointing to the server instance

Server dashboard points to notebooks existing in machine and ways to create new ones

Users can connect to this server instance remotely and collaborate by working on the same notebook

We use this feature for our SWAT log analysis and bottleneck finding experiments

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL13

IPYTHON NOTEBOOK DASHBOARD

Note

All three modes offer similar functionality

We use the notebook interface due to easy sharing for collaborating remotely with multiple users

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL14

IPYTHON NOTEBOOK EXAMPLE SESSION

Any python expression can be run at the console prompt

In [1] a=range(110)

In [2] a

Out[2] [1 2 3 4 5 6 7 8 9]

Also run basic shell commandsIn [3] pwd

Out [3] ucmonirapu13sumatra

In [4] cd

cmonir

Even capture the output of any shell command or script into python variable

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 2: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL2

AGENDA

SWAT OVERVIEW

IPython overview

Performance Bottleneck Analysis using IPython Notebooks

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL3

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)

SWATSoftware platform for automating creation deployment execution and data gathering of

synthetic compute workloads on clusters of arbitrary sizes

Allows deployment of workloads on Virtual Clusters (Amazon EC2) or physical in-house clusters (Seamicro server Bare hardware clusterhellip)

Supports benchmark workloads from CloudSuite and some research workloads like GraphLab HadoopCL etc

Gathers various system statistics during run and collects them along with workload logs in a batch folder which are stored in the UI box for later analysisvisualization

OVERVIEW

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL4

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)COMPONENTS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL5

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)

Front End

Houses the Control Box of the SWAT tool

Separate from the actual cluster that runs the workloads

Does not need the benchmark workloads to be installed

Manages the cluster if needed boots them with configuration options chosen by User

Stores logs generated for the runs for later analysis

Cluster nodes

Runs the actual workloads as directed by Front End

Needs workloads installed on each node

Can reside on cloud service providerrsquos data center (Amazon) or in a local internal cluster

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL6

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)FRONT END (UI)

Major steps in running a workload

Select nodes (instances)

Select Workload Container

Hadoop flavor

Other container (memcached)

Select Actual Workload

Basic hadoop jobs

Cloudsuite benchmarks Data Analytics (Mahout)

Data Serving (Cassandra)

Media Streaming (darwin)

Software Testing (Cloud9)

Web Search (nutch)

Web Serving (Oilo Faban)

GraphLab

McBlaster (memcached)

Start selected job in batch mode or standalone mode

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL7

SWAT WORKFLOW STEPS

1 Cluster selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL8

SWAT WORKFLOW STEPS

2 Workload Container Selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL9

SWAT WORKFLOW STEPS

3 Workload selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL10

SWAT WORKFLOW STEPS

4 Job initiation progress and termination

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL11

IPYTHON BASICS

What is Ipython

‒ Ipython stands for Interactive Python‒ A better python interpreter

‒ Interactive IDE for python (QTConsolehellip)

‒ Better web based front end for interactive analysis of scientific data (IPython notebooks)

‒ Also has parallel execution engine for running workloads on a cluster (not as full-featured as SWAT)

Installing

‒ Install as all-in-one package in Windowsreg Enthought Canopy Anaconda ActivePython pythonxy

‒ In Linuxreg we need to install the components separately (assuming setuptools already installed)

‒easy_install ipython[zmqqtconsolenotebook]

‒easy_install pandas

‒easy_install matplotlib

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL12

IPYTHON INTERFACES

Starting ipython

ipython

opens up default shell like console

ipython qtconsole --pylab=inline

opens up graphical qt-based console

ipython notebook --pylab=inline

instantiates server and a browser window pointing to the server instance

Server dashboard points to notebooks existing in machine and ways to create new ones

Users can connect to this server instance remotely and collaborate by working on the same notebook

We use this feature for our SWAT log analysis and bottleneck finding experiments

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL13

IPYTHON NOTEBOOK DASHBOARD

Note

All three modes offer similar functionality

We use the notebook interface due to easy sharing for collaborating remotely with multiple users

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL14

IPYTHON NOTEBOOK EXAMPLE SESSION

Any python expression can be run at the console prompt

In [1] a=range(110)

In [2] a

Out[2] [1 2 3 4 5 6 7 8 9]

Also run basic shell commandsIn [3] pwd

Out [3] ucmonirapu13sumatra

In [4] cd

cmonir

Even capture the output of any shell command or script into python variable

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 3: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL3

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)

SWATSoftware platform for automating creation deployment execution and data gathering of

synthetic compute workloads on clusters of arbitrary sizes

Allows deployment of workloads on Virtual Clusters (Amazon EC2) or physical in-house clusters (Seamicro server Bare hardware clusterhellip)

Supports benchmark workloads from CloudSuite and some research workloads like GraphLab HadoopCL etc

Gathers various system statistics during run and collects them along with workload logs in a batch folder which are stored in the UI box for later analysisvisualization

OVERVIEW

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL4

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)COMPONENTS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL5

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)

Front End

Houses the Control Box of the SWAT tool

Separate from the actual cluster that runs the workloads

Does not need the benchmark workloads to be installed

Manages the cluster if needed boots them with configuration options chosen by User

Stores logs generated for the runs for later analysis

Cluster nodes

Runs the actual workloads as directed by Front End

Needs workloads installed on each node

Can reside on cloud service providerrsquos data center (Amazon) or in a local internal cluster

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL6

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)FRONT END (UI)

Major steps in running a workload

Select nodes (instances)

Select Workload Container

Hadoop flavor

Other container (memcached)

Select Actual Workload

Basic hadoop jobs

Cloudsuite benchmarks Data Analytics (Mahout)

Data Serving (Cassandra)

Media Streaming (darwin)

Software Testing (Cloud9)

Web Search (nutch)

Web Serving (Oilo Faban)

GraphLab

McBlaster (memcached)

Start selected job in batch mode or standalone mode

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL7

SWAT WORKFLOW STEPS

1 Cluster selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL8

SWAT WORKFLOW STEPS

2 Workload Container Selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL9

SWAT WORKFLOW STEPS

3 Workload selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL10

SWAT WORKFLOW STEPS

4 Job initiation progress and termination

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL11

IPYTHON BASICS

What is Ipython

‒ Ipython stands for Interactive Python‒ A better python interpreter

‒ Interactive IDE for python (QTConsolehellip)

‒ Better web based front end for interactive analysis of scientific data (IPython notebooks)

‒ Also has parallel execution engine for running workloads on a cluster (not as full-featured as SWAT)

Installing

‒ Install as all-in-one package in Windowsreg Enthought Canopy Anaconda ActivePython pythonxy

‒ In Linuxreg we need to install the components separately (assuming setuptools already installed)

‒easy_install ipython[zmqqtconsolenotebook]

‒easy_install pandas

‒easy_install matplotlib

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL12

IPYTHON INTERFACES

Starting ipython

ipython

opens up default shell like console

ipython qtconsole --pylab=inline

opens up graphical qt-based console

ipython notebook --pylab=inline

instantiates server and a browser window pointing to the server instance

Server dashboard points to notebooks existing in machine and ways to create new ones

Users can connect to this server instance remotely and collaborate by working on the same notebook

We use this feature for our SWAT log analysis and bottleneck finding experiments

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL13

IPYTHON NOTEBOOK DASHBOARD

Note

All three modes offer similar functionality

We use the notebook interface due to easy sharing for collaborating remotely with multiple users

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL14

IPYTHON NOTEBOOK EXAMPLE SESSION

Any python expression can be run at the console prompt

In [1] a=range(110)

In [2] a

Out[2] [1 2 3 4 5 6 7 8 9]

Also run basic shell commandsIn [3] pwd

Out [3] ucmonirapu13sumatra

In [4] cd

cmonir

Even capture the output of any shell command or script into python variable

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 4: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL4

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)COMPONENTS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL5

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)

Front End

Houses the Control Box of the SWAT tool

Separate from the actual cluster that runs the workloads

Does not need the benchmark workloads to be installed

Manages the cluster if needed boots them with configuration options chosen by User

Stores logs generated for the runs for later analysis

Cluster nodes

Runs the actual workloads as directed by Front End

Needs workloads installed on each node

Can reside on cloud service providerrsquos data center (Amazon) or in a local internal cluster

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL6

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)FRONT END (UI)

Major steps in running a workload

Select nodes (instances)

Select Workload Container

Hadoop flavor

Other container (memcached)

Select Actual Workload

Basic hadoop jobs

Cloudsuite benchmarks Data Analytics (Mahout)

Data Serving (Cassandra)

Media Streaming (darwin)

Software Testing (Cloud9)

Web Search (nutch)

Web Serving (Oilo Faban)

GraphLab

McBlaster (memcached)

Start selected job in batch mode or standalone mode

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL7

SWAT WORKFLOW STEPS

1 Cluster selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL8

SWAT WORKFLOW STEPS

2 Workload Container Selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL9

SWAT WORKFLOW STEPS

3 Workload selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL10

SWAT WORKFLOW STEPS

4 Job initiation progress and termination

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL11

IPYTHON BASICS

What is Ipython

‒ Ipython stands for Interactive Python‒ A better python interpreter

‒ Interactive IDE for python (QTConsolehellip)

‒ Better web based front end for interactive analysis of scientific data (IPython notebooks)

‒ Also has parallel execution engine for running workloads on a cluster (not as full-featured as SWAT)

Installing

‒ Install as all-in-one package in Windowsreg Enthought Canopy Anaconda ActivePython pythonxy

‒ In Linuxreg we need to install the components separately (assuming setuptools already installed)

‒easy_install ipython[zmqqtconsolenotebook]

‒easy_install pandas

‒easy_install matplotlib

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL12

IPYTHON INTERFACES

Starting ipython

ipython

opens up default shell like console

ipython qtconsole --pylab=inline

opens up graphical qt-based console

ipython notebook --pylab=inline

instantiates server and a browser window pointing to the server instance

Server dashboard points to notebooks existing in machine and ways to create new ones

Users can connect to this server instance remotely and collaborate by working on the same notebook

We use this feature for our SWAT log analysis and bottleneck finding experiments

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL13

IPYTHON NOTEBOOK DASHBOARD

Note

All three modes offer similar functionality

We use the notebook interface due to easy sharing for collaborating remotely with multiple users

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL14

IPYTHON NOTEBOOK EXAMPLE SESSION

Any python expression can be run at the console prompt

In [1] a=range(110)

In [2] a

Out[2] [1 2 3 4 5 6 7 8 9]

Also run basic shell commandsIn [3] pwd

Out [3] ucmonirapu13sumatra

In [4] cd

cmonir

Even capture the output of any shell command or script into python variable

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 5: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL5

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)

Front End

Houses the Control Box of the SWAT tool

Separate from the actual cluster that runs the workloads

Does not need the benchmark workloads to be installed

Manages the cluster if needed boots them with configuration options chosen by User

Stores logs generated for the runs for later analysis

Cluster nodes

Runs the actual workloads as directed by Front End

Needs workloads installed on each node

Can reside on cloud service providerrsquos data center (Amazon) or in a local internal cluster

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL6

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)FRONT END (UI)

Major steps in running a workload

Select nodes (instances)

Select Workload Container

Hadoop flavor

Other container (memcached)

Select Actual Workload

Basic hadoop jobs

Cloudsuite benchmarks Data Analytics (Mahout)

Data Serving (Cassandra)

Media Streaming (darwin)

Software Testing (Cloud9)

Web Search (nutch)

Web Serving (Oilo Faban)

GraphLab

McBlaster (memcached)

Start selected job in batch mode or standalone mode

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL7

SWAT WORKFLOW STEPS

1 Cluster selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL8

SWAT WORKFLOW STEPS

2 Workload Container Selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL9

SWAT WORKFLOW STEPS

3 Workload selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL10

SWAT WORKFLOW STEPS

4 Job initiation progress and termination

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL11

IPYTHON BASICS

What is Ipython

‒ Ipython stands for Interactive Python‒ A better python interpreter

‒ Interactive IDE for python (QTConsolehellip)

‒ Better web based front end for interactive analysis of scientific data (IPython notebooks)

‒ Also has parallel execution engine for running workloads on a cluster (not as full-featured as SWAT)

Installing

‒ Install as all-in-one package in Windowsreg Enthought Canopy Anaconda ActivePython pythonxy

‒ In Linuxreg we need to install the components separately (assuming setuptools already installed)

‒easy_install ipython[zmqqtconsolenotebook]

‒easy_install pandas

‒easy_install matplotlib

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL12

IPYTHON INTERFACES

Starting ipython

ipython

opens up default shell like console

ipython qtconsole --pylab=inline

opens up graphical qt-based console

ipython notebook --pylab=inline

instantiates server and a browser window pointing to the server instance

Server dashboard points to notebooks existing in machine and ways to create new ones

Users can connect to this server instance remotely and collaborate by working on the same notebook

We use this feature for our SWAT log analysis and bottleneck finding experiments

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL13

IPYTHON NOTEBOOK DASHBOARD

Note

All three modes offer similar functionality

We use the notebook interface due to easy sharing for collaborating remotely with multiple users

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL14

IPYTHON NOTEBOOK EXAMPLE SESSION

Any python expression can be run at the console prompt

In [1] a=range(110)

In [2] a

Out[2] [1 2 3 4 5 6 7 8 9]

Also run basic shell commandsIn [3] pwd

Out [3] ucmonirapu13sumatra

In [4] cd

cmonir

Even capture the output of any shell command or script into python variable

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 6: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL6

SYNTHETIC WORKLOAD ANALYSIS TOOLKIT (SWAT)FRONT END (UI)

Major steps in running a workload

Select nodes (instances)

Select Workload Container

Hadoop flavor

Other container (memcached)

Select Actual Workload

Basic hadoop jobs

Cloudsuite benchmarks Data Analytics (Mahout)

Data Serving (Cassandra)

Media Streaming (darwin)

Software Testing (Cloud9)

Web Search (nutch)

Web Serving (Oilo Faban)

GraphLab

McBlaster (memcached)

Start selected job in batch mode or standalone mode

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL7

SWAT WORKFLOW STEPS

1 Cluster selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL8

SWAT WORKFLOW STEPS

2 Workload Container Selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL9

SWAT WORKFLOW STEPS

3 Workload selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL10

SWAT WORKFLOW STEPS

4 Job initiation progress and termination

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL11

IPYTHON BASICS

What is Ipython

‒ Ipython stands for Interactive Python‒ A better python interpreter

‒ Interactive IDE for python (QTConsolehellip)

‒ Better web based front end for interactive analysis of scientific data (IPython notebooks)

‒ Also has parallel execution engine for running workloads on a cluster (not as full-featured as SWAT)

Installing

‒ Install as all-in-one package in Windowsreg Enthought Canopy Anaconda ActivePython pythonxy

‒ In Linuxreg we need to install the components separately (assuming setuptools already installed)

‒easy_install ipython[zmqqtconsolenotebook]

‒easy_install pandas

‒easy_install matplotlib

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL12

IPYTHON INTERFACES

Starting ipython

ipython

opens up default shell like console

ipython qtconsole --pylab=inline

opens up graphical qt-based console

ipython notebook --pylab=inline

instantiates server and a browser window pointing to the server instance

Server dashboard points to notebooks existing in machine and ways to create new ones

Users can connect to this server instance remotely and collaborate by working on the same notebook

We use this feature for our SWAT log analysis and bottleneck finding experiments

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL13

IPYTHON NOTEBOOK DASHBOARD

Note

All three modes offer similar functionality

We use the notebook interface due to easy sharing for collaborating remotely with multiple users

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL14

IPYTHON NOTEBOOK EXAMPLE SESSION

Any python expression can be run at the console prompt

In [1] a=range(110)

In [2] a

Out[2] [1 2 3 4 5 6 7 8 9]

Also run basic shell commandsIn [3] pwd

Out [3] ucmonirapu13sumatra

In [4] cd

cmonir

Even capture the output of any shell command or script into python variable

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 7: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL7

SWAT WORKFLOW STEPS

1 Cluster selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL8

SWAT WORKFLOW STEPS

2 Workload Container Selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL9

SWAT WORKFLOW STEPS

3 Workload selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL10

SWAT WORKFLOW STEPS

4 Job initiation progress and termination

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL11

IPYTHON BASICS

What is Ipython

‒ Ipython stands for Interactive Python‒ A better python interpreter

‒ Interactive IDE for python (QTConsolehellip)

‒ Better web based front end for interactive analysis of scientific data (IPython notebooks)

‒ Also has parallel execution engine for running workloads on a cluster (not as full-featured as SWAT)

Installing

‒ Install as all-in-one package in Windowsreg Enthought Canopy Anaconda ActivePython pythonxy

‒ In Linuxreg we need to install the components separately (assuming setuptools already installed)

‒easy_install ipython[zmqqtconsolenotebook]

‒easy_install pandas

‒easy_install matplotlib

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL12

IPYTHON INTERFACES

Starting ipython

ipython

opens up default shell like console

ipython qtconsole --pylab=inline

opens up graphical qt-based console

ipython notebook --pylab=inline

instantiates server and a browser window pointing to the server instance

Server dashboard points to notebooks existing in machine and ways to create new ones

Users can connect to this server instance remotely and collaborate by working on the same notebook

We use this feature for our SWAT log analysis and bottleneck finding experiments

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL13

IPYTHON NOTEBOOK DASHBOARD

Note

All three modes offer similar functionality

We use the notebook interface due to easy sharing for collaborating remotely with multiple users

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL14

IPYTHON NOTEBOOK EXAMPLE SESSION

Any python expression can be run at the console prompt

In [1] a=range(110)

In [2] a

Out[2] [1 2 3 4 5 6 7 8 9]

Also run basic shell commandsIn [3] pwd

Out [3] ucmonirapu13sumatra

In [4] cd

cmonir

Even capture the output of any shell command or script into python variable

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 8: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL8

SWAT WORKFLOW STEPS

2 Workload Container Selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL9

SWAT WORKFLOW STEPS

3 Workload selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL10

SWAT WORKFLOW STEPS

4 Job initiation progress and termination

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL11

IPYTHON BASICS

What is Ipython

‒ Ipython stands for Interactive Python‒ A better python interpreter

‒ Interactive IDE for python (QTConsolehellip)

‒ Better web based front end for interactive analysis of scientific data (IPython notebooks)

‒ Also has parallel execution engine for running workloads on a cluster (not as full-featured as SWAT)

Installing

‒ Install as all-in-one package in Windowsreg Enthought Canopy Anaconda ActivePython pythonxy

‒ In Linuxreg we need to install the components separately (assuming setuptools already installed)

‒easy_install ipython[zmqqtconsolenotebook]

‒easy_install pandas

‒easy_install matplotlib

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL12

IPYTHON INTERFACES

Starting ipython

ipython

opens up default shell like console

ipython qtconsole --pylab=inline

opens up graphical qt-based console

ipython notebook --pylab=inline

instantiates server and a browser window pointing to the server instance

Server dashboard points to notebooks existing in machine and ways to create new ones

Users can connect to this server instance remotely and collaborate by working on the same notebook

We use this feature for our SWAT log analysis and bottleneck finding experiments

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL13

IPYTHON NOTEBOOK DASHBOARD

Note

All three modes offer similar functionality

We use the notebook interface due to easy sharing for collaborating remotely with multiple users

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL14

IPYTHON NOTEBOOK EXAMPLE SESSION

Any python expression can be run at the console prompt

In [1] a=range(110)

In [2] a

Out[2] [1 2 3 4 5 6 7 8 9]

Also run basic shell commandsIn [3] pwd

Out [3] ucmonirapu13sumatra

In [4] cd

cmonir

Even capture the output of any shell command or script into python variable

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 9: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL9

SWAT WORKFLOW STEPS

3 Workload selection

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL10

SWAT WORKFLOW STEPS

4 Job initiation progress and termination

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL11

IPYTHON BASICS

What is Ipython

‒ Ipython stands for Interactive Python‒ A better python interpreter

‒ Interactive IDE for python (QTConsolehellip)

‒ Better web based front end for interactive analysis of scientific data (IPython notebooks)

‒ Also has parallel execution engine for running workloads on a cluster (not as full-featured as SWAT)

Installing

‒ Install as all-in-one package in Windowsreg Enthought Canopy Anaconda ActivePython pythonxy

‒ In Linuxreg we need to install the components separately (assuming setuptools already installed)

‒easy_install ipython[zmqqtconsolenotebook]

‒easy_install pandas

‒easy_install matplotlib

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL12

IPYTHON INTERFACES

Starting ipython

ipython

opens up default shell like console

ipython qtconsole --pylab=inline

opens up graphical qt-based console

ipython notebook --pylab=inline

instantiates server and a browser window pointing to the server instance

Server dashboard points to notebooks existing in machine and ways to create new ones

Users can connect to this server instance remotely and collaborate by working on the same notebook

We use this feature for our SWAT log analysis and bottleneck finding experiments

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL13

IPYTHON NOTEBOOK DASHBOARD

Note

All three modes offer similar functionality

We use the notebook interface due to easy sharing for collaborating remotely with multiple users

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL14

IPYTHON NOTEBOOK EXAMPLE SESSION

Any python expression can be run at the console prompt

In [1] a=range(110)

In [2] a

Out[2] [1 2 3 4 5 6 7 8 9]

Also run basic shell commandsIn [3] pwd

Out [3] ucmonirapu13sumatra

In [4] cd

cmonir

Even capture the output of any shell command or script into python variable

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 10: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL10

SWAT WORKFLOW STEPS

4 Job initiation progress and termination

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL11

IPYTHON BASICS

What is Ipython

‒ Ipython stands for Interactive Python‒ A better python interpreter

‒ Interactive IDE for python (QTConsolehellip)

‒ Better web based front end for interactive analysis of scientific data (IPython notebooks)

‒ Also has parallel execution engine for running workloads on a cluster (not as full-featured as SWAT)

Installing

‒ Install as all-in-one package in Windowsreg Enthought Canopy Anaconda ActivePython pythonxy

‒ In Linuxreg we need to install the components separately (assuming setuptools already installed)

‒easy_install ipython[zmqqtconsolenotebook]

‒easy_install pandas

‒easy_install matplotlib

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL12

IPYTHON INTERFACES

Starting ipython

ipython

opens up default shell like console

ipython qtconsole --pylab=inline

opens up graphical qt-based console

ipython notebook --pylab=inline

instantiates server and a browser window pointing to the server instance

Server dashboard points to notebooks existing in machine and ways to create new ones

Users can connect to this server instance remotely and collaborate by working on the same notebook

We use this feature for our SWAT log analysis and bottleneck finding experiments

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL13

IPYTHON NOTEBOOK DASHBOARD

Note

All three modes offer similar functionality

We use the notebook interface due to easy sharing for collaborating remotely with multiple users

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL14

IPYTHON NOTEBOOK EXAMPLE SESSION

Any python expression can be run at the console prompt

In [1] a=range(110)

In [2] a

Out[2] [1 2 3 4 5 6 7 8 9]

Also run basic shell commandsIn [3] pwd

Out [3] ucmonirapu13sumatra

In [4] cd

cmonir

Even capture the output of any shell command or script into python variable

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 11: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL11

IPYTHON BASICS

What is Ipython

‒ Ipython stands for Interactive Python‒ A better python interpreter

‒ Interactive IDE for python (QTConsolehellip)

‒ Better web based front end for interactive analysis of scientific data (IPython notebooks)

‒ Also has parallel execution engine for running workloads on a cluster (not as full-featured as SWAT)

Installing

‒ Install as all-in-one package in Windowsreg Enthought Canopy Anaconda ActivePython pythonxy

‒ In Linuxreg we need to install the components separately (assuming setuptools already installed)

‒easy_install ipython[zmqqtconsolenotebook]

‒easy_install pandas

‒easy_install matplotlib

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL12

IPYTHON INTERFACES

Starting ipython

ipython

opens up default shell like console

ipython qtconsole --pylab=inline

opens up graphical qt-based console

ipython notebook --pylab=inline

instantiates server and a browser window pointing to the server instance

Server dashboard points to notebooks existing in machine and ways to create new ones

Users can connect to this server instance remotely and collaborate by working on the same notebook

We use this feature for our SWAT log analysis and bottleneck finding experiments

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL13

IPYTHON NOTEBOOK DASHBOARD

Note

All three modes offer similar functionality

We use the notebook interface due to easy sharing for collaborating remotely with multiple users

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL14

IPYTHON NOTEBOOK EXAMPLE SESSION

Any python expression can be run at the console prompt

In [1] a=range(110)

In [2] a

Out[2] [1 2 3 4 5 6 7 8 9]

Also run basic shell commandsIn [3] pwd

Out [3] ucmonirapu13sumatra

In [4] cd

cmonir

Even capture the output of any shell command or script into python variable

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 12: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL12

IPYTHON INTERFACES

Starting ipython

ipython

opens up default shell like console

ipython qtconsole --pylab=inline

opens up graphical qt-based console

ipython notebook --pylab=inline

instantiates server and a browser window pointing to the server instance

Server dashboard points to notebooks existing in machine and ways to create new ones

Users can connect to this server instance remotely and collaborate by working on the same notebook

We use this feature for our SWAT log analysis and bottleneck finding experiments

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL13

IPYTHON NOTEBOOK DASHBOARD

Note

All three modes offer similar functionality

We use the notebook interface due to easy sharing for collaborating remotely with multiple users

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL14

IPYTHON NOTEBOOK EXAMPLE SESSION

Any python expression can be run at the console prompt

In [1] a=range(110)

In [2] a

Out[2] [1 2 3 4 5 6 7 8 9]

Also run basic shell commandsIn [3] pwd

Out [3] ucmonirapu13sumatra

In [4] cd

cmonir

Even capture the output of any shell command or script into python variable

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 13: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL13

IPYTHON NOTEBOOK DASHBOARD

Note

All three modes offer similar functionality

We use the notebook interface due to easy sharing for collaborating remotely with multiple users

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL14

IPYTHON NOTEBOOK EXAMPLE SESSION

Any python expression can be run at the console prompt

In [1] a=range(110)

In [2] a

Out[2] [1 2 3 4 5 6 7 8 9]

Also run basic shell commandsIn [3] pwd

Out [3] ucmonirapu13sumatra

In [4] cd

cmonir

Even capture the output of any shell command or script into python variable

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 14: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL14

IPYTHON NOTEBOOK EXAMPLE SESSION

Any python expression can be run at the console prompt

In [1] a=range(110)

In [2] a

Out[2] [1 2 3 4 5 6 7 8 9]

Also run basic shell commandsIn [3] pwd

Out [3] ucmonirapu13sumatra

In [4] cd

cmonir

Even capture the output of any shell command or script into python variable

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 15: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL15

A COMMON PATTERN IN OUR DATA ANALYSIS

Read Data from sources

bull CSV JSON Excel Raw log file DB connectionhellip

Munge data into data structure suitable for plotting

bull Get it into a DataFrame (pandas library)

Do final plotting

bull dfplot()

Do any sub-range plotting for further details

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 16: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL16

TYPICAL DATA ANALYSIS SESSION

import pandas as pd

import matplotlibpyplot as plt

helliphelliphellip

swat_archive = varwwwhtmlrepolsquo

helliphelliphellip

cd $swat_archiveJob_TerasortExperiment_Feb5job_00_00

helliphelliphellip

df1 = pdread_csv(vmstatcsv parse_dates=[[datetime]] usecols=[date time idle])

df1 = df1set_index(date_time)

df1plot()

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 17: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL17

TYPICAL DATA ANALYSIS SESSION - 2

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 18: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL18

REUSE INTERACTIONS AS GENERIC FUNCTION

Once you are satisfied with your interactions put them in a script

Script has to have ipython as interpreter

usrbinipython

import matplotlib

matplotlibuse(Agg)

import matplotlibpyplot as plt

import pandas as pd

helliphelliphellip

task_list = [

CPU_Timeline_User_plus_Sys

Network_Timeline_Tx_plus_Rxw

DISK_Timeline_Read_Write_MBps

Compare_metrics_bottleneck

Compare_metrics_bottleneck_smoothed

Webserver_metrics

helliphelliphellip about 20 total tasks

]

def bottleneck( job mode=cpu img_name=None nodes=All

alt_repo=None nb=True cat=None arg=17 ifc=eth0

smooth=None)

create a custom graph and put it under images folder

hellip

def timeline ( argshelliphellip)

similar function

hellip

def main()

parse command line args and call appropriate graphing function

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 19: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL19

SWAT POST RUN Swat post job completion script now calls our graphing script

‒ graph_command_lineipy current_batch_num current_job_num

‒ Graphing script can also be run manually to create more fine tuned graphs

‒ Same can be done from Ipython notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 20: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL20

DATA IS BEAUTIFULCPU UTILIZATION FROM ONE RUN

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 21: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL21

DATA IS BEAUTIFUL - 2CPU UTILIZATION DATA FROM TWO RUNS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 22: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL22

DATA IS BEAUTIFUL - 3NETWORK UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 23: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL23

DATA IS BEAUTIFUL - 4

PERFORMANCE COUNTER DATA

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 24: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL24

DATA IS BEAUTIFUL PER-CORE CPU UTILIZATION

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 25: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL25

LIMITATIONS

Images are static no zooming

Zooming functionality can be done by interactively asking to plot certain sub-range of the series

Example sub-range plot

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 26: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL26

HOW IS THIS HELPING US

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 27: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL27

Workloads utilize the system resources

Certain workloads are CPU bound while certain others are DiskNetworkMemory bound

Ipython helps explore system logs and create nice graphs showing resource utilization‒Resource that is near 100 utilization is the bottleneck

‒ If system is not at bottleneck increase load by configuring workload and repeat

‒ If at bottleneck try to add more resources and repeat

‒Example scenarios in following slides

Enables us to optimizecharacterize systems for certain workloads

BOTTLENECK ANALYSIS

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 28: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL28

BOTTLENECK ANALYSIS CPU

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 29: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL29

BOTTLENECK ANALYSIS DISKHTTP1023612134REPOSBATCH_THU_08_AUG_2013_18_48_06JOB_01_00IMAGESBATCH_THU_08_AUG_2013_18_48_06_JOB_01_00_DISK_BOTTLENECK_SMOOTHED_SEAMICRO08PNG

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 30: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL30

WORKLOAD TUNING

1 Analyze workload logs of past SWAT runs from IPython notebook

2 Check resource utilizationperformance of a run

3 Analyze associated workload configuration and correlate

4 Create new workload configuration based on observation and insights on tuning

5 Push new config to SWAT template library

6 Initiate new run from SWAT using the new config

7 Repeat as needed to find optimal configuration

(WORK IN PROGRESS)

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 31: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL31

CONCLUSION

Ipython notebook is a great tool for interactive data analysis

For short exploratory sessions only not to code huge code base

Once done exploring - put into scripts for re-use

Has its limitations there are alternativesTableaureg interactive a la carte plots

needs licensing

no custom graphing but menu has lots of choices

d3js needs coding

OpenTSDB time series charts updated dynamically

etc

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 32: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL32

REFERENCES

Ipython tutorials

Pycon 2013 ldquoIPython in-depth high-productivity interactive and parallel pythonrdquo

httpwwwyoutubecomwatchv=bP8ydKBCZiY

SciPy2013 ldquoIPython in Depthrdquo

httpwwwyoutubecomwatchv=xe_ATRmw0KM

Ipython website httpipythonorg

Check the videos link for more tutorials

Support Stackoverflow tag ipython ipython-notebook

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 33: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL33

QUESTIONS DEMO

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners

Page 34: PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Monir Mozumder

| PRESENTATION TITLE | NOVEMBER 14 2013 | CONFIDENTIAL34

DISCLAIMER amp ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies omissions and typographical errors

The information contained herein is subject to change and may be rendered inaccurate for many reasons including but not limited to product and roadmap changes component and motherboard version changes new model andor product releases product differences between differing manufacturers software changes BIOS flashes firmware upgrades or the like AMD assumes no obligation to update or otherwise correct or revise this information However AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT INDIRECT SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES

ATTRIBUTION

copy 2013 Advanced Micro Devices Inc All rights reserved AMD the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices Inc in the United States andor other jurisdictions Linux is a registered trademark of Linus Torvalds Windows is a registered trademark of Microsoft Corporation and Tableau is a registered trademark of Tableau Software Other names are for informational purposes only and may be trademarks of their respective owners