PyData & Apache Spark
2017 / 2 / 10 Sapporo TechBar #7
@
▸ facebook : Ryuji Tamagawa
▸ Twitter : tamagawa_ryuji
▸ FB
techbar
▸ FB
▸ Twitter
Python
PyData
Apache Spark
Jupyter Notebook
2017
and the
future
Pandas
1 / 5 : PyData
PyData.org
1 / 5 : PyData
PyData
Anaconda PythonBlaze NumPy and pandas interface to Big Data'. daskBokeh
Canopy PythonIPython
matplotlib PyDatanose
numba JITNumPy PyDataScipy PyData
StatsmodelsSymPypandas NumPy SciPy
scikit-imagescikit-learn PyData
2 / 5 : pandas
pandas
▸ NumPy SciPy
▸ DataFrame
▸
2 / 5 : pandas
pandas Wes McKinney
2 / 5 : pandas
DataFrame
2 / 5 : pandas
▸
Python
▸
▸ PyData pandas
3 /5 : Jupyter Notebook
IPython Notebook
▸ Jupyter Notebook
▸ Julia Python R
▸ JupyterCon
3 /5 : Jupyter Notebook
3 /5 : Jupyter Notebook
3 /5 : Jupyter Notebook
pandas / matplotlib
3 /5 : Jupyter Notebook
Interactive Widget
3 /5 : Jupyter Notebook
▸ Learning Jupyter
4 / 5 : Apache Spark
Hadoop
▸ MapReduce Spark
▸ 2010 Hadoop = MapReduce + HDFS
▸ Hadoop
OSHDFS
Hive e.t.c.
HBaseMapReduce
YARN
Impala e.t.c in-
memory SQL engine
Spark Spark Streaming, MLlib, GraphX, Spark SQL)
Hadoop
HDFS S3
YARN Mesos
/
4 / 5 : Apache Spark
Apache Spark PyData pandas
Apache Spark pandas
JVM Python
× dask
I/OScala Java Python R
JVMPython
4 / 5 : Apache Spark
Spark
▸
▸
▸ 1 PC
Hadoop / MapReduce
4 / 5 : Apache Spark
DataFrame
4 / 5 : Apache Spark
▸
▸ SSD
▸ Spark Parquet
▸ Performance comparison of different file formats
and storage engines in the Hadoop ecosystem
▸ Parquet Python
4 / 5 : Apache Spark
Apache Spark
▸
▸ Parquet
▸
▸
Machine Learning
▸
▸ scikit-learn
▸ Spark MLlib / ML
▸
▸ TensorFlow
▸ Python
2017 and the future
5/5 : 2017 and the future
PyData
▸
▸ Spark - pandas
▸ pandas → Spark …
5/5 : 2017 and the future
Wes blog
▸ pandas Apache Arrow
▸ Blog
▸ PyData Blog
Wes OK
▸ 2017 : pandas, Arrow, Feather, Parquet, Spark, Ibis
http://qiita.com/tamagawa-ryuji/items/deb3f63ed4c7c8065e81
5/5 : 2017 and the future
High speed Apache Parquet for Python
▸ Parquet
▸ Spark
▸ Python
▸ Fastparquet
▸ pyarrow
5/5 : 2017 and the future
: apache arrow
▸ apache arrow
▸ PyData / OSS
▸ /