BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen...
Transcript of BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen...
![Page 1: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework](https://reader034.fdocuments.net/reader034/viewer/2022051104/5a751f3f7f8b9a4b538c3a86/html5/thumbnails/1.jpg)
BigSQL = Postgres + Hadoop Denis Lussier
![Page 2: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework](https://reader034.fdocuments.net/reader034/viewer/2022051104/5a751f3f7f8b9a4b538c3a86/html5/thumbnails/2.jpg)
BigSQL Best of Both Worlds
Agenda
OpenSCG Facts Community Distributions
Demo Moving Forward
BigSQL = Hadoop + Postgres
Q&A
![Page 3: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework](https://reader034.fdocuments.net/reader034/viewer/2022051104/5a751f3f7f8b9a4b538c3a86/html5/thumbnails/3.jpg)
OpenSCG Facts • Started Operations in early 2010 • Profitable, No Outside Investment • 40+ Member Team • Headquartered in Bridgewater, NJ • Offices in
• San Mateo, CA • Hyderabad, India
• Healthy Controlled Growth
Experts committed to helping our clients gain
strategic advantage leveraging PostgreSQL,
Hadoop and Java. Keen focus on high
availability and performance.
![Page 4: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework](https://reader034.fdocuments.net/reader034/viewer/2022051104/5a751f3f7f8b9a4b538c3a86/html5/thumbnails/4.jpg)
• Cross platform OpenJDK
OpenSCG Community Projects
• Eclipse BIRT for x86 installers Java & report developers
• pgOn operations network web console
• tPostgres Postgres with SQL Server compatibility
• PostgresHA PostgreSQL that’s highly available
• BenchmarkSQL TPC-C Like for major RDBMS’s
• pgHive Postgres & Hadoop connector
• PostgreSQL RPM & DEB packages
• And Now BigSQL!
![Page 5: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework](https://reader034.fdocuments.net/reader034/viewer/2022051104/5a751f3f7f8b9a4b538c3a86/html5/thumbnails/5.jpg)
Postgres
• World’s most advanced open source database solution
• Enterprise class including MVCC, streaming replication & rich data type support (to name a few!)
• Robust transaction support with strong ANSI-SQL compliance
Hadoop
• Big data distributed framework • Reliable, massively scalable &
proven • Failures handled at the application
layer allowing commodity hardware
BigSQL Best of Both Worlds
![Page 6: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework](https://reader034.fdocuments.net/reader034/viewer/2022051104/5a751f3f7f8b9a4b538c3a86/html5/thumbnails/6.jpg)
POSTGRESQL World’s leading Open Source RDBMS
World’s leading Big Data distributed framework
HADOOP
BigSQL Always 100% Open & Free! No Strings!
![Page 7: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework](https://reader034.fdocuments.net/reader034/viewer/2022051104/5a751f3f7f8b9a4b538c3a86/html5/thumbnails/7.jpg)
BigSQL Postgres Components
PostgreSQL Advanced RDBMS postgresql.org
pgHA
PG connection controller
postgresha.org
pgBouncer
Postgres high availability
pgfoundry/pgbouncer
JDBC4 PostgreSQL driver pgJDBC jdbc.postgresql.org
PG spatial DB extender PostGIS postgis.net
![Page 8: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework](https://reader034.fdocuments.net/reader034/viewer/2022051104/5a751f3f7f8b9a4b538c3a86/html5/thumbnails/8.jpg)
BigSQL Hadoop Components
Hadoop Big Data distributed framework
hadoop.apache.org
Hive
Cluster coordinator & lock manager
hive.apache.org
Zookeeper
SQL-like queries via map reduce
zookeeper.apache.org
RDBMS/HDFS data transfer
Sqoop sqoop.apcahe.org
Streaming log data into HDFS
Flume
flume.apache.org
![Page 9: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework](https://reader034.fdocuments.net/reader034/viewer/2022051104/5a751f3f7f8b9a4b538c3a86/html5/thumbnails/9.jpg)
BigSQL Hadoop Components
Hbase Random, real-time IO over HDFS
hbase.apache.org
Pig
Share Hive schemas with Pig
pig.apache.org
HCatalog
Platform for parallel data analysis
apache.org/hcatalog
Data serialization system Avro avro.apache.org
![Page 10: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework](https://reader034.fdocuments.net/reader034/viewer/2022051104/5a751f3f7f8b9a4b538c3a86/html5/thumbnails/10.jpg)
BigSQL Additional Components
BenchmarkSQL Java benchmark example sourceforge/benchmarksql
Ambari
Cluster-wide metrics collection
incubator.apache.org/ambari
Ganglia
Provision and manage Hadoop clusters
ganglia.info
Monitoring and alerting Nagios nagios.org
![Page 11: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework](https://reader034.fdocuments.net/reader034/viewer/2022051104/5a751f3f7f8b9a4b538c3a86/html5/thumbnails/11.jpg)
BigSQL Architecture 2013
HADOOP Cluster(HDFS + Map-Reduce)
Data Node
Name Node Job Tracker
Task Tracker
SQL Parallel Query
Driver (Compiler, Optimizer, Executor) Postgres Metastore
Web UI Console UI HIVE
![Page 12: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework](https://reader034.fdocuments.net/reader034/viewer/2022051104/5a751f3f7f8b9a4b538c3a86/html5/thumbnails/12.jpg)
BigSQL Moving Forward
• Out of Box Configurations for: • HBase & Pig • Oozie • Ambari, Ganglia & Nagios
• Deeper Postgres and Hadoop Integration
• Additional Examples
![Page 13: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework](https://reader034.fdocuments.net/reader034/viewer/2022051104/5a751f3f7f8b9a4b538c3a86/html5/thumbnails/13.jpg)
www.openscg.com [email protected] 1200 Rt 22 East – Suite 2000 Bridgewater, NJ 08807 (908) 203-4725
BigSQL Q & A