1 Sqoop 2 Introduction Mengwei Ding, Software Engineer Intern at Cloudera.
-
Upload
kamryn-ridings -
Category
Documents
-
view
217 -
download
1
Transcript of 1 Sqoop 2 Introduction Mengwei Ding, Software Engineer Intern at Cloudera.
2
What is Sqoop
• Apache Top-Level Project• SQl and hadOOP• Transfer a large bulk of data
• From relational data warehouses: Teradata, MySQL, PostgreSQL, Oracle, Netezza
• To Hadoop ecosystem: HDFS, Hive, HBase, Avio• Vice versa
• Sqoop 1(1.4.3) and Sqoop 2(1.99.2)
4
Sqoop 1 Challenges
• Command line tool, configured with line arguments(60+!)
• Connector-driven:o Responsible for metadata lookups and data transfero JDBC vocabulary-enforced (--connect)o Implicit connector selection
• Non-uniform, duplicated functionality
• Client accesses hadoop configurations and databases directly
• Security Concerns:o Client needs to know credentials to databases
• Type mapping is not clearly defined
5
Sqoop 2 - Design Goals
• Same goal: transfer data around
• Ease of Useo Sqoop as a Serviceo Domain Specific Interactions without too many args
• Ease of Extensiono No low-level Hadoop knowledge neededo Uniform functionality of connectors, no functional
overlap between connectors
• Security and Separation of Concernso Role based access and use
7
Sqoop 2 - Connection vs Job Metadata
• There are two distinct sets of optionso Connection (distinct per database)o Job (distinct per table)
8
Sqoop 2 - Connection vs Job Metadata
• Another distinct two sets of argumentso Connector specifico Shared across all connectors
9
Sqoop 2 - Security
• Support for secure access to external system via role-based access to connection objectso Administrators create/edit/delete connectionso Operators use connections
• Connection encompass credentialso Connection created once, then reused latero Created by Admin, used by operator to safeguard
credential access from end user
10
Sqoop 2 - Resource Management
• Connections allow specification of resource policyo Administrator can limit the total number of physical
connections open at one timeo Connections can be disabled
11
Sqoop 2 - Current Status
• Primary focus of Sqoop community
• Second cut: 1.99.2o bits and docs: http://sqoop.apache.org