Hortonworks Technical Workshop: Interactive Query with Apache Hive

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Interactive Query With Apache Hive

Dec 4, 2014

Ajay Singh


Agenda •  HDP 2.2

•  Apache Hive & Stinger Initiative

•  Stinger.Next

•  Putting It Together

•  Q&A


HDP 2.2 Generally Available

Hortonworks Data Platform 2.2

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Java Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

Others

ISV Engines

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

Slider Slider

SECURITY GOVERNANCE OPERATIONS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Data Workflow, Lifecycle & Governance

Falcon Sqoop Flume Kafka NFS

WebHDFS

Authentication Authorization Accounting

Data Protection

Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon

Cluster: Knox Cluster: Ranger

Deployment Choice Linux Windows On-Premises Cloud

YARN is the architectural center of HDP

Enables batch, interactive and real-time workloads

Provides comprehensive enterprise capabilities

The widest range of deployment options

Delivered Completely in the OPEN


HDP IS Apache Hadoop

There is ONE Enterprise Hadoop: everything else is a vendor derivation

Hortonworks Data Platform 2.2

Had

oop

&YA

RN

Pig

Hiv

e &

HC

atal

og

HB

ase

Sqo

op

Ooz

ie

Zoo

keep

er

Am

bari

Sto

rm

Flu

me

Kno

x

Pho

enix

Acc

umul

o

2.2.0 0.12.0

0.12.0 2.4.0

0.12.1

Data Management

0.13.0

0.96.1

0.98.0

0.9.1 1.4.4

1.3.1

1.4.0

1.4.4

1.5.1

3.3.2

4.0.0

3.4.5 0.4.0

4.0.0

1.5.1

Fal

con

0.5.0

Ran

ger

Spa

rk

Kaf

ka

0.14.0 0.14.0

0.98.4

1.6.1

4.2 0.9.3

1.2.0 0.6.0

0.8.1

1.4.5 1.5.0

1.7.0

4.1.0 0.5.0

0.4.0 2.6.0

* version numbers are targets and subject to change at time of general availability in accordance with ASF release process

3.4.5

Tez

0.4.0

Slid

er

0.60

HDP 2.0

October

2013

HDP 2.2 October

2014

HDP 2.1

April

2014

Sol

r

4.7.2

4.10.0

0.5.1

Data Access Governance & Integration Security Operations


Complete List of New Features in HDP 2.2 Apache Hadoop YARN •  Slide existing services onto YARN through ‘Slider’ •  GA release of HBase, Accumulo, and Storm on

YARN •  Support long running services: handling of logs,

containers not killed when AM dies, secure token renewal, YARN Labels for tagging nodes for specific workloads

•  Support for CPU Scheduling and CPU Resource Isolation through CGroups

Apache Hadoop HDFS •  Heterogeneous storage: Support for archival •  Rolling Upgrade (This is an item that applies to the

entire HDP Stack. YARN, Hive, HBase, everything. We now support comprehensive Rolling Upgrade across the HDP Stack).

•  Multi-NIC Support •  Heterogeneous storage: Support memory as a

storage tier (TP) •  HDFS Transparent Data Encryption (TP) Apache Hive, Apache Pig, and Apache Tez •  Hive Cost Based Optimizer: Function Pushdown &

Join re-ordering support for other join types: star & bushy.

•  Hive SQL Enhancements including: •  ACID Support: Insert, Update, Delete •  Temporary Tables •  Metadata-only queries return instantly •  Pig on Tez •  Including DataFu for use with Pig •  Vectorized shuffle •  Tez Debug Tooling & UI

Hue •  Support for HiveServer 2 •  Support for Resource Manager HA

Apache Spark •  Refreshed Tech Preview to Spark 1.1.0 (available

now) •  ORC File support & Hive 0.13 integration •  Planned for GA of Spark 1.2.0 •  Operations integration via YARN ATS and Ambari •  Security: Authentication •  Apache Solr •  Added Banana, a rich and flexible UI for visualizing

time series data indexed in Solr •  Cascading •  Cascading 3.0 on Tez distributed with HDP

— coming soon Apache Falcon •  Authentication Integration •  Lineage – now GA. (it’s been a tech preview

feature…) •  Improve UI for pipeline management & editing: list,

detail, and create new (from existing elements) •  Replicate to Cloud – Azure & S3 Apache Sqoop, Apache Flume & Apache Oozie •  Sqoop import support for Hive types via HCatalog •  Secure Windows cluster support: Sqoop, Flume,

Oozie •  Flume streaming support: sink to HCat on secure

cluster •  Oozie HA now supports secure clusters •  Oozie Rolling Upgrade •  Operational improvements for Oozie to better support

Falcon •  Capture workflow job logs in HDFS •  Don’t start new workflows for re-run •  Allow job property updates on running jobs

Apache HBase, Apache Phoenix, & Apache Accumulo •  HBase & Accumulo on YARN via Slider •  HBase HA •  Replicas update in real-time •  Fully supports region split/merge •  Scan API now supports standby RegionServers •  HBase Block cache compression •  HBase optimizations for low latency •  Phoenix Robust Secondary Indexes •  Performance enhancements for bulk import into

Phoenix •  Hive over HBase Snapshots •  Hive Connector to Accumulo •  HBase & Accumulo wire-level encryption •  Accumulo multi-datacenter replication Apache Storm •  Storm-on-YARN via Slider •  Ingest & notification for JMS (IBM MQ not supported) •  Kafka bolt for Storm – supports sophisticated

chaining of topologies through Kafka •  Kerberos support •  Hive update support – Streaming Ingest •  Connector improvements for HBase and HDFS •  Deliver Kafka as a companion component •  Kafka install, start/stop via Ambari •  Security Authorization Integration with Ranger Apache Slider •  Allow on-demand create and run different versions of

heterogeneous applications •  Allow users to configure different application

instances differently •  Manage operational lifecycle of application instances •  Expand / shrink application instances •  Provide application registry for publish and discovery

Apache Knox & Apache Ranger (Argus) & HDP Security •  Apache Ranger – Support authorization and auditing

for Storm and Knox •  Introducing REST APIs for managing policies in

Apache Ranger •  Apache Ranger – Support native grant/revoke

permissions in Hive and HBase •  Apache Ranger – Support Oracle DB and storing of

audit logs in HDFS •  Apache Ranger to run on Windows environment •  Apache Knox to protect YARN RM •  Apache Knox support for HDFS HA •  Apache Ambari install, start/stop of Knox Apache Ambari •  Support for HDP 2.2 Stack, including support for

Kafka, Knox and Slider •  Enhancements to Ambari Web configuration

management including: versioning, history and revert, setting final properties and downloading client configurations

•  Launch and monitor HDFS rebalance •  Perform Capacity Scheduler queue refresh •  Configure High Availability for ResourceManager •  Ambari Administration framework for managing user

and group access to Ambari •  Ambari Views development framework for

customizing the Ambari Web user experience •  Ambari Stacks for extending Ambari to bring custom

Services under Ambari management •  Ambari Blueprints for automating cluster

deployments •  Performance improvements and enterprise usability

guardrails


Just How Many New Features are in HDP 2.2? Apache Hadoop YARN •  Slide existing services onto YARN through ‘Slider’ •  GA release of HBase, Accumulo, and Storm on

YARN •  Support long running services: handling of logs,

containers not killed when AM dies, secure token renewal, YARN Labels for tagging nodes for specific workloads

•  Support for CPU Scheduling and CPU Resource Isolation through CGroups

Apache Hadoop HDFS •  Heterogeneous storage: Support for archival •  Rolling Upgrade (This is an item that applies to the

entire HDP Stack. YARN, Hive, HBase, everything. We now support comprehensive Rolling Upgrade across the HDP Stack).

•  Multi-NIC Support •  Heterogeneous storage: Support memory as a

storage tier (TP) •  HDFS Transparent Data Encryption (TP) Apache Hive, Apache Pig, and Apache Tez •  Hive Cost Based Optimizer: Function Pushdown &

Join re-ordering support for other join types: star & bushy.

•  Hive SQL Enhancements including: •  ACID Support: Insert, Update, Delete •  Temporary Tables •  Metadata-only queries return instantly •  Pig on Tez •  Including DataFu for use with Pig •  Vectorized shuffle •  Tez Debug Tooling & UI

Hue •  Support for HiveServer 2 •  Support for Resource Manager HA

Apache Spark •  Refreshed Tech Preview to Spark 1.1.0 (available

now) •  ORC File support & Hive 0.13 integration •  Planned for GA of Spark 1.2.0 •  Operations integration via YARN ATS and Ambari •  Security: Authentication •  Apache Solr •  Added Banana, a rich and flexible UI for visualizing

time series data indexed in Solr •  Cascading •  Cascading 3.0 on Tez distributed with HDP

— coming soon Apache Falcon •  Authentication Integration •  Lineage – now GA. (it’s been a tech preview

feature…) •  Improve UI for pipeline management & editing: list,

detail, and create new (from existing elements) •  Replicate to Cloud – Azure & S3 Apache Sqoop, Apache Flume & Apache Oozie •  Sqoop import support for Hive types via HCatalog •  Secure Windows cluster support: Sqoop, Flume,

Oozie •  Flume streaming support: sink to HCat on secure

cluster •  Oozie HA now supports secure clusters •  Oozie Rolling Upgrade •  Operational improvements for Oozie to better support

Falcon •  Capture workflow job logs in HDFS •  Don’t start new workflows for re-run •  Allow job property updates on running jobs

Apache HBase, Apache Phoenix, & Apache Accumulo •  HBase & Accumulo on YARN via Slider •  HBase HA •  Replicas update in real-time •  Fully supports region split/merge •  Scan API now supports standby RegionServers •  HBase Block cache compression •  HBase optimizations for low latency •  Phoenix Robust Secondary Indexes •  Performance enhancements for bulk import into

Phoenix •  Hive over HBase Snapshots •  Hive Connector to Accumulo •  HBase & Accumulo wire-level encryption •  Accumulo multi-datacenter replication Apache Storm •  Storm-on-YARN via Slider •  Ingest & notification for JMS (IBM MQ not supported) •  Kafka bolt for Storm – supports sophisticated

chaining of topologies through Kafka •  Kerberos support •  Hive update support – Streaming Ingest •  Connector improvements for HBase and HDFS •  Deliver Kafka as a companion component •  Kafka install, start/stop via Ambari •  Security Authorization Integration with Ranger Apache Slider •  Allow on-demand create and run different versions of

heterogeneous applications •  Allow users to configure different application

instances differently •  Manage operational lifecycle of application instances •  Expand / shrink application instances •  Provide application registry for publish and discovery

Apache Knox & Apache Ranger (Argus) & HDP Security •  Apache Ranger – Support authorization and auditing

for Storm and Knox •  Introducing REST APIs for managing policies in

Apache Ranger •  Apache Ranger – Support native grant/revoke

permissions in Hive and HBase •  Apache Ranger – Support Oracle DB and storing of

audit logs in HDFS •  Apache Ranger to run on Windows environment •  Apache Knox to protect YARN RM •  Apache Knox support for HDFS HA •  Apache Ambari install, start/stop of Knox Apache Ambari •  Support for HDP 2.2 Stack, including support for

Kafka, Knox and Slider •  Enhancements to Ambari Web configuration

management including: versioning, history and revert, setting final properties and downloading client configurations

•  Launch and monitor HDFS rebalance •  Perform Capacity Scheduler queue refresh •  Configure High Availability for ResourceManager •  Ambari Administration framework for managing user

and group access to Ambari •  Ambari Views development framework for

customizing the Ambari Web user experience •  Ambari Stacks for extending Ambari to bring custom

Services under Ambari management •  Ambari Blueprints for automating cluster

deployments •  Performance improvements and enterprise usability

guardrails

88 Astonishing amount of innovation in the OPEN Apache Community

HDP is Apache Hadoop


Apache Hive & Stinger Initiative


Hive – Single tool for all SQL use cases

OLTP, ERP, CRM Systems

Unstructured documents, emails

Clickstream

Server logs

Sen>ment, Web Data

Sensor. Machine Data

Geoloca>on

Interactive Analytics

Batch Reports / Deep Analytics

Hive - SQL

ETL / ELT


Hive Scales To Any Workload

Page 9

"  The original developers of Hive. "  More data than existing RDBMS could handle. "  100+ PB of data under management. "  15+ TB of data loaded daily. "  60,000+ Hive queries per day. "  More than 1,000 users per day.


Hive Join Strategies

Page 10

Type Approach Pros Cons

Shuffle Join

Join keys are shuffled using map/reduce and joins performed reduce side.

Works regardless of data size or layout.

Most resource-intensive and slowest join type.

Broadcast Join

Small tables are loaded into memory in all nodes, mapper scans through the large table and joins.

Very fast, single scan through largest table.

All but one table must be small enough to fit in RAM.

Sort-Merge-Bucket Join

Mappers take advantage of co-location of keys to do efficient joins.

Very fast for tables of any size.

Data must be bucketed ahead of time.


Stinger Initiative

• Stinger Initiative – DELIVERED Next generation SQL based interactive query in Hadoop Speed Improve Hive query performance has increased by 100X to allow for interactive query times (seconds)

Scale The only SQL interface to Hadoop designed for queries that scale from TB to PB

SQL Support broadest range of SQL semantics for analytic applications running against Hadoop

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns

Data Access

Data Management

HDP 2.1

An Open Community at its finest: Apache Hive Contribution

1,672 Jira Tickets Closed

145 Developers

44 Companies

360,000 Lines Of Code Added… (2.5x)

Apache YARN

Apache MapReduce

1 ° ° °

° ° ° °

° ° ° °

°

°

N

HDFS (Hadoop Distributed File System)

Apache Tez

Apache Hive SQL

Business Analy=cs Custom Apps

13 Months

Hive 10

100’s to 1000’s of seconds

seconds Hive 13

Dramatically faster queries

speeds time to insight


Stinger Initiative - Key Innovations

File Format

ORCFile

Execution Engine

Tez

= 100X + + Query Planner

CBO


Tez (“Speed”)

• What is it? – A data processing framework as an alternative to MapReduce

• Who else is involved? – Hortonworks, Facebook, Twitter, Yahoo, Microsoft

• Why does it matter? – Widens the platform for Hadoop use cases – Crucial to improving the performance of low-latency applications – Core to the Stinger initiative – Evidence of Hortonworks leading the community in the evolution of Enterprise Hadoop


Hive – MR Hive – Tez

Comparing: Hive/MR vs. Hive/Tez

Page 14

SELECT a.state, COUNT(*), AVERAGE(c.price) FROM a

JOIN b ON (a.id = b.id) JOIN c ON (a.itemId = c.itemId)

GROUP BY a.state

SELECT a.state

JOIN (a, c) SELECT c.price

SELECT b.id

JOIN(a, b) GROUP BY a.state

COUNT(*) AVERAGE(c.price)

M M M

R R

M M

R

M M

R

M M

R

HDFS

HDFS

HDFS

M M M

R R

R

M M

R

R

SELECT a.state, c.itemId

JOIN (a, c)

JOIN(a, b) GROUP BY a.state

COUNT(*) AVERAGE(c.price)

SELECT b.id

Tez avoids unneeded writes to HDFS


ORCFile – Columnar Storage for Hive

• Columns stored separately

• Knows types – Uses type-specific encoders – Stores statistics (min, max, sum, count)

• Has light-weight index – Skip over blocks of rows that don’t matter

Page 15


ORCFile – Columnar Storage for Hive

Large block size ideal for map/reduce.

Columnar format enables high compression and high performance.


Query Planner – Cost Based Optimizer in Hive

The Cost-Based Optimizer (CBO) uses statistics within Hive tables to produce optimal query plans

Why cost-based optimization? •  Ease of Use – Join Reordering •  Reduces the need for specialists to tune queries. •  More efficient query plans lead to better cluster utilization.

Page 17


Statistics: Foundations for CBO

Kind of statistics Table Statistics – Collected on load per partition •  Uncompressed size

•  Number of rows

•  Number of files

Column Statistics – Required by CBO •  NDV (Number of Distinct Values)

•  Nulls, Min, Max

Usability - How does the data get Statistics Analyze Table Command •  Analyze entire table

•  Run this command per partition

•  Run for some partitions and the compiler will extrapolate statistics

Collecting statistics on load •  Table stats can be collected if you insert via hive using set

hive.stats.autogather=true

•  Not with load data file


A Journey to SQL Compliance

Evolu=on of SQL Compliance in Hive SQL Datatypes SQL Seman=cs

INT/TINYINT/SMALLINT/BIGINT SELECT, INSERT

FLOAT/DOUBLE GROUP BY, ORDER BY, HAVING

BOOLEAN JOIN on explicit join key

ARRAY, MAP, STRUCT, UNION Inner, outer, cross and semi joins

STRING Sub-‐queries in the FROM clause

BINARY ROLLUP and CUBE

TIMESTAMP UNION

DECIMAL Standard aggrega>ons (sum, avg, etc.)

DATE Custom Java UDFs

VARCHAR Windowing func>ons (OVER, RANK, etc.)

CHAR Advanced UDFs (ngram, XPath, URL)

JOINs in WHERE Clause

Sub-‐queries for IN/NOT IN, HAVING

Legend

Hive 10 or earlier

Hive 11

Hive 12

Hive 13

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns

Data Access

Data Management

HDP 2.1


Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the

end of the beginning. -Winston Churchill

Hive 0.13


Stinger.Next


Stinger.Next: Delivery Themes

Hive 0.14

•  Transac>ons with ACID allowing insert, update and delete

•  Streaming Ingest

•  Cost Based Op>mizer op>mizes star and bushy join queries

Sub-‐Second 1st Half 2015

•  Sub-‐Second queries with LLAP

•  Hive-‐Spark Machine Learning integra>on

•  Opera>onal repor>ng with Hive Streaming Ingest and Transac>ons

Richer Analy=cs 2nd Half 2015

•  Toward SQL:2011 Analy>cs

•  Materialized Views

•  Cross-‐Geo Queries

•  Workload Management via YARN and LLAP integra>on


Transaction Use Cases Reporting with Analytics (YES) Reporting on data with occasional updates Corrections to the fact tables, evolving dimension tables

Low concurrency updates, low TPS4

Operational Reporting (YES) High throughput ingest from operational (OLTP) database

Periodic inserts every 5-30 minutes

Requires tool support and changes in our Transactions

Operational (OLTP) Database (NO) Small Transactions, each doing single line inserts

High Concurrency - Hundreds to thousands of connections

Hive

OLTP Hive Replication

Analytics Modifications

Hive

High Concurrency OLTP


Deep Dive: Transactions Transaction Support in Hive with ACID semantics •  Hive native support for INSERT, UPDATE, DELETE. •  Split Into Phases:

•  Phase 1: Hive Streaming Ingest (append) •  Phase 2: INSERT / UPDATE / DELETE Support •  Phase 3: BEGIN / COMMIT / ROLLBACK Txn

[Done]

[Done]

[Next]

Read-Optimized ORCFile

Delta File Merged Read-

Optimized ORCFile

1. Original File Task reads the latest

ORCFile

Task

Read-Optimized ORCFile

Task Task

2. Edits Made Task reads the ORCFile and merges

the delta file with the edits

3. Edits Merged Task reads the

updated ORCFile

Hive ACID Compactor periodically merges the delta

files in the background.


Transactions - Requirements

Needs to declare table as having Transaction Property

Table must be in ORC format

Tables must to be bucketed

Page 25


Putting It Together


Step 1 - Turn On Transactions Hive Configuration

§  hive.support.concurrency=true

§  hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

§  hive.compactor.initiator.on=true

§  hive.compactor.worker.threads=2

§  hive.enforce.bucketing=true

§  hive.exec.dynamic.partition.mode=nonstrict

Page 27


Step 2 – Enable Concurrency By Defining Queues

YARN Configuration

§  yarn.scheduler.capacity.root.default.capacity=50

§  yarn.scheduler.capacity.root.hiveserver.capacity=50

§  yarn.scheduler.capacity.root.hiveserver.hive1.capacity=50

§  yarn.scheduler.capacity.root.hiveserver.hive1.user-limit-factor=4

§  yarn.scheduler.capacity.root.hiveserver.hive2.capacity=50

§  yarn.scheduler.capacity.root.hiveserver.hive2.user-limit-factor=4

§  yarn.scheduler.capacity.root.hiveserver.queues=hive1,hive2

§  yarn.scheduler.capacity.root.queues=default,hiveserver

Default

Hive1

Hive2

Cluster Capacity


Step 3 – Deliver Capacity Guarantees BY Enabling YARN Preemption

YARN Configuration

§  yarn.resourcemanager.scheduler.monitor.enable=true

§  yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy

§  yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval=1000

§  yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill=5000

§  yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round=0.4


Enable Sessions For Hive Queues

Step 4 – Enable Tez Execution Engine & Tez Sessions

Hive Configuration

§  hive.execution.engine=tez

§  hive.server2.tez.initialize.default.sessions=true

§  hive.server2.tez.default.queues=hive1,hive2

§  hive.server2.tez.sessions.per.default.queue=1

§  hive.server2.enable.doAs=false

§  hive.vectorized.groupby.maxentries=10240

§  hive.vectorized.groupby.flush.percent=0.1


Step 5 - Create Partitioned & Bucketed ORC Tables

Create table if not exists test (id int, val string)

partitioned by (year string,month string,day string)

clustered by (id) into 7 buckets

stored as orc TBLPROPERTIES ("transactional"="true”);

Note: §  Transaction Requires Bucketed tables in ORC

Format. Tables cannot be sorted.

§  Transactional=true must be set as table properties

§  For performance, table Partition is recommended but not mandatory §  Partition on filter columns with low

cardinality §  For optimal performance stay below 1000

partitions §  Cluster on join columns

§  Number of buckets contingent on dataset size


Step 6 - Loading Data into ORC table

§  SQOOP, FLUME & STORM support direct ingestion to ORC Tables

§  Have a Text File ? §  Load to a Table stored as textfile §  Transfer to ORC Table using Hive insert statement


Step 7 - Compute Statistics §  Compute Table Stats

analyze table test partition(year,month,day) compute statistics;

§  Compute Column Stats

analyze table test partition(year,month,day) compute statistics for columns;

§  Keep Stats Updated §  Speed computation by limiting it to partitions that have

changed

Note: §  In hive 0.14, column stats can be

calculated for all partitions in a single statement

§  To limit computation to a specific partition, specify partition keys


Sample Code – Sqoop Import To ORC Table

sqoop import --verbose --connect 'jdbc:mysql://localhost/people' --table persons --username root --hcatalog-table persons --hcatalog-storage-stanza "stored as orc" -m 1

Use Hcatalog to import to ORC Table


Sample Code – Flume Configuration For Hive Streaming Ingest ## Agent

agent.sources = csvfile

agent.sources.csvfile.type = exec

agent.sources.csvfile.command = tail -F /root/test.txt

agent.sources.csvfile.batchSize = 1

agent.sources.csvfile.channels = memoryChannel

agent.sources.csvfile.interceptors = intercepttime

agent.sources.csvfile.interceptors.intercepttime.type = timestamp

## Channels

agent.channels = memoryChannel

agent.channels.memoryChannel.type = memory

agent.channels.memoryChannel.capacity = 10000

## Hive Streaming Sink

agent.sinks = hiveout

agent.sinks.hiveout.type = hive

agent.sinks.hiveout.hive.metastore=thrift://localhost:9083

agent.sinks.hiveout.hive.database=default

agent.sinks.hiveout.hive.table=test

agent.sinks.hiveout.hive.partition=%Y,%m,%d

agent.sinks.hiveout.serializer = DELIMITED

agent.sinks.hiveout.serializer.fieldnames =id,val

agent.sinks.hiveout.channel = memoryChannel


Q&A

Hortonworks Technical Workshop: Interactive Query with Apache Hive

Technology

Transcript of Hortonworks Technical Workshop: Interactive Query with Apache Hive