Vertica joins a refresher

7
3/3/2016 Vertica Joins: A Refresher HPE Developer Community https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 1/7

Transcript of Vertica joins a refresher

Page 1: Vertica joins  a refresher

3/3/2016 Vertica Joins: A Refresher HPE Developer Community

https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 1/7

Sign up Log In

HPE Dev Community > Big Data & Analytics > Big Data Blogs > Vertica Blog > Vertica Joins: A Refresher

Search our blogs

Options

0 LikesPost a Comment

Vertica Joins: A RefresherBy VickiL a week ago

As a Vertica user, you know that using joins can improve query performance by combining records from one or more

tables. But sometimes, you need to develop complex joins. Vertica supports many different kinds of joins that perform

different functions based on your needs.

In this blog, we’ll give you a refresher on join algorithms and predicates, as well as the following Vertica join types:

In Vertica, we refer to the tables participating in the join as left or right. The left table is specified first in the join

statement, and the right table is the table mentioned second.

Examples used in this blogFor this blog, we’ll use the following scenario:

Your organization wants employees to increase or refresh their computer programming skills. To accomplish this, you

put together a list of courses from a local university that your employees can take. To help keep track of employees and

completed classes, you reference two tables: employees and courses, described in more detail below:

employees: A table with detailed employee information for all of your organization’s 87 employees. For room, the

following outputs only show a portion of the tables:

=> SELECT * FROM employees ORDER BY personal_id ASC;

personal_id | employee_name | employee_age | employee_gender

+++

1 | John | 43 | M

2 | Dan | 52 | M

3 | Lori | 38 | F

4 | Tom | 55 | M

5 | Mary | 39 | F

6 | Virginia | 66 | F

7 | Gary | 63 | M

8 | Rebecca | 19 | F

9 | Steven | 32 | M

10 | Jessica | 25 | F

courses: A table the university uses to record each time someone takes a course. This includes people from outside

your organization.

=> SELECT * FROM courses ORDER BY record_id ASC;

record_id | personal_id | course_id | course_name | date_taken

++++

1 | 10 | 1 | Intro to Comp Sci | 20150105

2 | 7 | 4 | Java 303 | 20150405

3 | 53 | 3 | Database Architecture | 20150803

4 | 4 | 2 | SQL 101 | 20160113

5 | 6 | 4 | Java 303 | 20140910

6 | 6 | 1 | Intro to Comp Sci | 20150329

7 | 2 | 2 | SQL 101 | 20141030

My Community

About

Other HPE Communities

Latest Articles

Social Media

Labels

Top tags

HP Vertica Hadoop 7.2

Vertica Place ORC

What's new

Amazon Web Services

Looking Under the Hood atVertica Queries

On the Trail of a Red-TailedHawk!

Vertica Joins: A Refresher

Improving COUNT DISTINCTPerformance with LiveAggregate Projec...

Vertica Big Data Meetup

Jump Start your BI DashboardDevelopment

Cloudera 5.5 Certified for usewith HPE Vertica 7.2.x

Want to Know What Twitterthinks? Use Vertica Pulse forSentime...

Unlimited Access to Vertica forSQL on Hadoop for Free!

Cambridge Big Data AnalyticsMeet Up

7.2 (6)

access policies (1)

Amazon (1)

Amazon Web Services (3)

analytics (4)

Apache (2)

APi (1)

AWS (3)

Backup (1)

beta (1)

BI (1)

big data (22)

blog (1)

Boston (1)

business intelligence (1)

Solutions Services Products About Us SupportHPE Developer Community

Inner joinsLeft, right, and full outer joinsNatural joinsCross joins

Page 2: Vertica joins  a refresher

3/3/2016 Vertica Joins: A Refresher HPE Developer Community

https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 2/7

8 | 27 | 2 | SQL 101 | 20130704

9 | 8 | 4 | Java 303 | 20160105

10 | 3 | 3 | Database Architecture | 20150713

You can gain valuable insight into the relationship between the employees and courses by performing a variety of joins.

All the examples in this blog will use the employees and courses tables.

How does Vertica perform a join?Vertica uses two basic algorithms to perform a join operation:

The good news is, you don’t necessarily have to pay a whole lot of attention to the join algorithms because the Vertica

optimizer automatically chooses the most appropriate algorithm given the query and projections in a system. You can

facilitate a merge join by adding a projection that is sorted on the join keys.

Join predicatesIn the JOIN SQL statement, the join predicate specifies how Vertica should join the tables. You specify the join predicates

(for example, relational operators like <, <=, >, >=, <>, =, <=>) in the ON clause along with the columns from the left and

right tables that should be combined.

Vertica supports any arbitrary join expression with both matching and non-matching column values. If the clause uses an

equality predicate, indicated with an equal sign (=), the join is considered an equi-join. Consequently, non equi-joins use

predicates other than the equal sign, for example, the greater than sign (>).

For the examples in this blog, we’ll use the equality predicate (=).

Now that you have a basic understanding of what a join is for, let’s look at the different types of joins you can perform

with Vertica.

Inner joinsAn inner join combines records from two tables based on a join predicate, and returns rows from columns specified in the

SELECT statement that satisfy that predicate. Since we’re using the equality predicate, performing an inner join will

return values from columns in the SELECT statement where the columns specified in the JOIN statement match. If a row

from the left table’s joined column matches three rows from the right table’s specified column, the join will return three

rows (provided the columns are specified in the SELECT statement). When a value in the joined column appears in one

table but not the other, that row is not returned.

In the diagram below, the green shaded area represents values that are returned by an inner join. Keep in mind, duplicate

values may be returned.

Amazon Web Services

Analytics AWS

About the Author

Archives

View All

BD_Partner_Eng

Ben_Vandiver

Beth_Favini

BethF

Casey_S HP Big Data

Information Developer

Gary_G

HPBigData_BP

J_Kelley

KathyLynn

Nathan_W

nunziato

Sarah_L

Shubhangi_V

soniyas

VickiL

March 2016

February 2016

January 2016

December 2015

View Complete Archives

A hash join is used to join large data sets. In a hash join, Vertica uses the smaller (or inner) table to build an in-memory hash table on the join column. The Vertica execution engine then scans the larger (or outer) table andexamines the hash table to look for matches. The table size is determined by the number of rows times the size ofeach row. The optimizer chooses a hash join when projections are not sorted on the join columns. Although there areno sort requirements, the cost for performing a hash join can rise if the entire hash table can’t fit in memory.

If both inputs are pre-sorted on the join column, the optimizer chooses a merge join.

Page 3: Vertica joins  a refresher

3/3/2016 Vertica Joins: A Refresher HPE Developer Community

https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 3/7

The query used to create an inner join looks like this:

=> SELECT <column list>

FROM <left joined table>

[INNER] JOIN <right joined table>

ON <join condition>

An inner join is the most commonly used type of join so the INNER keyword is optional.

Example

If you want to list your employees who have taken classes, along with information about these classes, you can join the

two tables on their common column, personal_id. This inner join will include rows from both tables where the

personal_id from the employees table matches a value in the courses table’s personal_id column.

=> SELECT e.personal_id AS e_personal_id, employee_name, c.personal_id AS

c_personal_id, c.course_name, c.date_taken FROM employees e INNER JOIN courses c ON

e.personal_id = c.personal_id ORDER BY e.personal_id ASC;

Some employees, like Virginia, show up multiple times because they have taken multiple classes (i.e., her personal id

shows up twice in the courses table). Other employees, like Mary, don’t show up at all because they haven’t taken any

classes yet (i.e., isn’t in the courses table).

But what if you want all the rows from one or both tables to be included in your result set? For that, you can use an outer

join.

Outer joinsOuter joins extend the functionality of inner joins by letting you preserve rows in one or both tables that do not have

matching rows in the other table. Create outer joins using the following syntax:

=> SELECT <column list>FROM <leftjoined table>[ LEFT | RIGHT | FULL ] OUTER JOIN

<rightjoined table>ON <join condition> Vertica gives you the option of performing left

outer joins, right outer joins, or full outer joins.

Left outer joinsA left outer join preserves a complete set of records from the left (preserved) table, along with any matched records in

the right (non-preserved) table. Where Vertica finds no match, it inserts a null value for the right table.

In the diagram below, the green area represents values that are returned by a left outer join:

Example

Using our previous example, let’s say you want to see a list of all your employees along with classes they’ve taken, if

applicable. The result of a left outer join will be a record of every employee and a blank entry for rows from the courses

table if that employee hasn’t taken a course.

=> SELECT e.personal_id AS e_personal_id, employee_name, c.personal_id AS

c_personal_id, c.course_name, c.date_taken FROM employees e LEFT OUTER JOIN courses c

ON e.personal_id = c.personal_id ORDER BY e.personal_id ASC;

Page 4: Vertica joins  a refresher

3/3/2016 Vertica Joins: A Refresher HPE Developer Community

https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 4/7

Right outer joinsWhen performing a right outer join, Vertica returns a complete set of records from the right-joined (preserved) table, as

well as matched values from the left-joined (non-preserved) table. Where Vertica finds no match, it returns null values

for the left table.

In the diagram below, the green area represents values that are returned by a right outer join:

Example

You can use this example to list all courses taken, whether or not one of your employees participated.

=> SELECT e.personal_id AS e_personal_id, employee_name, c.personal_id AS

c_personal_id, c.course_name, c.date_taken FROM employees e RIGHT OUTER JOIN courses c

ON e.personal_id = c.personal_id ORDER BY e.personal_id ASC;

Full outer joinsYou can combine left outer joins and right outer joins with a full outer join. A full outer join returns all records specified in

the SELECT list, including nulls (missing matches), from either table in the join.

In the diagram below, the green area represents values that are returned by a full outer join:

Example

A full outer join is useful if you want to see, for example, each employee who has taken a particular class and each class

that has been taken by one of your employees, but you also want to see all the employees who have not taken any

courses, as well as any course that has not been attended by one of your employees:

=> SELECT e.personal_id AS e_personal_id, employee_name, c.personal_id AS

c_personal_id, c.course_name, c.date_taken FROM employees e FULL OUTER JOIN courses c

ON e.personal_id = c.personal_id ORDER BY e.personal_id ASC;

Page 5: Vertica joins  a refresher

3/3/2016 Vertica Joins: A Refresher HPE Developer Community

https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 5/7

Natural JoinsA natural join is just a join with an implicit join predicate. An implicit join predicate doesn’t include any explicit join

conditions, such as e.personal_id = c.personal_id. Instead, implicit connections are formed by matching all

pairs of columns in the tables that have the same name and compatible data types. (If the data types are incompatible,

Vertica returns an error.) The result set contains only one column representing the pair of equally-named columns.

Natural joins take the following form:

=> SELECT <column list> FROM <leftjoined table>NATURAL [ INNER | LEFT OUTER | RIGHT

OUTER | FULL OUTER ] JOIN <rightjoined table>

Vertica performs a natural join by creating an inner join (default) on the column common to both tables. This is useful

both syntactically, and if you don’t know the names or commonality of columns.

Example

For example, this query naturally joins the two personal_id columns.

=> SELECT * FROM employees e NATURAL JOIN courses c;

Cross JoinsUse a cross join when you want to all possible combinations of matching one table with another.

Cross joins take the following form:

=> SELECT <column list>FROM <leftjoined table>CROSS JOIN <rightjoined table>

Example

For example, you may want to compare every employee with every course record to see who they’ve taken classes with

(identified by the personal_id number). In the result set, Vertica retrieves a record from the employees table and

creates a row for every record in the courses table. It then does the same for the rest of the records in the employee

table until each row in the employee table is displayed with each row of the courses table. The number of rows equals

the number of rows in the first table multiplied by the number of rows in the second table. In our example, the output is

100 rows (10 * 10). The example output below only shows a sample:

=> SELECT * FROM employees e CROSS JOIN courses ORDER BY e.personal_id ASC;

More InfoFor more information about joins, including restrictions and optimization, see the Vertica documentation.

Also read our blog on range joins!

Page 6: Vertica joins  a refresher

3/3/2016 Vertica Joins: A Refresher HPE Developer Community

https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 6/7

Everyone's Tags: Joins refresher Vertica View All (3)

Tags: Joins| refresher| Vertica

Labels: Joins

Post a Comment Permalink

0 Likes

http://

Get a new challenge Get a sound challenge Help with word verification

*Name

*Email

Website (optional)

*Word verification by reCAPTCHA

Post Your CommentCancel

Leave a Comment

We encourage you to share your comments on this article. Comments are moderated and will bereviewed and posted as promptly as possible during regular business hours.

To ensure your comment is published, be sure to follow the community guidelines.

Font Family Font Sizes

† The opinions expressed above are the personal opinions of the authors, not of HPE. By using this site, you accept the Terms of Use and Rules of Participation

United States

QuoteRich Text HTML Preview

Page 7: Vertica joins  a refresher

3/3/2016 Vertica Joins: A Refresher HPE Developer Community

https://community.dev.hpe.com/t5/VerticaBlog/VerticaJoinsARefresher/bap/234901 7/7

Corporate

Accessibility

Careers

Contact Us

Corporate Responsibility

Events

Hewlett Packard Labs

Investor Relations

Leadership

Newsroom

Sitemap

Partners

Find a Partner

Partner Programs

Social

LinkedIn

Facebook

Twitter

YouTube

Communities

Developer Forums

Enterprise Business

Customer Resources

Enterprise Store

Public Sector Store

Education and Training

Email Signup

Legal

Privacy

Terms of Use

Cookies

© Copyright 2016 Hewlett Packard Enterprise Development LP

HPE employees: Report website issues