SAP HANA SPS10- Series Data/ TimeSeries

35
1 © 2014 SAP AG or an SAP affiliate company. All rights reserved. SAP HANA SPS 10 - What’s New? Series Data / TimeSeries SAP HANA Product Management June, 2015 (Delta from SPS 09 to SPS 10)

Transcript of SAP HANA SPS10- Series Data/ TimeSeries

Page 1: SAP HANA SPS10- Series Data/ TimeSeries

1© 2014 SAP AG or an SAP affiliate company. All rights reserved.

SAP HANA SPS 10 - What’s New? Series Data / TimeSeries

SAP HANA Product Management June, 2015

(Delta from SPS 09 to SPS 10)

Page 2: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 2Public

Disclaimer

This presentation outlines our general product direction and should not be relied on in making

a purchase decision. This presentation is not subject to your license agreement or any other

agreement with SAP.

SAP has no obligation to pursue any course of business outlined in this presentation or to

develop or release any functionality mentioned in this presentation. This presentation and

SAP’s strategy and possible future developments are subject to change and may be changed

by SAP at any time for any reason without notice.

This document is provided without a warranty of any kind, either express or implied, including

but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or

non-infringement. SAP assumes no responsibility for errors or omissions in this document,

except if such damages were caused by SAP intentionally or grossly negligent.

Page 3: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 3Public

Agenda

Overview

– Series Data Overview

– SPS09 Summary

– SPS10 Overview

Store Enhancements

– Enhanced Support for Equidistant Series

– Support for Equidistant Series with Multiple Increments, Offsets

Query Enhancements

– Updates to SERIES_ROUND

Analytic Enhancements

– New Analytic Functions

Page 4: SAP HANA SPS10- Series Data/ TimeSeries

Overview

Page 5: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 5Public

Series Data Overview

Series Data synonymous with Time Series

Series Data support introduced in SPS09 as a core SAP HANA capability

Series Data - What is it?

– Ordered sequence of data points/measurements

– Measured at points in time or within time intervals

o E.g. Discrete measurement taken from a sensor at every 10s

o E.g. Energy consumed by a home for every 15 minute interval (smart metering)

Series Data - What do we do with it?

– Analyze and predict

o Extract useful statistical information

o Forecasting

Series Data – Relevance?

– Foundational technology for IoT

o Industry 4.0 / Industrial Internet of Things (IIoT)

o IT/OT Convergence

Page 6: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 6Public

Series Data – SPS09 Review

Support very high volumes of data using effective compression techniques

– Non-lossy compression; all values originally inserted are accessible for auditing/regulatory purpose

Support both equidistant and non-equidistant data

– Often, source data will be non-equidistant; it will then be “snapped” to an equidistant “grid” for analysis, model

fitting, etc.

Allow time series manipulation, cleaning, and analytic operations to be expressed naturally in SQL while

maintaining high performance

– Table Creation via CREATE COLUMN TABLE extensions for Series Data

– Efficient grouping to different granularities (GROUP BY SERIES_ROUND(…))

– Built in SQL functions for efficient handling of Series Data

o SERIES_GENERATE; SERIES_DISAGGREGATE; SERIES_ROUND; SERIES_PERIOD_TO_ELEMENT;

SERIES_ELEMENT_TO_PERIOD

– New Analytical SQL Functions

o CORR; CORR_SPEARMAN; LINEAR_APPROX; MEDIAN

Page 7: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 7Public

Series Data – SPS10 Overview

Handle timestamp data that is NOT equidistant with a single offset in the entire table

– With good compression for reduced memory consumption

– Range block indexes for efficient handling of range queries

Enhance SERIES_ROUND with new rounding modes and to accept an offset

– Enhanced usability and greater expressive power for querying series data

New aggregate and window functions

CDS Support

Store

Equidistant series w/ any alignment

Generated rounded columns

Piecewise equidistant series

Query

Round to computed interval

Granulize (any offset)

Analyze AUTO_CORR, CROSS_CORR

BINNING

CUBIC_SPLINE_APPROX

DFT

RANDOM_PARTITION

SERIES_FILTER

WEIGHTED_AVG

Sliding window support

{FIRST/NTH/LAST}_VALUE

Page 8: SAP HANA SPS10- Series Data/ TimeSeries

Store EnhancementsEnhanced Support for Equidistant Series

Page 9: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 9Public

Enhanced Support for Equidistant SeriesLimitations of SPS09

Restrictions/Limitations in SPS09 on Equidistant Series

– Only one equidistant property per table

o i.e. Only a single INCREMENT BY is supported; Defined at table creation time; Applies to all of the series in the table

o Efficient compression can be provided on the timestamp column (but it had to be exactly aligned on the increment

boundary). i.e. no support for any offset

o Can be encoded as a line t = mx (i.e. single slope ‘m’, no offset from the INCREMENT boundary)

– Data needed to be ordered on INSERT (ordered by ‘Series Key, TimeStamp’) for good compression

SPS09 Equidistant series support works great for series data and use cases that meet the above criteria

Page 10: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 10Public

Enhanced Support for Equidistant SeriesMany use cases require more flexible handling of timestamps/periods

But, many use cases where

– ‘runs of data’ where timestamps for consecutive data points differ by a constant interval

o i.e. data effectively has multiple INCREMENTs

o can be due to different intervals for different series in table

o can be due to different intervals within single series in table

– timestamps are not necessarily aligned to INCREMENT boundaries

o i.e. offsets can exist from the INCREMENT boundaries

– often there may be slight local variations in the timestamp, i.e. some “jitter”

Page 11: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 11Public

New Representation For Timestamps

Encode series timestamps/periods as t = mx + b + j

t = mx + b + j • x integer value (monotonically increasing)

• m represents slope (i.e. represents INCREMENT BY)

• b is an offset value (locally constant)

• j is a jitter value (can have few distinct values)

Offers good compression even with different slopes and offsets in the series

– Slight differences from ideal line representation and recorded timestamps (j) represented

efficiently with n-bit compression

Enables support for alternate periods

– Useful when the period column needs to be offset by some constant

o e.g. for time zone differences; for daylight savings time; differences in starting day of week etc.

Page 12: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 12Public

Grammar Updates to Support Equidistant Piecewise SeriesSupported via CDS

Note: New syntax currently only supported via CDS and not via CREATE TABLE

– CREATE TABLE support may be provided in a future version

– Use of syntax via SQL statement will give errors

series_definition := SERIES ‘(‘ series_spec_list ‘)’

series_spec_list: SERIES KEY '(' column_name_list ')'

| NO MINVALUE | MINVALUE str_const

| NO MAXVALUE MAXVALUE str_const

| PERIOD FOR SERIES ‘(‘ {column|NULL} [‘,’ {column|NULL}] ‘)’

| series_equidistant_definition

| reorganize_process

| ALTERNATE PERIOD FOR SERIES (column [, column ...])

series_equidistant_definition:

NOT EQUIDISTANT

| EQUIDISTANT INCREMENT BY constant

[MISSING ELEMENTS [NOT] ALLOWED]

| EQUIDISTANT PIECEWISE

Page 13: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 13Public

Grammar Updates to Support Equidistant Piecewise SeriesSupported via CDS

entity Weather {

station_id String(3) not null;

ts_utc UTCTimestamp not null; -- UTC time at start of period

ts_local UTCTimestamp not null; -- local time at start of period

temp Decimal(3,1) not null; -- mean temp ℃wind_speed Decimal(2) null; -- wind speed (Km/h)

ts_utc_month UTCTimestamp not null; -- period rounded to months

GENERATED ALWAYS AS SERIES_ROUND(ts_utc,’INTERVAL 1 MONTH’);

} SERIES (

SERIES KEY(station_id)

EQUIDISTANT PIECEWISE

PERIOD FOR SERIES (ts_utc)

ALTERNATE PERIOD FOR SERIES(ts_local)

)

CREATE COLUMN TABLE Weather_(

station_id varchar(3) NOT NULL,

ts_utc_ timestamp NULL, --

ts_utc_x_ integer default 0 NOT NULL,

ts_utc_m_ decimal default 1 NOT NULL,

ts_utc_b_ timestamp default timestamp’0001-01-01 00:00’ NOT NULL,

ts_utc_j_ decimal default 0 NOT NULL,

ts_local_ timestamp NULL,

ts_local_d_ decimal default 1 NOT NULL,

temp decimal(3,1) NOT NULL,

wind_speed decimal(2) NULL,

ts_utc_month TIMESTAMP NOT NULL

GENERATED ALWAYS AS

SERIES_ROUND(

COALESCE(ts_utc,

ADD_SECONDS(_series_b, _series_m*_series_x +_series_j))

,’INTERVAL 1 MONTH’)

flags_ int default 0 not null,

) SERIES (

SERIES KEY(station_id)

EQUIDISTANT INCREMENT BY 1

PERIOD FOR SERIES (ts_utc_x)

)

CREATE VIEW Weather AS

SELECT station_id,

COALESCE(ts_utc_, ADD_SECONDS(ts_utc_b_, ts_utc_m_* ts_utc_x_ + ts_utc_j_)

) as ts_utc,

COALESCE(ts_local_, ADD_SECONDS(ts_utc_b_, ts_utc_m_* ts_utc_x_ + ts_utc_j_+ ts_local_o_)

) as ts_local,

temp, wind_speed, ts_utc_month

FROM Weather_;

On activation of

CDS Document

Logical Representation of the series table

Physical Representation of the series table

CDS specification

Page 14: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 14Public

Representation for Equidistant Piecewise Series

CREATE COLUMN TABLE Weather_(

station_id varchar(3) NOT NULL,

ts_utc_ timestamp NULL, --

ts_utc_x_ integer default 0 NOT NULL,

ts_utc_m_ decimal default 1 NOT NULL,

ts_utc_b_ timestamp default timestamp’0001-01-01 00:00’ NOT NULL,

ts_utc_j_ decimal default 0 NOT NULL,

ts_local_ timestamp NULL,

ts_local_d_ decimal default 1 NOT NULL,

temp decimal(3,1) NOT NULL,

wind_speed decimal(2) NULL,

ts_utc_month TIMESTAMP NOT NULL

GENERATED ALWAYS AS

SERIES_ROUND(

COALESCE(ts_utc,

ADD_SECONDS(_series_b, _series_m*_series_x +_series_j))

,’INTERVAL 1 MONTH’)

flags_ int default 0 not null,

) SERIES (

SERIES KEY(station_id)

EQUIDISTANT INCREMENT BY 1

PERIOD FOR SERIES (ts_utc_x)

)

Physical Representation of the series table • On first insert ts_utc_ is stored unmodified

• After a reorg step the x, m, b, j (ts_utc_x_, etc) are calculated, and ts_utc_ is set to

NULL

• The view is defined to correctly read the original time stamp value or the calculated

timestamp value after the reorganization.

• Using COALESCE

• Reorg is via ALTER TABLE SERIES REORGANIZE command

• Needs to be instantiated by user

• Generated Rounded Columns: Use rounded period columns for good

performance on range queries

Note: in SPS10, the j component is not yet realized. It is set to 0. This will be fixed in a subsequent release.

Page 15: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 15Public

Equidistant Piecewise Series – Reorg stepALTER TABLE SERIES REORGANIZE for compression

• On first INSERT the period columns (including alternate period columns) are stored as is (i.e. uncompressed form)

• ALTER TABLE SERIES REORGANIZE is required to store timestamps in their equidistant piecewise form (i.e. x,m,b,j

components) which provides compression

• Reorders the rows by (series key, period) by deleting existing rows (deletion gives good $rowid$ compression by

ensuring rowid matches timely order)

• Equidistant piecewise representation components are calculated (i.e. m, x, b, j) to give good compression while

maintaining the correct timestamp value

• Sets the period column to NULL (after this the timestamps get calculated via the components)

• ALTER TABLE SERIES REORGANIZE

• Needs to be user instantiated

• Can be run against subsets of data (e.g. partitions) and be limited to processing a fixed number of rows during a

run

• Will find the rows that are not optimally encoded and process them

• Should be run against sufficiently large sets of rows (1000’s to 100’s thousands) for good compression

• Is resource intensive – so best run during quiet periods

• M_SERIES_TABLE monitor view returns various statistics on series tables, including no. of rows reorganized

Page 16: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 16Public

Generated Rounded ColumnsRounded Period Columns for Better Performance on Range Predicates and OLAP Queries

• Generated rounded columns can be used to store period or alternate period columns rounded to a more coarse level

(e.g. day, week, month)

• Have great compression

• Are optional

• Multiple such columns can be created (on different period columns, different levels of coarseness)

• Used automatically by server for improved performance of range predicates on the original column; as well as for OLAP

queries (server can limit no of rows for which exact timestamps need to be calculated)

CREATE COLUMN TABLE Weather_(

station_id varchar(3) NOT NULL,

ts_utc_ timestamp NULL, -- ,

ts_utc_month TIMESTAMP NOT NULL

GENERATED ALWAYS AS

SERIES_ROUND(

COALESCE(ts_utc,

ADD_SECONDS(_series_b, _series_m*_series_x +_series_j))

,’INTERVAL 1 MONTH’)

) SERIES (

SERIES KEY(station_id)

EQUIDISTANT INCREMENT BY 1

PERIOD FOR SERIES (ts_utc_x)

)

• Generated Rounded Columns These store values rounded to a

coarser interval

Page 17: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 17Public

SummaryBenefits of Equidistant Piecewise Representation

Order-Independent INSERT w/ no degradation in compression

Good compression for multiple INCREMENT BY scenarios

Good Compression for scenarios with multiple offsets from zero in

timestamp

Good Compression for scenarios where timestamps have jitter

Support for local time variations w/ good compression

Efficient range comparisons on timestamp columns

Efficient GROUP BY for timestamp columns

Page 18: SAP HANA SPS10- Series Data/ TimeSeries

Query EnhancementsSERIES_ROUND Updates

Page 19: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 19Public

SERIES_ROUNDNew Rounding Modes & Non-Zero Alignment

• New rounding modes especially useful for intervals of months, years => months and years have variable lengths!

• The default rounding mode is ROUND_HALF_UP

• The <alignment_expression> allows specification of a non-zero alignment for the interval datatype

• Allows MINVALUE to have a non-zero offset

• E.g. Allows for summarizing weeks that begin with Mondays (as opposed to Saturdays which is the natural zero 0001-01-01 for the datetime

data type

• Interval widths (INCREMENT BY) can be dynamically specified

Mode Semantics

ROUND_HALF_UP Default value.

The value is rounded to the nearest series value. Values that fall halfway between two series values are rounded up away from zero.

ROUND_HALF_DOWN The value is rounded to the nearest series value. Values that fall halfway between two round values are rounded down towards zero.

ROUND_HALF_EVEN The value is rounded to the nearest series value. Values that fall halfway between two rounded values are rounded to the even series

value based on element number.

ROUND_UP The value is always rounded away from zero, to the larger series value.

ROUND_DOWN The value is always rounded towards zero, to the smaller series value.

ROUND_CEILING The value is always rounded in a positive direction, to the larger series value.

ROUND_FLOOR The value is always rounded in a negative direction, to the smaller series value.

SERIES_ROUND (<value>, {<increment_by> | SERIES TABLE <series_table>} [,

<rounding_mode> [, <alignment_expression>]])

New Rounding Modes

Page 20: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 20Public

SERIES_ROUNDExamples of Rounding with Month and Year Intervals

Period Length Expression Result with Default

ROUND_HALF_UP

28 days SERIES_ROUND(‘2014-02-14 23:59:59’, ‘INTERVAL 1 MONTH’) 2014-02-01 00:00:00

SERIES_ROUND(‘2014-02-15 00:00:00’, ‘INTERVAL 1 MONTH’) 2014-03-01 00:00:00

29 days SERIES_ROUND(‘2012-02-15 11:59:59’, ‘INTERVAL 1 MONTH’) 2012-02-01 00:00:00

SERIES_ROUND(‘2012-02-15 12:00:00’, ‘INTERVAL 1 MONTH’) 2012-03-01 00:00:00

30 days SERIES_ROUND(‘2014-04-15 23:59:59’, ‘INTERVAL 1 MONTH’) 2014-04-01 00:00:00

SERIES_ROUND(‘2014-04-16 00:00:00’, ‘INTERVAL 1 MONTH’) 2014-05-01 00:00:00

31 days SERIES_ROUND(‘2014-01-16 11:59:59’, ‘INTERVAL 1 MONTH’) 2014-01-01 00:00:00

SERIES_ROUND(‘2014-01-16 12:00:00’, ‘INTERVAL 1 MONTH’) 2014-02-01 00:00:00

59 days

31+28

SERIES_ROUND(‘2014-01-30 11:59:59’, ‘INTERVAL 2 MONTH’) 2014-01-01 00:00:00

SERIES_ROUND(‘2014-01-30 12:00:00’, ‘INTERVAL 2 MONTH’) 2014-03-01 00:00:00

92 days

31+31+30

SERIES_ROUND(‘2014-08-15 23:59:59’, ‘INTERVAL 3 MONTH’) 2014-07-01 00:00:00

SERIES_ROUND(‘2014-08-16 00:00:00’, ‘INTERVAL 3 MONTH’) 2014-10-01 00:00:00

Period Length Expression Result with Default

ROUND_HALF_UP

365 days SERIES_ROUND(‘2014-07-02 11:59:59’, ‘INTERVAL 1 YEAR’) 2014-01-01 00:00:00

SERIES_ROUND(‘2014-07-02 12:00:00’, ‘INTERVAL 1 YEAR’) 2015-01-01 00:00:00

366 days SERIES_ROUND(‘2012-07-01 23:59:59’, ‘INTERVAL 1 YEAR’) 2012-01-01 00:00:00

SERIES_ROUND(‘2012-07-02 00:00:00’, ‘INTERVAL 1 YEAR’) 2013-01-01 00:00:00

730 days

365+365

SERIES_ROUND(‘2014-12-31 23:59:59’, ‘INTERVAL 2 YEAR’) 2014-01-01 00:00:00

SERIES_ROUND(‘2015-01-01 00:00:00’, ‘INTERVAL 2 YEAR’) 2016-01-01 00:00:00

731 days

366+365

SERIES_ROUND(‘2012-12-31 11:59:59’, ‘INTERVAL 2 YEAR’) 2012-01-01 00:00:00

SERIES_ROUND(‘2012-12-31 12:00:00’, ‘INTERVAL 2 YEAR’) 2014-01-01 00:00:00

Note that the rounding result depends on the no of days in the period!

Page 21: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 21Public

SERIES_ROUNDExamples of Rounding with Specified Alignment Values

Expression Result Explain

SERIES_ROUND(8, 10, 3) 13 because 8 is the midpoint between 3 and 13 and the

default rounding mode ROUND_HALF_UP rounds away

from 0.

SERIES_ROUND(5, 10, 3) 3 because 5 is closer to 3 than 13

SERIES_ROUND(12, 10, 3) 13 because 12 is closer to 13 than 3

SERIES_ROUND(19, 10, 3) 23 because 19 is closer to 23 than 13

SERIES_ROUND( ‘2015-02-27’ , ‘INTERVAL 7 DAY’, ‘2015-01-05 09:00:00’,

ROUND_UP )

‘2015-03-02 09:00:00’ because 2015-01-05 is a Monday, and 2015-02-27 is a

Friday that is closer to Monday 2015-03-02 than to

Monday 2015-02-23.

SERIES_ROUND( ‘2015-03-01’ , ‘INTERVAL 2 MONTH’, ‘2014-02-01’) ‘2015-02-01’ because ‘2015-03-01’ lies closer to ‘2015-02-01’ than to

‘2015-04-01’

Page 22: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 22Public

SERIES_ROUNDRounding to an Evaluated Interval Width

Some use cases require a dynamic granularity for the interval width

E.g. To split data into n buckets per year (where n is a variable):

SELECT bucket, max(value)FROM (

SELECT SERIES_ROUND(ts,'interval ' || 3600*24*365/n || ' second' ) as bucket, value

FROM T ) DGROUP BY bucket

Page 23: SAP HANA SPS10- Series Data/ TimeSeries

Analytic EnhancementsNew Analytic Functions

Page 24: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 24Public

Analytic FunctionsSummary of New functions

Function Description

AUTO_CORR(col,maxlag {SERIES(…)

| ORDER BY c1, …})

Aggregate to computes all autocorrelation coefficients for a given input column.

DFT(col,N{SERIES(…) | ORDER BY c1,…} ).{REAL|IMAGINARY|AMPLITUDE|PHASE}

Aggregate to computes the Discrete Fourier Transform of a column for the first N values

and return an array with exactly N elements.

FIRST_VALUE(col ORDER BY c1,…) [RESPECT NULLS | IGNORE NULLS] Aggregate function to return first value (with given ordering)

LAST_VALUE(col ORDER BY c1,…) [RESPECT NULLS | IGNORE NULLS] Aggregate function to return last value (with given ordering)

NTH_VALUE(col, n ORDER BY c1,…) [RESPECT NULLS | IGNORE NULLS] Aggregate function to return n’th value (with given ordering)

CUBIC_SPLINE_APPROX(col, type, mode, par1, par2 ) OVER (PARTITION BY <…> ORDER BY <…>)

Window function to replace NULL values with cubic spline approximation

CROSS_CORR(col1,col2,N ORDER BY … ) The cross correlation function computes the correlation between two value columns for

a given number of lags

BINNING(col, name => val) OVER(…) Window function assigning input into bins using different algorithms.

RANDOM_PARTITION(n1,n2,n3,seed) OVER(…) Window function to assign input randomly to different sets (training/validation/test)

WEIGHTED_AVG(col,weight_array) OVER(…) Window function to compute a weighted moving average with the provided weight

values.

SERIES_FILTER(col,filter) OVER(…) A window function that applies filtering or smoothing. For example, exponential

smoothing or an autoregressive filter.

SERIES_FORECAST(model).{FITTED | LOW95 | HIGH95 | LOW80 | HIGH80} OVER (…) Forecast based on a model built using PAL.

Page 25: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 25Public

Analytic FunctionsFirst, Last, Nth Value Aggregate Functions

Changing the time granularity from days to monthsSAP Stock Price

SELECT min("date") as "date", first_value("open" order by "date") as "open",last_value("close" order by "date") as "close",max("high") as "high",min("low") as "low",sum("volume") as "volume"

FROM "I058576"."sap_stock_price"GROUP BY SERIES_ROUND("date", 'INTERVAL 1 MONTH', ROUND_DOWN)

Page 26: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 26Public

select distinct GF_ISIN,

TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) AS bin_datetime,

FIRST_VALUE(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST(

CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' ||

ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as open_price,

max(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST(

CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' ||

ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as high_price,

min(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST(

CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' ||

ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as low_price,

LAST_VALUE(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST(

CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' ||

ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as close_price,

COUNT(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST(

CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' ||

ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as num_trades,

sum(GF_LAST_VOL) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST(

CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' ||

ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) /

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) *

CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as bin_vol

from RAP_USER.GF_TICKS

where GF_TIME >= '08:59:59.999'

and GF_TIME <= '18:00:00.001'

and GF_DATE ='2012-01-13'

and GF_LAST_VOL > 0

and GF_ISIN = 'DE0007164600'

order by GF_ISIN, bin_datetime;

Query without series feature

Same query with series feature

SELECT min("date") as "date", first_value("open" order by "date") as "open",last_value("close" order by "date") as "close",max("high") as "high",min("low") as "low",sum("volume") as "volume"

FROM "I058576"."sap_stock_price"GROUP BY SERIES_ROUND("date", 'INTERVAL 1 MONTH', ROUND_DOWN)

Analytic FunctionsFirst, Last, Nth Value Aggregate Functions

Page 27: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 27Public

Analytic FunctionsCubic Spline Approximation

Replacement of null values by interpolating

the gaps and extrapolating any leading or

trailing null values.

Interpolation can be done by

Linear interpolation

Cubic spline interpolation

SELECT "ts", "temperature", linear_approx("temperature") OVER (ORDER BY "ts"), cubic_spline_approx("temperature") OVER (ORDER BY "ts")

FROM "weather"

Page 28: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 28Public

Analytic FunctionsAuto Correlation and Cross Correlation

Series data function used to find periodic

pattern in the data, like seasonality.

Auto-correlation looks for periodicity between

values of the same series as a function of the

time lag between them.

Cross-correlation looks for periodicity between

values of different series as a function of the

time lag between them

SELECT corr, ordinality AS lagFROM unnest((

SELECT auto_corr(temperature, 1000 ORDER BY ts)FROM weather

)) WITH ORDINALITY AS tt(corr)

Page 29: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 29Public

Analytic FunctionsWeighted Moving Average

Data smoothing via weighted moving average

with linearly decreasing weights.

Window frame defines the smoothing window.

SELECT "ts", "temperature",weighted_avg("temperature") OVER (ORDER BY "ts" ROWS BETWEEN 7 PRECEDING AND CURRENT ROW)

FROM "weather"

Page 30: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 30Public

Analytic FunctionsFiltering of Series Data

Filter function for different filter method

Exponential smoothing

Autoregressive and moving average filter

In SPS10 available

Single exponential smoothing

Double exponential smoothing

PAL functions integrated into series data.

Support for smoothing and forecasting.

-- single exponential smoothing with a smoothing parameter alpha = 0.2select "ts", "temperature",

series_filter(value => "temperature", method_name => 'SINGLESMOOTH', alpha => 0.2)OVER (ORDER BY "ts") AS SINGLESMOOTH

FROM "weather"

Page 31: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 31Public

Analytic FunctionsBinning

Binning assigns data values to bins.

Different binning methods

Number of equal width bins

Width of the bins

Number of bins with equal number of records

Number of standard deviations left and right from the mean

PAL function integrated into series data

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7 8

-- compute histogramSELECT bin_number, count(bin_number) as cntFROM (

SELECT binning(value => "open", bin_count => 8) OVER (ORDER BY "date") AS bin_numberFROM "I058576"."sap_stock_price"

)GROUP BY bin_number

Page 32: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 32Public

Analytic FunctionsRandom Partition

Partitioning divides the input data into three sets, a

training, a validation, and a test set that are used in

machine learning.

Support for

Random partitioning

Stratified partitioning

PAL function integrated into series data

-- stratified partitioning with fractional partition sizes (70% training, 20% validation, 10% test)SELECT *,

random_partition(0.7, 0.2, 0.1, 42) OVER (PARTITION BY "weather_station") AS "PARTITION"FROM "weather"

Page 33: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 33Public

Analytic FunctionsDiscrete Fourier Transform

Discrete Fourier transforms are used in spectral

analysis of series data, e.g. in vibration analysis.

Computation uses the FFT algorithm and returns

Amplitude / phase

Real part / imaginary part

SELECT ordinality AS "frequency", "amplitude"/4096 AS "amplitude"FROM unnest ((

SELECT dft("amplitude", 4096 order by "ts").amplitudeFROM "vibration"

)) WITH ORDINALITY AS tt(amplitude)

Page 34: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved. 34Public

Analytic FunctionsMiscellaneous Updates

MEDIAN as window function with arbitrary window frames

CORR_SPEARMAN for character columns

Aggregate functions in the series library

• Standard deviation (sample and population)

• Variance (sample and population)

• Co-Variance (sample and population)

Page 35: SAP HANA SPS10- Series Data/ TimeSeries

© 2015 SAP SE or an SAP affiliate company. All rights reserved.

Thank you

Contact information

Raj Rathee

SAP HANA Product Management

[email protected]