Pentaho 8.0 and Beyond - Presentation...Pentaho 8.0 –Complete Data Integration •Filters in Data...

32
Pentaho 8.0 and Beyond Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

Transcript of Pentaho 8.0 and Beyond - Presentation...Pentaho 8.0 –Complete Data Integration •Filters in Data...

Pentaho8.0andBeyondMattHowardPentahoSr.DirectorofProductManagement,HitachiVantara

The forward-looking statements contained in this document represent an outline of ourcurrent intended product direction. It is provided for information purposes only and is not acommitment to deliver any new or enhanced product or functionality, or that we will pursuethe product direction described. Facts and circumstances may occur which may impactcurrent plans, resulting in changes to the information in this presentation. This informationis current only as of the date it is made and should not be relied upon in making purchasingdecisions. The development, release (if at all), and timing of any features or functionalitydescribed for the Pentaho products remains at the sole discretion of Pentaho.

SafeHarborStatement

Pentaho8.0andBeyond

1ProductVision

2Pentaho8.0

3ProductRoadmap

ProductVision

HITACHIDATASYSTEMS> Contentplatform> Storagesolutions

ThePowerofThree

PENTAHO> DataIntegration> BusinessAnalytics

HITACHIINSIGHTGROUP> Lumada IoT

OperationalData BigData DataStream Public/PrivateClouds

ConsumerBusinessAnalystDataAnalyst/DataScientistDataEngineer

CustomandSelf-ServiceDashboards

InteractiveQueryandAnalysis

PentahoDataIntegrationDataPreparation|IntegratedMachinelearning

OPEN AND EMBEDDAB L E

ProductionReporting

OperationalData BigData DataStream Public/PrivateClouds

ConsumerBusinessAnalystDataAnalyst/DataScientistDataEngineer

CustomandSelf-ServiceDashboards

InteractiveQueryandAnalysisProductionReporting

PentahoDataIntegrationDataPreparation|IntegratedMachineLearning

PentahoBusinessAnalyticsPlatform

OPEN AND EMBEDDAB L EOPEN AND EMBEDDAB L EOPEN AND EMBEDDAB L E

FutureVision:ASingleConsistentExperience

DataPrepDataEngineering Analytics

Ingestion Processing Blending DataDelivery DataDiscovery/Analysis

Analysis&Dashboards

Administration Security LifecycleManagement

DataProvenance

DynamicDataPipeline Monitoring Automation

Pentaho8.0

IntroducingPentaho8.0

Challenge#1Datavolumesandvelocityaregrowingexponentially

Challenge#2Processingandstorageresourcesareconstrained

Challenge#3ShortageofBigDatatalentandlackofproductivity

Pentaho8.0Broadensconnectivitytostreamingdatasources

• ConnecttoKafkastreams• StreamprocessingwithSpark• BigdatasecuritywithKnox

Pentaho8.0Optimizesprocessingresources

• EnhancedAdaptiveExecution(AEL)• NativeAvroandParquethandling•Workernodesfor“Scale-out”

Pentaho8.0Booststeamproductivityacrossthepipeline

• Dataexplorerfilters• ImprovedrepositoryUX• Extendedoperationsmart

StreamingforTimeSensitiveInsight

Enableusecasesthatrequirereal-timeprocessing,monitoringandaggregation• Real-timedevicemonitoring• Log-fileaggregation• Notifications• Andmore…

NEWinPentaho8.0ü KafkaProducerStepü KafkaConsumerStepü GetrecordsfromstreamStepü SparkstreamingviaAEL

Pentaho7.1– AdaptiveExecutionforSpark

ü NoCoding

ü BuildOnce

ü ExecuteonAny*Engine

PDI

PentahoKettle

*CurrentlyAvailableEngines

EnhancedAdaptiveExecution

Simplifiedsetup• Eliminated“Zookeeper”component• Reducednumberofsetupsteps

Hardeneddeployment• Fail-overattheedge• Kerberosimpersonationforclient

Moreflexible• Supportmultiplerunconfigurations• Customizeclustersettingsperjobtype

PDIClient

Spark/HadoopProcessingNodes

HADOOPCLUSTER

AEL-SparkEngine

(SparkDriver)

AEL-SparkDaemononEdgeNodes

Hadoop/SparkCompatibleStorageCluster

HDFS AzureStorage

AmazonS3 Etc…

SparkExecutors

WorkerNodesforScalingOut

Scaleworkitemsacrossmultiplenodes(containers)

• Easilyaddandremoveresourcesasrequired

• Monitorandbalancechangingworkloads

• Deployonpremise,cloudandhybrid

WorkerNode(a)

WorkerNode(b)

WorkerNode(c…)DistributeandScale

NEWinPentaho8.0ü Containerframeworkü Orchestrationframeworkü Nodemonitoringü EnhancedHAimplementation

WorkerNodesArchitecture

WORKERNODES

OrchestrationFramework

ContainerFrameworkPentahoServer

WN1e.g.KJB

WN2e.g.KTR

WN…n“Executor”

Orchestration(Scheduler,monitoring,security,etc.)

Controller(HA)

Master(Standby)

Master(Standby)

Master(Working)

PentahoRepository

PentahoClients

Poweredby…

Pentaho7.0– DataExplorer

Accessvisualizationsduringdataprepforinspectionandprototyping

DataExplorerFilters

EnhanceddatainspectioninPDI

• Identifydatatobecleanedorremoved

• Deliverdatatothebusinessmorequickly

ENHANCEDinPentaho8.0ü Numericfiltersü Stringfiltersü Include/Excludedatapoints

Pentaho8.0– CompleteDataIntegration• FiltersinDataExplorerforenhanceddatainspectionduringprep

• NewPDIRepositoryDialogsforbetterusability• RunConfigurationsforJobsforseamlessuserexperience

BigData• StreamDataProcessingtosimplifynearrealtimeintegrationwithKafka

• EnhancedAELforreliability,performance,andsecurity

• BigDataFileFormatstosupportcrucialHadoopusecases

• BigDataSecuritywithHDPKnoxGateway• VFSImprovementsfornamedHadoopclusters

EnterprisePlatform• WorkerNodesScale-OuttodrivesuperioragilityandTCOforenterprises

• RubyTheme– newplatformbranding

AdditionalItems• OpsMartforOracle,MySQL,SQLServer• BigDataSandboxVMupdates• Platformpasswordsecurityimprovements• PDIMavenization forinfraalignment• Documentationimprovementsonhelp.pentaho.com

ProductRoadmap

Scale-outDeployment

MetadataManagement

OperationsManagement

CloudDeployment

AdaptiveExecution

SparkExecution

StreamProcessing

MachineLearning

DataExploration

VisualDataPrep

EmbeddedAnalytics

DataCatalog

EnterprisePlatform

BigDataProcessing

EMERGINGTRENDSANDTECHNOLOGYAdvancedAnalytics|Real-time

VisualDataExperience

PENTAHOFOUNDATIONAL INVESTMENTAREAS

RoadmapInitiatives

StrengtheningtheBridgeBetweenDataandInsight

DATAEXPLORER

Source1Source2Source3Source4Source5

ü Visualdatainspectionü Intuitivedataprepü Advancedvisualization

ü Governedaccessü Searchablemetadataü CollaborationCATALOG

InlineDataPrep– VisionIntuitive,excel-liketransformationdesign

FieldStatisticsFieldType:IntegerRecords:10,000Cardinality:273Min<count>:1Max<count>:23BinSize(%):Quintile

IntegratedProfiling

InlineModel

MergeFields

InlineTransformation

PentahoMachineLearningOrchestration

DataExplorer

NotebookIntegrations

NativeAlgorithms

Catalog

AdaptiveExecution

Roadmapprojectsthatserveemergingneedsofdatascientists.

PentahoRoadmap Featuresanddatesaresubjecttochange.

Nov2017 1H18 (8.1) Future

VISUALDATAEXPERIENCE

• DataExplorerFilters • CatalogI• VisualProfiling

• CatalogSearch• DataPrepfromDET• LayoutManager

• NewUserConsole• DataScienceViz• Real-timeViz

(BIG)DATAPROCESSING

• KafkaInterface• SparkStreaming• ParquetandAvro• EnhancedAEL

• StreamingII• EnhancedJSON/XML/ORC• AEL- extenddistros

• AdvancedProfiling• RulesValidator• NativeMLalgorithms• AEL– Flink

• ThinKettle(Composer)• WebDesigner• DataOperationsMgr.• AEL– Next

ENTERPRISEPLATFORM

• Scale-outFramework• FoundryIntegration

• UnifiedMonitoring• HardenMetadataBridges• Vantara Integrations

• EnhancedUpgrade• EnhancedSecurity• NewContentLifecycle• Vantara Integrations

• MetadataManager• BusinessGlossary• Multi-tenancy• Vantara Integrations

ECOSYSTEM • AELHDP,MapR • GoogleCloudPlatform• Cassandra/NoSQLUpdate

• Multi-cloudOrchestration• CloudAppConnectors

• Mainframe• EnhancedSAPandSFDC

HitachiVantara Portfolio

FoundryServicePlatformWorkflow Scheduling Security Clustering MonitoringRepositorySearch

ApplicationStudioDashboards Visualization Notifications AppDevelopment

StorageConvergedInfrastructure AutomatedManagement DataProtectionFlashStorage

DataIntegrationAssetManagement AnalyticsEdgeProcessing• Assetregistry• Datacatalog• Metadatamanagement• Modelingandlineage• Governance

• Dataconnectors• Transformationengines• Profilingandquality• Datablending• Datapreparation

• Businessanalytics• Contentanalytics• Artificialintelligence• Batchandstream

SoftwarePlatform

ApplicationFramework

Storage

EdgeProcessing AssetManagement AnalyticsDataIntegration

IoTSolutions– fromEdgetoOutcomes

Sensors

Things

People

FogLayer Core

IoT DataPipeline

Telemetry

Edge

AssetRegistry

StreamQueues

Edge Core

Sensors

Things

People

Edge

Filtering

AssetRegistry

StreamQueues

Lumada IoTDataPipeline

Insights Outcomes

Ingest

Process

Visualize

Model

Predict

Notify

IoT AnalyticProcessor

SMARTCITY

SMARTBUSINESS

SMARTDATACENTER

SMARTINDUSTRY

UnlocktheBusinessValueinYOURData

YOUR

DA

TA

Video,ImageandAudioEmailand DocumentsTransactionalData IT,Sensorand MachineLogsSocialMedia

HitachiContentPlatform

TX TX

YOUR

STRA

TEGY NeedforBetterInsights ToAchieveBetterOutcomes

BigDataAnalytics

ContentExploration

Pentaho

HitachiContentIntelligenceYOUR

INSH

GTS

HITACHIDATASYSTEMS> Contentplatform> Storagesolutions

ThePowerofThree

PENTAHO> DataIntegration> BusinessAnalytics

HITACHIINSIGHTGROUP> Lumada IoT

Summary

Summary

Whatwecoveredtoday:

• ProductVision• Pentaho8.0Release• ProductRoadmap

NextStepsWanttolearnmoreaboutPentaho8.0andproductroadmap?

• Otherrecommendedbreakoutsessions:– ProcessingBigDatawithPentaho:RakeshSaha– OperatingPentahoatScale:JensBleul

• SolutionExpo– Pentaho8.0andBeyond– Lumada IoTPlatform– HitachiContentPlatform– SparkProcessing– Andmore….

Pentaho8.1– Preview

SomeCandidateProjects• EnhancedStreaming• EnhancedProfiling• GoogleCloudPlatform• UnifiedMonitoringandLogging• EnhancedMetadataHandling

Pentaho8.1ExpectedAvailability

Q22017