Copyright © 2015 KNIME.com AG 2
• Two feature releases last year: 2.10 & 2.11
• Documented in “What‘s new summary”, YouTube, Changelogs
Copyright © 2015 KNIME.com AG 6
Release Stats
• KNIME 2.10
– 23 new nodes (node sets)
– 89 enhancements
• KNIME 2.11
– 35 new nodes (node sets)
– 68 enhancements
Copyright © 2015 KNIME.com AG 7
Outline
• Interactive feature demos
• … by the developers*
• Questions? Right after the session
*) with exceptions
Copyright © 2015 KNIME.com AG 8
Agenda
• Social Media Connectors: Google API & Twitter
• Database & Big Data
• Analytics: PMML
• Analytics: Custom Distance Measures, etc
• Enhanced Time Series Analysis
• Interactive Python Scripting
• Tips & Tricks: UI & Utility Nodes
• JSON Processing
Copyright © 2015 KNIME.com AG 11
Twitter Nodes
Twitter API Connector
Twitter Timeline
Twitter Search
Twitter Users
11
Copyright © 2015 KNIME.com AG 16
Database Connectors
• Preconfigured connectors
– DB specific dialog
– DB specific behavior/capability• SQL dialects
• Aggregation methods
– Bundle necessary JDBC drivers
• General database connector
– Can connect to any JDBC source
16
Copyright © 2015 KNIME.com AG 17
New Utility Nodes
• Drop table
– missing table handling
– cascade option
• Execute any SQL statement also DDL
• Manipulate existing queries
17
Copyright © 2015 KNIME.com AG 18
New Manipulation Nodes
• Filter rows and columns
• Join tables/queries
• Sort your data
• Aggregate your data
• User friendly dialogs
18
Copyright © 2015 KNIME.com AG 20
Pre-processing on Hadoop
• KNIME Big Data Extension (commercial)
– Package required drivers/libraries for specific HDFS, Hive, Impala access
– Runs on Hadoop
• Preconfigured connectors
– Hive (Big Data Extension)
– Cloudera Impala (Big Data Extension)
– HP Vertica (OS)
– ParStream (Partner)
Copyright © 2015 KNIME.com AG 21
Hive/Impala Loader
21
• Upload a KNIME data table to Hive/Impala
• Part of the commercial Big Data Extension
Copyright © 2015 KNIME.com AG 22
Pre-processing Example Smart Meters: Classic
Irish Smart Energy Meter Trials• July 2009 – Dec 2010• 6000 meters• roughly 176m rows of data
Copyright © 2015 KNIME.com AG 23
Pre-processing Example Smart Meters: on Hadoop
Transformation/Aggregation• Before: >1 day• Hadoop/ParStream: <.5h
http://www.knime.org/white-papers#bigdata
Copyright © 2015 KNIME.com AG 24
HDFS File Handling
• New nodes
– HDFS Connection
– HDFS File Permission
• Utilize the existing remote file handling nodes
– Upload/download files
– Create/list directories
– Delete files
24
Copyright © 2015 KNIME.com AG 27
Modular PMML
• Predictive Model Markup Language
• Documents your preprocessing and modeling results
• Built in a modular fashion parallel to the data flow
• Start with an empty PMML document,add transformations and finally a model
• Consumable by various predictors
Copyright © 2015 KNIME.com AG 28
PMML Translation
• PMML to SQL
– Execution of models directly in the database
– No data transfer to KNIME required
• PMML to Java
– Model is compiled into Java bytecode and executed
– Up to 4 times faster execution
– Preprocessing and model evaluation in one node
Copyright © 2015 KNIME.com AG 30
Distance Functions
• Used in many data mining algorithms
– Clustering
– Classification (kNN)
• Similarity search
• (Address) Deduplication
• New: Extensible framework to define such measures
Copyright © 2015 KNIME.com AG 31
Custom Distance Functions
• Separate port object representing distance function
• Parameterized distance functions, e.g.Tanimoto vs. Dice Tversky
31
Copyright © 2015 KNIME.com AG 32
Target Shuffling
• Assess validity of your data mining result
• Easier identify “overlearning” with many variables
• Idea: Resample target in training data to break relationship between in- and output
• Learn & score a model
• Do this many times and compareto model on unmodified data
Copyright © 2015 KNIME.com AG 36
Python: Integrated into KNIME
• General-purpose language
• Wide spread use in chemistry and life science
• Wide range of libraries, including:
– Scientific computing
– Text processing
– Image processing
Copyright © 2015 KNIME.com AG 37
Python: Pandas as data representation
• Provides data structures and data analysis tools
• Builds on the scientific computing library NumPy
• KNIME table ↔ Pandas DataFrame
Copyright © 2015 KNIME.com AG 38
Python: Nodes
• Source
• Script
• Script (2:1)
• View
• Learner
• Predictor
• Object Reader
• Object Writer
Copyright © 2015 KNIME.com AG 41
UI Improvements
• Welcome Page: Overview of updates & Shortcuts to most recently used workflows
• Quick Node Insertion: Try “Ctrl-Space”Search and insert nodes quickly
• Auto-Save: Keep shadow copy of workflow
Copyright © 2015 KNIME.com AG 42
Utility Nodes: Data Validation
• Assert table structure at runtime - fail or fix
– Useful in workflow automatization
– Useful in conjunction with “to Report” nodes
Copyright © 2015 KNIME.com AG 43
Utility Nodes: Auto Type Cast
• Guess (primitive) column types for all-string tables
• Useful for parsing configuration files
Copyright © 2015 KNIME.com AG 44
JavaScript Nodes
44
• Same representation in Client and Webportal
• Interactivity
Copyright © 2015 KNIME.com AG 48
Bit & Byte Vector Nodes
• Nodes to parse/write bit & byte vectors
• Distance calculation & association rule learning
48
Copyright © 2015 KNIME.com AG 49
JSON Type & Utility Nodes
• JSON – JavaScript Object Notation
• Hierarchical data format
Copyright © 2015 KNIME.com AG 50
JSON Type & Utility Nodes - Applications
• Web Service consumption ((K)REST)
• NoSQL databases / MongoDB
• KNIME Web Service
Top Related