JESSICA Holding Fund in Bulgaria - Status, types and selection of UDFs – Next steps
Processing Java UDFs in a C++ Environment - user.tu-berlin.de · Wildfire 3 • HTAP prototype...
Transcript of Processing Java UDFs in a C++ Environment - user.tu-berlin.de · Wildfire 3 • HTAP prototype...
-
Viktor Rosenfeld, Rene Mueller, Pınar Tözün, Fatma Özcan IBM Research – Almaden
SoCC '17, Santa Clara – September 27, 2017
Processing Java UDFs in a C++ Environment
-
Programming languages split
2
liberal use of UDFs
JVM-based data analytics ecosystem
Software components that do not use the JVM
“native code”
-
Wildfire
3
• HTAP prototype from IBM Research [CIDR2017] • C++ columnar engine • Spark as user-facing front end • data analytics through SparkSQL queries • SparkSQL queries can contain Scala UDFs
Goal: Execute Scala UDFs inside SparkSQL queries on the Wildfire C++ engine.
-
Java Native Interface
4
native codeJVM
Java Native Interface (JNI)
-
JNI overheads
5
• Simple Java function called in a loop • 1 billion iterations • JNI overhead: 2 orders of magnitude • Splitting the loop (strided execution)
hides the overhead
int udf(int i) { return i; }
●
●
●
●
● ● ● ● ● ●
1
10
100
100 101 102 103 104 105 106 107 108 109
Stride size [logscale]
Seco
nds
[logs
cale
]
Loop in Java
Loop in C / Function called via JNI
Split loop / 1 JNI call per stride
• JNI calls have significant overhead • Execute tuple-based UDF in a strided fashion
-
SparkSQL UDFs
6
var offset = 10 sqlContext.udf.register("add_offset", (i: Int) => i + offset) sqlContext.sql("SELECT add_offset(i) FROM table").show()
Usage in Spark
Java class representationpublic final class SparkProgramm$$anonfun$run$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable { public SparkProgram$$anonfun$run$1(scala.runtime.IntRef); public final int apply(int); ... }
var offset = 10 sqlContext.udf.register("add_offset", (i: Int) => i + offset) sqlContext.sql("SELECT add_offset(i) FROM table").show()
public final class SparkProgramm$$anonfun$run$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable { public SparkProgram$$anonfun$run$1(scala.runtime.IntRef); public final int apply(int); ... }
var offset = 10 sqlContext.udf.register("add_offset", (i: Int) => i + offset) sqlContext.sql("SELECT add_offset(i) FROM table").show()
public final class SparkProgramm$$anonfun$run$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable { public SparkProgram$$anonfun$run$1(scala.runtime.IntRef); public final int apply(int); ... }
SparkSQL UDFs represented both as Java class and instance.
-
Spark
UDF
Bytecode
Instance
• Instantiate UDF in embedded JVM • Instantiate UDF in embedded JVM • JIT-compile strided execution wrapper • Instantiate UDF in embedded JVM • JIT-compile strided execution wrapper • Wrap engine buffers as Java direct ByteBuffers
Wildfire engine nodeEmbedded JVM
Bytecode copy
Instance copy
Embedded JVM
7
Strided execution wrapper
Java direct ByteBuffers
Buffers
Output
Inputs
-
Embedded JVM performance
8
• Comparable performance to execution in Spark • Strided execution is key to fast performance
Wal
l−cl
ock
time
[rela
tive
to S
park−o
nly]
1 0.65
8.91
0123456789
10(bandwidth−bound)Range query
10.89
1.07
0
1
(compute−bound)Vincenty formulae
1 0.83
3.77
0
1
2
3
4
(object creation)Word length
Spark−only Strided execution Unstrided execution
Wal
l−cl
ock
time
[rela
tive
to S
park−o
nly]
1 0.65
8.91
0123456789
10(bandwidth−bound)Range query
10.89
1.07
0
1
(compute−bound)Vincenty formulae
1 0.83
3.77
0
1
2
3
4
(object creation)Word length
Spark−only Strided execution Unstrided execution
Wal
l−cl
ock
time
[rela
tive
to S
park−o
nly]
1 0.65
8.91
0123456789
10(bandwidth−bound)Range query
10.89
1.07
0
1
(compute−bound)Vincenty formulae
1 0.83
3.77
0
1
2
3
4
(object creation)Word length
Spark−only Strided execution Unstrided execution
Wal
l−cl
ock
time
[rela
tive
to S
park−o
nly]
1 0.65
8.91
0123456789
10(bandwidth−bound)Range query
10.89
1.07
0
1
(compute−bound)Vincenty formulae
1 0.83
3.77
0
1
2
3
4
(object creation)Word length
Spark−only Strided execution Unstrided execution
-
Can we do better?
9
-
Wildfire
Object codeLLVM MCJIT
JIT compilation to machine code
• Compile bytecode to LLVM IR
•10
Spark
Bytecode
Instance
Wrapper LLVM IR
Bytecode LLVM IR
BugVM compiler
Instance pointer
BugVM JVM
• Compile bytecode to LLVM IR • Generate wrapper LLVM IR • Compile bytecode to LLVM IR • Generate wrapper LLVM IR • Link both to object code
• Compile bytecode to LLVM IR • Generate wrapper LLVM IR • Link both to object code
• Deserialize instance • Compile bytecode to LLVM IR • Generate wrapper LLVM IR • Link both to object code
• Deserialize instance • Dynamically load and
execute object code
-
JIT compilation performance
11
Wal
l−cl
ock
time
[rela
tive
to S
park−o
nly]
0.65
0.51
0.75
0
0.5
1
(bandwidth−bound)Range query
0.89
0.51 0.51
0
0.5
1
(compute−bound)Vincenty formulae
0.83
0.39
1.39
0
0.5
1
1.5
(object creation)Word length
Embedded JVM Hand−written JIT−compiled Spark−only
Compilation to machine code is beneficial if UDF is computationally heavy and does not create objects.
Wal
l−cl
ock
time
[rela
tive
to S
park−o
nly]
0.65
0.51
0.75
0
0.5
1
(bandwidth−bound)Range query
0.89
0.51 0.51
0
0.5
1
(compute−bound)Vincenty formulae
0.83
0.39
1.39
0
0.5
1
1.5
(object creation)Word length
Embedded JVM Hand−written JIT−compiled Spark−only
Wal
l−cl
ock
time
[rela
tive
to S
park−o
nly]
0.65
0.51
0.75
0
0.5
1
(bandwidth−bound)Range query
0.89
0.51 0.51
0
0.5
1
(compute−bound)Vincenty formulae
0.83
0.39
1.39
0
0.5
1
1.5
(object creation)Word length
Embedded JVM Hand−written JIT−compiled Spark−only
Wal
l−cl
ock
time
[rela
tive
to S
park−o
nly]
0.65
0.51
0.75
0
0.5
1
(bandwidth−bound)Range query
0.89
0.51 0.51
0
0.5
1
(compute−bound)Vincenty formulae
0.83
0.39
1.39
0
0.5
1
1.5
(object creation)Word length
Embedded JVM Hand−written JIT−compiled Spark−only
-
Word length UDF optimizations
for i ← 1 to size of input do javaString ← CreateJavaString(inputi) outputi ← WordLengthUdf(inputi) CheckForJavaException() ReleaseJavaObject(javaString) end
12
These optimizations violate Java language guarantees (e.g., immutability of Strings)!
We can apply them because we know that the UDF does not leak or retain the String object.
for i ← 1 to size of input do javaString ← CreateJavaString(inputi) outputi ← WordLengthUdf(javaString) CheckForJavaException() ReleaseJavaObject(javaString) end
for i ← 1 to size of input do javaString ← CreateJavaString(inputi) outputi ← WordLengthUdf(javaString) CheckForJavaException() ReleaseJavaObject(javaString) end
UDF wrapper Optimizations
1. Reuse javaString object 2. Eliminate data copies
when creating javaString 3. Remove memory fence 4. Relax exception handling
1. Reuse javaString object 2. Eliminate data copies
when creating javaString 3. Remove memory fence
-
Word length UDF performance
13
1.391.12
0.55 0.5 0.42 0.39
0.0
0.5
1.0
1.5
JIT−compiled ReuseStringobject
Wrapenginebuffer
Relaxexceptionhandling
Removememory
fence
Hand−written
Wal
l−cl
ock
time
[rela
tive
to S
park−o
nly]
Spark−only
These optimizations violate Java language guarantees!
-
Summary
• Transparent strided execution of tuple-based SparkSQL UDFs inside a C++ engine
• Comparable performance to execution in Spark • Java direct ByteBuffers simplify passing data between
C++ code and the embedded JVM • Compilation to machine code is beneficial if UDF is
computationally heavy and does not create objects
14
-
Backup
15
-
Comparison with SQL
16
Range query with UDF predicate -- def filter(a: Int, low: Int, high: Int) -- = low < a && a < high SELECT sum(a) FROM R WHERE filter(a, min, max)
Range query with SQL predicate SELECT sum(a) FROM R WHERE min < a AND a < max
0.65
0.51
0
1
UDFpredicate
SQLpredicate
Wal
l−cl
ock
time
[rela
tive
to S
park−o
nly]
-
Embedded JVM performance
17
• 2.5B integers (10 GB) • Stride size 4096
• 64M points (1 GB) • Stride size 512
• 80M words (1 GB) • Stride size 1024 • Word length follows
poisson distribution• Intel Xeon X7560, 2.27 GHz, 512 GB RAM • Oracle JDK 112 • Data stored as uncompressed Parquet files with RLE/PLAIN encoding
Wal
l−cl
ock
time
[rela
tive
to S
park−o
nly]
1 0.65
8.91
0123456789
10(bandwidth−bound)Range query
10.89
1.07
0
1
(compute−bound)Vincenty formulae
1 0.83
3.77
0
1
2
3
4
(object creation)Word length
Spark−only Strided execution Unstrided execution
-
State of the art
• Impala: supports C++ UDFs (fast) and unmodified Hive UDFs (tuple-based, slow, discouraged)
• Vertica: User-defined Transform Functions that process multiple rows via an iterator
• Tupleware: UDFs in LLVM-based languages
18
-
Wildfire system architecture
19
Analytics
tolerate slightly stale data
Analytics
require most recent data
Transactions
high volume
Wildfireengine
Spark executor
Spark executor
Wildfireengine
Wildfireengine
Wildfireengine
Spark executor
Spark executor
Spark executor
Shared file system
Spark applications
Wildfireengine
-
Java direct ByteBuffers
• Wraps memory that is not managed by the JVM • Access in Java through typed getter and setter methods • JVM will make “best effort” to avoid unnecessary data copies • Primitive types: equivalent to typed array • String objects: data must be copied into temporary array
20
-
Strided execution wrapper
21
public class StridedExecutionWrapper { public static void wrapUdf( UdfClass udfInstance, int numRows, ByteBuffer output, ByteBuffer auxiliaryOutput, ByteBuffer[] inputs, ByteBuffer[] auxiliaryInputs) { for (int i = 0; i < numRows; ++i) { output.putX(udfInstance.apply( inputs[0].getX(), … ,inputs[N].getX())); } }
-
Supported types
Supported • Primitive types (no extra copies) • Strings (copy into temporary array)
Not supported in prototype • User-defined types require object creation (straightforward
if type can be decomposed into primitives) • Complex nested types cannot be used with scalar UDFs
22
-
Security considerations
• UDFs must not crash database • typically executed in separate process • JVM offers similar separation • errors conditions are signalled as exceptions • UDFs can further be restricted through Java
SecurityManagers
23
-
Thread scaling
24
●
●
●
●
●
●●
1
2
4
8
16
1 2 4 8 16 32 64Number of threads [logscale]
Spee
dup
[logs
cale
] ● SQL predicateUDF predicate
-
Auxiliary buffer
ValueLength
CustomString Object
Length
Pointer
set
wrap
1 Object allocation
BugVM JVMEngine
Modified Java String implementation
25
Auxiliary buffer
ValueLength
String Object
Length
Char Array
set
copy
2 Object allocations
BugVM JVMEngine
Default implementation
Auxiliary buffer
ValueLength
ReusedString Object
Length
Char Array
set
copy
1 Object allocation
BugVM JVMEngine
Reuse string buffer
Wrap engine buffer