Do's and Don't in JAVA

It was a long way – more than 10 years since SAP started with Java!

and … Java was too young! Version 1.2 at this time!

3

SAP develops java-based software since more than 10 year.

In 2010 SAP Java Application Server registered a success with SPECjEnterprise

benchmark – a prove that SAP NetWeaver Java Server is scalable and

competitive and is excellent basis for scalable and competitive java applications!

The Java virtual machine itself provides performance optimizations with every

new release. The SPECjbb2005 Benchmark is an advanced simulation of

purchase–delivery process. A comparison with this benchmark between Java 5

and Java 6 shows remarkable increase of throughput.

The same benchmark was executed to compare different competitor JVMs with

SAP JVMs shows that on many platforms SAP JVM is better.

SAP JVM has its own performance optimizations, for example string

concatenation, compressed object references (Oops), tiered compilation (tier1

and tier2). Details could be found on the web pages of SAP JVM.

Even with the old NetWeaver 7.0, which is still running at customer side, now the

customers can switch to and benefit from stable and solid SAP JVM 1.4 as

replacement of the no longer supported by other vendors 1.4 versions of Java.

An application is scalable if it is able to serve additional users with additional

hardware capacity and extended software configuration.

7

The hardware capacity could be extended by adding resources, such as memory

or an additional/upgraded CPU to the same machine (so called vertical scaling),

or adding more machines to the landscape (so called horizontal scaling).

Netweaver Java Server 7.30, similar to previous releases, supports the concept

of extendable java instances and cluster behind central load balancer. The load

balancer is only required if there is more than one Java instance used. With only

one instance the load balancing between the java server nodes is done by ICM.

Due to multi-threaded architecture one java server can utilize multicore machine

pretty well. New Java server node (vertical scaling) is added to bring more

memory to the cluster to handle higher load.

New Java instance on another machine (horizontal scaling) is usually added to

the cluster when capacity of the already allocated machines is used more than

65%.

The components, which are indicated as “Central” are message server, enqueue

and relational database. The cluster throughput could be limited if those

components are not able to scale further.

The thread management system is the backbone of NW AS Java. The concept of thread pool ensures the parallelism of processing requests with no runtime overhead for expensive creation of new threads. Wrong usage of threads can harm scalability. The resources, like db connections, jco connections, etc. are handled by pools of opened physical connections which are logically re-used by the applications in parallel threads. Inappropriate usage of connections can impact scalability. Opening and closing of physical connection at runtime is very expensive and should be avoided. The java memory management with the garbage collection mechanisms is factor which heavily impacts scalability especially when garbage collections take longer.

For high throughput, cluster is required. The server nodes in the cluster need to be consistent to one another. Cluster communication APIs are available to exchange data in cluster, such as membership notifications; replication/updates/invalidations, etc. If business logic needs to be consistent in cluster in many cases logical locking should be applied. With Java Server 7.30 there is Locking service which is implemented to manage locking via SAP enqueue. Logical locking is required for example for deployment or startup of applications in cluster. Database locking is valid for transactional applications, where transaction management mechanisms of the application server should be used.

10

It is not recommended to start new threads from the applications. Not only creation

of a new thread is expensive because it is OS resource but also too many threads

can cause stability problems because other shared resources, like memory and

connections, can get short. For better runtime performance the thread pool size

could be configured with min=initial=max number of threads: expanding or

shrinking of the pool requires synchronization of incoming tasks until the resize is

completed.

The threads in the pool are precious resources: do not block them for periodic

tasks because if the threads are blocked for long time the capacity of the system

for processing parallel requests is affected. For periodic tasks or asynchronous

requests rather use scheduling mechanisms (in Java server 7.30: Job Scheduler),

timeout management (register task class and it will be executed in dedicated

threads by timeout service), and for j2ee applications use Message Driven Beans,

which will be taken care by the platform. Multithreaded environment requires thread-safe coding. Too much synchronization may cause contention. Wrong usage of synchronization can cause deadlocks. Too little synchronization can cause data inconsistency and there is danger of infinite loops.

If you ever tried to resolve a contention problem perhaps you experienced “cascading” effect - the first level synchronization hides the next level of synchronization. If you have provided fix for the first level , it can happen that at the next level contention is even more severe than the one which was fixed and

11

the throughput is not increased but decreased (!) after the fix. Load test verification is mandatory with java applications when changing synchronizations.

11

One example on how contention looks like in thread dump: many threads are

waiting to lock on object which is locked by another thread. If the thread locks

the object for 10 ms and if there are 10 threads waiting to lock the resource, the

last thread would add at least 90 ms wait time to the response time it delivers to

the end user.

With this concrete example the reason for contention is related to getting

connections from the db connections pool. Normally this synchronization is very

cheap but in case there are no free database connections in the pool, the wait

time might be unpredictably long.

SAP delivers appropriate profiler – so called “SAP JVM Profiler” with a

“synchronization trace” feature which can be applied to identify such hotspots.

This trace is to be used preferably with single user requests (not under load).

Deadlock occurs when synchronization (locking) on objects by different threads

is done in different order. With all execution paths of the program the “shared”

structures or objects should be locked in the same order. It is good idea to take

special care of it and evaluate the required synchronizations theoretically in

architecture and development phase.

Insufficient locking can lead to infinite loops or to hard to reproduce and debug

data inconsistencies. It is a scalability problem because if it appears it reduces

the system capacity by the number of threads which fall into infinite looping and

reduce the system capacity by at least one full core/CPU.

Infinite loop can be stopped only by restart of the complete java server,

therefore it is not situation which is affordable for customers.

Opening and closing of physical connections are runtime is expensive. It is

recommended that the db connections pool is configured with a fixed size. Never

allocate JDBC/JCO/other connections in recursive methods or in loops: connections

which are taken cannot be released because the method is not finished, but it cannot

finish because it waits on getting yet another connection. Some persistency

implementations try to automatically detect such application behavior handle it

automatically, but it is additional code to be executed (maybe error prone) so

applications better do not rely on such mechanisms but do correct coding.

Caching of connections in the application layer, outside the main connection pool, is

bad idea; it hardly makes any sense.

The connections should be explicitly closed in try-finally block to return to pool and be

available for other threads.

15

Example on how to correctly close connections and resources.

There are different situations in which Out of Memory Error can happen. For

example, there could be memory leak or there could be simply memory

shortage due to wrong memory sizing. Memory shortage is resolved by adding

more java servers. Adding more java server nodes cannot resolve but only

postpone the crash due to a memory leak.

Typically the reason for memory leakages is wrong decision on scope of

objects. For example, if user session specific data is added to central structures

most probably it will remain on the server after session is terminated. The soft

and weak references are some mechanisms to handle this problem, but there

is different problem with it that there is no exact control what is kept in memory

and what not and some frequently used cached data might require to be re-

generated after collection, and that can require even more resources.

The execution of finalize() method increases the duration for garbage collection

and can also contribute to stability problems.

17

The garbage collection time has influence on the end user response time. If we

have frequent FULL GC the response times of the end user are affected more

by the wait times. Full garbage collection is only responsible for peaks in

response time and temporary reduction of throughput but it is not harming the

overall availability, stability and scalability of the system.

18

Less cluster communication always improves scalability. The APIs for communication

shall be selected appropriate: with Java Server 7.30 no bodies should be send through

message server, but the lazy channels between the different server nodes shall be

used. The volume of data which is exchanged should keep small in all applicable

ways.

The java server may sometimes be unreliable as receiver of data packages from other

server nodes: thus it is almost forbidden to use notifications which require return of

result (answer from the receiving server).

19

When transferring messages, the system uses one of the following types of

communication:

● Message Server Communication – the communication is established

through the Message Server used as a dispatcher when sending messages.

On a Cluster Manager level, a verification is made of the threshold value of the

message body size. If the size is below the threshold value, the message is

sent through the Message Server. If the size is above this value, the connection

is through a specially opened lazy communication channel.

● Lazy Communication – lazy communication is used when transferring large

messages.

This function allows large amounts of information to be exchanged quickly

between two servers without using the Message Server as an intermediary.

Instead, the information is transported through sockets that are opened on both

servers. The main goal is to avoid overloading the Message Server.

20

The most essential aspect of locking with regards to scalability and

performance is the duration of the lock: lock as short as possible and as long

as required.

There are “writer locks” (shared locks) and “reader/writer locks” (exclusive

locks). If “shared lock” is used the readers can still access the data

concurrently. If the isolation is stronger (exclusive lock) throughput and

scalability decrease.

There are different locking techniques:

· Database locks, which is a locking technique provided by the database

vendor. For more information, see the documentation of the database, because

database vendors do not offer uniform semantics for locks.

· Logical locks, which is a locking technique provided and managed centrally

by the Web AS Java. Logical locks are managed by the Enqueue Server via a

central lock table.

21

J2EE applications use the LogicalLocking and TableLocking interfaces provided

by the Locking Adapter Service. These interfaces access the Locking Manager,

which in turn communicates with the Enqueue Server.

If the lifetime of the locks is “user session”, considering the usual default

session duration of 30 minutes, those locks might be really long. It is

recommended not to use such high granularity.

22

The architecture of the application determines to very high degree the resource

consumption (as part of TCO): the customer will have lower hardware and

maintenance/administration cost.

By optimizing the software we reduce the TCO for customer! The best and

most efficient optimizations are achieved when

- scenario execution paths are optimized by truncating all function calls which

can be avoided

- reusing all function results which can be reused to avoid repeatable

calculation

Architecture, which is based on multiple software components and involves a

lot of remote calls will not have good performance.

Design, which is based on multiple software layers and data structures with

high access/time complexity algorithms cannot have good performance.

23

Major impact on resource consumption of the application have the User

interface design, the number of calls and volumes of data which are remotely

send/received between the different systems /clients, the design of the service

and component APIs which are involved in the processing and the appropriate

design/choice of data structures, alignment of data types and correct scope of

data.

24

When designing the user interface the main concern should be about the

volume of data which is transferred to the user UI. Correct planning and

minimization of exchanged data volumes guarantees as low as possible

resource consumption in the entire application layers. The main areas where

unnecessary waste is typically observed is in

- technical key, ids and page layout which if “overhead” information calculated

by server on every request. Especially unnecessary is when applications (UI

frameworks) are using “human” readable / unnecessary long UI element IDs

like in the given example.

- Displaying data which requires “scroll bars” on the end user UI is wasting

resources, because data is fetched which might never be used. Better is to

provide “pagination” functionality and minimize required data for one page.

- Similar when data volume is not predictable (for example amount of “found

search results”). Correct handling there should be planned.

25

No “boomerang” calls to the system itself should be send: if http, web-service

or another call is send from the server to itself there is danger of blocking

system under higher load.

The goal should be to always achieve minimum remote communication (to DB,

to ABAP server, to MDM, to HANA, to TREX, to cloud services, etc.).

Minimal data exchange saves not just network resources, but also memory and

CPU resources.

“Compression” needs to be planned only when it is more efficient than

expensive. It is trade off between size of exchanged data and the resources

required for compression/decompression.

Some protocol optimizations such as MTOM (Message Transmission

Optimization Mechanism) reduce the throughput (bytes send/received) of web-

services but additional CPU and memory resources for text encoding are

required.

26

Performance and scalability depend on the quality of service interfaces and

component APIs.

Merged Data APIs: The server side APIs need to be planned in synch with the

screens, which will be shown in the end user UI and make sure that for the

most performance critical screens always optimal APIs for retrieval of data (with

one remote call) are available.

(Sorted) Pagination: Existence of such APIs is absolutely essential especially

because memory in Java is shared resource and should never be challenged to

hold , even in scope of one request, unpredictably high amount of data. The

pagination implementation is typically provided by the source of information

layer (be it against database, TREX, HANA, remote cloud service, etc.).

Bulk APIs (also called “mass calls”) are very important for scalability – they

save remote communication and simplify a lot the execution flow of the

applications : millions of singleton method invocations can be avoided.

27

The implementation of the Bulk API should not be fake! It is very simple to do

loop around the singleton method to program the mass call. But this does not

help at all the performance and scalability. The mass calls should have their

own implementation, which is optimized as much as possible. With this

example the correct implementation of the mass call is with database statement

which has multiple members in the “where clause” and thus gets advantage of

optimized algorithms at database side for mass data retrieval and avoids

multiple remote calls to the database.

28

The overall decision on scope of data is one of the most important decisions for

achieving optimal memory allocation by the application and improving

scalability with multiple concurrent users.

Stateless applications scale better. But usually application needs to be stateful

for functional reasons. If there is possibility to provide some application screens

in anonymous more (stateless mode) this opportunity should be used. To keep

number of active sessions lower, appropriate timeout should be chosen by the

applications.

Caching is a trade-off between CPU utilization and memory usage, and the

ideal balance for this trade-off depends on how much memory is available. With

too little caching, the desired performance benefit will not be achieved; with too

much, performance may suffer because too much memory is being expended

on caching and therefore not enough is available for other purposes

29

If the cache is too big then more memory will be consumed by cached objects,

which affects available free heap space. If free heap space is low, more

frequent Full GC will happen, because typically cached objects are located in

the tenured space. The Application server may run out of memory even when

no real memory leak.

If the cache is too small, lots of persistency accesses, regeneration of objects,

etc. and corresponding heavy load on CPU and disk I/O will happen.

A general guidance is that data structures, which cost a lot of memory and CPU

in runtime processing should be used rarely, only when no alternatives are

appropriate.

30

The software performance KPIs are defined after the hardware resources like

CPU, memory, disk and network.

Java KPIs do not make major difference, apart from the breakdown of the

memory KPI, which is extended for the logical purpose of the memory

allocation from application standpoint and in the context of the special JVM

Memory Management to “processing memory”, “session memory” and

“framework” memory.

Different layers and applications can be integrated to work together on the

same Java Server. Only very early measurements, evaluations and addressing

optimization requirements to bad performing and non-scalable components can

give chances for success.

In the modern Infrastructures as a Service (IaaS), some KPIs like, for example

CPU and throughput, are often taken in consideration for “billable quotas” and if

total allowed CPU or throughput quota is exceeded the application might not be

further accessible to end users. This emphasis the need to minimize and

optimize resource consumption.

32

For practical reasons the memory KPIs are split into categories: framework

memory, session memory and processing memory. The framework memory is

full of objects initialized at start up and warm up of java server and live as long

as java server is started. The session memory contains only objects which live

until user is active on the system, until session is destroyed or timeout. This

size usually is almost identical to the size of the serialized session object. The

processing memory is only valid in scope of one request and is expected to be

garbage collected already in the Eden space.

If we have to face a choice: it is better to increase processing memory

consumption rather than session memory consumption, because the session

memory is still allocated even if the user is not sending requests for a while,

until it logouts or the session timeouts. The processing memory is only

allocated when the user is actively sending request. The intervals of small

garbage collections, which are supposed to free the processing memory, may

vary in certain ranges without too significant impact. In any case, it is best to

optimize both session memory and processing memory to lowest possible

value.

34

The Java Distributed Statistical Records is available in SAP NetWeaver Web

Administrator in the section for trouble-shooting and analysis. All KPIs related

to resources consumption on Java server itself, as well as number of calls and

bytes exchanged to other systems are provided out of the box (no additional

configurations are required).

Only processing memory per dialog step is measured with JDSR. The session

memory and the framework memory are calculated after analysis of heap

dumps with SAP Memory Analyzer tool.

For Java Applications load testing is simply mandatory. No static check or code

review can guarantee that a concurrency issue did not found its way to the

deliverables. Even a load test cannot guarantee to 100% as not all possible

execution paths could be tested, but at least one should execute load test for

the “bread and butter” scenarios!

The load simulation tool is not sufficient. The metrics which are collected in the

load generation tool are “black-box” data which does not give any insides in

case of problems. To save time and get more confidence from the test results,

automation environment might be used to operate the load generation tool.

Together with the operation of the testing flow, specific log files and other

sources of information will be collected from the system under test.

For load testing in general it is very important to build reliable, realistically

parameterized and randomized scripts.

36

For every typical performance complain SAP is offering tool for deep analysis

and break down of the problem.

Java Distributed Statistical records provide breakdown of the response time to

identify if the slow-down is on Java server side or related to remote component

or service. If the problem is on Java Server side itself, the SAP JVM Profiler

can be applied.

38

The Eclipse Memory Analyzer, developed by SAP, is the tool to analyze any

heap related issues , like too big session space, too big framework space,

memory leaks and so on.

39

Wily Introscope is extendable in development environment too. Probes can be

added on demand to narrow down the problem. It is good idea to instrument the

API which are used for communication between different application layers. Do

not instrument low granularity methods. Methods with execution time less than

50 ms does not make sense to be instrumented – the overhead will be higher

than the benefit.

40

Reference documentation on the mentioned tools is available in the web.

41

The most of our precious time is still spend in arguing and doubts in our own

capability to resolve performance bugs.

Unlike the functional correctness fixes, the performance fixes are typically

wrongly perceived by management as risks and many times avoided! And in

many such situations the pressure to provide the fix comes later, in unplanned

time and escalated by some customer….

42

Here are some coding examples..

The reasons why it was coded this way are unknown: there are more memory

efficient ways to implement this!

Especially with regards to generating log messages without checks of log level

is waste which can be easily avoided by doing boolean checks like beTrace(),

beError(), beDebug() etc. before actual message concatenation.

43

Object by object, even with small delta of few bytes, the java heap gets full.

There are millions of objects generated on every end user request! We should

not allow waste and have a good reason for every object which we create.

44

A very prominent bug is with concatenation of strings which are done in

frequently executed methods.

With this example, before every access to cached value, the programmer

generates the access key to the cache.

Caches are intended to be accessed frequently – the more accesses, the more

efficient the cache is, so here it is not a problem that the cache is accessed

3336 times with get() method. The problem is that same key is generated 3336

times wasting in total about 35 MB of processing memory!

Similar would be the situation if a frequently used toString() method of an object

is creating internally string objects on every invocation.

45

To minimize generation of objects for some most frequently used objects could

be created “Thread Local”-based pools. In this way with no synchronization and

central pool size tuning can be achieved lower memory footprint.

Behind the PreparedStatement class stands the prepared statement cache,

which reduces the runtime and memory consumption for preparation of SQL

statement. This cache is not working if just Statement is used in the code.

46

To take advantage of the cache the statement should be implemented

appropriately avoiding hardcoded parameters in the SQL statement string, but

using the “setter methods” instead.

Network/Disk should be implemented with “buffered” reading with appropriate

buffer size.

Compression should be used when appropriate.

48

This is example of how correct reading of file should be implemented with

reusing the buffer and closing the stream appropriately in finalize() method.

When we decide for appropriate buffer size, the following example can help:

With the big SDA/ SCA files (size of several hundreds of MB) which are used to

pack the SAP software for deployment to java server it will be totally ineffective

to read with buffer which is only few KB big. Appropriate buffer size in this case

could be some tenths of MB big buffer. With regards to http request/response

read channel buffer of 1 MB even will be highly oversized because typically the

http communication between browser and server is only 20-30 KB per request.

49

There is no need to implement own algorithms as alternatives to algorithms

which are provided by the Java Virtual Machine. If something is missing there

or needs to be optimized – address to SAP JVM team, so that every one later

on can benefit from the optimization.

50

The teams are often reluctant to measure performance in test environment with

the arguments that the machines which are used for development and

functional testing are not compliant with reference performance landscapes.

This is true. On the other hand early measurements reduce the risk to find

performance issues and degradations very late after code is submitted and

consumed by other developers.

Development has to ensure no degradation in CPU, memory , # roundtrips, #

bytes send/received from one change list to another. If optimization is

requested it can be verified on relative comparison basis: for example CPU

time was reduced by 15% due to submitted optimized code.

KPIs, such as the “end to end response time” , “remote call duration to ABAP

server”, etc., which may experience network latency delays should be

evaluated carefully to prevent from “false” alarms and wasted time.

52

Jlin has already more than 10 years history in SAP. It has proven its value for

identification of basic performance anti-patterns in java code.

It cannot trouble-shoot all potential performance problems of java applications,

but it ensures that we do not waste time in testing faces to discover and repair

“amateur” mistakes.

53

There are many different variants of automation which are possible. The

number of variants are driven mainly by the goal to reuse as much as possible

already existing functional tests for performance measurements and thus

ensure no additional effort in test maintenance, but only added value of

performance checks.

When the functional tests are implemented with Junit, the performance

measurements can be done by integrating SAP JVM API or JDSR API.

54

When the functional tests automation based on Selenium framework, can

extend to collect performance KPIs from the SAP JVM API and/or JDSR APIs.

55

Load runner based automation makes sense if we have “no user UI based

functions” , such as web-services or proprietary remote interfaces which are

intended for reuse in both single call tests and load tests.

56

Java gives us the opportunity to implement scalable and fast applications : it

depends entirely on developers and architects to use this opportunity!

57

Do's and Don't in JAVA

Documents

Transcript of Do's and Don't in JAVA