Do's and Don't in JAVA
-
Upload
abhishek-kumar -
Category
Documents
-
view
51 -
download
0
description
Transcript of Do's and Don't in JAVA
1
It was a long way – more than 10 years since SAP started with Java!
and … Java was too young! Version 1.2 at this time!
3
SAP develops java-based software since more than 10 year.
In 2010 SAP Java Application Server registered a success with SPECjEnterprise
benchmark – a prove that SAP NetWeaver Java Server is scalable and
competitive and is excellent basis for scalable and competitive java applications!
The Java virtual machine itself provides performance optimizations with every
new release. The SPECjbb2005 Benchmark is an advanced simulation of
purchase–delivery process. A comparison with this benchmark between Java 5
and Java 6 shows remarkable increase of throughput.
The same benchmark was executed to compare different competitor JVMs with
SAP JVMs shows that on many platforms SAP JVM is better.
SAP JVM has its own performance optimizations, for example string
concatenation, compressed object references (Oops), tiered compilation (tier1
and tier2). Details could be found on the web pages of SAP JVM.
Even with the old NetWeaver 7.0, which is still running at customer side, now the
customers can switch to and benefit from stable and solid SAP JVM 1.4 as
replacement of the no longer supported by other vendors 1.4 versions of Java.
An application is scalable if it is able to serve additional users with additional
hardware capacity and extended software configuration.
7
The hardware capacity could be extended by adding resources, such as memory
or an additional/upgraded CPU to the same machine (so called vertical scaling),
or adding more machines to the landscape (so called horizontal scaling).
Netweaver Java Server 7.30, similar to previous releases, supports the concept
of extendable java instances and cluster behind central load balancer. The load
balancer is only required if there is more than one Java instance used. With only
one instance the load balancing between the java server nodes is done by ICM.
Due to multi-threaded architecture one java server can utilize multicore machine
pretty well. New Java server node (vertical scaling) is added to bring more
memory to the cluster to handle higher load.
New Java instance on another machine (horizontal scaling) is usually added to
the cluster when capacity of the already allocated machines is used more than
65%.
The components, which are indicated as “Central” are message server, enqueue
and relational database. The cluster throughput could be limited if those
components are not able to scale further.
The thread management system is the backbone of NW AS Java. The concept of thread pool ensures the parallelism of processing requests with no runtime overhead for expensive creation of new threads. Wrong usage of threads can harm scalability. The resources, like db connections, jco connections, etc. are handled by pools of opened physical connections which are logically re-used by the applications in parallel threads. Inappropriate usage of connections can impact scalability. Opening and closing of physical connection at runtime is very expensive and should be avoided. The java memory management with the garbage collection mechanisms is factor which heavily impacts scalability especially when garbage collections take longer.
For high throughput, cluster is required. The server nodes in the cluster need to be consistent to one another. Cluster communication APIs are available to exchange data in cluster, such as membership notifications; replication/updates/invalidations, etc. If business logic needs to be consistent in cluster in many cases logical locking should be applied. With Java Server 7.30 there is Locking service which is implemented to manage locking via SAP enqueue. Logical locking is required for example for deployment or startup of applications in cluster. Database locking is valid for transactional applications, where transaction management mechanisms of the application server should be used.
10
It is not recommended to start new threads from the applications. Not only creation
of a new thread is expensive because it is OS resource but also too many threads
can cause stability problems because other shared resources, like memory and
connections, can get short. For better runtime performance the thread pool size
could be configured with min=initial=max number of threads: expanding or
shrinking of the pool requires synchronization of incoming tasks until the resize is
completed.
The threads in the pool are precious resources: do not block them for periodic
tasks because if the threads are blocked for long time the capacity of the system
for processing parallel requests is affected. For periodic tasks or asynchronous
requests rather use scheduling mechanisms (in Java server 7.30: Job Scheduler),
timeout management (register task class and it will be executed in dedicated
threads by timeout service), and for j2ee applications use Message Driven Beans,
which will be taken care by the platform. Multithreaded environment requires thread-safe coding. Too much synchronization may cause contention. Wrong usage of synchronization can cause deadlocks. Too little synchronization can cause data inconsistency and there is danger of infinite loops.
If you ever tried to resolve a contention problem perhaps you experienced “cascading” effect - the first level synchronization hides the next level of synchronization. If you have provided fix for the first level , it can happen that at the next level contention is even more severe than the one which was fixed and
11
the throughput is not increased but decreased (!) after the fix. Load test verification is mandatory with java applications when changing synchronizations.
11
One example on how contention looks like in thread dump: many threads are
waiting to lock on object which is locked by another thread. If the thread locks
the object for 10 ms and if there are 10 threads waiting to lock the resource, the
last thread would add at least 90 ms wait time to the response time it delivers to
the end user.
With this concrete example the reason for contention is related to getting
connections from the db connections pool. Normally this synchronization is very
cheap but in case there are no free database connections in the pool, the wait
time might be unpredictably long.
SAP delivers appropriate profiler – so called “SAP JVM Profiler” with a
“synchronization trace” feature which can be applied to identify such hotspots.
This trace is to be used preferably with single user requests (not under load).
Deadlock occurs when synchronization (locking) on objects by different threads
is done in different order. With all execution paths of the program the “shared”
structures or objects should be locked in the same order. It is good idea to take
special care of it and evaluate the required synchronizations theoretically in
architecture and development phase.
Insufficient locking can lead to infinite loops or to hard to reproduce and debug
data inconsistencies. It is a scalability problem because if it appears it reduces
the system capacity by the number of threads which fall into infinite looping and
reduce the system capacity by at least one full core/CPU.
Infinite loop can be stopped only by restart of the complete java server,
therefore it is not situation which is affordable for customers.
Opening and closing of physical connections are runtime is expensive. It is
recommended that the db connections pool is configured with a fixed size. Never
allocate JDBC/JCO/other connections in recursive methods or in loops: connections
which are taken cannot be released because the method is not finished, but it cannot
finish because it waits on getting yet another connection. Some persistency
implementations try to automatically detect such application behavior handle it
automatically, but it is additional code to be executed (maybe error prone) so
applications better do not rely on such mechanisms but do correct coding.
Caching of connections in the application layer, outside the main connection pool, is
bad idea; it hardly makes any sense.
The connections should be explicitly closed in try-finally block to return to pool and be
available for other threads.
15
Example on how to correctly close connections and resources.
There are different situations in which Out of Memory Error can happen. For
example, there could be memory leak or there could be simply memory
shortage due to wrong memory sizing. Memory shortage is resolved by adding
more java servers. Adding more java server nodes cannot resolve but only
postpone the crash due to a memory leak.
Typically the reason for memory leakages is wrong decision on scope of
objects. For example, if user session specific data is added to central structures
most probably it will remain on the server after session is terminated. The soft
and weak references are some mechanisms to handle this problem, but there
is different problem with it that there is no exact control what is kept in memory
and what not and some frequently used cached data might require to be re-
generated after collection, and that can require even more resources.
The execution of finalize() method increases the duration for garbage collection
and can also contribute to stability problems.
17
The garbage collection time has influence on the end user response time. If we
have frequent FULL GC the response times of the end user are affected more
by the wait times. Full garbage collection is only responsible for peaks in
response time and temporary reduction of throughput but it is not harming the
overall availability, stability and scalability of the system.
18
Less cluster communication always improves scalability. The APIs for communication
shall be selected appropriate: with Java Server 7.30 no bodies should be send through
message server, but the lazy channels between the different server nodes shall be
used. The volume of data which is exchanged should keep small in all applicable
ways.
The java server may sometimes be unreliable as receiver of data packages from other
server nodes: thus it is almost forbidden to use notifications which require return of
result (answer from the receiving server).
19
When transferring messages, the system uses one of the following types of
communication:
● Message Server Communication – the communication is established
through the Message Server used as a dispatcher when sending messages.
On a Cluster Manager level, a verification is made of the threshold value of the
message body size. If the size is below the threshold value, the message is
sent through the Message Server. If the size is above this value, the connection
is through a specially opened lazy communication channel.
● Lazy Communication – lazy communication is used when transferring large
messages.
This function allows large amounts of information to be exchanged quickly
between two servers without using the Message Server as an intermediary.
Instead, the information is transported through sockets that are opened on both
servers. The main goal is to avoid overloading the Message Server.
20
The most essential aspect of locking with regards to scalability and
performance is the duration of the lock: lock as short as possible and as long
as required.
There are “writer locks” (shared locks) and “reader/writer locks” (exclusive
locks). If “shared lock” is used the readers can still access the data
concurrently. If the isolation is stronger (exclusive lock) throughput and
scalability decrease.
There are different locking techniques:
· Database locks, which is a locking technique provided by the database
vendor. For more information, see the documentation of the database, because
database vendors do not offer uniform semantics for locks.
· Logical locks, which is a locking technique provided and managed centrally
by the Web AS Java. Logical locks are managed by the Enqueue Server via a
central lock table.
21
J2EE applications use the LogicalLocking and TableLocking interfaces provided
by the Locking Adapter Service. These interfaces access the Locking Manager,
which in turn communicates with the Enqueue Server.
If the lifetime of the locks is “user session”, considering the usual default
session duration of 30 minutes, those locks might be really long. It is
recommended not to use such high granularity.
22
The architecture of the application determines to very high degree the resource
consumption (as part of TCO): the customer will have lower hardware and
maintenance/administration cost.
By optimizing the software we reduce the TCO for customer! The best and
most efficient optimizations are achieved when
- scenario execution paths are optimized by truncating all function calls which
can be avoided
- reusing all function results which can be reused to avoid repeatable
calculation
Architecture, which is based on multiple software components and involves a
lot of remote calls will not have good performance.
Design, which is based on multiple software layers and data structures with
high access/time complexity algorithms cannot have good performance.
23
Major impact on resource consumption of the application have the User
interface design, the number of calls and volumes of data which are remotely
send/received between the different systems /clients, the design of the service
and component APIs which are involved in the processing and the appropriate
design/choice of data structures, alignment of data types and correct scope of
data.
24
When designing the user interface the main concern should be about the
volume of data which is transferred to the user UI. Correct planning and
minimization of exchanged data volumes guarantees as low as possible
resource consumption in the entire application layers. The main areas where
unnecessary waste is typically observed is in
- technical key, ids and page layout which if “overhead” information calculated
by server on every request. Especially unnecessary is when applications (UI
frameworks) are using “human” readable / unnecessary long UI element IDs
like in the given example.
- Displaying data which requires “scroll bars” on the end user UI is wasting
resources, because data is fetched which might never be used. Better is to
provide “pagination” functionality and minimize required data for one page.
- Similar when data volume is not predictable (for example amount of “found
search results”). Correct handling there should be planned.
25
No “boomerang” calls to the system itself should be send: if http, web-service
or another call is send from the server to itself there is danger of blocking
system under higher load.
The goal should be to always achieve minimum remote communication (to DB,
to ABAP server, to MDM, to HANA, to TREX, to cloud services, etc.).
Minimal data exchange saves not just network resources, but also memory and
CPU resources.
“Compression” needs to be planned only when it is more efficient than
expensive. It is trade off between size of exchanged data and the resources
required for compression/decompression.
Some protocol optimizations such as MTOM (Message Transmission
Optimization Mechanism) reduce the throughput (bytes send/received) of web-
services but additional CPU and memory resources for text encoding are
required.
26
Performance and scalability depend on the quality of service interfaces and
component APIs.
Merged Data APIs: The server side APIs need to be planned in synch with the
screens, which will be shown in the end user UI and make sure that for the
most performance critical screens always optimal APIs for retrieval of data (with
one remote call) are available.
(Sorted) Pagination: Existence of such APIs is absolutely essential especially
because memory in Java is shared resource and should never be challenged to
hold , even in scope of one request, unpredictably high amount of data. The
pagination implementation is typically provided by the source of information
layer (be it against database, TREX, HANA, remote cloud service, etc.).
Bulk APIs (also called “mass calls”) are very important for scalability – they
save remote communication and simplify a lot the execution flow of the
applications : millions of singleton method invocations can be avoided.
27
The implementation of the Bulk API should not be fake! It is very simple to do
loop around the singleton method to program the mass call. But this does not
help at all the performance and scalability. The mass calls should have their
own implementation, which is optimized as much as possible. With this
example the correct implementation of the mass call is with database statement
which has multiple members in the “where clause” and thus gets advantage of
optimized algorithms at database side for mass data retrieval and avoids
multiple remote calls to the database.
28
The overall decision on scope of data is one of the most important decisions for
achieving optimal memory allocation by the application and improving
scalability with multiple concurrent users.
Stateless applications scale better. But usually application needs to be stateful
for functional reasons. If there is possibility to provide some application screens
in anonymous more (stateless mode) this opportunity should be used. To keep
number of active sessions lower, appropriate timeout should be chosen by the
applications.
Caching is a trade-off between CPU utilization and memory usage, and the
ideal balance for this trade-off depends on how much memory is available. With
too little caching, the desired performance benefit will not be achieved; with too
much, performance may suffer because too much memory is being expended
on caching and therefore not enough is available for other purposes
29
If the cache is too big then more memory will be consumed by cached objects,
which affects available free heap space. If free heap space is low, more
frequent Full GC will happen, because typically cached objects are located in
the tenured space. The Application server may run out of memory even when
no real memory leak.
If the cache is too small, lots of persistency accesses, regeneration of objects,
etc. and corresponding heavy load on CPU and disk I/O will happen.
A general guidance is that data structures, which cost a lot of memory and CPU
in runtime processing should be used rarely, only when no alternatives are
appropriate.
30
31
The software performance KPIs are defined after the hardware resources like
CPU, memory, disk and network.
Java KPIs do not make major difference, apart from the breakdown of the
memory KPI, which is extended for the logical purpose of the memory
allocation from application standpoint and in the context of the special JVM
Memory Management to “processing memory”, “session memory” and
“framework” memory.
Different layers and applications can be integrated to work together on the
same Java Server. Only very early measurements, evaluations and addressing
optimization requirements to bad performing and non-scalable components can
give chances for success.
In the modern Infrastructures as a Service (IaaS), some KPIs like, for example
CPU and throughput, are often taken in consideration for “billable quotas” and if
total allowed CPU or throughput quota is exceeded the application might not be
further accessible to end users. This emphasis the need to minimize and
optimize resource consumption.
32
For practical reasons the memory KPIs are split into categories: framework
memory, session memory and processing memory. The framework memory is
full of objects initialized at start up and warm up of java server and live as long
as java server is started. The session memory contains only objects which live
until user is active on the system, until session is destroyed or timeout. This
size usually is almost identical to the size of the serialized session object. The
processing memory is only valid in scope of one request and is expected to be
garbage collected already in the Eden space.
If we have to face a choice: it is better to increase processing memory
consumption rather than session memory consumption, because the session
memory is still allocated even if the user is not sending requests for a while,
until it logouts or the session timeouts. The processing memory is only
allocated when the user is actively sending request. The intervals of small
garbage collections, which are supposed to free the processing memory, may
vary in certain ranges without too significant impact. In any case, it is best to
optimize both session memory and processing memory to lowest possible
value.
34
The Java Distributed Statistical Records is available in SAP NetWeaver Web
Administrator in the section for trouble-shooting and analysis. All KPIs related
to resources consumption on Java server itself, as well as number of calls and
bytes exchanged to other systems are provided out of the box (no additional
configurations are required).
Only processing memory per dialog step is measured with JDSR. The session
memory and the framework memory are calculated after analysis of heap
dumps with SAP Memory Analyzer tool.
For Java Applications load testing is simply mandatory. No static check or code
review can guarantee that a concurrency issue did not found its way to the
deliverables. Even a load test cannot guarantee to 100% as not all possible
execution paths could be tested, but at least one should execute load test for
the “bread and butter” scenarios!
The load simulation tool is not sufficient. The metrics which are collected in the
load generation tool are “black-box” data which does not give any insides in
case of problems. To save time and get more confidence from the test results,
automation environment might be used to operate the load generation tool.
Together with the operation of the testing flow, specific log files and other
sources of information will be collected from the system under test.
For load testing in general it is very important to build reliable, realistically
parameterized and randomized scripts.
36
37
For every typical performance complain SAP is offering tool for deep analysis
and break down of the problem.
Java Distributed Statistical records provide breakdown of the response time to
identify if the slow-down is on Java server side or related to remote component
or service. If the problem is on Java Server side itself, the SAP JVM Profiler
can be applied.
38
The Eclipse Memory Analyzer, developed by SAP, is the tool to analyze any
heap related issues , like too big session space, too big framework space,
memory leaks and so on.
39
Wily Introscope is extendable in development environment too. Probes can be
added on demand to narrow down the problem. It is good idea to instrument the
API which are used for communication between different application layers. Do
not instrument low granularity methods. Methods with execution time less than
50 ms does not make sense to be instrumented – the overhead will be higher
than the benefit.
40
Reference documentation on the mentioned tools is available in the web.
41
The most of our precious time is still spend in arguing and doubts in our own
capability to resolve performance bugs.
Unlike the functional correctness fixes, the performance fixes are typically
wrongly perceived by management as risks and many times avoided! And in
many such situations the pressure to provide the fix comes later, in unplanned
time and escalated by some customer….
42
Here are some coding examples..
The reasons why it was coded this way are unknown: there are more memory
efficient ways to implement this!
Especially with regards to generating log messages without checks of log level
is waste which can be easily avoided by doing boolean checks like beTrace(),
beError(), beDebug() etc. before actual message concatenation.
43
Object by object, even with small delta of few bytes, the java heap gets full.
There are millions of objects generated on every end user request! We should
not allow waste and have a good reason for every object which we create.
44
A very prominent bug is with concatenation of strings which are done in
frequently executed methods.
With this example, before every access to cached value, the programmer
generates the access key to the cache.
Caches are intended to be accessed frequently – the more accesses, the more
efficient the cache is, so here it is not a problem that the cache is accessed
3336 times with get() method. The problem is that same key is generated 3336
times wasting in total about 35 MB of processing memory!
Similar would be the situation if a frequently used toString() method of an object
is creating internally string objects on every invocation.
45
To minimize generation of objects for some most frequently used objects could
be created “Thread Local”-based pools. In this way with no synchronization and
central pool size tuning can be achieved lower memory footprint.
Behind the PreparedStatement class stands the prepared statement cache,
which reduces the runtime and memory consumption for preparation of SQL
statement. This cache is not working if just Statement is used in the code.
46
To take advantage of the cache the statement should be implemented
appropriately avoiding hardcoded parameters in the SQL statement string, but
using the “setter methods” instead.
Network/Disk should be implemented with “buffered” reading with appropriate
buffer size.
Compression should be used when appropriate.
48
This is example of how correct reading of file should be implemented with
reusing the buffer and closing the stream appropriately in finalize() method.
When we decide for appropriate buffer size, the following example can help:
With the big SDA/ SCA files (size of several hundreds of MB) which are used to
pack the SAP software for deployment to java server it will be totally ineffective
to read with buffer which is only few KB big. Appropriate buffer size in this case
could be some tenths of MB big buffer. With regards to http request/response
read channel buffer of 1 MB even will be highly oversized because typically the
http communication between browser and server is only 20-30 KB per request.
49
There is no need to implement own algorithms as alternatives to algorithms
which are provided by the Java Virtual Machine. If something is missing there
or needs to be optimized – address to SAP JVM team, so that every one later
on can benefit from the optimization.
50
51
The teams are often reluctant to measure performance in test environment with
the arguments that the machines which are used for development and
functional testing are not compliant with reference performance landscapes.
This is true. On the other hand early measurements reduce the risk to find
performance issues and degradations very late after code is submitted and
consumed by other developers.
Development has to ensure no degradation in CPU, memory , # roundtrips, #
bytes send/received from one change list to another. If optimization is
requested it can be verified on relative comparison basis: for example CPU
time was reduced by 15% due to submitted optimized code.
KPIs, such as the “end to end response time” , “remote call duration to ABAP
server”, etc., which may experience network latency delays should be
evaluated carefully to prevent from “false” alarms and wasted time.
52
Jlin has already more than 10 years history in SAP. It has proven its value for
identification of basic performance anti-patterns in java code.
It cannot trouble-shoot all potential performance problems of java applications,
but it ensures that we do not waste time in testing faces to discover and repair
“amateur” mistakes.
53
There are many different variants of automation which are possible. The
number of variants are driven mainly by the goal to reuse as much as possible
already existing functional tests for performance measurements and thus
ensure no additional effort in test maintenance, but only added value of
performance checks.
When the functional tests are implemented with Junit, the performance
measurements can be done by integrating SAP JVM API or JDSR API.
54
When the functional tests automation based on Selenium framework, can
extend to collect performance KPIs from the SAP JVM API and/or JDSR APIs.
55
Load runner based automation makes sense if we have “no user UI based
functions” , such as web-services or proprietary remote interfaces which are
intended for reuse in both single call tests and load tests.
56
Java gives us the opportunity to implement scalable and fast applications : it
depends entirely on developers and architects to use this opportunity!
57
59