Pinpoint 로어플리케이션에대한

129
김성욱 정현길 Pinpoint 로 어플리케이션에 대한 Naver 수준의 가시성 확보하기

Transcript of Pinpoint 로어플리케이션에대한

김성욱정현길

Pinpoint 로어플리케이션에대한

Naver 수준의가시성확보하기

• Microservice Architecture• Observability in Naver• Enhance Observability with Pinpoint• Open source Pinpoint• Open source Community• Troubleshooting Distributed Systems• How Pinpoint Works• Troubleshooting with Pinpoint• Future Plans

Index

Microservice Architecture

Microservice Architecture

Microservice Architecture

● Development

● Maintenance

● Reliability

● Scalability

● Cost

● Deployment

● Releasing

Microservice Architecture

● Development

● Maintenance

● Reliability

● Scalability

● Cost

● Deployment

● Releasing

Microservice Architecture

● Development

● Maintenance

● Reliability

● Scalability

● Cost

● Deployment

● Releasing

Microservice-based architectures facilitate continuous delivery and deployment.

Microservice Architecture

https://dzone.com/articles/right-strategies-for-microservices-deployment

Microservice Architecture

Microservice Architecture

Cloud Platform

Messenger

Search Engine

Cryptocurrency Market

R & D Center

Camera Mobile App

Webtoon Platform

AI Speaker

Business Platform

Video Streaming Service

Microservice Architecture

● Deployment Management

● Debugging

Microservice Architecture

● Deployment Management

● Debugging

Microservice Architecture

Microservice Architecture

Microservice Architecture

Microservice Architecture

● Deployment Management

● Debugging

Observability in Naver

● Deployment Management

● Debugging

Observability↑

Observability in Naver

Tracing

Logging Metric

Observability

Logging

Observability in Naver

Tracing

Metric

Observability

그리고사람

Observability in Naver

Tracing

Metric

Observability

Logging

Observability in Naver

Tracing

Logging Metric

Observability

Logging

Observability in Naver

Tracing

Metric

Observability

Logging

Observability in Naver

Tracing

Metric

Observability

Logging

Observability in Naver

Tracing

Metric

Observability

Enhance Observability

Bird Eye ViewFinding Slow TransactionsDistributed TracingDevOpsScalableMinimum Overload

Bird Eye View

• Server Map

Bird Eye View

• Server Map

Bird Eye View

• Server Map

Bird Eye View

Bird Eye View

Bird Eye View

Enhance Observability

Bird Eye ViewFinding Slow TransactionsDistributed TracingDevOpsScalableMinimum Overload

Finding Slow Transaction

• Scatter Chart

Response Timeof each

Transaction

Date & Time

Finding Slow Transaction

Finding Slow Transaction

Slow Transactions

Finding Slow Transaction

Finding Slow Transaction

Finding Slow Transaction

Finding Slow Transaction

• Malfunctioning in Real Server

Finding Slow Transaction

• Malfunctioning in Real Server

Finding Slow Transaction

• Malfunctioning in Real ServerReboot

Finding Slow Transaction

• Response Summary chart

• Load chart

Enhance Observability

Bird Eye ViewFinding Slow TransactionsDistributed TracingDevOpsScalableMinimum Overload

Finding Slow Transaction

Distributed Tracing

List of Selected Transaction

The Call Stack with Code Level

• Call Stack

Distributed Tracing

Distributed Tracing

Distributed Tracing

Enhance Observability

Bird Eye ViewFinding Slow TransactionsDistributed TracingDevOpsScalableMinimum Overload

DevOps

Can be used as HealthCheck

• Realtime Active Thread Chart

DevOps

DevOps

DevOps

DevOps

DevOps

Clean =

DevOps

• Heap, Non-Heap Memory• JVM/SYSTEM CPU• JVM GC• TPS, Active Thread• Response Time• File Descriptor• Direct/Mapped Buffer• Data Source

Basic Info of the Instance

• Inspector

Enhance Observability

Bird Eye ViewFinding Slow TransactionsDistributed TracingDevOpsScalableMinimum Overload

Scalable

collectorcollectorcollectorcollector Hbase

Application

agent

Application

agent

70 billion Span Chunk

(trace segment) per day

.

.

.

• Fully functioning in Naver

Application

agent

Scalable

collectorcollectorcollectorcollector Hbase

70 billion Span Chunk

(trace segment) per day

63 region server17 CollectorsApplication

agent

Application

agent

.

.

.

• Fully functioning in Naver

Application

agent

12,000+ Pinpoint Agent

12,000+ Pinpoint Agent

Scalable

collectorcollectorcollectorcollector Hbase

70 billion Span Chunk

(trace segment) per day

63 region server17 CollectorsApplication

agent

Application

agent

.

.

.

• Fully functioning in Naver

Application

agent

Peek TPS : 870K Write TPS : 1M1,800+ Application

12,000+ Pinpoint Agent

Scalable

collectorcollectorcollectorcollector Hbase

x 470 billion Span Chunk

(trace segment) per day

63 region server17 CollectorsApplication

agent

Application

agent

.

.

.

• Fully functioning in Naver

Application

agent

Peek TPS : 870K Write TPS : 1M1,800+ Application

Enhance Observability

Bird Eye ViewFinding Slow TransactionsDistributed TracingDevOpsScalableMinimum Overload

Minimum Overload

• Execute Integration Tests Periodically

- Less than 3% difference in performance

- Sampling ‘No-Agent’, ‘5%’, ‘100%’

Open Source Pinpoint

2012.07 DevelopmentStart

2014.01 First useof Pinpoint

2014.12 90% teams switched fromcommercial APM

2015.01 Startingopen source

2017.12 Reached5000 stars

Open Source Pinpoint

Open Source Pinpoint

Open Source Pinpoint

• Lost of Top 10 IT companies in China• Global enterprises in Korea• Various IT companies in USA• Companies in Financial Industry

Open Source Community

Open Source Community

Open Source Community

Open Source Community

Open Source Community

Open Source Community

Open Source Community

Open Source Community

Open Source Community

GOOD HOSTING!!!!

Open Source Community

Open Source Community

Open Source Community

Open Source Community

Open Source Community

Microservice Challenges

More areas that can fail

99% success rate over 10 components -> 90.5% success rate

Network error much more relevant

Increase in latency

1s 99th percentile over 10 components -> approx. 80th percentile

Observability greatly reduced

Individual servers does not tell the whole story

Traditional way of troubleshooting no longer works

Troubleshooting Microservices

USER ApiGateway Shopping-API

Auth. etc

USER ApiGateway Shopping-API

Your API is slow

Troubleshooting Microservices

Troubleshooting Microservices

Troubleshooting Microservices

USER ApiGateway

Order

Product

Shopping-API

Troubleshooting Microservices

USER ApiGateway

Order

Product

Shopping-API ???

??? ???

???

???

???

???

Troubleshooting Microservices

USER ApiGateway

Order

Product

???

??? ???

???

???

???

???

GET /shopping/products/12345

POST /shopping/orders

Distributed Transactions

Shopping-API

Troubleshooting Microservices

@honest_update 트위터

www.slideshare.net/alvarosanchezmariscal/stateless-authentication-for-microservices

So what do we need?

Context

Identify as part of the same request within a single service instance

Identify as part of the same transaction within distributed service instances

Context propagation across thread, or process boundary

Order

Timestamp not enough

Distributed Nodes - time skew

Asynchronous Processing - delayed execution

Structure

Order - 1 dimensional

Call tree requires depth as well

Pinpoint

Call Stack Trace

Traces everything that happens in a single instance

Context propagated via thread-local

Order and structure inherently provided by emulating the call stack

Distributed Transaction Trace

Stitches multiple call stacks under the same transaction

Context propagated via RPC

Order and structure via parent-child relationship of distributed call stacks

Server Map

Server Map

Server Map

Server Map

Server Map

Distributed Call Tree

Distributed Call Tree

Distributed Call Tree

HttpClient.execute()

Distributed Call Tree

HttpClient.execute()

Distributed Call Tree

HttpClient.execute()

Tomcat.receive()

Tomcat.receive()

Distributed Call Tree

FRONT-WEB

BACKEND-WEB

How it all works

Call Stack Trace

Distributed Transaction Trace

Call Stack Trace

invoke(); -> doGet(); -> demo2();

Call Stack Trace

invoke(); -> doGet(); -> demo2();

doGet() {

demo2();

}

demo2() {

}

invoke() {

doGet();

}

Call Stack Trace

invoke() {

doGet();

}

invoke(); -> doGet(); -> demo2();

doGet() {

demo2();

}

demo2() {

}

invoke() {

doGet() {

demo2() {

}

}

}

Call Stack Trace

invoke() {

doGet();

}

invoke(); -> doGet(); -> demo2();

doGet() {

demo2();

}

demo2() {

}

invoke() {

doGet() {

demo2() {

}

}

}demo2()

doGet()

invoke()

Depth

Call Stack Trace

demo2()

doGet()

invoke()

Sequence

0

1

2

Call Stack Trace

Sequence

0

1

2

Depth

0

1

2demo2()

doGet()

invoke()

Call Stack Trace

Sequence

0

1

2

Depth

0

1

2demo2()

doGet()

invoke()

Sequence

0

1

2

Depth

0

1

1demo2()

doGet()

invoke()

Call Stack Trace

foo() {

fooInterceptor.before();

fooInterceptor.after();

}

Call Stack Trace

invoke() {

invokeInterceptor.before();

doGet() {

doGetInterceptor.before();

demo2() {

demo2Interceptor.before();

demo2Interceptor.after();

}

doGetInterceptor.after();

}

invokeInterceptor.after();

}

Call Stack Trace

invoke() {

invokeInterceptor.before();

doGet() {

doGetInterceptor.before();

demo2() {

demo2Interceptor.before();

demo2Interceptor.after();

}

doGetInterceptor.after();

}

invokeInterceptor.after();

}

Seq. Event Depth

3

2

1

0 invoke() 0

before()

Increment sequence

Push on to the call stack

push

Call Stack Trace

invoke() {

invokeInterceptor.before();

doGet() {

doGetInterceptor.before();

demo2() {

demo2Interceptor.before();

demo2Interceptor.after();

}

doGetInterceptor.after();

}

invokeInterceptor.after();

}

Seq. Event Depth

3

2

1 doGet() 1

0 invoke() 0

before()

Increment sequence

Push on to the call stack

push

Call Stack Trace

invoke() {

invokeInterceptor.before();

doGet() {

doGetInterceptor.before();

demo2() {

demo2Interceptor.before();

demo2Interceptor.after();

}

doGetInterceptor.after();

}

invokeInterceptor.after();

}

Seq. Event Depth

3

2 demo2() 2

1 doGet() 1

0 invoke() 0

before()

Increment sequence

Push on to the call stack

push

Call Stack Trace

invoke() {

invokeInterceptor.before();

doGet() {

doGetInterceptor.before();

demo2() {

demo2Interceptor.before();

demo2Interceptor.after();

}

doGetInterceptor.after();

}

invokeInterceptor.after();

}

Seq. Event Depth

3

2

1 doGet() 1

0 invoke() 0

after()

Pop off the call stack

Buffer to write queue

pop

Write Queue

demo2()

Call Stack Trace

invoke() {

invokeInterceptor.before();

doGet() {

doGetInterceptor.before();

demo2() {

demo2Interceptor.before();

demo2Interceptor.after();

}

doGetInterceptor.after();

}

invokeInterceptor.after();

}

Seq. Event Depth

3

2

1

0 invoke() 0

after()

Pop off the call stack

Buffer to write queue

Write Queue

demo2()

pop

doGet()

Call Stack Trace

invoke() {

invokeInterceptor.before();

doGet() {

doGetInterceptor.before();

demo2() {

demo2Interceptor.before();

demo2Interceptor.after();

}

doGetInterceptor.after();

}

invokeInterceptor.after();

}

Seq. Event Depth

3

2

1

0

after()

Pop off the call stack

Buffer to write queue

Write Queue

demo2()

pop

doGet() invoke()

Call Stack Trace

Sequence

0

1

2

Depth

0

1

2demo2()

doGet()

invoke()

Distributed Transaction Trace

Node 1 Node 2

Node 4

Node 3RPC 2

RPC 1

RPC 3

Find out the relationship between nodes connected by RPCs for a given transaction

Trace tag added to each requests

- HTTP : HttpHeader

Distributed Transaction Trace

Node 1 Node 2

Node 4

Node 3RPC 2

RPC 1

RPC 3

TraceId

- TransactionId

- SpanId

- Parent SpanId

TxId = Node1^Time^1

SpanId = 1

Parent SpanId = -1

TxId = Node1^Time^1

SpanId = 2

Parent SpanId = 1

TxId = Node1^Time^1

SpanId = 3

Parent SpanId = 2

TxId = Node1^Time^1

SpanId = 4

Parent SpanId = 2

Distributed Transaction Trace

Node 1 Node 2

Node 4

Node 3RPC 2

RPC 1

RPC 3

TransactionId (TxId)

Globally unique ID for a single transaction

TxId = Node1^Time^1

SpanId = 1

Parent SpanId = -1

TxId = Node1^Time^1

SpanId = 2

Parent SpanId = 1

TxId = Node1^Time^1

SpanId = 3

Parent SpanId = 2

TxId = Node1^Time^1

SpanId = 4

Parent SpanId = 2

Distributed Transaction Trace

Node 1 Node 2

Node 4

Node 3RPC 2

RPC 1

RPC 3

TxId = Node1^Time^1

SpanId = 1

Parent SpanId = -1

TxId = Node1^Time^1

SpanId = 2

Parent SpanId = 1

TxId = Node1^Time^1

SpanId = 3

Parent SpanId = 2

TxId = Node1^Time^1

SpanId = 4

Parent SpanId = 2SpanId, Parent SpanId

Id used to encode parent-child relationship between nodes

Distributed Call Tree

Troubleshooting with Pinpoint

USER ApiGateway Shopping-API ???

Let’s troubleshoot our system using Pinpoint

(There’s a link @ www.github.com/naver/pinpoint)

Troubleshooting with Pinpoint

Future plans

Future Plans

Future Plans

Thank you

Homepage : https://naver.github.io/pinpointGithub : https://github.com/naver/pinpointTwitter : https://twitter.com/Pinpoint_APM E-mail : [email protected], [email protected]

Q & A