HKG15-201: Upstream Kernel Validation

21
Presented by Date Event Upstream Kernel Validation The kernelci.org project Tyler Baker Kevin Hilman Milo Casagrande 10-Feb-15 Linaro Connect HKG15

Transcript of HKG15-201: Upstream Kernel Validation

Presented by

Date

Event

Upstream Kernel ValidationThe kernelci.org projectTyler Baker

Kevin HilmanMilo Casagrande

10-Feb-15

Linaro Connect HKG15

Why is validation needed?

● As of 3.18 more than 500 ARM platforms supported

● With our current resources we are only able to cover 10% of these platform

● This is not good enough

Dashboard - kernelci.org

Landing page describes the current state of the upstream trees, focused on failures

● What builds are failing?● What boots are failing?● What are trending searches?

Dashboard - kernelci.org/job/

What is a job?● A linked collection of build/boot/test

data for a given tree/branch● At a glance it will display

○ Tree/Branch○ Build Status (Passed/Failed Defconfigs)○ Boot Status (Passed/Failed Boots/Tests)

● Job -> Build -> Boots/Tests

Dashboard - kernelci.org/build/

Displays build status● Tree/Branch● Kernel● Board● Defconfig● Arch● Queries supported

○ http://kernelci.org/build/?arm64

Dashboard - kernelci.org/boot/

Displays boot status● Tree/Branch● Kernel● Defconfig● Arch● Queries supported

○ http://kernelci.org/boot/?sunxi

Dashboard - Bisection

Builds and Boots● Displays bisection table● Bad Commit SHA1● Good Commit SHA1● Link to bisection script

Email Reporting

Build Reports● Sent to the mailing list when build

(s) are finished Boot Reports

● Sent to the mailing list one hour after build completes

Upstream Kernel CI

Monitored Kernel Trees

Mailing List

MergeTrigger theCI Process

Report Build

In KernelTests

Boot Tests

Dashboard

Patch Series

1 x86 build4 ARM64 builds121 ARM builds126 Kernel Builds

5 distributed labs - 86 device types - 208 boots

22 Tree/Branch Combinations

~ 30 Minutes

15 - 45 minute per lab

Who / How / with what

urgency are regressions addressed

Development

DepthTests

Boot TestsBoot Tests

In KernelTestsIn KernelTests

DepthTests Boot

Tests

Dashboard Architecture - Frontend● Data driven● Uses the backend API● Acts as a proxy to the backend● Stateless web app (for now)

Web Server (nginx)

Web App (Flask + uwsgi)

BackendCache(Redis)

Dashboard Architecture - Backendhttp://api.kernelci.org

● Provides REST API● Token based auth/auth

Web Server (nginx)

Web App (Tornado Framework)

Database(mongodb)

Task Queue(Celery)

Task Broker(Redis)

Dashboard API● REST based API● Speaks and reads only JSON● Implements GET, POST, DELETE

Examples (curl):curl -X GET http://api.kernelci.org/version

curl -X GET -H ‘Authorization: 12’ http://api.kernelci.org/boot?job=next&kernel=next-20150105&status=FAIL

curl -X POST -H ‘Authorization: 12’ http://api.kernelci.org/job -d “{ … }”

Dashboard APIExamples (Python)headers = {

‘Authorization’: ‘token’

}

params = {

‘job’: ‘next’,

‘kernel’: ‘next-20150105’,

‘status’: ‘FAIL’

}

response = requests.get(‘http://api.kernelci.org/boot’, params=params, headers=headers)

print response.content

More Python examples at: http://api.kernelci.org/examples.html

Summary of where are we today

○ Still in ~Alpha/Beta~ status■ system is still subject to large changes and/or pivots

○ Build & Boot testing using dev boards and VMs■ Leveraging 5 independent labs■ There are still conflicts with Linaro CI needs that limit our use of LAVA LAB hardware

○ Testing Statistics■ ___ def configs■ ___ unique devices■ ___ xyz

Lesson’s Learned

● Reporting○ Reporting only failures will help keep the noise to a minimum○ Especially important when there are +200 boots for each tree

● Automation is hard○ Workarounds and hardware hacking

Future Plans

● Dashboard○ Test Capabilities

■ In Kernel Testing● Locking validation● OF Unit Tests● kselftest

■ Community Test Suites● LKP● LTP● Trinity

● Automated Bisection

Future Plans Continued

● Log Searching○ Perform structured queries to help track

regressions■ i.e. show all occurances of

‘DEBUG_LOCKS_WARN_ON’■ filter based on job, kernel, arch,

defconfig, etc.. ● Virtualization

FAQ

● How are lab specific failures dealt with?○ This is a key reason this project is still in

beta. This scenario happens often and we want to ensure the reporting is clean and as noise free as possible

● How can I get my tree tested?

We need your help!

● Monitoring, analysis, and reporting of failures

● Getting started○ https://wiki.linaro.

org/ProductTechnology/kernelci.org

Open Discussion

● How do we track issues?● Infrastructure capacity concerns