Huawei HiAI Unlock the Future - connect.linaro.org.s3...

Huawei HiAI Unlock the FutureEric Zhou

Kirin Chipset Solution Planning Directorzhouchen@huawei.com

AI is coming round● Get AI for your apps● 25 Apps, 9 with AI

The 25 Best Apps of the Year So Far (2017.6)

Why On-device AI?

Real-time Privacy

AI Knowledge in an instant burst

AI Knowledge processing in real time is key point

Mobile AI

Processing

Knowledge

A real APP needs More Resource

CPUGPU

• Task scheduling• Load balance• Memory

allocation

• UI rendering• Graphics

processing

• AI Computing

ISP/DSP

• Camera 3A• Image processing

Work load for a AI APP

CPU little core

CPU big core

Two Key Questions for AI APPs

●What Performance we need?

●Who will make the APPs?

Real-time Processing is the Requirement

5fps 15fps >25fps

To Match Real-time AI Processing Ruequirement

● The NPU supports a dedicated set of AI instructions for neural network model operations that allow more efficient parallel execution of more neural network operators within minimal clock cycles.

ConvolutionDeconvolutionPoolingReLUNormalizeBatchNormFullConnectionSigmoid

CPU = Scalar

• General-purpose computing • Logic control

GPU = Vector

• Image processing• Large-scale parallel

computing

NPU = Tensor

• Knowledge model processing

• Dedicated AI Instruction-Set

• Large-scale parallel computing

High Performance Density

5.5Billion Transistors

Die Size NN Perf.

NPU 3% 100%

CPU 6% 5%

GPU 18% 60%

What developers care about ?

● Development Cost○ The porting cost ： operators ， tools ， documents

and FAE

● Benefits○ users ， downloads ， marketing

● Intellectual property protection○ Model (algorithm) encryption

HiAI Mobile Computing Platform● Open AI ecosystem ● 3rd parties APPS empowered by HiAI mobile computing platform

APP APP

Caffe/Caffe2…

HiAI APIs

HiAI Model and Resouse Management/Runtime

GPU DSP NPU

TensorFlow/TensorFlow Lite

What HiAI provides?● Provides frequently used artificial intelligence

function APIs which can efficiently run on mobile devices.

● Supports popular frameworks.● Provides processor-independent acceleration APIs

to allow the Kirin hardware to accelerate model and operator computing.

● Schedules heterogeneous resources flexibly to enable developers to accelerate neural network model and operator computing.

Tool chainComplete documentsAbundant APIs Easy-to-use

source code

HiAI SDK V100

Framework

Operators

Documents and Support

Caffe, TensorFlow

HiAI SDK V150

Caffe, TensorFlowTensorFlow LiteHuawei HiAI SDKAndroid NN

Command line

User manualSource codeFAE

FAQ / Open classNew more sample code and documentsMore FAE supporting

Android Studio plug-inCompile toolSparsifying toolCode automatically generated Log tools

What kinds of APP that HiAI can unlock

Short videos and live streaming

Facial recognition, gesture recognition, portrait segmentation, posture recognition, and video stylization

Social platform

Photo classification, image recognition, and super-resolution imaging

ARDepth estimation, light estimation, environmental understanding, and simultaneous localization and mapping (SLAM)

Photo taking and retouching

Facial beautification and image enhancement

Shopping Shopping through image recognition

Translation and words processing

Instant camera translation, optical character recognition (OCR), word splitting, named entity recognition (NER), textual emotion recognition, and intelligent textual response

Portrait Segmentation

HiAICPU

Shopping through image recognition on device

● The HiAI mobile computing platform supports sparse model acceleration. The NPU can skip the multiply-add algorithms with a coefficient of zero, which can greatly improve the computing efficiency and reduce the bandwidth while maintaining computing precision.

● The HiAI mobile computing platform supports 8-bit and 1-bit quantization, effectively reducing the computing bandwidth and storage consumption and improving energy efficiency.

HiAI Mobile Computing Platform Technologies

ToolsTools

Energy Efficiency Ratio During CNN Model Execution

HiAI Mobile Computing Platform Execution● As shown in the figure below, by using conversion tools, a trained neural network model

is converted into an offline model that can be efficiently executed on the HiAI mobile computing platform, and output as the binary file, the offline model. The main purpose of converting the standard neural network model (such as Caffe) into an offline model is to optimize the network configuration. After conversion, an optimized target file, namely the offline model, is generated. The offline model is serialized and stored on the disk. As a result, when the inference is performed, the optimized target file is used,

which is faster.

Offline Mode Generation

HiAI Mobile Computing Platform Execution● As shown in the figure below, during the offline model computing, the offline model is

loaded from the file, and data such as images input by users is copied to HiAI's NPU for computing. User data needs to be transferred from the DDR SDRAM to the NPU and then returned to the DDR SDRAM for each inference during computing.

Offline Mode Computing

NPUOfflineModel

Data LayerCopy Data Load

How to integrate your knowledge model

Operator compatibility

assessment

conversionModel loading Model execution Model

unloading

1 2 3 4 5

HiAI Mobile Computing Platform Advantages● HiAI: enables offline models to be compiled without occupying the

runtime, which improves runtime efficiency.● HiAI: supports multiple frameworks such as Caffe and

TensorFlow/Lite, and will support Caffe2, ONNX in the future.● HiAI: supports multiple operators and rapid iteration, and supports

operator customization in later versions.● HiAI: supports Kirin 970 and later chip platforms and rapidly

increasing devices, and is widely used.● HiAI is also integrated into HiKey970 which is the third generation of

Hikey series in 96boards

More and More

New HiKey970 AI Development Board is coming

● Huawei HiAI SDK● Up to 25X Performance 50X Power Efficiency● Dedicated Neural-network Processing Unit (NPU)● Heterogeneous Resource Management

More stronger Compute abilities

More Hardware interfacesMainly OSPopular AI stacks

HiKey970 SpecSoftware Hardware

OS Choices Stacks

• Huawei HiAI• Android NN• OpenGL ES• OpenCL

• Ubuntu• Debian• Android Master

• CPU: 4 x A73 2.36GHz + 4 x A53 1.8GHz • GPU: Mali G72 MP12• NPU: 256MAC/cycle• 6GB 1866MHz, 4 Channel LPDDR4x• 4 lanes CSI + 2 lanes CSI• CAN V2.0B up to 1Mbps• Video Dec up to H.265 3840x2160@60fps

• UEFI + ARM Trusted Firmware• Kernel 4.9• CAN driver• CSI driver• WiFi enable• Video Codec enable

Shining Star Plan

Talent cultivation

Development support

Innovation support

Marketing

Campus project

Training

Certification

Innovation lab

Development

environment

Testing tools

Funded

Cloud-Client

resource

Contest awards

User growth

Marketing

promotion

How to Get HiAI SDK and HiKey970?

• http://developer.huawei.com/consumer/en/home

• https://www.96boards.org/

• http://www.hihope.org

• Amazon channel for worldwide market, two stores in Amazon: “Hoperun” and “HiHope”, Secure96 is on sell now

• Taobao channel for China market https://shop596522926.taobao.com/index.htm?spm=2013.1.w5002-17633002258.2.32b43ca7TxdF52

• http://www.lenovator.com/

Available in APR. 2018

$299Huawei Developer Website

Thank You

#HKG18HKG18 keynotes and videos on: connect.linaro.orgFor further information: www.linaro.org

Huawei HiAI Unlock the Future - connect.linaro.org.s3...

Documents

Transcript of Huawei HiAI Unlock the Future - connect.linaro.org.s3...

Improving Effectiveness And Generality of GCC Auto ...connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-202… · Autovectorization in GCC Loop Vectorization Classic single

Containerized VNFs with Data Plane Acceleraon On …connect.linaro.org.s3.amazonaws.com › sfo17 › Presentations...• Data Plane Development Kit (DPDK) • Libraries and poll-mode

LNG Tech Lead, ODP Project Bill Fischoferconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-106.pdf · Bill Fischofer LNG Tech Lead, ODP Project. Topics for Today Brief

The Container in the Shell (script)connect.linaro.org.s3.amazonaws.com/sfo17/Presentations/SFO17-105... · Mini Container Shell Scripts (pronounced ‘minks’) ... - X86-64, i386,

Dramatically Accelerate 96Board Software via an FPGA …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-301… · DPLL, VPLL, APLL Debug Timers SWDT USB 3.0 Full Power

HKG18-203: Overview of Linaro DRMconnect.linaro.org.s3.amazonaws.com/hkg18/... · OpenCDM / CDMi architecture Browser OpenCDM [1] CDMi Service [2] cdmi.h Interface for various key

Experience OPNFV on Arm - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-1… · Deploy OPNFV on Arm –A frustrating experience Fuel installer -

Arm ServerReady Update - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-317.pdfTwo discussion channels: Partner to Arm over armserverac-request@arm.com

HKG18 TR12 - LAVA for LITE Platforms and Testsconnect.linaro.org.s3.amazonaws.com/hkg18/... · HKG18 TR12 - LAVA for LITE Platforms and Tests Bill Fletcher. Part1 LAVA overview and

HKG18-407: Hypervisors and IoT (IoT Secret Storage)connect.linaro.org.s3.amazonaws.com/hkg18/... · PSK common in IoT, needs fewer resources Key install at provision time If key leaks,

TF-M Secure Storageconnect.linaro.org.s3.amazonaws.com/hkg18/... · Connected devices have secrets! • RoT keys, Communications keys, hashes, certificates • Vendor secrets (provisioning

HKG2018-411: OpenAMP Introductionconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-411.pdfFirmware Creation Application RTOS/BM lib OpenAMP lib Resource table Carved out

HKG18-TR01: Open Source Philosophyconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Open Source Origin 60’s 70’s 80’s 90’s 1990 : GNU project:

ARM64 Server RAS Solutionsconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-116.pdf · Building blocks Reflections. Overview Reliability, Availability, Serviceability

Using the Clang Developer Tools - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-tr04.pdf · Clang tools Clang is designed to be much more than just

HKG18-118: OpenDataPlane Testing in Travis and Shippableconnect.linaro.org.s3.amazonaws.com/hkg18/... · Shippable through the magnifying glass x86-64 or AArch64 architecture Docker

Linaro Developer Cloud - Updatesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-307.pdfControl plane, networking and compute. 31 hypervisors, totalizing 346 VCPUs and

Introduction to Matrix Analysis and Applicationsmath.bme.hu/~petz/matrixPD.pdf · Introduction to Matrix Analysis and Applications Fumio Hiai and D´enes Petz Graduate School of Information

Reference Implementation Software TZMP-1connect.linaro.org.s3.amazonaws.com › hkg18 › presentations › hkg1… · cci – 5xx interconnect gpu nic-400 interconnect vpu display

LINARO CONNECT 22 - connect.linaro.org.s3…connect.linaro.org.s3.amazonaws.com/sfo17/Presentations/Keynote... · Reference Hardware Concept USB 2.0 x 6 HD Cameras LIDAR/Ultrasonic