Hardware-Software Interface Specification for Verification ...

37
Hardware-Software Interface Specification for Verification in Accelerator-Rich Platforms Sharad Malik Princeton University LATTE ’21, April 15, 2021 This work was supported in part by the Applications Driving Architectures (ADA) Research Center, a JUMP Center co- sponsored by SRC and DARPA; by the DARPA POSH and SSITH programs; and by NSF XPS Grant No. 1628926.

Transcript of Hardware-Software Interface Specification for Verification ...

Hardware-Software Interface Specification for Verification in Accelerator-Rich Platforms

Sharad Malik

Princeton University

LATTE ’21, April 15, 2021

This work was supported in part by the Applications Driving Architectures (ADA) Research Center, a JUMP Center co-

sponsored by SRC and DARPA; by the DARPA POSH and SSITH programs; and by NSF XPS Grant No. 1628926.

2

ILA Team + Collaborators

Bo-Yuan HuangPramod Subramanyan

Yakir Vizel Weikun Yang Grigory Fedyukovich

Aarti Gupta

Yue Xing Yu ZengHuaixi LuYi Li

Sharad Malik

Hongce Zhang

Caroline Trippel Yatin A. Manerkar

Margaret Martonosi

3

Hardware-Software Interface

ISA

Software

Hardware

Abstraction for software

Specification for hardware

Pentium® III ProcessorSingle-core uniprocessor

Source: pdcfaculty.org

Instruction-Set Architecture

4

Hardware-Software Interface

ISA + MCM

Software

Hardware

Abstraction for software

Specification for hardware

Intel SkylakeMulti-core uniprocessor

Source: wikichip.org

Instruction-Set Architecture +

Memory Consistency Model

5

Hardware-Software Interface

8-Core GPU

4 Firestorm

Cores + 12MB

L2

4 Icestorm

Core

+ 4MB L2

SLC Cache

16–Core

Neural

Engine

Apple M1 Die Photo

Source: AnandTech

https://www.anandtech.com/show/16226

/apple-silicon-m1-a14-deep-dive

???

Software

Hardware

Abstraction for software

Specification for hardware

Heterogenous System-on-Chip

6

Accelerator Interface

On-chip Interconnect

CPU GPU Flash

DMA …MMU+DRAM

Microcontroller + Firmware

Memory

HW accelerators

NoC interface

// Load instruction

// Store instruction

Triggering operations

Firmware C code

Accelerator

Address Register

0xff00 status

0xff02 enable

… …

Accessing registers

7

• Accelerator implementation verification

• Formal

• Simulation-based

• Firmware hardware co-verification• Formal

• Simulation-based

• Shared memory accesses and memory consistency

SoC Verification

On-chip Interconnect

CPU GPU Flash

DMA …MMU+DRAM

Microcontroller + Firmware

Memory

HW accelerators

NoC interface

8

• Merits (similar to ISA)• Software-visible “architectural”

state variables

• Modular: set of instructions• Per-instruction state-update

• More than ISA• Formal

• Hierarchical

• Generalizes ISA to include accelerators

Instruction-Level Abstraction (ILA)

Inte

rface

Length

Key

Counter

State

Address

Inte

rco

nn

ect

AES Block Encryption

Interface Commands ≝ Instructions

Visible State

MMIO accesses Commands

Write, 0xff00, 0x1 START_ENCRYPT

Write, 0xff02, data WRITE_ADDRESS

… …

module Top_rtl (clk, rst, interrupt,if_axi_rd_r_msg, if_axi_rd_r_rdy,if_axi_wr_w_msg, if_axi_wr_w_rdy// other AXI interface ports

);

assign and_192 = and_6& (fsm_out[2]);assign or_117 = (fsm_out[2]) | (fsm_out[5]);

always @(posedge clk or negedge rst_bar) beginif (~rst_bar) beginaddrBound_4_1_equal <= 1’b0;

endelse if (run_w_wen & (~while_case_0)) beginaddrBound_4_1_equal <= addrBound_1_1_tmp_7;

endend

{WR 0x33000100 val (SetWeightAddr val),WR 0x33000120 val (SetTensorSize val),WR 0x33000130 val (SetLSTMConfig val),WR 0x33000200 0x1 (StartLSTM), ...

}

SetWeightAddr 0xff100100SetDataAddr 0xff100200SetOutputAddr 0xff100300SetTensorSize 0x20SetNumTimestep 0x10SetLSTMConfig 0x02cd87f9StartLSTM

(a) Accelerator RTL implementation snippet.

(b) Accelerator instruction set (partial).

Each HW operation triggered by an

MMIO access is modeled as an

individual instruction.

(c) A sequence of accelerator

instructions per- forming an LSTM

layer.

Instruction-Level Abstraction (ILA)

FlexASR Accelerator (Speech Recognition)

Harvard Wei Group

10

• Accelerator implementation verification

• Formal

• Simulation-based

• Firmware hardware co-verification• Formal

• Simulation-based

• Shared memory accesses and memory consistency

SoC Verification

On-chip Interconnect

CPU GPU Flash

DMA …MMU+DRAM

Microcontroller + Firmware

Memory

HW accelerators

NoC interface

11

• Formal verification of RTL implementation

Application of ILA: Hardware Verification

Add: rd=rs1+rs2

pc+=4, …

Jump:

pc= rs1+imm, …

Processor or accelerator RTL

ILA specification

SetLength:if !busy: length=axi_wdata

SetKey:if !busy: key=axi_wdata

(for processors or accelerators)

Refinement

Mapping

“instructions”: [{“instruction” : “WR_ADDR”,“ready_signal” : “RTL.xram_ack_delay_1”,“max_bound” : 20 }, …]

“state_mapping”: {“aes_address” : “RTL.aes_reg_opaddr_i.reg_out”,“aes_length” : “RTL.aes_reg_oplen_i.reg_out”,… }

12

• Formal verification of RTL implementation• Auto-generate complete formal properties for each instruction

Application of ILA: Hardware Verification (cont’d)

ILA specification

Instruction i: S’=F(S)

Implementation

Instruction i

State S

Instruction i

State S’

Check: match?Assume: match

Symbolically execute the instruction

13

• Formal verification of RTL implementation• Modular verification — per-instruction checking

• Automating modular verification

Application of ILA: Hardware Verification (cont’d)

f = RTL state transitions

f*f*V’V

U U’𝑖ILA

𝑟 𝑟

𝑖 = 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛

𝑟 = refinement relations

RTL

f*

For each instruction, check:

• in all valid RTL starting states (environment)

14

• Formal verification of RTL implementation• Modular verification — per-instruction checking

• Automating modular verification

Application of ILA: Hardware Verification (cont’d)

f = RTL state transitions

f*

V’V

U U’𝑖ILA

𝑟 𝑟

𝑖 = 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛

𝑟 = refinement relations

RTL

For each instruction, check:

• in all valid RTL starting states (environment)

• Environment constraint auto-generation

𝑪

15

• Formal verification of RTL implementation

Application of ILA: Hardware Verification (cont’d)

Category Designs

Design Size ILA Size RTL size vs.

ILA size

(ratio of LoC)

Proof Coverage (in 1 hour on

double Xeon 5222

+ 256GB RAM)RTL

LoC

#. of state bit(wo. Mem. Abs)

DSL

LoC

#. of instr./cmd.

Processors

8051 micro-controller 8.9k 0.6k (2.6k) 0.6k 256 15x 88%

Rocket 18.2k 4.6k 0.7k 37 26x 71%

Piccolo 11.4k 7.8k (79.2k) 0.7k 37 16x 80%

Nibbler 2.7k 2.7k 0.6k 15 5x 100%

Accelerators

AES encryption 1.1k 7.5k 0.5k 16 2x 100%

RBM 10.6k 6.3k (184.6k) 1.0k 17 11x 100%

Gaussian blur image processing 6.9k 36.8k 0.5k 2 14x 100%

Memory system OpenPiton L1.5 cache 12.0k 5.9k (104.7k) 1.5k 19 8x 100%

Communication

interfaces

LeWiz Ethernet MAC core (TX) 12.5k 12.9k (428.7k) 2.3k 29 5x 97%

EMesh AXI interfaces 0.9k 0.45k 0.5k 22 2x 100%

16

• Accelerator implementation verification

• Formal

• Simulation-based

• Firmware hardware co-verification• Formal

• Simulation-based

• Shared memory accesses and memory consistency

SoC Verification

On-chip Interconnect

CPU GPU Flash

DMA …MMU+DRAM

Microcontroller + Firmware

Memory

HW accelerators

NoC interface

17

• Simulation-based Validation• ILA supports auto-generation of simulation model

Application of ILA: Hardware Verification (cont’d)

RTL executable model (RTEM)RTL Impl.

ILA Spec. ILAtor

Existing RTL

Compiler*

Instruction level executable model (ILEM)

Tool’s inputGenerated

ExecutableILA toolchain e.g. Verilator

18

• Simulation-based validation (traditional)• Activate the RTL Execution Model (RTEM) with test stimuli (a sequence of RTL inputs)

• After the simulation, check if the RTL signal/register value matches the reference result

Application of ILA: Hardware Verification (cont’d)

RTL Design

Execution ModelTest Stimuli Reference

result

High-Level

Functional Model

Manually

constructed

Deficiency: need to run and analyze the full test regardless of when bug

is triggered – instead, should stop when bug is triggered

checka = 6;b = 2;

res = a/b = 3;

res = 3;res = 2;

res = 3;

19

• Simulation-based validation (tandem simulation)• After simulating each instruction, check if the RTEM Architectural Variables (RTAV) match

ILEM Architectural Variables (ILAV)

Application of ILA: Hardware Verification (cont’d)

RTL Design

Execution Model

Instruction

Sequence Stimuli

ISA Execution

Model (ILEM)

RTAV

valuesILAV

values

AV-check

Identify bugs right at the instruction that causes AV deviations

add r1, r2, r3;

jmp;

add

add AV-check

jmp

jmp

Use Refinement Map

20

• Accelerator implementation verification

• Formal

• Simulation-based

• Firmware hardware co-verification• Formal

• Simulation-based

• Shared memory accesses and memory consistency

SoC Verification

On-chip Interconnect

CPU GPU Flash

DMA …MMU+DRAM

Microcontroller + Firmware

Memory

HW accelerators

NoC interface

21

FW/HW co-verification in SoCs

• Communicating (heterogeneous) IPs • Processor

• Firmware

• Specialized accelerators

• Hard to design, hard to verify• FW/HW co-design

• Heterogeneity

• Security critical

22

• FW/HW interaction• Hardware functions not captured

• RTL models not practical (too complex)

• HW/SW semantics gap

• Scalability• Bit-precise reasoning

• Concurrency

• message handling, synchronization

Challenges

ILA for Hardware

+

SW Verification Tech.

23

Co-verification methodology

•Modeling• ILA for specialized HW

• Source-level (LLVM) modeling of SW

•Verification• Software verification

techniques

Program 1 Program 2

Interacting

program threads

Using ILA Context

bounding

VerifiedbyBoogie/Corral

Single programIP1

HW 1a

FW 1

HW 1b

IP2

FW 2

HW 2a

HW 2b

24

• Accelerator implementation verification

• Formal

• Simulation-based

• Firmware hardware co-verification• Formal

• Simulation-based

• Shared memory accesses and memory consistency

SoC Verification

On-chip Interconnect

CPU GPU Flash

DMA …MMU+DRAM

Microcontroller + Firmware

Memory

HW accelerators

NoC interface

25

• Simulation-based• hardware-software co-simulation using ILA as verified abstraction

Hardware-Software Co-Verification (cont’d)

ILA ModelInstruction-level

simulator

Verilog

design

ILAtor

Formally-verified

abstraction

HW/SW co-simulation

(QEMU)

High speed

simulation

Low speed simulation

26

• Accelerator implementation verification

• Formal

• Simulation-based

• Firmware hardware co-verification• Formal

• Simulation-based

• Shared memory accesses and memory consistency

SoC Verification

On-chip Interconnect

CPU GPU Flash

DMA …MMU+DRAM

Microcontroller + Firmware

Memory

HW accelerators

NoC interface

27

• Shared memory accesses – memory consistency issues

Hardware-Software Co-Verification with MCM

Driver DeviceCrypto

Accelerator

Shared Memory

System-wide Verification

28

• Shared memory accesses – memory consistency issues• Memory consistency issues: memory access reordering

Hardware-Software Co-Verification with MCM (cont’d)

Thread 1 (T1):

1: st [x], 1

2: ld r1, [y]

Thread 2 (T2):

1’: st [y], 1

2’: ld r2, [x]

r1 == 0 and r2 == 0

Observable under Total Store Order

(TSO: adopted by x86)

Reordering visible to

the other thread

Example:

29

• Shared memory accesses – memory consistency issues• ILA-MCM framework: security and correctness verification using ILA and MCM

Driver DeviceCrypto

Accelerator

Shared Memory

Device Crypto-Acc.Driver

ILA ILA ILA

Memory Consistency Model

Property

State Update Values

State Update Orderings

System-wide Verification w. MCM

Hardware-Software Co-Verification with MCM (cont’d)

30

• Shared memory accesses – memory consistency issues• ILA: operational, state update functions (values)

• MCM: axiomatic & relational, state update re-ordering effects (ordering)

• Integrating ILA and MCM• Facet and facet event

• Captures different entities’ different observations

xx.T1 (observed by T1)

x.T2 (observed by T2)

Shared (memory)

variablesObserved by some entityDenoted as <variable.entity>

Hardware-Software Co-Verification with MCM (cont’d)

31

• Shared memory accesses – memory consistency issues• ILA: operational, state update functions (values)

• MCM: axiomatic & relational, state update re-ordering effects (ordering)

• Integrating ILA and MCM• Facet and facet event

• Captures different entities’ different observations

ILA

Facet

integrate values and orderings

a+b 0

vorderings

MCM

valuesa+b 0 v

Hardware-Software Co-Verification with MCM (cont’d)

32

• Verifying existing hardware-interacting programs

• Synthesize new programs (malicious exploits, buggy trace and etc…)

• Program sketch• Set of ILA instructions

• Partial ordering over instances of

instructions (optional)

IO_WRITE 0x100, 1 (SendResetCmd)

memcpy FwAddr,SM? (WriteImage)

IO_WRITE 0x104, 1 (SendLdFwCmd)

Driver Device Crypto-Acc.

Hardware-Software Co-Verification with MCM (cont’d)

33

Hardware-Software Co-Verification with MCM (cont’d)

• But TOCTOU

exists under

TSO!

SndRst. @1 2Rst. @3

SndRst. @11 12

StFwSM. @7 8

MemCpy. @15 16

SndAuthRq. @16 17

LdCmd. @10

SndLdCmd. @8 9

StFwSM. @12 16

SndLdCmd. @16 17

Rst. @17

LdCmd. @18

MemCpy. @23 26

SndAuthRq. @33 35

Lock.

SndRsp. @34 36

Auth. @21

ChkSig. @25

RdStsBit @38

SndInt. @39

PASS:JMP. @40

HandleRsp. @37

Set the lock of device to

prevent changes to FW image

Overwritten is still

allowed, but visible later

Core problem:

The overwritten could be visible

late due to relaxed MCM, thus it

circumvents the checking

Driver Device Crypto-Acc

Okay in SC

And example of the

security exploit found

by our verification:

34

Summary: Hardware-Software Interface

8-Core GPU

4 Firestorm

Cores + 12MB

L2

4 Icestorm

Core

+ 4MB L2

SLC Cache

16–Core

Neural

Engine

Apple M1 Die Photo

Source: AnandTech

https://www.anandtech.com/show/16226

/apple-silicon-m1-a14-deep-dive

ILA+MCM

Software

Hardware

Abstraction for software

Specification for hardware

Heterogenous System-on-Chip

35

• Accelerator implementation verification

• Formal

• Simulation-based

• Firmware hardware co-verification• Formal

• Simulation-based

• Shared memory accesses and memory consistency

Summary: ILA Based SoC Verification

On-chip Interconnect

CPU GPU Flash

DMA …MMU+DRAM

Microcontroller + Firmware

Memory

HW accelerators

NoC interface

36

The ILAng Tool

GitHub: https://github.com/PrincetonUniversity/ILAng

Wiki: https://bo-yuan-huang.gitbook.io/ilang/

Docker: https://hub.docker.com/r/byhuang/ilang/

37

• ILAng: A Modeling and Verification Platform for SoCs using Instruction-

Level Abstractions. [TACAS19]

• Integrating Memory Consistency Models with Instruction-Level

Abstractions for Heterogeneous System-on-Chip Verification. [FMCAD18]

• Formal Security Verification of Concurrent Firmware in SoCs using

Instruction-Level Abstraction for Hardware. [DAC18]

• Instruction-Level Abstraction (ILA): A Uniform Specification for System-

on-Chip (SoC) Verification. [TODAES18] (Best Paper Award)

Selected Bibliography