Implementation of ProDrive Model Ran Katzur 10-8-2014.

69
Implementation of ProDrive Model Ran Katzur 10-8-2014

Transcript of Implementation of ProDrive Model Ran Katzur 10-8-2014.

Page 1: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Implementation of ProDrive Model

Ran Katzur 10-8-2014

Page 2: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Demo Goals1. Demonstrate the ability of DSP core to copy

data from 66AK2H12 DDR into its own DDR2. Demonstrate the ability of DSP core to copy

data from its own DDR into 66AK2H12 DDR 3. Demonstrate the ability of a DSP core to

process data and return results to the ARM4. Demonstrate the IPC model that is described

in this presentation

Page 3: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Agenda• Demo Model• Shannon Copy Implementation details• 66AK2H12 Messages Implementation Details• Building the Demo

Page 4: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Basic Card

66AK2H012

ARM CorePac

C6678 Device

FPGA

DSP CorePacs

C6678 Device

DDR Memory

DDR Memory

DDR Memory

Page 5: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Management Communication

66AK2H012

ARM CorePac

C6678 Device

FPGA

DSP CorePacs

C6678 Device

DDR Memory

DDR Memory

DDR Memory

Message Types:1. Data Address for the next load2. Finish LoadingMessage Media:3. SRIO type 114. SRIO DirectIO5. Ethernet

Page 6: Implementation of ProDrive Model Ran Katzur 10-8-2014.

SRIO • Short messages – less than 256 bytes

– Message type (2 Bytes)– Sender ID (2 bytes)– Destination ID (2 bytes)– Destination address (4 bytes)– Other information needed

• Type 11– up to 64 mailboxes and 4 letters (single packet model)– Hardware protected messages- each message has acknowledgment– Access through sockets– Each ARM thread can have its own mailbox - socket

• Direct IO– Need to define protocol structure

Page 7: Implementation of ProDrive Model Ran Katzur 10-8-2014.

IPC Control Communication

From ARM thread to DSP core:1. Copy my memory to your memory 2. Copy your memory to my memory3. Execute a functionFrom DSP core to ARM:4. Finish Copying 5. Finish processing with results

66AK2H012

ARM CorePac

C6678 Device

FPGA

DSP CorePacs

C6678 Device

DDR Memory

DDR Memory

DDR Memory

Page 8: Implementation of ProDrive Model Ran Katzur 10-8-2014.

IPC over Hyperlink - Simple Model • Each thread is associated with one DSP core• Simple “messageQ” type model, single writer• No interrupts, messages are always pulled • Multiple buffers for messages, simple state machine

for the write side and the read side • Each side of the transection keeps score what buffer it

should read next and what buffer it should write next• Each side takes care of cache coherency• Communicating with the DSP that are on 66AK2H12

– Same algorithm, uses direct read and write with cache coherency

Page 9: Implementation of ProDrive Model Ran Katzur 10-8-2014.

ARM Thread – DSP Core Messages

1. Thread sends a message to DSP Core2. DSP reads and executes the message3. DSP sends acknowledgment to thread

a. Buffer 0 is released4. Thread sends the next message to DSP

a. Can be before step 35. DSP reads and processes the message6. DSP sends acknowledgment to thread

a. Buffer 1 is released

ARM Thread DSP Core

Buffer 0

Buffer 1

Buffer 0

Buffer 1

1

3

4

6

Note: 1. The number of message buffers is the depth of processing queue. The

Arm thread keeps track on number of available (free) messages2. Thread checks Message Number to detect if DSP message was

overwritten (No DSP release of ARM message Buffer)

Page 10: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Copy Data From Thread to DSP Core

1. DSP core gets a message from the thread with source logical address, destination logical address, and size

2. DSP initiates EDMA transfer via the Hyperlink and waits for the EDMA completion

3. At the completion of the transfer the DSP send a message to the thread

4. MPAX and Hyperlink configuration will be discuss later

ARM Thread DSP Core

6678 DDRLogical address

(translation to physical is

done by MPAX)

66AK2H12 DDR Logical

address (translation to

physical is done by MPAX)

Page 11: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Copy Data From DSP Core to Thread

1. DSP core gets a message from the thread with source logical address, destination logical address, and size

2. DSP initiates EDMA transfer via the Hyperlink and waits for the EDMA completion

3. At the completion of the transfer the DSP send a message to the thread

4. MPAX and Hyperlink configuration will be discuss later

ARM Thread DSP Core

6678 DDRLogical address

(translation to physical is

done by MPAX)

66AK2H12 DDR Logical

address (translation to

physical is done by MPAX)

Page 12: Implementation of ProDrive Model Ran Katzur 10-8-2014.

DSP Core Real-time State Machine1. DSP waits for a new

message to arrive 2. When message arrives the

DSP executes the function that is associated with the message

3. Upon completion of execution the DSP sends message back to the thread, and updates the buffer number for the next message

4. DSP returns to the waiting state. If there is a message waiting it continue with step 2, otherwise continue waiting

Waiting for a new message

Execute the message that

arrives

Send Message to Thread, change the

buffer number

Message from the Thread

Message to the Thread

Page 13: Implementation of ProDrive Model Ran Katzur 10-8-2014.

The Thread Real-time AlgorithmAssume ARM manages DSP Data Memory

1. Thread checks if a new message from FPGA arriveda. If a message arrived, it processes the message and then checks for new

message from the DSP coreb. If no message arrive, checks to see if a new message arrived from the DSP

2. Thread checks if a new message from DSP arriveda. If a message arrived, it processes the message and then checks for new

message from the FPGA coreb. If no message arrive, checks to see if a new message arrived from the FPGA

Checking for a new message from the FPGA

Message from the DSP

Yes

NoChecking for a new

message from the DSP

No

Yes

Process the messageSee details

Process the messageSee details

Message from the FPGA

Post Processing if

needed

Page 14: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Processing FPGA messageAssume ARM manages DSP Data Memory

1. Messages to DSP includes source and destination logical address and scratch logical address if needed

2. Logical scratch address or destination address are managed by the ARM thread and can be used for post processing (post mortem) and to load new tables and constants

Messages from FPGA:1. Load message - what memory buffer was loaded and loading size 2. Error message – has data to process but not available buffer to load it**

Error message

Message Type Error Procedure

Load message

1. Update Buffers utilization table2. Send a message to the DSP core. If DSP message queue is full, report an error**3. Update messages buffer utilization

Page 15: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Processing DSP MessageAssume ARM manages DSP Data Memory

Messages from DSP:1. Finished Copying – Logical address DSP, logical address ARM, size, Message ID and message buffer number of the initiated message2. Execution Finished – return value(s), source buffer address, scratch buffer address, Message ID and message buffer number of the initiated message2. Error message – error code, Message ID and message buffer number of the initiated message

Error message

Message Type Error Procedure

All Other messages

1. Process the return values 2. Update Buffers utilization table(if not needed for post processing) and message buffer utilization3. If needed, start post-processing (post Mortem processing) –

a. Initiate upload or download data from ARM logical memory to DSP logical memory

Page 16: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Thread Post-ProcessingAssume ARM manages DSP Data Memory

1. Two type of messages – Load from the ARM logical memory or upload from the DSP logical memory2. In both messages specify the message ID, source address and destination address3. Send a message to the DSP core. If DSP message queue is full, report an error**4. Update Buffers utilization table and message buffer utilization

When DSP core reads the message it does the following:1. Initiates data transfer using pre-define EDMA channel2. May continue to process other messages (or not) and wait for the EDMA completion interrupt3. After receiving the EDMA interrupt send a completion message to the thread

After receiving completion message of upload data, the thread can process upload data or/and write it to an external disk

Page 17: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Agenda• Demo Model• Shannon Copy Implementation details• 66AK2H12 Messages Implementation Details• Building the Demo

Page 18: Implementation of ProDrive Model Ran Katzur 10-8-2014.

C6678 Memory Management

4G Total DDR memory

dedicated to 8 cores, each has 384MB private memory and 1G is accessed by

all DSP

384MB Dedicated for Core 0

08 8000 0000

09 8000 0000

Shannon DDR Data Partition – Physical address

08 9800 0000

1GB shared between all cores – code, constants, etc.

384MB Dedicated for Core 108 b000 0000

384MB Dedicated for Core 208 c800 0000

384MB Dedicated for Core 3 08 E000 0000

384MB Dedicated for Core 408 F800 0000

384MB Dedicated for Core 509 1000 0000

384MB Dedicated for Core 609 2800 0000

384MB Dedicated for Core7 09 4000 0000

Page 19: Implementation of ProDrive Model Ran Katzur 10-8-2014.

C6678 Memory SegmentPhysical Address Size description Logical address for the

core Comment

0x0 0c00 0000 4MB MSMC shared memory 0x0c00 0000 Use for IPC, all DSP cores can see this memory

0x8 8000 0000 384MB DSP 0 private memory 0x8000 0000 Access only by DSP 0

0x8 9800 0000 384MB DSP 1 Private memory 0x8000 0000 Access only by DSP 1

0x8 b000 0000 384MB DSP 2 private memory 0x8000 0000 Access only by DSP 2

0x8 c800 0000 384MB DSP 3 Private memory 0x8000 0000 Access only by DSP 3

0x8 e000 0000 384MB DSP 4 private memory 0x8000 0000 Access only by DSP 4

0x8 f800 0000 384MB DSP 5 Private memory 0x8000 0000 Access only by DSP 5

0x9 1000 0000 384MB DSP 6 private memory 0x8000 0000 Access only by DSP 6

0x9 2800 0000 384MB DSP 7 Private memory 0x8000 0000 Access only by DSP 7

0x9 4000 0000 1GB Shared Memory for all cores

0xc000 0000 Accessed by all cores, will have code, constants and so on

0x8 8000 0000**(For each core the start address will be different, the implementation will be describe in the MPAX implementation section)

1GB – 384M = 0x3F40 0000

No core has access except to its own region

0x9800 0000 This segment will have no permission to read, write or execute for any core. This is done to prevent one core overwrite the data of another core

Page 20: Implementation of ProDrive Model Ran Katzur 10-8-2014.

MPAX registers – Shannon side• Each DSP core has its own set of MPAX registers• Teranet has multiple sets of SES and SMS MPAX registers• Since EDMA inherent the PriviID of the DSP core that initiates the

transfer, each core will configure its own MPAX registers and the SES and SMS MPAX registers that are associated with its PriviID.

• Multiple MPAX registers may map the same Logical address, each one to a different physical address. It that case the actual translation is done based on the MPAX register with the higher ID number. This feature will be used to prevent DSP core from accessing private memory of another core.

• The default setting of the MPAX registers uses MPAX register 0 to map all internal device addresses (logical memory MSB is 0x0) to internal memory ), just add 4 bits of zero as the MSB, and maps 2G of external memory (MSB is 0x1) to 2G physical addresses starting with address 0x8 8000 0000. The SES and SMS default registers are similar. These registers will not be modified.

Page 21: Implementation of ProDrive Model Ran Katzur 10-8-2014.

C6678 MPAX Registers

Value MPAX2 MPAX3 MPAX4 MPAX5Logical 0x80000 0x80000 0x90000 0xc0000Physical 0x8800000 0x880000 + I *

0x18000 where I is the core number

0x890000 + I * 0x18000 where I is the core number

0x940000

Size 0x1E (1G) 0x1c (256M) 0xb (128MB) 0x1E (1G)Permission

0x00 0x3f 0x3f 0x3f

Comment Permission are all zero, cannot read, write or execute

Configure the private memory, Overwrite MPAX 2

Configure the private memory, Overwrite MPAX 2

For the shared memory

The setting of MPAX registers for DSP core I, i=0. 7 (C6678 only)

Page 22: Implementation of ProDrive Model Ran Katzur 10-8-2014.

C6678 MPAX Registers

Value SES 1 for PrivID i SES 2 for PrivID i

SES 3 for PrivID i SES 4 for PrivID i

Logical 0x80000 0x80000 0x90000 0xc0000Physical 0x880000 0x880000 + I *

0x18000 where I is the PrivID number

0x890000 + I * 0x18000 where I is the PrivID number

0x940000

Size 0x1E (1G) 0x1c (256M) 0xb (128MB) 0x1E (1G)Permission

0x00 0x3f 0x3f 0x3f

Comment Permission are all zero, cannot read, write or execute

Configure the private memory, Overwrite MPAX 2

Configure the private memory, Overwrite MPAX 2

For the shared memory

The setting of SES registers for PriviID I, i=0. 7 (C6678 only)

Page 23: Implementation of ProDrive Model Ran Katzur 10-8-2014.

C6678 MPAX Registers

The setting of SMS registers for PriviID I, i=0.7 stays as the default

Page 24: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Hyperlink Considerations• Each CorePac can access up to 256MB of memory (128M

Hyperlink 1 on 66AK2H12)• Using ARM thread to move data to and from Shannon

limits the data to 256MB (128MB) for all the 8 cores (No run-time re-configure of Hyperlink please)

• When the system uses Shannon cores to move data to and from the 66AK2H12, each core can address up to 256MB

• If two Shannons use Hyperlink to access remote memory, DDR accessible memory is limited to 2G (31 bits address, the MSB is always 1) in addition to internal-device MMR and memories (MSMC, L2, L1, MMR)

Page 25: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Hyperlink Considerations (2)• To increase efficiency and reduce complexity it

is very important to allow parallel data movements to and from 66AK2H12 DDR

• 8 ARM threads may exchange data between the ARM and DSP cores within 66AK2H12. This work does not cover internal data move

• 16 threads move data via the Hyperlink, thus the size limit of Hyperlink is very important

Page 26: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Hyperlink Considerations (3)• Message buffers are located on the MSMC

memory. All MSMC memory can be accessed by Hyperlink

• 2G of DDR memory can be access by Hyperlink• Each DSP core can access up to 128M (2G/16)• In the following slides we analyze the Hyperlink

configuration that is needed to support Shannon access to 66AK2H12 memories

• 66AK2H12 access into Shannon (for messages) will be discussed later

Page 27: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Hyperlink Considerations (4)• We assume that messages reside in MSMC memory• In order to get 128MB DDR for each core, PriviID

must be overlay on the look-up table index• On the remote side, the look-up table has the base

address of memory segment. The index to the look-up table is part of the address value that is sent from the local to the remote

• The following figure shows the structure of the address value for 1G total access from Shannon( Each core – 128MB. 4 buffers, 32MB each for each DSP core)

Page 28: Implementation of ProDrive Model Ran Katzur 10-8-2014.

C6678 Hyperlink Address structureThis is the address that the Shannon sends to 66AK2H12 Hyperlink

31 30 29 28 27 26 25 24 23 - 0

PriviID

Index Into Look-Up Table

Index Into Look-Up Table32MB buffers require 25 bits offset

Page 29: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Tx Address Overlay Control Register

• User configures PrivID / Security bit overload in this register

• Register is at address HyperLinkCfgBase + 0x1c. For 6678 that is 0x2140_001c

• If using HyperLink LLD, hyplnkTXAddrOvlyReg_s represents this register

31 20 19 16 15 12 11 8 7 4 3 0

Reserved txsecovl Reserved txprividovl Reserved txigmask

R R/W R R/W R R/W

Address Manipulation: Tx Side Registers

Register Configuration

• txsecovl = o – security bit not overlay

• txprividovl = 12 (bit 31 to 28)

• txigmask = 11 (mask = 0x0fff ffff)

Page 30: Implementation of ProDrive Model Ran Katzur 10-8-2014.

31 26 25 24 23 20 19 16 15 12 11 8 7 4 3 0Reserved rxsechi rxseclo Reserved rxsecsel Reserved rxprividsel Reserved rxsegsel

R R/W R/W R R/W R R/W R R/W

Rx Address Selector Control Register

• Register is at address HyperLinkCfgBase + 0x2c. For 6678, that is 0x2140_002c

• If using HyperLink LLD, hyplnkRXAddrSelReg_s represents this register

Address Translation: Rx Side Registers

Register Configuration

• rxsechi, rxseclo, and rxsecsel are all zero

• rxprividsel = 12 (Bits 31 to 28)

• rxsegsel = 9 (bits 30 to 25)

Page 31: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Hyperlink Look-up Table• Each Shannon core will have 8 lines in the look-up

table (there are 64 lines in each Hyperlink, and 8 cores)

• 4 lines point to 4 segment of remote memory, 32MB memory each, fifth segment is the MSMC memory

• The last 3 lines are empty (can configure to non-existing memory to prevent access to memory that is not accessible to Shannon)

• Translation from logical addresses to physical addresses will be done by the 66AK2H12 Hyperlink MPAX registers (set E)

Page 32: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Hyperlink Look-up Table Shannon 0DSP internal addresses - from 0x4000 000 to 0x47ff ffff

Line (index)(Binary)

CorePac Logical base Address Size Purpose

000000 to line 000111

0 0x8000 0000,0x8200 00000x8400 00000x8600 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

001000 to line 001111

1 0x8800 0000,0x8b00 00000x8d00 00000x8e00 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

010000 to line 010111

2 0x9000 0000,0x9200 00000x9400 00000x9600 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

011000 to line 011111

3 0x9800 0000,0x9b00 00000x9d00 00000x9e00 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

Page 33: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Hyperlink Look-up Table Shannon 0DSP internal addresses - from 0x4000 000 to 0x47ff ffff

100000 to line 100111

4 0xa000 0000,0xa200 00000xa400 00000xa600 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

101000 to line 101111

5 0xa800 0000,0xab00 00000xad00 00000x8e00 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

110000 to line 110111

6 0xb000 0000,0xb200 00000xb400 00000xb600 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

111000 to line 111111

7 0xb800 0000,0xbb00 00000xbd00 00000xbe00 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

Page 34: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Hyperlink Look-up Table Shannon 1DSP internal addresses - from 0x4000 000 to 0x47ff ffff

Line (index)(Binary)

)

CorePac Logical base Address

Size Purpose

000000 to line 000111

0 0xc000 0000,0xc200 00000xc400 00000xc600 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

001000 to line 001111

1 0xc800 0000,0xca00 00000xcc00 00000xcd00 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

010000 to line 010111

2 0xd000 0000,0xd200 00000xd400 00000xd600 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

011000 to line 011111

3 0xd800 0000,0xda00 00000xdc00 00000xdd00 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

Page 35: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Hyperlink Look-up Table Shannon 1DSP internal addresses - from 0x4000 000 to 0x47ff ffff

100000 to line 100111

4 0xec000 0000,0xe200 00000xe400 00000xe600 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

101000 to line 101111

5 0xe800 0000,0xea00 00000xec00 00000xed00 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

110000 to line 110111

6 0xf000 0000,0xf200 00000xf400 00000xf600 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

111000 to line 111111

7 0xf800 0000,0xfa00 00000xfc00 00000xfd00 00000x0c00 0000

24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC

First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment

Page 36: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Agenda• Demo Model• Shannon Copy Implementation details• 66AK2H12 Implementation Details• Building the Demo

Page 37: Implementation of ProDrive Model Ran Katzur 10-8-2014.

66AK2H12 Physical Addresses

• 66AK2H12 dedicates 1G of DDR memory to facilitate data move (read and write) between each Shannon and the ARM using Hyperlink

• Assume that Shannon 0 has a dedicated physical addresses 0x9 0000 0000 to 0x9 3fff ffff

• Assume that Shannon 1 has a dedicated physical addresses 0x9 c000 0000 to 0x9 ffff ffff

• Accessing the memory for IPC (messages) will be described later

Page 38: Implementation of ProDrive Model Ran Katzur 10-8-2014.

1G Total DDR memory

dedicated to move data to 8

cores

128MB MB Dedicated for Core 0

09 0000 0000

66AK2H12 Physical address dedicated to the first Shannon Device

09 0800 0000

09 1000 0000

09 1800 0000

09 2000 0000

09 2800 0000

09 3000 0000

09 3800 0000

09 4000 0000

128MB MB Dedicated for Core 1

128MB MB Dedicated for Core 4

128MB MB Dedicated for Core 5

128MB MB Dedicated for Core 2

128MB MB Dedicated for Core 3

128MB MB Dedicated for Core 6

128MB MB Dedicated for Core 7

Page 39: Implementation of ProDrive Model Ran Katzur 10-8-2014.

1G Total DDR memory

dedicated to move data to 8

cores

128MB MB Dedicated for Core 0

09 c000 0000

66AK2H12 Physical address dedicated to the Second Shannon Device

09 c800 0000

09 d000 0000

09 d800 0000

09 e000 0000

09 e800 0000

09 f000 0000

09 f800 0000

0A 0000 0000

128MB MB Dedicated for Core 1

128MB MB Dedicated for Core 4

128MB MB Dedicated for Core 5

128MB MB Dedicated for Core 2

128MB MB Dedicated for Core 3

128MB MB Dedicated for Core 6

128MB MB Dedicated for Core 7

Page 40: Implementation of ProDrive Model Ran Katzur 10-8-2014.

MPAX registers – Hyperlink on 66AK2H12• The hyperlink configuration on the 66AK2H12– Shannon 0 logical memory 0x8000 0000 to 0xbfff

ffff– Shannon 1 logical memory 0xc000 0000 to 0xffff

ffff• The physical memory configuration of

66AK2H12– Shannon 0 - 0x9 0000 0000 to 0x9 3fff ffff– Shannon 1 - 0x9 C000 0000 to 0x9 ffff ffff

Page 41: Implementation of ProDrive Model Ran Katzur 10-8-2014.

66AK2H12 Hyperlink MPAX Registers

Value SES 1 for PriviID 0xE SES 2 for PriviID 0xELogical 0x80000 0xc0000Physical 0x900000 0x9c0000

Size 0x1E (1G) 0x1E (1G)Permission 0x3f 0x3fComment First Shannon starts at address

0x9 0000 0000Second Shannon starts at address 0x9 C000 0000

Page 42: Implementation of ProDrive Model Ran Katzur 10-8-2014.

66AK2H12 Hyperlink MPAX Registers

The setting of SMS registers for PriviID 0xE stays as the default

Page 43: Implementation of ProDrive Model Ran Katzur 10-8-2014.

66AK2H12 to Shannon Communication Considerations

• In the model that is described here, the only read or write that the 66AK2H12 does with respect to the Shannon devices is sending messages

• 66AK2H12 messages area (from Shannon to 66AK2H12) is chosen to be the MSMC – If the messages are in DDR, it reduces the size of buffer that is dedicated to

each DSP– The hyperlink and MPAX setting was covered already

• The Shannon’s messages memory is chosen to be in the MSMC memory– Otherwise it reduces the size of the DDR buffers that are currently used by

a DSP core

Page 44: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Configuration Considerations• The messages memory is statically divide

between DSP cores in the application. In terms of the Hyperlink configuration and MPAX registers all cores in all Shannons can access the entire messages memory. (again, limitations are in the application)

• The next few slides shows the proposed messages’ structure

Page 45: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Messages structuresize 128 Bytes

1. Magic Number

2.Message ID

8. Destination Logical Address

4.Source name

5. Destination Name

9. Auxiliary Address (Logical)

Word 10 to 32-Additional parameters/return values

6. Execution Code (Name)

7. Source Logical Address

3. message Number (modulo 16)

Page 46: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Messages Control

Base Address Messages

Next Message to Read

Last Message ID

Number of messages in the Buffer

Base Address Messages

Next Message to Write

Last Message ID

Number of messages in the Buffer

Read Control Structure Write Control Structure

Message size Message size

Page 47: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Shannon MSMC Messages structure

1K total memory for 8 messages for one DSP core

128B Message 0

Base Address

128B Message 1

128B Message 7

128B Message 2

128B Message 6

128B Message 3

128B Message 5

128B Message 4

8K total memory for 8 DSP cores

DSP 0 Messages Buffer

Base Address

Single DSP ALL DSPs

Base Address + 0x800

DSP 1 Messages Buffer

DSP 2 Messages Buffer

DSP 3 Messages Buffer

DSP 4 Messages Buffer

DSP 5 Messages Buffer

DSP 6 Messages Buffer

DSP 7 Messages Buffer

Base Address + 0x1000

Base Address + 0x1800

Base Address + 0x2000

Base Address + 0x2800

Base Address + 0x3000

Base Address + 0x3800

Each DSP can keep track on its address using DNUM, or we can use the MPAX registers to have the same logical address to all DSPs

Page 48: Implementation of ProDrive Model Ran Katzur 10-8-2014.

66AK2H12 Hyperlink Address structureThis is the address that the 66AK2H12 send to Hyperlink Shannon

31 30 29 28 27 26 25 24 21-0

PriviID

Index Into Look-Up Table

Index Into Look-Up Table4MB buffers require 22 bits offset

23 22

Page 49: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Tx Address Overlay Control Register

• User configures PrivID / Security bit overload in this register

• Register is at address HyperLinkCfgBase + 0x1c. For 6678 that is 0x2140_001c

• If using HyperLink LLD, hyplnkTXAddrOvlyReg_s represents this register

31 20 19 16 15 12 11 8 7 4 3 0

Reserved txsecovl Reserved txprividovl Reserved txigmask

R R/W R R/W R R/W

Address Manipulation: Tx Side Registers

Register Configuration

• txsecovl = o – security bit not overlay

• txprividovl = 12 (bit 31 to 28)

• txigmask = 11 (mask = 0x0fff ffff)

Page 50: Implementation of ProDrive Model Ran Katzur 10-8-2014.

31 26 25 24 23 20 19 16 15 12 11 8 7 4 3 0Reserved rxsechi rxseclo Reserved rxsecsel Reserved rxprividsel Reserved rxsegsel

R R/W R/W R R/W R R/W R R/W

Rx Address Selector Control Register

• Register is at address HyperLinkCfgBase + 0x2c. For 6678, that is 0x2140_002c

• If using HyperLink LLD, hyplnkRXAddrSelReg_s represents this register

Address Translation: Rx Side Registers

Register Configuration

• rxsechi, rxseclo, and rxsecsel are all zero

• rxprividsel = 12 (Bits 31 to 28)

• rxsegsel = 6 (bits 27 to 22)

Page 51: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Hyperlink Look-up Table• Since there is no overlay between PriviID and

the index to the look-up table, only one line in the look-up table is needed

• If the model is changed, and more Shannon memory is visible to the 66AK2H12, then more lines will be added (and the configuration might be changed)

• The SMS MPAX registers on the 66AK2H12 for Hyperlink are the default

Page 52: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Hyperlink Look-up TableLine (index)(Binary)

CorePac Logical base Address Size Purpose

000000 ARM CorePack 0x0c00 0000, 21 (4MB) for the MSMC

Having the messages buffers. All together 8K for each Shannon. Base address can be anywhere in the 4MB area

Page 53: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Agenda• Demo Model• Shannon Copy Implementation details• 66AK2H12 Messages Implementation Details• Building the Demo

Page 54: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Demo Goals1. Demonstrate the ability of DSP core to copy data from

66AK2H12 DDR into its own DDR2. Demonstrate the ability of DSP core to copy data from its

own DDR into 66AK2H12 DDR 3. Demonstrate the ability of a DSP core to process data and

return results to the ARM4. Demonstrate the IPC model that is described in this

presentation5. Usage of the 66AK2H12 DSP cores is not covered in the

demo 6. Hyperlink boot of the Shannon device is not covered by the

demo7. Hyperlink speed is not an issue in the demo

Page 55: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Demo FlowLoad 66AK2H12 DSP Core 0 program that generates data into 8x8 (pre-defined) buffers in the 66AK2H12 DDR memory

One other Core will configure the MPAX registers to enable peeking into the DDR memory that is dedicated to Shannon 0

Follow the SMP Lab in the workshop to start multiple identical threads

Thread 0

Start the ARM process, do initialization and then span out 8 threads. All threads

waiting on a flag value, thread zero has its flag TRUE

Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7

All threads run the same algorithm

Page 56: Implementation of ProDrive Model Ran Katzur 10-8-2014.

ARM Initialization

• Initializes all global variables• Reboot the Shannon device• Initial the global Flag array• Span 8 threads

Flag Index

State

0 TRUE

1 FALSE

2 FALSE

3 FALSE

4 FALSE

5 FALSE

6 FALSE

7 FALSE

Page 57: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Thread (i) Initialization

Buffer Index

Logical Address

State

0 0x 0

1 0x 0

2 0x 0

3 0x 0

• Initializes all sets of buffers that are associated with the DSP that is controlled by this thread• Row Data Buffers• Output buffers• Scratch area buffers• Mailbox buffers

• Other initialization, thread variables, etc.• Wait on the flag

Page 58: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Thread (i) FlowRead the volatile global

variable flag[i]

No

Is it True?

Set flag[i] back to FALSEStart a terminal dialogue with the user:1. What function the Shannon should perform next?2. Supply the parameters for this function3.Next thread J to start after this thread (can be the same one)

Delay

Yes

Write the message to the DSP and wait for completion message back from the DSP

Print DSP Message on the terminalInsert a delay

Set the volatile flag[j]to TRUE to start the next threadNote – If parallel threads are supported in the demo, this instruction will move up after block 3

1

2

3

4

5

6

Page 59: Implementation of ProDrive Model Ran Katzur 10-8-2014.

DSP FlowRead the next message

magic number

NoIs it new (TRUE) message?

Read the message IDBased on Message ID jump to the function that performs this messageRead the message parameters

Yes

Perform the operation that was assigned in the message

Find the next available ARM mailbox and send the completion message to the thread

Change the magic number of the message to old message (FALSE)Update the next message pointer

1

2

3

4

5

6

Page 60: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Questions?

Page 61: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Back up

Page 62: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Example memory Allocation for DSP 74 x 32MB row data buffers

Logical Address(first 128MB starting in logical address 0x8000 0000

Physical Address (DSP 7)Physical address starts at 0x9 2800 0000

0 0x8000 0000 0x9 2800 0000

1 0x8200 0000 0x9 2A00 0000

2 0x8400 0000 0x9 2C00 0000

3 0x8600 0000 0x9 2E00 0000

Note – each buffer will be loaded before the program starts with 1024 valuesEach value is 0x1000 0000 * DSP number + 0x0010 0000 * buffer Number + IWhere I goes from 0 to 1023

Page 63: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Example memory Allocation for DSP 74 x 32MB output data buffers

Logical Address(next 128MB starting in logical address 0x8800 0000

Physical Address (DSP 7)Physical address starts at 0x9 2800 0000

0 0x8800 0000 0x9 3000 0000

1 0x8A00 0000 0x9 3200 0000

2 0x8C00 0000 0x9 3400 0000

3 0x8E00 0000 0x9 3600 0000

Note – These buffers will be used to move data back to the 66AK2H12One of the DSP functions will multiply the row data values by constant and write it to these buffers

Page 64: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Example memory Allocation for DSP 74 x 32MB scratch data buffers

Logical Address(next 128MB starting in logical address 0x9000 0000

Physical Address (DSP 7)Physical address starts at 0x9 2800 0000

0 0x9000 0000 0x9 3800 0000

1 0x9200 0000 0x9 3A00 0000

2 0x9400 0000 0x9 3C00 0000

3 0x9600 0000 0x9 3E00 0000

Note – These buffers will be used as private scratch area if needed

Page 65: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Mailbox Allocation in ShannonAssume base Address 0x0c00 0000 (logical)

0x0 0c00 0000 (Physical)

Message Number

Logical Address

0 0x0C00 0000

1 0x0C00 0080

2 0x0C00 0100

3 0x0C00 0180

4 0x0C00 0200

5 0x0C00 0280

6 0x0C00 0300

7 0x0C00 0480

Note – These buffers will be used as private scratch area if needed

Page 66: Implementation of ProDrive Model Ran Katzur 10-8-2014.

Shannon MSMC Messages structure

1K total memory for 8 messages for one DSP core

128B Message 0

Base Address

128B Message 1

128B Message 7

128B Message 2

128B Message 6

128B Message 3

128B Message 5

128B Message 4

8K total memory for 8 DSP cores

DSP 0 Messages Buffer

Base Address

Single DSP ALL DSPs

Base Address + 0x800

DSP 1 Messages Buffer

DSP 2 Messages Buffer

DSP 3 Messages Buffer

DSP 4 Messages Buffer

DSP 5 Messages Buffer

DSP 6 Messages Buffer

DSP 7 Messages Buffer

Base Address + 0x1000

Base Address + 0x1800

Base Address + 0x2000

Base Address + 0x2800

Base Address + 0x3000

Base Address + 0x3800

Each DSP can keep track on its address using DNUM, or we can use the MPAX registers to have the same logical address to all DSPs

Page 67: Implementation of ProDrive Model Ran Katzur 10-8-2014.

C6678 Hyperlink and Memory – EDMA

Page 68: Implementation of ProDrive Model Ran Katzur 10-8-2014.

66AK2H12 Hyperlink and Memory – EDMA

Page 69: Implementation of ProDrive Model Ran Katzur 10-8-2014.

66AK2H12 Hyperlink and Memory – EDMA