Implementation of ProDrive Model Ran Katzur 10-8-2014.
-
Upload
julian-blake -
Category
Documents
-
view
219 -
download
1
Transcript of Implementation of ProDrive Model Ran Katzur 10-8-2014.
Implementation of ProDrive Model
Ran Katzur 10-8-2014
Demo Goals1. Demonstrate the ability of DSP core to copy
data from 66AK2H12 DDR into its own DDR2. Demonstrate the ability of DSP core to copy
data from its own DDR into 66AK2H12 DDR 3. Demonstrate the ability of a DSP core to
process data and return results to the ARM4. Demonstrate the IPC model that is described
in this presentation
Agenda• Demo Model• Shannon Copy Implementation details• 66AK2H12 Messages Implementation Details• Building the Demo
Basic Card
66AK2H012
ARM CorePac
C6678 Device
FPGA
DSP CorePacs
C6678 Device
DDR Memory
DDR Memory
DDR Memory
Management Communication
66AK2H012
ARM CorePac
C6678 Device
FPGA
DSP CorePacs
C6678 Device
DDR Memory
DDR Memory
DDR Memory
Message Types:1. Data Address for the next load2. Finish LoadingMessage Media:3. SRIO type 114. SRIO DirectIO5. Ethernet
SRIO • Short messages – less than 256 bytes
– Message type (2 Bytes)– Sender ID (2 bytes)– Destination ID (2 bytes)– Destination address (4 bytes)– Other information needed
• Type 11– up to 64 mailboxes and 4 letters (single packet model)– Hardware protected messages- each message has acknowledgment– Access through sockets– Each ARM thread can have its own mailbox - socket
• Direct IO– Need to define protocol structure
IPC Control Communication
From ARM thread to DSP core:1. Copy my memory to your memory 2. Copy your memory to my memory3. Execute a functionFrom DSP core to ARM:4. Finish Copying 5. Finish processing with results
66AK2H012
ARM CorePac
C6678 Device
FPGA
DSP CorePacs
C6678 Device
DDR Memory
DDR Memory
DDR Memory
IPC over Hyperlink - Simple Model • Each thread is associated with one DSP core• Simple “messageQ” type model, single writer• No interrupts, messages are always pulled • Multiple buffers for messages, simple state machine
for the write side and the read side • Each side of the transection keeps score what buffer it
should read next and what buffer it should write next• Each side takes care of cache coherency• Communicating with the DSP that are on 66AK2H12
– Same algorithm, uses direct read and write with cache coherency
ARM Thread – DSP Core Messages
1. Thread sends a message to DSP Core2. DSP reads and executes the message3. DSP sends acknowledgment to thread
a. Buffer 0 is released4. Thread sends the next message to DSP
a. Can be before step 35. DSP reads and processes the message6. DSP sends acknowledgment to thread
a. Buffer 1 is released
ARM Thread DSP Core
Buffer 0
Buffer 1
Buffer 0
Buffer 1
1
3
4
6
Note: 1. The number of message buffers is the depth of processing queue. The
Arm thread keeps track on number of available (free) messages2. Thread checks Message Number to detect if DSP message was
overwritten (No DSP release of ARM message Buffer)
Copy Data From Thread to DSP Core
1. DSP core gets a message from the thread with source logical address, destination logical address, and size
2. DSP initiates EDMA transfer via the Hyperlink and waits for the EDMA completion
3. At the completion of the transfer the DSP send a message to the thread
4. MPAX and Hyperlink configuration will be discuss later
ARM Thread DSP Core
6678 DDRLogical address
(translation to physical is
done by MPAX)
66AK2H12 DDR Logical
address (translation to
physical is done by MPAX)
Copy Data From DSP Core to Thread
1. DSP core gets a message from the thread with source logical address, destination logical address, and size
2. DSP initiates EDMA transfer via the Hyperlink and waits for the EDMA completion
3. At the completion of the transfer the DSP send a message to the thread
4. MPAX and Hyperlink configuration will be discuss later
ARM Thread DSP Core
6678 DDRLogical address
(translation to physical is
done by MPAX)
66AK2H12 DDR Logical
address (translation to
physical is done by MPAX)
DSP Core Real-time State Machine1. DSP waits for a new
message to arrive 2. When message arrives the
DSP executes the function that is associated with the message
3. Upon completion of execution the DSP sends message back to the thread, and updates the buffer number for the next message
4. DSP returns to the waiting state. If there is a message waiting it continue with step 2, otherwise continue waiting
Waiting for a new message
Execute the message that
arrives
Send Message to Thread, change the
buffer number
Message from the Thread
Message to the Thread
The Thread Real-time AlgorithmAssume ARM manages DSP Data Memory
1. Thread checks if a new message from FPGA arriveda. If a message arrived, it processes the message and then checks for new
message from the DSP coreb. If no message arrive, checks to see if a new message arrived from the DSP
2. Thread checks if a new message from DSP arriveda. If a message arrived, it processes the message and then checks for new
message from the FPGA coreb. If no message arrive, checks to see if a new message arrived from the FPGA
Checking for a new message from the FPGA
Message from the DSP
Yes
NoChecking for a new
message from the DSP
No
Yes
Process the messageSee details
Process the messageSee details
Message from the FPGA
Post Processing if
needed
Processing FPGA messageAssume ARM manages DSP Data Memory
1. Messages to DSP includes source and destination logical address and scratch logical address if needed
2. Logical scratch address or destination address are managed by the ARM thread and can be used for post processing (post mortem) and to load new tables and constants
Messages from FPGA:1. Load message - what memory buffer was loaded and loading size 2. Error message – has data to process but not available buffer to load it**
Error message
Message Type Error Procedure
Load message
1. Update Buffers utilization table2. Send a message to the DSP core. If DSP message queue is full, report an error**3. Update messages buffer utilization
Processing DSP MessageAssume ARM manages DSP Data Memory
Messages from DSP:1. Finished Copying – Logical address DSP, logical address ARM, size, Message ID and message buffer number of the initiated message2. Execution Finished – return value(s), source buffer address, scratch buffer address, Message ID and message buffer number of the initiated message2. Error message – error code, Message ID and message buffer number of the initiated message
Error message
Message Type Error Procedure
All Other messages
1. Process the return values 2. Update Buffers utilization table(if not needed for post processing) and message buffer utilization3. If needed, start post-processing (post Mortem processing) –
a. Initiate upload or download data from ARM logical memory to DSP logical memory
Thread Post-ProcessingAssume ARM manages DSP Data Memory
1. Two type of messages – Load from the ARM logical memory or upload from the DSP logical memory2. In both messages specify the message ID, source address and destination address3. Send a message to the DSP core. If DSP message queue is full, report an error**4. Update Buffers utilization table and message buffer utilization
When DSP core reads the message it does the following:1. Initiates data transfer using pre-define EDMA channel2. May continue to process other messages (or not) and wait for the EDMA completion interrupt3. After receiving the EDMA interrupt send a completion message to the thread
After receiving completion message of upload data, the thread can process upload data or/and write it to an external disk
Agenda• Demo Model• Shannon Copy Implementation details• 66AK2H12 Messages Implementation Details• Building the Demo
C6678 Memory Management
4G Total DDR memory
dedicated to 8 cores, each has 384MB private memory and 1G is accessed by
all DSP
384MB Dedicated for Core 0
08 8000 0000
09 8000 0000
Shannon DDR Data Partition – Physical address
08 9800 0000
1GB shared between all cores – code, constants, etc.
384MB Dedicated for Core 108 b000 0000
384MB Dedicated for Core 208 c800 0000
384MB Dedicated for Core 3 08 E000 0000
384MB Dedicated for Core 408 F800 0000
384MB Dedicated for Core 509 1000 0000
384MB Dedicated for Core 609 2800 0000
384MB Dedicated for Core7 09 4000 0000
C6678 Memory SegmentPhysical Address Size description Logical address for the
core Comment
0x0 0c00 0000 4MB MSMC shared memory 0x0c00 0000 Use for IPC, all DSP cores can see this memory
0x8 8000 0000 384MB DSP 0 private memory 0x8000 0000 Access only by DSP 0
0x8 9800 0000 384MB DSP 1 Private memory 0x8000 0000 Access only by DSP 1
0x8 b000 0000 384MB DSP 2 private memory 0x8000 0000 Access only by DSP 2
0x8 c800 0000 384MB DSP 3 Private memory 0x8000 0000 Access only by DSP 3
0x8 e000 0000 384MB DSP 4 private memory 0x8000 0000 Access only by DSP 4
0x8 f800 0000 384MB DSP 5 Private memory 0x8000 0000 Access only by DSP 5
0x9 1000 0000 384MB DSP 6 private memory 0x8000 0000 Access only by DSP 6
0x9 2800 0000 384MB DSP 7 Private memory 0x8000 0000 Access only by DSP 7
0x9 4000 0000 1GB Shared Memory for all cores
0xc000 0000 Accessed by all cores, will have code, constants and so on
0x8 8000 0000**(For each core the start address will be different, the implementation will be describe in the MPAX implementation section)
1GB – 384M = 0x3F40 0000
No core has access except to its own region
0x9800 0000 This segment will have no permission to read, write or execute for any core. This is done to prevent one core overwrite the data of another core
MPAX registers – Shannon side• Each DSP core has its own set of MPAX registers• Teranet has multiple sets of SES and SMS MPAX registers• Since EDMA inherent the PriviID of the DSP core that initiates the
transfer, each core will configure its own MPAX registers and the SES and SMS MPAX registers that are associated with its PriviID.
• Multiple MPAX registers may map the same Logical address, each one to a different physical address. It that case the actual translation is done based on the MPAX register with the higher ID number. This feature will be used to prevent DSP core from accessing private memory of another core.
• The default setting of the MPAX registers uses MPAX register 0 to map all internal device addresses (logical memory MSB is 0x0) to internal memory ), just add 4 bits of zero as the MSB, and maps 2G of external memory (MSB is 0x1) to 2G physical addresses starting with address 0x8 8000 0000. The SES and SMS default registers are similar. These registers will not be modified.
C6678 MPAX Registers
Value MPAX2 MPAX3 MPAX4 MPAX5Logical 0x80000 0x80000 0x90000 0xc0000Physical 0x8800000 0x880000 + I *
0x18000 where I is the core number
0x890000 + I * 0x18000 where I is the core number
0x940000
Size 0x1E (1G) 0x1c (256M) 0xb (128MB) 0x1E (1G)Permission
0x00 0x3f 0x3f 0x3f
Comment Permission are all zero, cannot read, write or execute
Configure the private memory, Overwrite MPAX 2
Configure the private memory, Overwrite MPAX 2
For the shared memory
The setting of MPAX registers for DSP core I, i=0. 7 (C6678 only)
C6678 MPAX Registers
Value SES 1 for PrivID i SES 2 for PrivID i
SES 3 for PrivID i SES 4 for PrivID i
Logical 0x80000 0x80000 0x90000 0xc0000Physical 0x880000 0x880000 + I *
0x18000 where I is the PrivID number
0x890000 + I * 0x18000 where I is the PrivID number
0x940000
Size 0x1E (1G) 0x1c (256M) 0xb (128MB) 0x1E (1G)Permission
0x00 0x3f 0x3f 0x3f
Comment Permission are all zero, cannot read, write or execute
Configure the private memory, Overwrite MPAX 2
Configure the private memory, Overwrite MPAX 2
For the shared memory
The setting of SES registers for PriviID I, i=0. 7 (C6678 only)
C6678 MPAX Registers
The setting of SMS registers for PriviID I, i=0.7 stays as the default
Hyperlink Considerations• Each CorePac can access up to 256MB of memory (128M
Hyperlink 1 on 66AK2H12)• Using ARM thread to move data to and from Shannon
limits the data to 256MB (128MB) for all the 8 cores (No run-time re-configure of Hyperlink please)
• When the system uses Shannon cores to move data to and from the 66AK2H12, each core can address up to 256MB
• If two Shannons use Hyperlink to access remote memory, DDR accessible memory is limited to 2G (31 bits address, the MSB is always 1) in addition to internal-device MMR and memories (MSMC, L2, L1, MMR)
Hyperlink Considerations (2)• To increase efficiency and reduce complexity it
is very important to allow parallel data movements to and from 66AK2H12 DDR
• 8 ARM threads may exchange data between the ARM and DSP cores within 66AK2H12. This work does not cover internal data move
• 16 threads move data via the Hyperlink, thus the size limit of Hyperlink is very important
Hyperlink Considerations (3)• Message buffers are located on the MSMC
memory. All MSMC memory can be accessed by Hyperlink
• 2G of DDR memory can be access by Hyperlink• Each DSP core can access up to 128M (2G/16)• In the following slides we analyze the Hyperlink
configuration that is needed to support Shannon access to 66AK2H12 memories
• 66AK2H12 access into Shannon (for messages) will be discussed later
Hyperlink Considerations (4)• We assume that messages reside in MSMC memory• In order to get 128MB DDR for each core, PriviID
must be overlay on the look-up table index• On the remote side, the look-up table has the base
address of memory segment. The index to the look-up table is part of the address value that is sent from the local to the remote
• The following figure shows the structure of the address value for 1G total access from Shannon( Each core – 128MB. 4 buffers, 32MB each for each DSP core)
C6678 Hyperlink Address structureThis is the address that the Shannon sends to 66AK2H12 Hyperlink
31 30 29 28 27 26 25 24 23 - 0
PriviID
Index Into Look-Up Table
Index Into Look-Up Table32MB buffers require 25 bits offset
Tx Address Overlay Control Register
• User configures PrivID / Security bit overload in this register
• Register is at address HyperLinkCfgBase + 0x1c. For 6678 that is 0x2140_001c
• If using HyperLink LLD, hyplnkTXAddrOvlyReg_s represents this register
31 20 19 16 15 12 11 8 7 4 3 0
Reserved txsecovl Reserved txprividovl Reserved txigmask
R R/W R R/W R R/W
Address Manipulation: Tx Side Registers
Register Configuration
• txsecovl = o – security bit not overlay
• txprividovl = 12 (bit 31 to 28)
• txigmask = 11 (mask = 0x0fff ffff)
31 26 25 24 23 20 19 16 15 12 11 8 7 4 3 0Reserved rxsechi rxseclo Reserved rxsecsel Reserved rxprividsel Reserved rxsegsel
R R/W R/W R R/W R R/W R R/W
Rx Address Selector Control Register
• Register is at address HyperLinkCfgBase + 0x2c. For 6678, that is 0x2140_002c
• If using HyperLink LLD, hyplnkRXAddrSelReg_s represents this register
Address Translation: Rx Side Registers
Register Configuration
• rxsechi, rxseclo, and rxsecsel are all zero
• rxprividsel = 12 (Bits 31 to 28)
• rxsegsel = 9 (bits 30 to 25)
Hyperlink Look-up Table• Each Shannon core will have 8 lines in the look-up
table (there are 64 lines in each Hyperlink, and 8 cores)
• 4 lines point to 4 segment of remote memory, 32MB memory each, fifth segment is the MSMC memory
• The last 3 lines are empty (can configure to non-existing memory to prevent access to memory that is not accessible to Shannon)
• Translation from logical addresses to physical addresses will be done by the 66AK2H12 Hyperlink MPAX registers (set E)
Hyperlink Look-up Table Shannon 0DSP internal addresses - from 0x4000 000 to 0x47ff ffff
Line (index)(Binary)
CorePac Logical base Address Size Purpose
000000 to line 000111
0 0x8000 0000,0x8200 00000x8400 00000x8600 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
001000 to line 001111
1 0x8800 0000,0x8b00 00000x8d00 00000x8e00 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
010000 to line 010111
2 0x9000 0000,0x9200 00000x9400 00000x9600 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
011000 to line 011111
3 0x9800 0000,0x9b00 00000x9d00 00000x9e00 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
Hyperlink Look-up Table Shannon 0DSP internal addresses - from 0x4000 000 to 0x47ff ffff
100000 to line 100111
4 0xa000 0000,0xa200 00000xa400 00000xa600 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
101000 to line 101111
5 0xa800 0000,0xab00 00000xad00 00000x8e00 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
110000 to line 110111
6 0xb000 0000,0xb200 00000xb400 00000xb600 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
111000 to line 111111
7 0xb800 0000,0xbb00 00000xbd00 00000xbe00 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
Hyperlink Look-up Table Shannon 1DSP internal addresses - from 0x4000 000 to 0x47ff ffff
Line (index)(Binary)
)
CorePac Logical base Address
Size Purpose
000000 to line 000111
0 0xc000 0000,0xc200 00000xc400 00000xc600 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
001000 to line 001111
1 0xc800 0000,0xca00 00000xcc00 00000xcd00 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
010000 to line 010111
2 0xd000 0000,0xd200 00000xd400 00000xd600 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
011000 to line 011111
3 0xd800 0000,0xda00 00000xdc00 00000xdd00 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
Hyperlink Look-up Table Shannon 1DSP internal addresses - from 0x4000 000 to 0x47ff ffff
100000 to line 100111
4 0xec000 0000,0xe200 00000xe400 00000xe600 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
101000 to line 101111
5 0xe800 0000,0xea00 00000xec00 00000xed00 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
110000 to line 110111
6 0xf000 0000,0xf200 00000xf400 00000xf600 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
111000 to line 111111
7 0xf800 0000,0xfa00 00000xfc00 00000xfd00 00000x0c00 0000
24 (32MB) for the first 4 segments, 21 (4MB) for the last segment dedicated to IPC
First 4 segment are for data copy and will be mapped to DDR physical memory by SES MPAX, last segment
Agenda• Demo Model• Shannon Copy Implementation details• 66AK2H12 Implementation Details• Building the Demo
66AK2H12 Physical Addresses
• 66AK2H12 dedicates 1G of DDR memory to facilitate data move (read and write) between each Shannon and the ARM using Hyperlink
• Assume that Shannon 0 has a dedicated physical addresses 0x9 0000 0000 to 0x9 3fff ffff
• Assume that Shannon 1 has a dedicated physical addresses 0x9 c000 0000 to 0x9 ffff ffff
• Accessing the memory for IPC (messages) will be described later
1G Total DDR memory
dedicated to move data to 8
cores
128MB MB Dedicated for Core 0
09 0000 0000
66AK2H12 Physical address dedicated to the first Shannon Device
09 0800 0000
09 1000 0000
09 1800 0000
09 2000 0000
09 2800 0000
09 3000 0000
09 3800 0000
09 4000 0000
128MB MB Dedicated for Core 1
128MB MB Dedicated for Core 4
128MB MB Dedicated for Core 5
128MB MB Dedicated for Core 2
128MB MB Dedicated for Core 3
128MB MB Dedicated for Core 6
128MB MB Dedicated for Core 7
1G Total DDR memory
dedicated to move data to 8
cores
128MB MB Dedicated for Core 0
09 c000 0000
66AK2H12 Physical address dedicated to the Second Shannon Device
09 c800 0000
09 d000 0000
09 d800 0000
09 e000 0000
09 e800 0000
09 f000 0000
09 f800 0000
0A 0000 0000
128MB MB Dedicated for Core 1
128MB MB Dedicated for Core 4
128MB MB Dedicated for Core 5
128MB MB Dedicated for Core 2
128MB MB Dedicated for Core 3
128MB MB Dedicated for Core 6
128MB MB Dedicated for Core 7
MPAX registers – Hyperlink on 66AK2H12• The hyperlink configuration on the 66AK2H12– Shannon 0 logical memory 0x8000 0000 to 0xbfff
ffff– Shannon 1 logical memory 0xc000 0000 to 0xffff
ffff• The physical memory configuration of
66AK2H12– Shannon 0 - 0x9 0000 0000 to 0x9 3fff ffff– Shannon 1 - 0x9 C000 0000 to 0x9 ffff ffff
66AK2H12 Hyperlink MPAX Registers
Value SES 1 for PriviID 0xE SES 2 for PriviID 0xELogical 0x80000 0xc0000Physical 0x900000 0x9c0000
Size 0x1E (1G) 0x1E (1G)Permission 0x3f 0x3fComment First Shannon starts at address
0x9 0000 0000Second Shannon starts at address 0x9 C000 0000
66AK2H12 Hyperlink MPAX Registers
The setting of SMS registers for PriviID 0xE stays as the default
66AK2H12 to Shannon Communication Considerations
• In the model that is described here, the only read or write that the 66AK2H12 does with respect to the Shannon devices is sending messages
• 66AK2H12 messages area (from Shannon to 66AK2H12) is chosen to be the MSMC – If the messages are in DDR, it reduces the size of buffer that is dedicated to
each DSP– The hyperlink and MPAX setting was covered already
• The Shannon’s messages memory is chosen to be in the MSMC memory– Otherwise it reduces the size of the DDR buffers that are currently used by
a DSP core
Configuration Considerations• The messages memory is statically divide
between DSP cores in the application. In terms of the Hyperlink configuration and MPAX registers all cores in all Shannons can access the entire messages memory. (again, limitations are in the application)
• The next few slides shows the proposed messages’ structure
Messages structuresize 128 Bytes
1. Magic Number
2.Message ID
8. Destination Logical Address
4.Source name
5. Destination Name
9. Auxiliary Address (Logical)
Word 10 to 32-Additional parameters/return values
6. Execution Code (Name)
7. Source Logical Address
3. message Number (modulo 16)
Messages Control
Base Address Messages
Next Message to Read
Last Message ID
Number of messages in the Buffer
Base Address Messages
Next Message to Write
Last Message ID
Number of messages in the Buffer
Read Control Structure Write Control Structure
Message size Message size
Shannon MSMC Messages structure
1K total memory for 8 messages for one DSP core
128B Message 0
Base Address
128B Message 1
128B Message 7
128B Message 2
128B Message 6
128B Message 3
128B Message 5
128B Message 4
8K total memory for 8 DSP cores
DSP 0 Messages Buffer
Base Address
Single DSP ALL DSPs
Base Address + 0x800
DSP 1 Messages Buffer
DSP 2 Messages Buffer
DSP 3 Messages Buffer
DSP 4 Messages Buffer
DSP 5 Messages Buffer
DSP 6 Messages Buffer
DSP 7 Messages Buffer
Base Address + 0x1000
Base Address + 0x1800
Base Address + 0x2000
Base Address + 0x2800
Base Address + 0x3000
Base Address + 0x3800
Each DSP can keep track on its address using DNUM, or we can use the MPAX registers to have the same logical address to all DSPs
66AK2H12 Hyperlink Address structureThis is the address that the 66AK2H12 send to Hyperlink Shannon
31 30 29 28 27 26 25 24 21-0
PriviID
Index Into Look-Up Table
Index Into Look-Up Table4MB buffers require 22 bits offset
23 22
Tx Address Overlay Control Register
• User configures PrivID / Security bit overload in this register
• Register is at address HyperLinkCfgBase + 0x1c. For 6678 that is 0x2140_001c
• If using HyperLink LLD, hyplnkTXAddrOvlyReg_s represents this register
31 20 19 16 15 12 11 8 7 4 3 0
Reserved txsecovl Reserved txprividovl Reserved txigmask
R R/W R R/W R R/W
Address Manipulation: Tx Side Registers
Register Configuration
• txsecovl = o – security bit not overlay
• txprividovl = 12 (bit 31 to 28)
• txigmask = 11 (mask = 0x0fff ffff)
31 26 25 24 23 20 19 16 15 12 11 8 7 4 3 0Reserved rxsechi rxseclo Reserved rxsecsel Reserved rxprividsel Reserved rxsegsel
R R/W R/W R R/W R R/W R R/W
Rx Address Selector Control Register
• Register is at address HyperLinkCfgBase + 0x2c. For 6678, that is 0x2140_002c
• If using HyperLink LLD, hyplnkRXAddrSelReg_s represents this register
Address Translation: Rx Side Registers
Register Configuration
• rxsechi, rxseclo, and rxsecsel are all zero
• rxprividsel = 12 (Bits 31 to 28)
• rxsegsel = 6 (bits 27 to 22)
Hyperlink Look-up Table• Since there is no overlay between PriviID and
the index to the look-up table, only one line in the look-up table is needed
• If the model is changed, and more Shannon memory is visible to the 66AK2H12, then more lines will be added (and the configuration might be changed)
• The SMS MPAX registers on the 66AK2H12 for Hyperlink are the default
Hyperlink Look-up TableLine (index)(Binary)
CorePac Logical base Address Size Purpose
000000 ARM CorePack 0x0c00 0000, 21 (4MB) for the MSMC
Having the messages buffers. All together 8K for each Shannon. Base address can be anywhere in the 4MB area
Agenda• Demo Model• Shannon Copy Implementation details• 66AK2H12 Messages Implementation Details• Building the Demo
Demo Goals1. Demonstrate the ability of DSP core to copy data from
66AK2H12 DDR into its own DDR2. Demonstrate the ability of DSP core to copy data from its
own DDR into 66AK2H12 DDR 3. Demonstrate the ability of a DSP core to process data and
return results to the ARM4. Demonstrate the IPC model that is described in this
presentation5. Usage of the 66AK2H12 DSP cores is not covered in the
demo 6. Hyperlink boot of the Shannon device is not covered by the
demo7. Hyperlink speed is not an issue in the demo
Demo FlowLoad 66AK2H12 DSP Core 0 program that generates data into 8x8 (pre-defined) buffers in the 66AK2H12 DDR memory
One other Core will configure the MPAX registers to enable peeking into the DDR memory that is dedicated to Shannon 0
Follow the SMP Lab in the workshop to start multiple identical threads
Thread 0
Start the ARM process, do initialization and then span out 8 threads. All threads
waiting on a flag value, thread zero has its flag TRUE
Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7
All threads run the same algorithm
ARM Initialization
• Initializes all global variables• Reboot the Shannon device• Initial the global Flag array• Span 8 threads
Flag Index
State
0 TRUE
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
7 FALSE
Thread (i) Initialization
Buffer Index
Logical Address
State
0 0x 0
1 0x 0
2 0x 0
3 0x 0
• Initializes all sets of buffers that are associated with the DSP that is controlled by this thread• Row Data Buffers• Output buffers• Scratch area buffers• Mailbox buffers
• Other initialization, thread variables, etc.• Wait on the flag
Thread (i) FlowRead the volatile global
variable flag[i]
No
Is it True?
Set flag[i] back to FALSEStart a terminal dialogue with the user:1. What function the Shannon should perform next?2. Supply the parameters for this function3.Next thread J to start after this thread (can be the same one)
Delay
Yes
Write the message to the DSP and wait for completion message back from the DSP
Print DSP Message on the terminalInsert a delay
Set the volatile flag[j]to TRUE to start the next threadNote – If parallel threads are supported in the demo, this instruction will move up after block 3
1
2
3
4
5
6
DSP FlowRead the next message
magic number
NoIs it new (TRUE) message?
Read the message IDBased on Message ID jump to the function that performs this messageRead the message parameters
Yes
Perform the operation that was assigned in the message
Find the next available ARM mailbox and send the completion message to the thread
Change the magic number of the message to old message (FALSE)Update the next message pointer
1
2
3
4
5
6
Questions?
Back up
Example memory Allocation for DSP 74 x 32MB row data buffers
Logical Address(first 128MB starting in logical address 0x8000 0000
Physical Address (DSP 7)Physical address starts at 0x9 2800 0000
0 0x8000 0000 0x9 2800 0000
1 0x8200 0000 0x9 2A00 0000
2 0x8400 0000 0x9 2C00 0000
3 0x8600 0000 0x9 2E00 0000
Note – each buffer will be loaded before the program starts with 1024 valuesEach value is 0x1000 0000 * DSP number + 0x0010 0000 * buffer Number + IWhere I goes from 0 to 1023
Example memory Allocation for DSP 74 x 32MB output data buffers
Logical Address(next 128MB starting in logical address 0x8800 0000
Physical Address (DSP 7)Physical address starts at 0x9 2800 0000
0 0x8800 0000 0x9 3000 0000
1 0x8A00 0000 0x9 3200 0000
2 0x8C00 0000 0x9 3400 0000
3 0x8E00 0000 0x9 3600 0000
Note – These buffers will be used to move data back to the 66AK2H12One of the DSP functions will multiply the row data values by constant and write it to these buffers
Example memory Allocation for DSP 74 x 32MB scratch data buffers
Logical Address(next 128MB starting in logical address 0x9000 0000
Physical Address (DSP 7)Physical address starts at 0x9 2800 0000
0 0x9000 0000 0x9 3800 0000
1 0x9200 0000 0x9 3A00 0000
2 0x9400 0000 0x9 3C00 0000
3 0x9600 0000 0x9 3E00 0000
Note – These buffers will be used as private scratch area if needed
Mailbox Allocation in ShannonAssume base Address 0x0c00 0000 (logical)
0x0 0c00 0000 (Physical)
Message Number
Logical Address
0 0x0C00 0000
1 0x0C00 0080
2 0x0C00 0100
3 0x0C00 0180
4 0x0C00 0200
5 0x0C00 0280
6 0x0C00 0300
7 0x0C00 0480
Note – These buffers will be used as private scratch area if needed
Shannon MSMC Messages structure
1K total memory for 8 messages for one DSP core
128B Message 0
Base Address
128B Message 1
128B Message 7
128B Message 2
128B Message 6
128B Message 3
128B Message 5
128B Message 4
8K total memory for 8 DSP cores
DSP 0 Messages Buffer
Base Address
Single DSP ALL DSPs
Base Address + 0x800
DSP 1 Messages Buffer
DSP 2 Messages Buffer
DSP 3 Messages Buffer
DSP 4 Messages Buffer
DSP 5 Messages Buffer
DSP 6 Messages Buffer
DSP 7 Messages Buffer
Base Address + 0x1000
Base Address + 0x1800
Base Address + 0x2000
Base Address + 0x2800
Base Address + 0x3000
Base Address + 0x3800
Each DSP can keep track on its address using DNUM, or we can use the MPAX registers to have the same logical address to all DSPs
C6678 Hyperlink and Memory – EDMA
66AK2H12 Hyperlink and Memory – EDMA
66AK2H12 Hyperlink and Memory – EDMA