March 2012 P2020 Dual Core SEE Test Report - NASA Guertin_ P2020 Dual... · March 2012 P2020 Dual...

March 2012 P2020 Dual Core SEE Test Report NASA Electronic Parts and Packaging Program System-on-a-Chip Devices Prepared by: Steve Guertin Radiation Effects Group January 1, 2013

Jet Propulsion Laboratory California Institute of Technology

ii

CONTENTS

Summary ......................................................................................................................................... 1

1.0 Purpose ................................................................................................................................... 1

2.0 Context ................................................................................................................................... 2

3.0 Test System Description ........................................................................................................ 2

3.1 General Test Setup ......................................................................................................... 2

3.2 Test Hardware ................................................................................................................ 2

3.2.1 Test Board ......................................................................................................... 3

3.2.2 Test Interface Computer ................................................................................... 3

3.2.3 Power Supplies.................................................................................................. 4

3.2.4 P2020 Exposure ................................................................................................ 4

3.3 Test Software .................................................................................................................. 5

3.3.1 Previously Used Algorithms ............................................................................. 6

3.3.2 Testing on Multiple Cores ................................................................................ 7

3.3.3 Write/Read Test ................................................................................................ 7

3.3.4 L2 Error Map Extraction ................................................................................... 8

4.0 Test Plan................................................................................................................................. 8

4.1 DUTs .............................................................................................................................. 8

4.2 Beams Used .................................................................................................................... 9

4.3 Test Algorithms .............................................................................................................. 9

4.4 Basic Test Procedure .................................................................................................... 10

5.0 Test Results .......................................................................................................................... 10

5.1 L1 Cache Results .......................................................................................................... 10

5.2 L2 Cache Results and Error Maps ................................................................................ 11

5.3 Register Test Results .................................................................................................... 13

5.4 Dual Core Write/Read Test .......................................................................................... 14

5.5 Core Crashes ................................................................................................................. 14

6.0 Future Work ......................................................................................................................... 15

6.1 Core-to-Core Transfer of Data ..................................................................................... 15

6.1.1 Message Passing ............................................................................................. 15

6.1.2 Memory Coherence ......................................................................................... 15

6.2 P5020 Efforts ................................................................................................................ 16

6.3 Test of Memory Interface ............................................................................................. 16

6.4 Test of SERDES, Ethernet, or Other I/O ...................................................................... 16

iii

7.0 References ............................................................................................................................ 16

8.0 Acknowledgement ............................................................................................................... 17

9.0 Test Log for P2020 Exposures ............................................................................................. 18

1

Summary

P2020 dual core Freescale e500 processors were tested for heavy ion single event effects (SEE) at Texas A&M University (TAMU) on March 23 to March 24, 2012, under the NASA Electronic Parts and Packaging (NEPP) Program. This testing was intended to extend the testing reported in [1] and [2] by adding dual core operations, and testing with an algorithm that sensitizes the memory controller on the P2020. The approach used enabled testing of more components in the system-on-a-chip (SOC) structure and explored the nature of upsets in the dual core environment. Note that we also intended to test P5020s, but devices were not ready in time. P5020 testing was then pushed to the end of FY12 and has since been delayed to FY13 due to facility issues.

For this test, earlier test software was upgraded to enable more of the P2020’s hardware components. This included the development of dual-core test code, which allows for direct, simultaneous communication with both cores. Dual-core operation was accomplished by using a flat memory structure and having the second core run from a slightly different address, which allows the cores to identify their roles. The memory controller was tested by performing write and read operations during irradiation; however, there was no particular emphasis put on which of the memory systems was being stressed, and the results primarily apply to the portion of the memory management system associated with the processor caches.

The testing performed was tied to the earlier testing by noting the L1 and L2 cell cross sections, which remained the same. The cross section for bit upsets was the same in both cores, with an onset linear energy transfer (LET) of approximately 1 MeV-cm2/mg and saturated cross section of 1×10-9cm2/bit. No significant upset sensitivity of the memory controller was observed (with sensitivity of ~1×10-7cm2/device). Core-to-core communications were not directly tested. During testing cores were predominantly observed to crash independently of each other. The SEE sensitivity for crashes showed a threshold of at or below 1 MeV-cm2/mg, with a saturated cross section of about 2×10-11cm2/core.

1.0 Purpose

This report covers follow-up testing of P2020 processors for the NEPP program. Earlier testing established the general SEE sensitivity of this device for heavy ions. However, it did not address multicore operation and provided very limited peripheral control data. The earlier data was primarily isolated to L1 and L2 cache sensitivity in one of the e500 cores, along with limited register data. The testing covered here extends the previous testing by providing data obtained from other parts of the device. The additional data was collected while operating both cores simultaneously, and performing repeated write and read operations to memory.

For this testing some key developments were made. First, the hardware platform was moved from the P2020RDB used in earlier testing [1,2] to the P2020RDB-PCA, which is an updated version of the test hardware. The test software on the P2020 was updated to enable both of the cores to be operated simultaneously. An alternate version of the software was developed to enable memory operations during the dwell portion of the test algorithm. Additionally, data analysis capabilities were expanded to enable collection of the error pattern from the P2020 L2 cache.

This report provides detail on the software and hardware developments, as well information about the test system. It also includes the test plan, followed by the results. This report concludes

2

with recommendations on future work for achieving a reasonable termination point for SEE evaluation of the P2020 processor and extension to other SOCs.

2.0 Context

The testing reported here is part of an effort by the NEPP SOC task to determine radiation qualification methods for modern SOCs, and concerns testing of the P2020 dual core e500 microprocessor only.

Previous testing of the P2020 included proton testing [1] and heavy ion testing [2]. Heavy ion testing proved to be difficult because of the P2020’s copper heat spreader on the top of the die that, when opening, poses a significant damage risk to the bond wires below it. Results from static testing showed that the register, L1 cache, and L2 cache SEE sensitivities are in line with the types of responses previously seen with Freescale devices. The NEPP SOC task seeks to use this type of basic device-sensitivity data collection further, by testing to identify the relative sensitivity (or absolute if possible) of peripheral devices and the dynamic use of the P2020.

The present work provides heavy ion test data that extends that in [2] by adding some peripheral testing (memory management unit—MMU) and by collecting data from both cores of the processor simultaneously. Thus far we have not performed core-to-core communications testing, and have not performed extensive input/output (I/O) testing.

Originally this testing was to include the P5020; however, getting devices ready for testing took more effort than planned, and the hardware was not functional at the time of the testing.

3.0 Test System Description

This section discusses the test software and hardware used to operate the devices under test (DUTs) and collect data for this test.

3.1 General Test Setup

The basic test system from [2] was used as the basis for the testing performed. Note the general approach for testing was to leverage inexpensive boards and directly modify the DUT on the board. For this testing a couple changes were made to the software and hardware operation derived from [1,2]. In this section we discuss the test system, providing details on the test software and hardware.

The test boards used in this work are functional computers with a lot of support components. For this work, only the P2020 was tested for heavy ion SEE. Based on the earlier testing, we did not anticipate single event latchup (SEL) complications and decided that we could use test boards in the normal operating configuration.

3.2 Test Hardware

The test system is derived directly from that in the earlier testing [2]. For the current testing, we moved from the older and less available P2020RDB to the P2020RDB-PCA board. The new board required a slightly different effort to remove shielding material from the processor. However, the rest of the test system remains essentially the same.

3

3.2.1 Test Board

The test boards were the P2020RDB-PCA, with modifications to the mounted P2020 processor. The block diagram of the test board is provided in Figure 3.2-1. This board is essentially a full computer. As indicated above, however, only the processor was tested for SEE.

Figure 3.2-1. The block diagram of the P2020RDB-PCA. Note that it is a functional computer that uses the P2020 as the processor and primary bus for peripheral IO.

The test boards were mounted to the standard TAMU mounting bracket using an adapter plate made from a modified P2020RDB-PCA chassis.

3.2.2 Test Interface Computer

A laptop with custom Visual Basic code was used to communicate with the DUT over the RS232 interface, enabling recording and automation of key sequences to speed up test system operation. This computer enabled upload of test software and communicated with the P2020 under test. Both universal asynchronous receiver/transmitter (UART) ports on the P2020 were connected to this computer because each core has direct communication through its own UART under the test software (see section 3.3). The P2020RDB-PCA setup is shown in Figure 3.2-2.

4

Figure 3.2-2. Schematic of the test system. Note that the actual power supply was the P2020RDB-PCA power unit. The system was not monitored for SEL,

and none was expected.

3.2.3 Power Supplies

For the P2020RDB, standard wall power was applied to the special P2020RDB-PCA power supply, which provided power for some of the power planes on the P2020RDB-PCA. The board also generates some voltages using on-board regulation. Power was not monitored, and no damage has yet been observed in P2020 heavy ion and proton exposures [1,2].

3.2.4 P2020 Exposure

The P2020 is designed with the integrated circuit (IC) in an upright configuration (active surface facing up), with wire bonds going from the IC surface to an outer edge lead frame to connect to the ball grid array on the bottom surface. Above the IC is an unidentified filler material, and above that is a copper heat spreader. For heavy ion exposures the heat spreader must be thinned or removed. We removed the heat spreader through mechanical grinding. An example DUT after grinding is shown in Figure 3.2-3. Note that, for thermal control during testing, we directed the nitrogen exhaust line provided by TAMU. However, heating has not been observed to be a major issue. This is believed to be because we do not operate many of the high-speed I/Os (such as the serial ATA and serializer/deserializer [SERDES] ports) under the test software used for this test.

5

Figure 3.2-3. P2020 in a P2020RDB-PCA, with heat spreader removed. Bond wires are very close to the bottom of the heat spreader, so care is required

to ensure the part functions after the after the spreader is opened.

3.3 Test Software

The test software is architecturally similar to the software used to test PowerPC chips in the past, and is similar to the software developed under the NEPP SOC task for testing complex SOCs. The general test software structure is seen in Figure 3.3-1.

Figure 3.3-1. Flow chart for standard SOC test algorithms. Note that the wait period T can be used to perform dynamic testing.

6

The algorithm in Figure 3.3-1 is based on a control loop with a fixed interval for checking when the test run is complete. This approach is possible on the P2020 because the test software was designed to perform its own I/O. In earlier SOC and microprocessor test algorithms the I/O is performed through kernel code that usually carries significantly higher crash sensitivity than the test code. Since the P2020 test code performs its own I/O and does not call any kernel functions, the risk of kernel code exposure—and associated crashes—is dramatically reduced, making it viable for the test code to use input from the serial port during the test loop. This also allows periodic data output, reducing the amount of data lost when a processor core crash occurs.

The algorithm presented above is a general-use algorithm on which additional test algorithms can be developed. In this section we discuss the capabilities of the general algorithm and the extensions created to test individual parts of the P2020.

3.3.1 Previously Used Algorithms

3.3.1.1 Register Test Algorithm

The general use test algorithm automatically prepares unused registers with a set data pattern before irradiation, and then queries them after radiation (not on each iteration of the dwell loop). Any changes are recorded as single, double, triple, or more than triple-bit errors in a given register. The data pattern used is dependent on the 0s or 1s setting used as a global setting for the test algorithm. (The bits are set logically to all 0s or all 1s respectively; however, the actual bias configuration is not known for the individual storage elements.)

The test algorithm uses general purpose registers (GPRs) 0, 1, 2, 7, 8, 9, 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, for a total of twenty-two 32-bit registers, or a grand total of 704 bits. During the testing here both of the e500 cores were tested for register upsets, so the total number of register bits tested for SEE is 1,408.

3.3.1.2 L1 Test Algorithm

The general use test algorithm can be configured to perform L1 cache testing. For the reported testing, both cores were enabled to test their respective L1 data caches separately. (L1 instruction cache upsets cannot be observed.) Previous testing [1,2] showed that when an L1 upset occurs the cache line with the upset experiences a miss and the line is reloaded from the memory hierarchy (in this case the target cache line exists only in the main memory after an L1 miss, so a memory fetch is performed).

The primary test algorithm consists of three main operations:

1. Before heavy ion exposure, load the L1 data cache with 0s or 1s based on the global setting for the test algorithm.

2. Also before heavy ion exposure, disable the L1 data cache and load the main memory that corresponds to the address range stored in the L1 data cache with a known pattern that is different than all 1s or all 0s.

3. After exposure, activate the L1 data cache and observe how many cache lines result in misses and show the main memory image instead of the test data pattern. (Detailed information on which bits are in error cannot be collected due to the cache miss and associated cache line load.)

7

The L1 data cache consists of 32,768 bytes arranged in 8-byte cache lines. Each cache line has address information and cache line status bits, in addition to the basic data and parity bits. We approximate the total number of sensitive bits in the L1 data cache that contribute to cache misses through parity failure as 278,528 bits. There are only 4,096 unique cache lines, limiting the number of upsets that can be detected during a single beam exposure. Each line, however, contains more than 80 bits, so multiple upsets in a cache line cannot reliably be determined when the number of upsets in the cache reaches a significant fraction of the 4,096 cache lines. Thus, we try to restrict the number of upsets in the cache at any time to about 1,000 to keep the risk of undercounting to less than approximately 30%. It should be noted that the lack of double-bit upsets observed in the cache (only a handful observed when hundreds of errors are in the cache) suggests that the parity granularity is much smaller than the total cache line—we suspect at least one parity bit is used for every eight bits used in a cache line.

The L1 data caches in each processor are the same and were both tested independently by software running on their respective cores.

3.3.1.3 L2 Test Algorithm

Testing of the L2 cache was performed with the cache configured in cache as random-access memory (RAM) mode. In order to directly observe upsets, we disabled the error detection and correction (EDAC) protection on the L2 cache. With these settings in place, the L2 test algorithm proceeds as follows:

1. Before exposure, the L2 is prepared as all 1s or all 0s, based on the global setting for the test algorithm.

2. After exposure, each 32-bit word in the L2 cache is checked for errors. Errors are classified as 1, 2, 3 or more than 3-bits in error. All errors in the cache can be dumped with address and bit-level information given to enable pattern analysis.

The L2 cache is 512kB. With the cache in cache as RAM mode, and with the EDAC disabled, the number of target bits is simply eight bits per byte, or 4,194,304 bits.

The L2 cache is shared by the two cores. In this test algorithm, only the first test core (core 0) performs L2 testing.

3.3.2 Testing on Multiple Cores

For this test we developed test code that could run both cores on the P2020. At boot up only core 0 is operational. The code was designed to use the P2020RDB-PCA loaded U-Boot software second core (core 1) operation utilities. Core 1 is not initialized under the U-Boot start-up sequence. Thus, the test software had to initialize the core 1 memory management and I/O systems.

Under the test software used here, core 1 communicates through the second UART port. Aside from the boot operations, where core 0 has supervisor control over core 1, both cores operate autonomously. This is supported by using a memory map that does not intersect between the two cores, and core 1 is not used to test the L2 cache.

3.3.3 Write/Read Test

A write/read test was developed to explore potential SEE sensitivity in the memory management unit, as it pertains to writing and reading data from the main memory, and for operations with the L1 cache. Two versions of this test were developed. The first, which is referred to as

8

“DualCoreWriteReadTest_A,” focuses on off-chip operations, while the second, which replaces the “A” with “B,” “C,” and “D” for various versions used throughout testing, focuses on operations entirely contained in the L1 data cache.

This test operates as a functional “plug in” to the general test algorithm. That is, the testing is exactly the same for this test as for the static tests, until execution reaches the “wait for period T” step in Figure 3.3-1. Once in the wait spot, the cores will execute memory write and read operations during the wait period. Any error in memory operations is reported when it is observed, and the algorithm finishes when all the test memory has been written then read.

This algorithm is only targeted for internal or external operations, depending on the configuration of the L1 cache. For high-speed data operations another version of this algorithm may be required, as high-speed data transfer only occurs during cache line reads and writes from and to main memory, which are much more rapid if handled by the cache controller, instead of being isolated on-chip or forced to be external memory accesses. A functional speedup of 8× was observed by running entirely out of the cache compared to the algorithm running entirely to off-chip memory. If caches are enabled, and memory accesses use both the cache and main memory, the speed is not expected to be significantly reduced. Thus, more memory management resources and data buffer bandwidth will be used during testing, increasing the observed SEE sensitivity.

3.3.4 L2 Error Map Extraction

The L2 test and data extraction algorithms were modified to enable bit-level detail on observed errors. This was accomplished by adding a report option where the L2 could be dumped directly, while suppressing any cache region with no errors. By doing this we were able to obtain a list of all bit upsets in the L2 cache without using a lot of test I/O bandwidth. This information can be used for cluster analysis.

4.0 Test Plan

The test plan for this work was straightforward. We had a matrix of beams for exposure, and a set of test algorithms to use under each beam.

4.1 DUTs

All testing was performed on the DUT labeled “7.” The results (see section 5) are consistent with testing performed on the earlier DUT labeled “B” from [2], suggesting we did not gain value from testing with more DUTs. Shown in Figure 4.1-1, the markings on this DUT are from before the heat spreader was removed.

9

Figure 4.1-1. P2020 DUT in P2020RDB-7. Image taken shows markings before removal of the heat spreader.

4.2 Beams Used

The exposure list is given in Table 4.2-1.

Table 4.2-1. Ions used for testing of P2020. Ion (E in MeV) LETinc Angle LETeff Fluence

Ne-40 1.5 0 1.5 2.23E+06

Ne-40 4 0 4 2.91E+06

Ne-40 2.2 60 4.4 1.55E+06

Ne-40 8 60 8 2.21E+06

Ar-40 7.6 0 7.6 1.04E+07

Ar-40 13.9 0 13.9 9.83E+06

4.3 Test Algorithms

The test algorithms used with each beam include:

1. Basic dual-core test—for collecting static cache, register, and L2 RAM data. Note that many of these data are collected under the other test algorithms as well.

2. Dual-core write/read test A—for examining errors that come out of intensive write/read operations.

3. Dual-core write/read test B-D—versions B through D of this test are constrained to the on-chip cache for write/read operations.

An approximately 50/50 split of the global data pattern (1s or 0s) was used for testing.

10

4.4 Basic Test Procedure

Testing was performed by having the DUT test itself. The test code used is discussed in section 3.3. Since the goals were generally nominal proof-of-concept for the test approach applied to heavy ions, the test software was designed to test only one core and to sensitize the microprocessor register and cache bits for static single event upset (SEU).

The test mode was executed as follows:

1. Power up the DUT 2. Load the test code from the on-board flash 3. Instruct U-Boot to begin core 1 execution 4. Start the test code on core 0 5. Configure the test code 6. Start test execution and wait for initialization (usually by noting the completion of a

dwell period, which is indicated by printing a “.” to the serial port) 7. Perform beam exposure 8. End exposure if either core stops functioning as expected, or sufficient fluence is

achieved 9. Terminate test code with input through serial port 10. Collect and report on register, L1, and L2 test static upset counts 11. Request additional data dumps if desired.

5.0 Test Results

In this section we present the test results for the five main SEE categories of interest to this task: L1 cache, L2 cache, processor registers, memory interface, and dual core operations. Also included in these results are error maps, discussed in the test software section 3.3.

5.1 L1 Cache Results

The L1 data caches in each of the two e500 cores were tested for upsets. As in the earlier work, no individual upsets were observed, but the data cache lines were observed to become invalid and be refetched from main memory. The caches were tested for both 1 to 0 and 0 to 1 upsets. The results are shown in Figure 5.1-1. Here it is seen that data simultaneously collected from cores 0 and 1 show no significant difference. Similarly, there is no evidence that the data pattern 0s or 1s showed any bias in SEE sensitivity.

11

Figure 5.1-1. L1 data cache results for cores 0 and 1. The cores perform similarly to each other and show no bias toward 1 to 0 or 0 to 1 errors.

The results of this cache testing are a threshold of approximately 1 MeV-cm2/mg, and a saturated cross section of approximately 1×10-9cm2/bit.

5.2 L2 Cache Results and Error Maps

As in earlier testing, the L2 cache was tested with the cache placed into RAM mode, and with error correction disabled so that it could be directly examined for upsets. The overall results for the L2 cache are shown in Figure 5.2-1. Note that there is disagreement in the center of the graph, with LETs 2, 3, and 5 from the left having the greatest disagreement. These are the LETs achieved with Ne ions at an angle or with degraded beams, indicating possible range impact, and/or differences in the amount of packaging material crossed to get to each region of the device. The fourth set of points from the left is taken with normal-incident, undegraded Ar. However, the degraded Ar beam has good agreement at LET = 13.6 MeV-cm2/mg, so the interpretation is not clear.

12

Figure 5.2-1. L2 cache SEU results for 1 to 0 and 0 to 1 upsets compared to L1. Note that the results differ the most at the LETs achieved with Ne at an angle or with a degrader (LETs 2, 3, and 5 from the left).

The L2 cache SEU address information was collected to enable cluster analysis. No clustering algorithm has yet been constructed for automated analysis, but we have cursorily examined the data and found a small amount of correlation that suggests higher LETs will give more complex clustering. An example of the address and bit data collected from a run with 37 L2 upsets is given below. >0xc0000000]1234567812345678................................................> >0xc0005bc0].....................................1..........................> >0xc00097c0]...............4................................................> >0xc000d960]......................4.........................................> >0xc000da40]...........................................4....................> >0xc0014200]..........................8.....................................> >0xc001ad00].........8......................................................> >0xc0020e40]......................................2.........................> >0xc0025180]..........8.....................................................> >0xc00259a0].......................4........................................> >0xc00271c0]........................................1.......................> >0xc002d440]..................2.............................................> >0xc0030040].......................................1........................> >0xc0031320]........................1.......................................> >0xc0031d20].........................4......................................> >0xc0038700]...................8............................................> >0xc003e580]......................................4.........................> >0xc0043120].........................................................4......> >0xc0049180]............................................................2...> >0xc0049e60]2...............................................................> >0xc004c800]..........................................1.....................> >0xc004ca20].....................4..........................................> >0xc004e4e0].........2......................................................> >0xc0055540].........1......................................................> >0xc0055560].........1......................................................> >0xc0056ec0].........................................8......................> >0xc0061cc0].........2......................................................>

13

>0xc00633c0]....................................................2...........> >0xc0063ea0]....................................................8...........> >0xc0068640].................................1..............................> >0xc006adc0].............................2..................................> >0xc006afe0]....................................................8...........> >0xc006f4c0]...........1....................................................> >0xc0070c80].........................................8......................> >0xc0075460].....................................2..........................> >0xc0079800]........................1.......................................> >0xc007abc0]......................................................1.........> >0xc007cc40].........................................................8......>

It can be seen in the address and bit data that there might be a double bit error at 0xc0055540 and 0xc0055560. The probability of two back-to-back cache lines having the same bit in error as a random occurrence is 1 in 256 (0.4%) for back-to-back lines with upsets. The probability of having two lines in a row with errors, given 37 SBUs and 16,384 32-byte cache lines (as reported), is approximately one minus the probability that 37 random cache lines are at least one

line apart. The latter is approximately

36

1 16384

316384

i

i (ignoring edge cases where a line has

only one neighbor, and noting we are also counting two errors on the same cache line as suspicious), or about 0.885%. This indicates the probability of having two adjacent lines with an upset is less than 12% for 37 SBUs. Thus the overall chance for observing the paired upsets in the above dataset is approximately 0.05%.

5.3 Register Test Results

During testing the general test algorithm sensitizes 22 of the general purpose registers (GPRs) in each e500 core, for a total of 1,408 bits. Few bit upsets were observed in these registers during testing (primarily due to the limited number of test bits). The results are presented in Figure 5.3-1.

Figure 5.3-1. Register SBU results for GPRs.

14

Register clobbers were also observed (where the contents of a register change by more than a few bits, and appear to have been modified by processor action). These are believed to be due to upsets in control operations or instruction execution, leading to modified registers. It is not believed that register clobbers are due to SEU in the register showing the upset.

5.4 Dual Core Write/Read Test

The final test type was the write/read test of both cores to and from memory or their L1 data caches. The only observed events appeared to be due to upsets of registers involved in the execution of the test code. No clear evidence of upsets to the memory management operations was observed. Based on the exposure at the highest LET, the limit of the device sensitivity for this test is ~1×10-7cm2/device. Because of duty cycle limitations, the effective sensitivity may actually be lower (i.e., the test algorithm may be using only 10% or less of the resources available in the memory management unit).

5.5 Core Crashes

During testing, we observed the individual e500 cores of the P2020 crash. A total of eight core crashes occurred over a total fluence of 1.64×107/cm2, and these occurred across seven different runs. The one run where both cores crashed in the same run carried 1.3×105/cm2 of the total fluence, and it appears the chances of two crashes being in about 1% of the total fluence with events being independent is about 8%, which is not terribly high but is not prohibitively low. Thus, we do not have definitive evidence that any of the core crashes contributed to altered operation of the other processor core. Crashes were observed at the lowest and highest test LETs and the difference in cross sections was not very big, though measurements have limited statistics. The threshold is apparently below LET = 1 MeV-cm2/mg, and the saturated cross section is approximately 2×10-6cm2/device. The cross section plot for core crashes is shown in Figure 5.4-1. Of the eight crashes, six were on core 1 and two were on core 0. This distribution does not show a clear deviation from small sample statistics on the possibility of equal chance for crashes to occur on either core.

15

Figure 5.2-1. Cross section for core crashes. A total of eight crashes were observed from LET = 1.4 to 13.6 MeV-cm2/mg.

6.0 Future Work

In this section we discuss the recommended additional testing of the P2020, and associated P5020 SOCs, under this NEPP task.

6.1 Core-to-Core Transfer of Data

Core-to-core communication is a critical part of the operation of a parallel computer system, regardless of whether it is a message passing interface (MPI) system or a memory coherence system. A key component to understanding SOC devices is the inherent sensitivity of hypervisors (both in hardware and software). By analyzing the P2020’s MPI and memory coherence capabilities, we can better assess the operation of the P2020 in a parallel computing environment.

6.1.1 Message Passing

Although the P2020 primarily supports memory coherence as the core-to-core communications approach, it may be possible to configure it such that the core complex bus (CCB) or the e500 coherency module (ECM) supports direct communications between cores. This approach should only be attempted if a viable example of using this approach for hypervisor operations is found (this is not expected).

6.1.2 Memory Coherence

Shared-memory multiprocessors (SMP) handle parallelism largely through coherency of the memory architecture [6]. A very important aspect of the SEE sensitivity of a multiprocessor, such as the P2020 is the SEE sensitivity of the memory coherence operations. A good way to test this in the future is to examine the reliability of data transfer using cache coherency between the two cores

16

of the P2020. A potential way to achieve this would be to designate a region of memory that both cores can cache locally, but is within the same physical range in both cores so that writes to the cache in one core result in invalidations in the other core, thus reads in the other core will result in reading the data from the other core. In this way data can be transferred. Determining the SEE rate for core-to-core data transfer—and establishing if it is above or below the level of sensitivity determined by the intrinsic L1 parity event rate—would be valuable. Similarly, if the L1 caches are operated in write-through mode, alternate sensitivity could be determined. (For testing in a high-reliability configuration, it may be necessary to appropriately map the L2 cache to support local caching of write-through data so that off-chip writes are not always performed as a result of memory write operations when L1 caches are in write-through mode.)

6.2 P5020 Efforts

P5020 testing was not performed during this test because of problems with DUT preparation. The P5020, and in particular the e5500 core, are of considerable interest to aerospace users because of the licensing agreement between BAE Systems and Freescale for the e5500 64-bit core and associated QorIQ elements [5].

Future testing of the P5020 should be similar to that of the P2020, except that it is known that the P5020 is considerably more difficult to keep under thermal control. Hence, testing may have to be targeted at individual architectural elements that can be exposed, while maintaining good thermal management on the rest of the DUT.

6.3 Test of Memory Interface

The testing presented here provides results of limited application to test the actual memory interface of the P2020. An upgraded test that achieves higher throughput and more memory transfer operations is desired for a more complete memory interface test. The main change to the existing write/read test is that it should perform high-speed memory operations, while the current test is either isolated to the core’s L1 cache or performs direct memory access off-chip. The former case occurs in the L1-enabled test, which performs cache accesses at the designated size of the L1 cache. The latter case is inherently very slow, as code execution must wait for reads and writes to return from main memory.

Although the two existing forms of the test algorithm are useful for testing cache operations and direct memory access, they do not require significant throughput on the memory controller. Throughput can be achieved by allowing the cache controller to perform background memory operations while test code operates on memory that is automatically read into the cache and written to main memory using accelerated hardware operations.

6.4 Test of SERDES, Ethernet, or Other I/O

A high speed data transfer interface should be tested as an example of SEE testing of high speed I/O.

7.0 References

[1] Guertin, S. P2020 Proton Test Report. JPL Report. March 24, 2011. [2] Guertin, S. P2020 and P5020 Heavy Ion Test Report. JPL Report. August 27, 2011. [3] “P2020 Reference Manual,” Freescale Semiconductor, March 2011.

17

[4] Logan, J., “Migrating PowerQUICC III Processors to QorIQ Platforms,” Freescale Semiconductor, June 2010.

[5] “Freescale’s Power Architecture® Technology Licensed for Space Missions,” http://www.baesystems.com/article/BAES_043443/freescales-power-architecture-technology-licensed-for-space-missions.

[6] Hennessy, J. L., and D.A. Patterson., Computer Architecture, Fifth Edition: A Quantitative Approach, Morgan Kaufmann, Waltham, MA, 2011.

8.0 Acknowledgement

This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, and was sponsored by the National Aeronautics and Space Administration Electronic Parts and Packaging (NEPP) Program.

Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by the United States Government or the Jet Propulsion Laboratory, California Institute of Technology.

Copyright 2013. California Institute of Technology. Government sponsorship acknowledged.

18

9.0 Test Log for P2020 Exposures

March 2012 P2020 Dual Core SEE Test Report - NASA Guertin_ P2020 Dual... · March 2012 P2020 Dual...

Documents

Transcript of March 2012 P2020 Dual Core SEE Test Report - NASA Guertin_ P2020 Dual... · March 2012 P2020 Dual...