Dynamically Reconfigurable Neuron Architecture for the...
Transcript of Dynamically Reconfigurable Neuron Architecture for the...
Dynamically Reconfigurable Neuron Architecture for the Implementation of Self-
Organizing Learning Array
Janusz A. Starzyk,Yongtao Guo, and Zhineng Zhu
School of Electrical Engineering & Computer Science
Ohio University, Athens, OH 45701
{starzyk, gyt,zhineng}@bobcat.ent.ohiou.edu
Abstract: In this paper, we describe a new dynamically
reconfigurable neuron hardware architecture based on the
modified Xilinx Picoblaze microcontroller and self-
organizing learning array (SOLAR) algorithm reported
earlier. This architecture is aiming at using hundreds of traditional reconfigurable field programmable gate arrays
(FPGAs) to build the SOLAR learning machine. SOLAR
has many advantages over traditional neural network
hardware implementation. Neurons are optimized for
area and speed, and the whole system is dynamically self-
reconfigurable during the runtime. The system architecture is expandable to a large multiple-chip system.
1. Introduction
The immense computing power of the brain is
believed to be the result of the parallel and distributed
computing performed by approximately 1011
neurons, each
with an average of 103 – 10
4 connections. To prototype
networks that mimic the biological neurons using
hardware, FPGA technology can be used with its flexible
and programmable interconnect and computing resources.
During the last decade, numbers of approaches have been
attempted to apply reconfigurable hardware to the neural
networks [1,2]. However, these implementations are
focused on densely interconnected, weight dominated
architectures which are not easily expandable to multiple
FPGAs with identical cell architecture. Due to the
additional hardware cost of adjusting interconnection
weights of traditional neural networks, often their
connections and functionalities are fixed and predefined by
off-line simulation. The gradient method is difficult to
implement and therefore, it is hard to expand the network
to a large-scale system. One exception here are FPGA
based arrays of identical cells to implement evolutionary
computing and genetic algorithms [3,4,5]. New FPGA
structures, adopting pulse density [6], pulse position and
time difference simultaneous perturbation are presented to
avoid the hardware gradient evaluation in NNs. Also,
recently, cellular neural networks used in image
processing [7] and their hardware implementation have
attracted a lot of attention.
In this paper, we present a dynamically reconfigurable
(i.e. during runtime), data-based and expandable neuron
architecture to implement a self-organizing learning array
(SOLAR). The neuron’s functionality is represented in the
form of flexible connections and organization of an array
of evolvable signal processing blocks. Each neuron can
implement various arithmetic and logic functions. The
neurons can self-reconfigure and use local interconnect for
maximum performance.
The rest of the paper is organized as follows. Section
2 talks about our pervious work on SOLAR. Section 3
deals with the architecture of the dynamically
reconfigurable neuron. Section 4 reports the simulation
and experimental results. Finally, Section 5 concludes this
paper.
2. Previous Work
Self-organization is important in artificial neural
networks (ANNs) and machine learning. Previously, a
self-organizing learning algorithm that combines neural
networks and information theory was presented in [8].
This entropy-based neural network learning algorithm was
simulated on standard benchmarks and proved to be
advantageous over many existing neural networks and
machine learning algorithms in a wide range of technical
applications [9], in particular while dealing with noisy or
incomplete data. Based on this algorithm, we proposed
SOLAR system which is different from classical ANNs in
the way it is organized and how it learns. SOLAR self-
organizes its hardware resources to perform classification
and recognition tasks. Similar to the structure of cellular
neural networks, SOLAR has a fixed array of elemental
processing units acting as single neurons, and
programmable interconnections between them. Initially,
SOLAR neurons are randomly connected to previously
generated neurons. They learn adaptively using both
primary inputs and inputs from other neurons. Controlled
by signals from other neurons, they perform basic
transformations of their input signals. A neuron
parameters and connections are re-configured as a result of
training, and effectively, SOLAR’s structure self-organizes
establishing its final wiring.
Earlier work mainly focused on SOLAR algorithm
simulation, and its hardware implementation was limited
to a single FPGA chip with the cooperation from software
[10]. This hardware implementation explored the
adaptability of SOLAR to FPGA implementation,
0-7695-2132-0/04/$17.00 (C) 2004 IEEE
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
although, the cellular neuron architecture was not fully
explored since a shared block memory was used for all
neurons. In SOLAR simulation, we adopted feed forward
network structure for its stability and fast learning.
SOLAR architecuture is divided into three main layers as
shown in Fig. 1– input, processing and output layers.
The interconnections are randomly initialized at the
beginning of the network training. The basic building
blocks of the architecture are the small neurons, trained
using the entropy-based algorithm. The simplified learning
algorithm is introduced as follows:
Let ( ){ }cS ,ν= be a training set of N vectors, where
nv ℜ∈ is a feature vector and Ζ∈c is its class label
from an index set Ζ . A classifier is a mapping
Ζ→ℜ nC : , which assigns a class label in Ζ to each
vector in nℜ . A training pair ( ) Sc ∈,ν is misclassified
if ( ) cC ≠ν . The performance measure of the classifier is
the probability of error, i.e. the fraction of the training set
that it misclassifies. Our goal is to minimize this
misclassification probability, and it is achieved through a
simple threshold searching based on a maximum
information index calculated from estimated subspace
probability and local class probabilities. The set of
training data is searched sequentially class by class to
establish the optimum thresholds, which best separate the
signals of various training classes. A partition quality is
measured by the entropy based information index defined
by:
max
1E
EI
∆−=
where +−=∆s
sssc
s c
sc PPPPE )log()log(
and −=c
cc PPE )log(max,
where cP ,
sP ,scP represent the probabilities of each
class, attribute probability, and joint probability,
respectively. The summation is performed with respect to
all classes and subspaces. The information index should
be maximized to provide an optimum separation of the
input training data. Entropy based information index is
also used to evolve the neural interconnect network
structure and to select neuron’s transformation function.
The neuron learning is realized by maximizing the
information index. If a neuron reaches specified
information index threshold, then it becomes a voting
neuron. Voting neurons’ outputs are weighted together to
solve a given problem.
3. Neuron Architecture
The hardware architecture presented in this paper is
based on the identical neuron modules. The constant
Coded Programmable State Machine (KCPSM)[11], an 8-
bit micro-controller developed by Xilinx Corp., has been
modified and embedded into the neuron module. A block
level of single neuron architecture is shown on Fig.2.
Each neuron module contains circuits to be reprogrammed
dynamically and to execute its program without affecting
other neurons’ operations. An assembler is used to
implement neuron’s functions and its operation is
controlled by locally stored parameters’ values.
After translation from the assembler to binary codes,
these binary codes are written into the dual port memory
via calling the Peripheral Component Interconnect (PCI)
functions. The dynamical programming is implemented
by a dual-port 256x16-bit memory. The KCPSM reads the
current program on one port while the other port can be
used to store the new program. The two ports of the dual-
port Random Access Memory (RAM) operate
independently, and the operation is via shared
programming bus among all neurons. Therefore, the self-
reconfiguration process can be performed affecting only
the current neuron. The configuration time and contents
can be controlled by the software outside the chip or by the
Fig. 1 Basic SOLAR structure
Fig.2 Single neuron’s schematic
instruction <15:0>
in_port <7:0>
clk
interrupt
reset
address
out_port <7:0>
port_id<7:0>
read_strobe
write_strobe
x30 inputs
Dual port
memory
R
R
Neural Controller
(use KCPSM)
address
addr_bus<7:0>
data_bus<15:0>
clk
enable
instruction<15:0>
0-7695-2132-0/04/$17.00 (C) 2004 IEEE
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
configuration data that are located in the system’s
distributed memory cells. The neuron inputs are obtained
either from the primary inputs or other neurons’ outputs
via a 30 to 1 multiplexer (MUX). The selection signals are
decided by the content of the programming dual-port
memory via the execution of the programming commands
for the particular neuron.
The single neuron architecture is expanded to multiple
neurons in a single FPGA chip. In this work, we use a
Nallatech board with Xilinx Virtex XCV800 FPGA [12].
It can contain an array of up to 28 neurons organized as
shown in Fig.3 (The 480 Virtex XCV1000 FPGAs we are
using to build 3D learning machine will contain 64
neurons on each chip).
These neurons are fully interconnected via the
connection bus and a 30 to 1 MUX thus forming a local
cluster of neurons. The full connections implementation
mimics the dense connection scheme in the neighborhood
neurons. A 3D expansion of these chips represents the
sparse connections between remote neurons. These
interchip connections scale linearly with the number of
neurons added. The neurons connections are decided by
each neuron based on its learning results. The
programming contents can be dynamically updated via the
configuration bus or set locally by a neuron. The
configuration bus used to configure every single neuron is
divided into 16-bit data, 8-bit address and 5-bit neuron
selection buses. To demonstrate functionality of 28
neurons cluster, we integrate a PCI interface controllers to
transfer the data/configuration via the PCI bus to neurons.
4. Routing Scheme
A traditional neural network system may consist of
many sub-networks, which take up a lot of resources in the
hardware implementation. If the resources used to
implement a single sub-network can be reused to
implement multiple sub-networks sequentially, then many
resources will be saved. This of course will be
accomplished at the cost of additional processing time for
sequential operations of neural network blocks; however,
this is not a problem, since a neural network response can
be obtained in a fraction of time required by a practical
application. For instance, in NTSC video standard, one
full frame is displayed about every 1/30 of a second; in
PAL video standard, the video images are displayed 25
frames per second, leaving more than 40 ms for
recognition of each frame. A practical time limit for
recognition may be even larger.
Since different networks could have different
connections, a new dynamically reconfigurable routing
and addressing scheme is proposed. This routing and
addressing scheme is suitable for general artificial neuron
networks, whether or not their connections are local or
global. Fig. 4 gives an example where 4 sub-networks are
replaced by a single sub-network based on the proposed
routing and addressing scheme.
Fig. 4 One network replacing multiple networks
Fig. 5 gives an example of a single sub-network with
more detailed interconnect structure.
Fig. 5 Structure of a single network
It consists of an array of 4*4 neurons. The column order
of the neurons in this single network is not natural: the
original order of 1, 2, 3 and 4 is replaced by 1, 4, 2 and 3.
This kind of routing scheme takes advantage of the
Fig. 3 Array neurons’ organization
0-7695-2132-0/04/$17.00 (C) 2004 IEEE
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
reconfigurable resources in FPGA that can easily be
configured into a long shift register
Scheme of the routing channel
The neural network structure shown in Fig. 5 has two
basic elements: the routing channel and the neuron cell.
The routing channel is implemented by RAM in
Configurable Logic Blocks (CLBs) of FPGA chip. These
storage RAM cells are configured into a closed loop shift
register, shifting data among neurons. This shift register
creates a routing channel carrying neurons’ input data and
their operation results. Each sub-network has its unique
connections and information about these connections is
stored in individual neurons. The input data for a specific
neuron in this sub-network will appear on neuron’s input
port after a fixed number of clock cycles. The task of
addressing these data is equivalent to counting the clock
cycles and comparing them against the pre-stored
addresses (pre-stored number of clock cycles). When the
clock cycle matches one of its pre-stored values, this
neuron will read data from the routing channel. So, each
neuron requires some memory to store the addresses of
neurons which connect to this neuron.
For example, consider an original network system that
consists of 2 sub-networks with the size of 4 by 5 neurons
each (Fig. 6). Assume, that this network system is
implemented by a single sub-network. Consider a neuron
in row 3 and column 4 for analysis. Assume that this
neuron has 4 inputs in the first sub-network of the original
network system, and 5 inputs in the second sub-network.
When these 2 sub-networks are replaced by a single sub-
network, then this neuron will have two groups of
addresses: the first group of 4 addresses storing the
number of clock cycles corresponding to the 4 inputs in the
first sub-network, and the other group of 5 addresses
storing the number of clock cycles corresponding to the 5
inputs in the second network. Output data of each neuron
is handled in a similar way.
Fig. 6 Hardware reuse to implement 2 sub-networks.
In order to transfer data from one sub-network to the
other one, the same routing channel is used. This dynamic
reconfigurability and data transfer between sub-networks
can be explained as follows.
The first column of the routing channel is connected
to the system’s input. So, the system data is applied
directly to the routing channel. For the network in Fig. 5,
after the first row finishes processing its input data, the
routing channel corresponding to the first column of
neurons stores the results of those neurons operations.
These results are shifted into the routing channel and
mixed with the input signal values which are already in the
channel, are connected to the neurons in the second
column, then into the third and forth columns. After the
neurons in the fourth column processed their input data, the
routing channel connecting to the neurons in forth column
stores the outputs of the first sub-network. By now, the
first sub-network in the original network system has
finished its operations, but at this time, it is not prepared to
receive the next set of input data because this single
network will change its connections to implement the
original second sub-network. So, the outputs of the first
sub-network, which are stored in the fourth column of the
routing channel, will be shifted back to the routing channel
in the first column, which now represents the first column
of the second sub-network of the original network system.
The data is processed in the same way as was described
above. If the original network system has four sub-
networks, then those data will be iterated for four times
before this single sub-network receives next set of input
data (The feedback network implementation in this
iterative scheme will iterate more than 4 iterations).
Changing connections for different networks can be
implemented by comparing clock cycle values against
different groups of pre-stored addresses.
Structure of routing channel
In the SOLAR’s hardware implementation, the neuron
function is implemented by the PicoBlaze [Xilinx]. The
PicoBlaze is a micro-core with 8-bit parallel input and
output. A diagram of the routing channel connected to the
input port of a PicoBlaze is given in Fig. 7.
Fig.7 Routing channel with a micro controller
0-7695-2132-0/04/$17.00 (C) 2004 IEEE
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
The RAM in each CLB is configured into a shift register,
with length configured from 1 to 16 - here we choose 8.
There are eight such shift registers connecting to the
corresponding bus line of the PicoBlaze’s input bus to
provide the input data in parallel.
In the routing scheme presented in this paper, neurons are
directly connected to the local routing channels, and don’t
need direct physical connection among the neurons, no
matter local or global. The connections among the neurons
are set up by stored addresses. For the network with
random connections, those stored addresses can be
generated on-line randomly. For other types of learning
networks with fixed connections, the stored addresses are
based on specified network’s interconnect structure. The
best advantage of this kind of routing and addressing
scheme is that it makes on-line learning and
reconfiguration easy to implement. For the on-line
network reconfiguration, we don't need to change the
physical connections between network components. What
we need to do is only to change pre-stored communication
channel addresses according to the learning rules.
At present, the implemented neural network consists
of 28 fully interconnected neurons on a single chip. In a
neural network system, composed of many chips, some
neurons in one chip need to be connected to the neurons in
other chips. In order to transfer data from one chip to the
others, the same concept of routing channel as shown in
Fig. 5 is used. This routing channel is laid around
perimeter of each chip. Channels in different chips can be
connected serially to form a long chain of shift registers.
Neurons in one chip can be connected to this routing
channels and output their operation results. These results
will be shifted to the channel, and after shifting they will be
picked up by neurons in other chips. This way, remote
connections between neurons in different chips are set up.
If the neurons are connected to the neurons in the sub-
network of same neighboring row or column, then those
neurons can be directly connected, without passing through
the routing channel (as illustrated in Fig. 3). One
advantage of this scheme is that, with the introduction of
the routing channel, the neurons can be connected in 3
dimensions. The neurons output their results to the routing
channel, and then those results can be shifted to the routing
channels in different layers of the neural system.
5. Simulation and Results
In this section, results from prototyping a simple
SOLAR architecture onto a single VIRTEX FPGA chip
are discussed. The neurons self-organizing learning
process can be configured during chip programming or
dynamically updated while running. To demonstrate this
process, a simple example is given to illustrate the
implementation step of the SOLAR algorithm. The
implementation of the whole SOLAR array is organized in
a similar way. In this simulation example, we have six out
of twenty-eight neurons configured as shown in Fig.8.
The initial connections are shown in solid lines.
Every neuron simply adds two inputs together, for
instance, neuron 1 adds input 1 and 2 to its content; neuron
3 adds input 2 and the output of neuron 1; neuron 5 adds
the outputs of neuron 2 and 3, etc. Later, we dynamically
reconfigure the connections of neuron 3 and neuron 5. So
neuron 3 has inputs from the outputs of neuron 1 and 2,
and neuron 5 has inputs from the outputs of neuron 3 and 4
shown by the dotted lines. The results read out from the
chip via PCI bus are shown in the Matlab console. In the
inserted Matlab command console in Fig.8, “initial” values
show primary input values (6 and 2) and neuron outputs
for two rows of neurons, while “updated” values show
inputs and neuron outputs after dynamical reconfiguration
step. We developed the Matlab DLLs to implement the
I/O functions including read and write. We can see the
results from the real experiment corresponding to VHDL
simulation results as shown in Fig. 9. In this plot,
“Enable_bus” representing the neuron selection signal,
selects a particular neuron to be configured. Once the
configuration process for all neurons is over, the outputs
from neurons are stable and ready to be read out. This way
we can update any neuron’s configuration information
without affecting the other neurons. In this example, we
only update neurons 3 and 4 connections represented by
“Enable_bus” content 4 and 5. The simulation results after
updating correspond to the real experiments read back
from hardware as the “updated” values in Fig.8.
Fig. 9 VHDL simulation for partial configuration
neu
ron
ou
tpu
ts
initial updated
Fig.8 Experimental network and result
1 3 5
2 4 6
0-7695-2132-0/04/$17.00 (C) 2004 IEEE
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
The design has been described in VHDL and
synthesized using the Xilinx XST and the Xilinx Alliance
tools for place and route. The implementation results are
summed up in Fig. 10.
The implementation results show that this neural
network architecture realizes a maximum parallel
instruction throughput of 23.16x28 MIPs with 28 fully
connected neurons. The neuron number is limited to 28
since there are only 28 BRAM modules on the chip
although we still have some other resources unused. If we
further lower the neuron’s program memory size, we can
put more neurons on a single chip to overcome the BRAM
bottleneck. Due to the parallel processing in hardware and
the distributed network memory, the speed improvement
using FPGA implementation is significant comparing to
the software simulation.
The mapping result is shown in Fig. 11. We can see
how the neurons are distributed inside the chip after the
mapping process. Every single neuron occupies a compact
and concentrated logic area. The compactness depends on
the structure-oriented hardware design, for instance, we
adopt LUTs and dedicated multiplexer module in addition
to the optimized Picoblaze structure to build every single
neuron.
6. 3D SOLAR
The proposed neuron structure can be expanded to
hundreds of FPGA chips to form a large 3D architecture.
This architecture is based on many printed circuit boards
(PCBs) with 4 interconnected VIRTEX XCV1000 FPGA
chips on every single PCB board. Four PCB boards are
interconnected vertically through the connectors on upper
and lower sides of each board to build up the rack
SOLAR architecture as shown in the left side of Fig. 12.
The rack SOLAR will be expanded to a rack array as
shown in the right side of Fig.12 through the flat
connectors on both the left and the right sides of the
board. In this rack array, one PCB board in a specific
rack will be chosen to be the first board from which all
signals will be broadcast to other boards in the whole
rack array - the identical configuration bits are sent to
every PCB board and FPGA chips are programmed
through JTAG port in a daisy chain or through Slave
Serial mode in parallel. These broadcast signals contain
the configuration signals and data signals. These signals
are buffered to assure signal integrity. These chips can
communicate with each other through the data bus via
various connectors. All of the chips have identical
neuron’s distribution as shown in Fig. 3. Every neuron
can be dynamically updated without affecting other
neurons. Finally, 3D SOLAR system, containing 5x5
SOLAR racks with close to 400 million gates, as shown
in Fig. 12 will be built The 3D SOLAR learning machine
will have a tremendous self-organizing learning ability
based on numerous computing cells (neurons) working in
parallel to process the incoming data.
Fig. 10 Implementation reports
Fig. 11 Mapping result
Fig.12 RACK and CUBE SOLAR
0-7695-2132-0/04/$17.00 (C) 2004 IEEE
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
7. Conclusions
This paper has described the architecture and chip
implementation of an array of neurons aimed at
implementation of the SOLAR architecture. The self-
organizing learning is based on a new machine-learning
algorithm that combines knowledge from NNs and
information theory. SOLAR represents a new idea in
hardware design of neural networks. It is modular and
expandable system. It also defines a new breed of
dynamically reconfigurable architectures that can
dynamically reconfigure themselves based on information
included in the input data. This presented architecture is a
novel dynamically reconfigurable (via dual-port memory)
neural network implementation that used simple general-
purpose processor (KCPSM) architecture. Firstly, it has a
regular expandable parallel architecture. Therefore, its
speed and learning abilities can be greatly improved
comparing to the software simulation. Secondly, it has
data-driven self-organizing learning structure based on a
new self-organizing learning algorithm. Furthermore,
design flexibility is attained by exploiting the features of
self-reconfigurable neuron units. Finally, hardware
reconfigurability is achieved in this self-organizing
learning array by involving reconfigurable routing
modules. According to the implementation results, this
neuron architecture realizes a maximum parallel
instruction throughput of 648 MIPs with 28 fully
connected neurons. System performance increases as
more neurons are connected. The PicoBlaze for Virtex-II
Series FPGAs reaches performance levels of up to 55
MIPS. With up to 336 PicoBlaze nodes on the chip
SOLAR will reach the performance of 18.5 GIPS on a
single chip. So its parallel processing ability can be
further improved. With the popular PowerPC on VIRTEX
II Pro FPGAs used as the main CPU, the PicoBlaze
neurons can be used as slave peripherals to further
improve the throughput for system on a chip
implementation of neural networks. Hence it can be of a
practical use for the embedded hardware applications in
signal processing, wireless communications, multimedia
systems, data networks, and so forth.
In our design, the number of neurons/chip is limited
by the number of BRAMs, so the total logic utilization is
comparatively low. The remaining LUT RAM will be
used for the communication between neurons on other
chips and to reuse the same Picoblaze for several neurons,
to fully utilize the hardware resource and accommodate
more neurons in a single FPGA chip. In this way, our
design seems device-dependent, since we need to carefully
arrange additional resource for different FPGAs. Each
device type has to be individually optimized. In scaling
our design to 3D system, this will be a necessary work to
be computed in order to accommodate more neurons in a
single selected FPGA chip.
References
[1] Misra, M., 1997, Parallel Environment for Implementing
Neural Networks. Neural Computing Survey, Vol.1, 48-60,1997.
[2] Glesner, M. and Pochmuller, W., 1994, An Overview of
Neural Networks in VLSI, Chapman & Hall, London, 1994.
[3] G. Tempesti, D. Mange, A. Stauffer, C. Teuscher “The
BioWall: an Electronic Tissue for Prototyping Bio-Inspired
Systems”, Proceedings, NASA/DoD Conf. on Evolvable
Hardware, Los Alamitos, Calif., pp.221-230, 2002.
[4] L. Sekanina, ”Towards Evolvable IP Cores for FPGAs,” In:
Proc. NASA/DoD Conf. on Evolvable Hardware, Los Alamitos,
US, ICSP, 2003, pp. 145-154, 2003.
[5] Hugo de Garis, Michael Korkin, "The CAM-Brain Machine
for Real Time Robot Control", J. Neurocomputing, Elsevier, Vol.
42, Issue 1-4, Feb., 2002.
[6] Maeda, Y., Tada, T., “FPGA implementation of a pulse
density neural network with learning ability using simultaneous
perturbation,” IEEE Transactions on Neural Networks, Vol. 14
Issue3, May 2003.
[7] T.Roska et. al., "The Computational Infrastructure of
Analogic CNN Computing – Part I.: the CNN-UM Prototyping
System”, IEEE Trans. Circuits and Systems I, vol. 46, pp. 261-
268, 1999.
[8] J. A. Starzyk and T-H.Liu, “Design of a self-organizing
learning array system,” IEEE Int. Symposium on Circuits and
Systems (ISCAS), Bangkok, Thailand, May 2003.
[9] J. A. Starzyk and Z. Zhu, “Software simulation of a self-
organizing learning array system,” The 6th IASTED Int. Conf.
Artificial Intelligence & Soft Comp (ASC 2002), Canada.
[10] J. A. Starzyk, and Y. Guo, “Dynamically Self-
Reconfigurable Machine Learning Structure for FPGA
Implementation” Proc. Int. Conf. on Engineering of
Reconfigurable Systems and Algorithms (ERSA) Las Vegas,
Nevada, USA, June,2003.
[11] K. Chapman,“8-Bit Microcontroller for Virtex Devices.”
Xilinx XAPP213, Online http://www.xilinx.com/xapp, Oct.
2000.
[12] Nallatech Ltd, “Ballynuey 2 VIRTEX PCI card user guide”,
1993-1999.
0-7695-2132-0/04/$17.00 (C) 2004 IEEE
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)