Final Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets...

Final Presentation Final Presentation

Neural Network Implementation On Neural Network Implementation On FPGAFPGA

Supervisor: Chen KorenSupervisor: Chen Koren

Maria Nemets 309326767Maria Nemets 309326767Maxim Zavodchik 310623772Maxim Zavodchik 310623772

Project ObjectivesProject Objectives

Implementing Neural Network on FPGAImplementing Neural Network on FPGA Creating modular designCreating modular design Implementing in software (Matlab)Implementing in software (Matlab) Creating PC InterfaceCreating PC Interface Performance Analyze:Performance Analyze:

Area on chipArea on chip InterconnectionsInterconnections Speed vs. software implementationSpeed vs. software implementation FrequencyFrequency CostCost

Project’s Part A ObjectivesProject’s Part A Objectives

Implementing a single neuron in VHDL.Implementing a single neuron in VHDL. Researching and integrating into EDK Researching and integrating into EDK

environment and running the design on environment and running the design on FPGA.FPGA.

Implementing the feed forward calculation.Implementing the feed forward calculation. Implementing the learning in Matlab.Implementing the learning in Matlab. Building a Graphical User Interface for Building a Graphical User Interface for

friendly communication with the system. friendly communication with the system.

Testing ApplicationTesting Application Single neuron can separate two regions by linear Single neuron can separate two regions by linear

line.line. There is need for a multi layered network to There is need for a multi layered network to

recognize an image.recognize an image. Implementing and/or functions:Implementing and/or functions:

(0,0)

(0,1)

(1,0)

(1,1)

(0,0)

(0,1)

(1,0)

(1,1)

AND Function

OR Function

Learning in MatlabLearning in Matlab

Implementing a NN using logsig() Implementing a NN using logsig() activation function and ‘traingdx’ activation function and ‘traingdx’ training algorithm.training algorithm.

Providing a Truth Table for the binary Providing a Truth Table for the binary functions AND/OR as a training set.functions AND/OR as a training set.

% Build the NN temp = size(inputs_vec); in_range = zeros(temp(1),2); in_range(:,2) = 1; net = newff(in_range,[1],{'logsig'},'traingdx');

% Train the NN net.TrainParam.epochs = epochs; net.TrainParam.goal = error; net = train(net,inputs_vec,target_vec);

1( )

1 vv

e

Sigmoid Function:

Hardware Description Hardware Description

XilinX ML310 Development BoardXilinX ML310 Development Board RS232 Standard - FPGA UARTRS232 Standard - FPGA UART

Transmission rate is 115,200 bits/sec optimally Transmission rate is 115,200 bits/sec optimally

VirtexII-Pro XC2VP30 FPGAVirtexII-Pro XC2VP30 FPGA 2 PowerPC 405 Core - 300+ MHz2 PowerPC 405 Core - 300+ MHz 2,448 Kbytes of BRAM memories2,448 Kbytes of BRAM memories 136 18x18 bits multipliers136 18x18 bits multipliers 30,816 Logic Cells30,816 Logic Cells Up to 111,232 internal registers Up to 111,232 internal registers Up to 111,232 LUTSUp to 111,232 LUTS

256 MB DDR DIMM256 MB DDR DIMM

System InterfaceSystem Interface

InputsInputs Binary number ( up to 1024 bits)Binary number ( up to 1024 bits) Weights – 13 bits width Weights – 13 bits width

Fixed Point Presentation:Fixed Point Presentation: 1 sign bit1 sign bit 4 integer bits4 integer bits 8 fraction bits8 fraction bits

Sigmoid function values – 8 bit widthSigmoid function values – 8 bit width

OutputsOutputs Two bits – neuron’s binary result on the input number or Two bits – neuron’s binary result on the input number or

failure detection.failure detection.

System Description System Description

Pow

er PC

Pow

er PC Weights

memory

Single NeuronSingle Neuron

UART

Inputmemory

Sigmoid

memory

PLB

OPB

PLB2OPBBridge

EDK IntegrationEDK Integration

PPC writes the BRAMS and controls Single PPC writes the BRAMS and controls Single Neuron through the PLBNeuron through the PLB

Single Neuron connected to PLB as an User Core Single Neuron connected to PLB as an User Core IPIF. IPIF.

Memories:Memories:

PORT1: Connected to PLB as IPIFPORT1: Connected to PLB as IPIF

PORT2: Connected to Single Neuron directlyPORT2: Connected to Single Neuron directly UART (Serial Port) is connected to OPB.UART (Serial Port) is connected to OPB.

Control FlowControl Flow

Get WeightsGet Sigmoid

Load Input numberLoad BiasCalculate

Calculate φφ(.)(.) Calculate output bits

01

m

j jj

w x w

Send the result to user

IDLEIDLELoad decision

valuesGet Inputs

Wait for loading

Bias

Architecture – Single NeuronArchitecture – Single Neuron

Multiplier 1x13 bitsMultiplier 1x13 bits Accumulator 13 bits widthAccumulator 13 bits width FSM ControllerFSM Controller

MULTMULT AccumulatorAccumulator

RREEGG

RREEGG

RREEGG

RREEGG

ComparatorComparator Min DecisionMin Decision Max DecisionMax Decision

Bias WeightBias Weight

X[i]X[i]

W[i]W[i]

YY

vv

logsig(v)logsig(v)

bias/max/min/inputs_numbias/max/min/inputs_num

RREEGG

Bias/Min/Max/Inputs_num RegistersBias/Min/Max/Inputs_num Registers Comparator:Comparator: _ _ , "00"

( ) _ _ , "01"

, "10"

z min decision val

COMP z z max decision val

else

Architecture – Memories (1)Architecture – Memories (1)

2-Port BRAMS with separate clocks. 2-Port BRAMS with separate clocks.

Special sized BRAMS generated by the Xilinx Core Generator.Special sized BRAMS generated by the Xilinx Core Generator.

VHDL SRAM controller wrappingVHDL SRAM controller wrapping

Inputs Memory:Inputs Memory: Up to 1024 binary bits Up to 1024 binary bits

1 Kbyte

Architecture – Memories (2)Architecture – Memories (2)

Weights Memory:Weights Memory:• 1024*13bits =13,312 bits =1,664 bytes 1024*13bits =13,312 bits =1,664 bytes

Bias weight:Bias weight:• 1 register for output layer (13 bit width)1 register for output layer (13 bit width)

Sigmoid Memory:Sigmoid Memory: Values out of range [-4,4] are mapped to 0,1Values out of range [-4,4] are mapped to 0,1 Memory block quantizing sigmoid values :Memory block quantizing sigmoid values :

• 11 bits input representing values [-4,4]11 bits input representing values [-4,4]• 8 bits output representing values [0,1]8 bits output representing values [0,1]

~1.6 Kbyte

2 Kbyte

Simulation (1)Simulation (1) Single Neuron VHDL simulation Single Neuron VHDL simulation Application: AND function with 4 inputsApplication: AND function with 4 inputs

• Minimum decision value:Minimum decision value: 0.37890.3789

• Maximum decision value:Maximum decision value: 0.62110.6211 3-Pipeline stages:3-Pipeline stages:

Memories Mult Accumulator

Simulation (2)Simulation (2) Result:Result:

Sigmoid answer: Sigmoid answer: 9F = 10011111 = 0.62119F = 10011111 = 0.6211 ““ready” signal assigned when doneready” signal assigned when done Latency: 14 + |Inputs| - 1 [clocks]Latency: 14 + |Inputs| - 1 [clocks]

4 4

01 1

1 1 ( 3.5) 0.5i ii

v x w w

( ) (0.5) 0.6225 0.6211v

SoftwareSoftware PPC’s program controls the whole flow.PPC’s program controls the whole flow. The PPC writes control words and reads result words on The PPC writes control words and reads result words on

PLB as 64 bits of data.PLB as 64 bits of data. Control/Result Words Structure:Control/Result Words Structure:

Memories:Memories:

Single Neuron:Single Neuron:• From CPUFrom CPU

• To CPUTo CPU

[load_w0][rst][start][w0_ready][load_min_val][load_max_val][load_inputs_num][w0/min_val/max_val/inputs_number][ "0" ][ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ][ 6 ][ 7÷÷19 ][20÷÷63]

[ y ][ready][w0_rd][ "0" ][0 ÷÷ 1][ 2 ][ 3 ][4÷÷63]

[[ 00 ][ ][ 1÷101÷10 ][ ][ 11÷24 / 11 11÷24 / 11 ][ ][ 25÷63 / 12÷6325÷63 / 12÷63 ] ][[USER_wr_aUSER_wr_a][][USER_addr_aUSER_addr_a][][USER_dout_aUSER_dout_a][ ][ “0” “0” ]]

[[ 00 ][ ][ 1÷111÷11 ][ ][ 12÷1912÷19 ][ ][ 20÷63 20÷63 ] ]SigmoidW/X

Building a Graphical User Interface for friendly Building a Graphical User Interface for friendly communication between the user and the system.communication between the user and the system.

Implemented in Matlab 6.1Implemented in Matlab 6.1 The GUI enables:The GUI enables:

Choosing a function to be implementedChoosing a function to be implemented Define maximum error, epochs number Define maximum error, epochs number

and decision values.and decision values. Choosing the length of binary Choosing the length of binary

input vector.input vector. Simulating the neuron for input vector.Simulating the neuron for input vector.

Graphical User InterfaceGraphical User Interface

Project’s Part B ObjectivesProject’s Part B Objectives

Creating a multi layered network to classify a Creating a multi layered network to classify a digit.digit.

Implementing a modular system :Implementing a modular system :Number of neurons in the hidden layer varies Number of neurons in the hidden layer varies

from 2 to 10.from 2 to 10.Number of sub-networks.Number of sub-networks.

Project’s Part B Objectives (Cont.)Project’s Part B Objectives (Cont.)

Implementing a Parallel System:Implementing a Parallel System: Dividing complex fully-connected network into Dividing complex fully-connected network into

sub-networks. sub-networks. 10 sub-networks running concurrently.10 sub-networks running concurrently. Up to 10 neurons run concurrently in each sub-Up to 10 neurons run concurrently in each sub-

network.network. Up to 5 inputs are calculated together depending on Up to 5 inputs are calculated together depending on

number of neurons in hidden layer.number of neurons in hidden layer. Parallel calculations of output layer. Parallel calculations of output layer.

Final Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets...

Documents

Transcript of Final Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets...