Extending BrainScaleS OS for BrainScaleS-2

10
Extending BrainScaleS OS for BrainScaleS-2 Eric Müller * , Christian Mauch * , Philipp Spilger * , Oliver Julien Breitwieser * , Johann Klähn, David Stöckel, Timo Wunderlich and Johannes Schemmel Kirchho-Institute for Physics Ruprecht-Karls-Universität Heidelberg, Germany * contributed equally Email: {mueller,cmauch,pspilger,obreitwi}@kip.uni-heidelberg.de Abstract —BrainScaleS-2 is a mixed-signal accelerated neu- romorphic system targeted for research in the elds of com- putational neuroscience and beyond-von-Neumann comput- ing. To augment its exibility, the analog neural network core is accompanied by an embedded SIMD microprocessor. The BrainScaleS Operating System (BrainScaleS OS) is a software stack designed for the user-friendly operation of the BrainScaleS architectures. We present and walk through the software-architectural enhancements that were introduced for the BrainScaleS-2 architecture. Finally, using a second-version BrainScaleS-2 prototype we demonstrate its application in an example experiment based on spike-based expectation maximization. I. I State-of-the-art neuromorphic architectures pose many requirements in terms of system control, data preprocessing, data exchange and data analysis. In all these areas, software is involved in satisfying these requirements. Several neuro- morphic systems are directly used by individual researchers in collaborations, e.g., [14]. In addition, some systems are operated as experiment platforms providing access for external users [25]. Especially the latter calls for additional measures, such as clear and concise interfaces, resource management, runtime control and —depending on data volumes— “grid-computing”- like data processing capabilities. At the same time, usability and experiment reproducibility are crucial properties of all experiment platforms, including neuromorphic systems. Modern software engineering techniques such as code review, continuous integration as well as continuous de- ployment can help to increase platform robustness and ensure experiment reproducibility. Long-term hardware development roadmaps and experiment collaborations draw attention to platform sustainability. Technical decisions need to be evaluated for potential future impact; containing and reducing technical debt is a key objective during planning as well as development. Regardless of being software-driven simulations/emulations, or being physical experiments, mod- ern experiment setups more and more depend on these additional tools and skills in order to enable reproducible, correct and successful scientic research. In Müller et al. [6], the authors already introduced BrainScaleS OS, the Operating System for BrainScaleS-1. This article describes modications and enhancements of Fig. 1: BrainScaleS-2 single-chip setup. The white plastic cap (top left) covers one accelerated neuromorphic chip (right) which is bonded onto the underlying chip-carrier PCB; other PCBs connect each chip to one FPGA (invisible on the back). The host computer and FPGAs are linked via 1-Gigabit Ethernet. Each BSS-2 chip comprises 512 AdEx neurons and 512 × 256 = 131 072 synapses. the BrainScaleS OS architecture in the light of the second- generation BSS-2 hardware generation. The following sec- tions introduce the hardware substrate and its envisioned exploitation model in neuroscientic modeling, machine learning and data processing in general. Section II introduces the methods and tools we employ. In section III, we discuss aspects of hybrid —cf. section III-A— operation, hardware component identication and conguration as well as runtime control. Section IV exemplies the usage of the BrainScaleS Operation System on a simple experiment and describes larger experiments carried out in the past. We close in section V with a discussion of our work and give an overview over future developments. A. BrainScaleS-2 — an Accelerated Mixed-Signal Neuromorphic Substrate Figure 1 depicts a BSS-2 single-chip lab setup. The main constituent is the neuromorphic mixed-signal chip, manufactured in 65 nm CMOS, carrying 512 AdEx neurons, arXiv:2003.13750v1 [cs.NE] 30 Mar 2020

Transcript of Extending BrainScaleS OS for BrainScaleS-2

Page 1: Extending BrainScaleS OS for BrainScaleS-2

Extending BrainScaleS OS for BrainScaleS-2Eric Müller∗, Christian Mauch∗, Philipp Spilger∗, Oliver Julien Breitwieser∗,

Johann Klähn, David Stöckel, Timo Wunderlich and Johannes Schemmel

Kirchho�-Institute for PhysicsRuprecht-Karls-Universität Heidelberg, Germany

∗contributed equallyEmail: {mueller,cmauch,pspilger,obreitwi}@kip.uni-heidelberg.de

Abstract—BrainScaleS-2 is a mixed-signal accelerated neu-romorphic system targeted for research in the �elds of com-putational neuroscience and beyond-von-Neumann comput-ing. To augment its �exibility, the analog neural networkcore is accompanied by an embedded SIMD microprocessor.The BrainScaleS Operating System (BrainScaleS OS) is asoftware stack designed for the user-friendly operationof the BrainScaleS architectures. We present and walkthrough the software-architectural enhancements that wereintroduced for the BrainScaleS-2 architecture. Finally, usinga second-version BrainScaleS-2 prototype we demonstrate itsapplication in an example experiment based on spike-basedexpectation maximization.

I. Introduction

State-of-the-art neuromorphic architectures pose manyrequirements in terms of system control, data preprocessing,data exchange and data analysis. In all these areas, softwareis involved in satisfying these requirements. Several neuro-morphic systems are directly used by individual researchersin collaborations, e.g., [1–4]. In addition, some systemsare operated as experiment platforms providing access forexternal users [2–5].

Especially the latter calls for additional measures, such asclear and concise interfaces, resource management, runtimecontrol and —depending on data volumes— “grid-computing”-like data processing capabilities. At the same time, usabilityand experiment reproducibility are crucial properties of allexperiment platforms, including neuromorphic systems.

Modern software engineering techniques such as codereview, continuous integration as well as continuous de-ployment can help to increase platform robustness andensure experiment reproducibility. Long-term hardwaredevelopment roadmaps and experiment collaborations drawattention to platform sustainability. Technical decisions needto be evaluated for potential future impact; containing andreducing technical debt is a key objective during planningas well as development. Regardless of being software-drivensimulations/emulations, or being physical experiments, mod-ern experiment setups more and more depend on theseadditional tools and skills in order to enable reproducible,correct and successful scienti�c research.

In Müller et al. [6], the authors already introducedBrainScaleS OS, the Operating System for BrainScaleS-1.This article describes modi�cations and enhancements of

Fig. 1: BrainScaleS-2 single-chip setup. The white plasticcap (top left) covers one accelerated neuromorphic chip(right) which is bonded onto the underlying chip-carrierPCB; other PCBs connect each chip to one FPGA (invisibleon the back). The host computer and FPGAs are linked via1-Gigabit Ethernet. Each BSS-2 chip comprises 512 AdExneurons and 512× 256 = 131 072 synapses.

the BrainScaleS OS architecture in the light of the second-generation BSS-2 hardware generation. The following sec-tions introduce the hardware substrate and its envisionedexploitation model in neuroscienti�c modeling, machinelearning and data processing in general. Section II introducesthe methods and tools we employ. In section III, wediscuss aspects of hybrid —cf. section III-A— operation,hardware component identi�cation and con�guration aswell as runtime control. Section IV exempli�es the usage ofthe BrainScaleS Operation System on a simple experimentand describes larger experiments carried out in the past. Weclose in section V with a discussion of our work and givean overview over future developments.

A. BrainScaleS-2 — an Accelerated Mixed-Signal NeuromorphicSubstrate

Figure 1 depicts a BSS-2 single-chip lab setup. Themain constituent is the neuromorphic mixed-signal chip,manufactured in 65 nm CMOS, carrying 512 AdEx neurons,

arX

iv:2

003.

1375

0v1

[cs

.NE

] 3

0 M

ar 2

020

Page 2: Extending BrainScaleS OS for BrainScaleS-2

512 × 256 = 131 072 plastic synapses and two embeddedSIMD processors capable of fast access to the synapse matrix.A Xilinx Kintex-7 FPGA provides the I/O interface forcon�guration, stimulus and recorded data. The connectionbetween BSS-2 single-chip setups and a control clusternetwork is established via 1-Gigabit Ethernet.

The embedded SIMD microprocessor, the Plasticity Pro-cessing Unit (PPU), is a Power [7] architecture-based single-core microprocessor with 16 KiB SRAM. It is equipped witha custom vector unit extension, developed in this groupand designed to provide digital integer and �xed-pointarithmetics which hold up with the parallelism in the analogcore and access especially the synapse array in a parallelfashion [8]. In the full-sized chip each PPU features a vectorunit with 128 byte vector width, which can operate on 1 or2 byte entries. Programs are loaded to a PPU via memorywrites to the on-chip SRAM and execution is gated via a resetpin. The PPU supports access to o�-chip memory regions,e.g., the FPGA’s DRAM, for instructions and scalar as wellas vector data.

B. Performing Experiments on Neuromorphic SystemsIn mueller2020bss1 we introduced the BrainScaleS Oper-

ating System for BrainScaleS-1 (BSS-1). It covers aspects oflarge-scale neuromorphic hardware con�guration, experi-ment runtime control and platform operation. BSS-1 is awafer-scale neuromorphic system that is available as anexperiment platform for external researchers. Compared tosingle-chip lab systems the system con�guration space islarge, and aspects of platform operation result in additionalrequirements for the software stack. In addition, non-expert usability, operational robustness and experimentreproducibility are even more important when o�eringsystems to external users. Previous e�orts [9] focused mainlyon the neuroscienti�c community and its view on describingspiking neural networks [10, 11]. However, BrainScaleS OShas been providing access to lower-level aspects of thesystem to expert users [6].

We are still in the early phases of the hardware devel-opment roadmap for BSS-2: the �rst full-sized chip arrivedin the labs in 2019 after three small-chip prototypes hadbeen produced and evaluated since 2016. Early experimentswere already implemented on the small prototypes [12–15].We make use of the same prototype version for ourexample experiment in section IV. However, commissioningof the �rst full-sized BSS-2 chip is progressing and multi-chip systems are to be expected soon. Therefore, softwarerequirements start to extend into regions already coveredby BrainScaleS OS. Especially additional features in termsof structured neurons, plateau potentials and the embeddedSIMD processors have to be handled by the software stack.Another use case of BSS-2 is its non-spiking mode resemblingan analog vector-matrix multiplier that can be used inclassical deep neural network experiments.

II. Methods and ToolsWe already introduced functional requirements for

BrainScaleS OS enabling users to perform experiments on the

BrainScaleS-1 (BSS-1) neuromorphic hardware platform andadditional tasks related to platform operation, cf. Mülleret al. [6]. In this work at hand, we focus on softwareaspects of con�guration and control for expert usage. Weespecially aim to facilitate the process of chip commissioning.In this problem setting users need interfaces to the hardwareallowing a transparent and explicit view on con�gurationas well as runtime control. The implementation of thesystem con�guration layer for BSS-1 already provides someideas for a structured encapsulation of con�guration andruntime control. However, while some aspects of interfaceare su�cient in terms of software architecture, e.g. thecoordinate system and the strongly-typed con�gurationspace, others needed polishing. In particular, the descriptionof experiment control �ow was too implicit, relied on manyconventions and was hard to extend or modify. Now inBSS-2, the API tracks the experiment control �ow explicitly,see sections III-D and III-E.

A. Methodology and Foundations

In Müller et al. [6], we explained the design and implemen-tation methodology: open-sourcing, code review, continuousintegration, continuous deployment, explicit tracking ofexternal software dependencies and containerized softwaredevelopment as well as user environments.

Operating custom-built experiment hardware setups posesmultiple tasks: secure and fast data exchange, encoding/de-coding of hardware con�guration as well as result data,and the de�nition of experiment protocols, i.e. a series oftimed events. Taken together, performance and correctnessrequirements favor the usage of a compiled language. On theother hand, experiment description, input data preprocessingand result data analysis take advantage by interactivity andquicker turn-around cycles.

For the core software stack, we chose C++, a high-performance programming language with strong support forcompile-time correctness that evolved in the last years intoa multi-paradigm language. One particular popular languagefor interactive usage and scripted programming is Python.Its use in data science, computational neuroscience as wellas machine learning communities enlarges the potential userbase.

B. Python APIs

Exposing C++-based programming interfaces to Pythoncan be accomplished in multiple ways. There are at least twolibraries providing support for deep integration of Pythonand C++, boost::python and pybind11 [16]. Both librariesuse advanced metaprogramming techniques to simplify thesyntax and to reduce the required amount of additional code.Among other things, aspects of type conversion, object life-time and polymorphism are handled. However, both librariesstill need some repetitive coding that could, in principle, bereduced by using a code generator operating on the C++API’s abstract syntax tree. For the BSS-2 software stack, wemake extensive use of Clang’s C++ AST accessor library,libclang, and generate pybind11-based wrapper code directly

Page 3: Extending BrainScaleS OS for BrainScaleS-2

from C++ header �les. This functionality is encapsulated inthe tool genpybind [17].

III. Implementation

BrainScaleS-2 is a novel compute system: on the onehand, a large fraction of the chip is dedicated to neurons,synapses and model parameter storage; on the other hand,embedded SIMD processors enable conventional computing.In typical neuroscienti�c experiments, these two parts runclosely coupled.

A. Hybrid Operation

In this hybrid operation of the on-chip spiking neuralnetwork and the embedded SIMD processors, the latter accessobservables, modify parameters, perform calculations andchange the input of the former to a�ect its dynamics. Inparticular, support for �exible learning rules has been oneof the main design goals for the system.

The scalar unit of the embedded SIMD processors is basedon the Power instruction set architecture [7] allowing toreuse existing open-source software infrastructure such asthe C++ language infrastructure by the GNU project. Thecustom vector extensions have been designed speci�cally forfast synaptic access. Hence, the authors implemented supportfor the custom extension to facilitate its use in experiments.

B. Support for the Embedded Processor

Creating executable programs for the embedded processor—the plasticity processor, or PPU— is the main objective ofthis toolchain. In this case, it comprises a C/C++ compiler, alinker and an assembler for our speci�c embedded processor.These tools work as a cross-compilation toolchain running ona host computer and generating executables for the PPU. Thescalar part of the PPU is already supported by upstream gcc.Extensions for the PPU’s custom vector unit are described insection III-B1. In addition to the core C and C++ languages,the respective standard libraries de�ne an extensive set ofadditional functionality. A subset of this functionality isappropriate for embedded programming [18]. The limitedsupport for the C and C++ standard libraries is presented insection III-B2. Device-speci�c runtime and hardware-accessabstraction, beyond the general-purpose support providedvia libc and libstdc++, is implemented in a dedicated librarypresented in section III-B3. Post-compilation and runtimesupplication of parameters to PPU programs is providedvia symbolic access to sections of the program using ELF(Executable and Linking Format) information. The supportedAPI is presented in section III-B4. The development ofcomplex programs often necessitates non-trivial debuggingtechniques. However, embedded systems often lack supportfor directly interacting with the system. One techniqueaddressing this problem are remote debuggers which allowdebugging of problems on a di�erent machine than onwhich the debugged program is running. In section III-B5we explain the custom remote debugger implementation forthe PPU.

1) Compiler Toolchain: We use the GNU compiler collection(gcc) together with the binary utilities package binutilsto provide this toolchain targeting C++ as programminglanguage [19]. Since the scalar part of the processor complieswith a subset of the embedded Power instruction setarchitecture 2.06, we can take advantage of the existinggcc backend implementation. We support the custom vectorunit by providing the operation code set extension tothe assembler. Support for the PPU vector extensions wasimplemented similar to Power’s AltiVec™ [20]. Vector-unitdata entities thereby become primary types on the samelevel as int with synchronization handled by the compilertransparently to the language user. This greatly bene�tsthe conception of plasticity algorithms as it allows, e.g., forfunctional and object-oriented algorithm design.

2) C/C++ Standard Library Support: The programs writtenfor the PPU are freestanding programs. As a consequencethereof, no system calls are available which otherwise wouldbe provided by an operating system. They are required forthe C and C++17 [21] system libraries libc and libstdc++ towork which are typically available in an hosted environment.By supporting a minimal set of required system calls —mostnotably page acquisition on the heap— a slim C library,newlib [22], has been integrated. The libc then provides thebasis for libstdc++ [23] support. Thereby full standard librarysupport (except �le system handling) is available to easegeneral purpose computation. This library support can beused as a basic set of tools for implementation of re-occuringtasks in abstraction of more complex plasticity problems. Forexample usage of the STL removes additional developmente�ort of providing custom equivalent implementations.3) Device-speci�c support library: In order to facilitate

using special features of the processor as well as thehardware, a support library has been implemented. Forinstance, it provides abstracted access to a wallclock-timer,vector unit access to synapse array, a C and C++ runtime aswell as debugging functionality for stack protection facilities.This enables reuse of frequent, at experiment runtime,or typically (e.g., synapse access) needed functionality inprograms implementing plasticity rules.4) ELF-symbol lookup functionality: Complex plasticity

kernels typically consume parameters for the initial con�g-uration of the algorithms. These may either be suppliedat compile-time, introducing the need to recompile onparameter change, or after compilation via memory accessto prede�ned regions. The latter is supported by providingmeans to extract ELF symbol positions after compilationto the host software. ELF (Executable and Linking Format)[24] is a �le format to store binary program data alongsidewith additional information, e.g., debug symbols or programsection information. By extracting section symbol name andlocation information, sections of the program memory layoutcan be annotated with symbolic names. The user-facingAPI allows for map-like access, e.g., program["my_param"]= 24. This thereby allows simple symbolic access, e.g., toalgorithmic parameters or to code sections, after compilationand at runtime.

Page 4: Extending BrainScaleS OS for BrainScaleS-2

gdb hostgdb or user generatedcommands or requests...

host adaptorget memoryset memory

PPU-stubget registersset registersstepcontinue

bidirectionalin-PPU-memorybu�ers

TCP/IP

gdbserver

program

interrupt handler

returninterrupt

communication

Fig. 2: Schematic showing remote GDB debugger control�ow on the embedded processor. A gdb instance runningon the host computer communicates with remote debuggingprotocol via TCP/IP to the gdbserver. Communication fromthe gdbserver host adaptor to the PPU is established viain memory writes and reads. This allows for transportingregister data and control �ow information to and fromthe PPU program under inspection each time the interrupthandler in the PPU program is reached.

5) Debugging Support: An increasing level of complexityin PPU programs in conjunction with the resources limitedby programming for a microprocessor with small memoryconstraints demands for runtime debugging capabilities.Since the toolchain is cross-platform, i.e. developmenttypically happens on a x86-based host computer while thetarget platform is the embedded Power-based processor, theexecution in a debugger on the development platform isnot possible. However the GNU debugger (gdb) [25], asidefrom normal debugging on the same machine, also o�erssupport for remote debugging via a TCP connection to atarget platform, which also works in a cross-platform setting.The PPU being an embedded processor does neither supportTCP natively nor is it feasible to implement a direct clientdue to memory restrictions.

Figure 2 depicts the implementation of the gdbservertargeting the PPU. It is split into a minimal stub in thePPU program which understands base commands such asdumping register content to memory, replacing an instructionthrough a trap or stepping one instruction and a synchronousprogram on the host computer which communicates withthe PPU through in-memory read and write operations.This adaptor program converts requests from or producesresponses to gdb via TCP. This keeps the memory footprintin a PPU program to a minimum while allowing real-time�ow control and state inspection.

C. Coordinate SystemThe multitude of components on a chip leads to a large

con�guration space of ≈ 350KiB. There are over 100 distinct

4mm

8m

m

128 Neurons

128 x 256Synapses

SIMD Processor

PLL

FastADC

SIMD Processor

128 Neurons

128 x 256Synapses

128 x 256Synapses

128 Neurons128 Neurons

Off

-chip

Transp

ort La

yer

Synapse

Analog Parameters24x130

Analog Parameters24x130

Analog Parameters24x130

Analog Parameters24x130

BottomHemisphere

Bottom-rightQuadrant8

x 2

GB

it/sH

igh-sp

eed Lin

ks

256-Column ADC256-Column ADC

256-Column ADC256-Column ADC

11.76µm

8.0

m

Array addressingscheme

Fig. 3: Layout schematic of the latest BrainScaleS-2 chip.Various component regions are framed in yellow. Framedwith dashed lines are logically separable regions of the chip.Synapses and neurons are partitioned into four quadrants,two embedded SIMD processors as well as columnar ADCsare located in the upper and lower chip hemisphere. Theordering scheme of two dimensional coordinates is shownin orange, rows then columns.

ranged integer and 150 boolean registers on hardware whichneed to be represented in software. To provide type safetyas well as other features, e.g., range checking, we do notuse the builtin numeric types of C++ but custom rangedtypes[26]. High symmetry in chip layout naturally leads toabstraction on di�erent scales.

Figure 3 shows the layout schematic of one chip withannotations for the di�erent component regions. The highsymmetry of component distribution is evident, e.g., bothchip hemispheres are identically and simply mirrored alongthe x-axis. Some parts on the hemispheres themselves areagain mirrored halves and are therefore called quadrants.This symmetry is re�ected in a hierarchical structure of thechip’s coordinate system. For BSS-2, the coordinate frame-work —previously developed for the BrainScaleS OS [6]—was extended. To illustrate the idea we use the coordinatede�ning a Synapse circuit. Depending on the hierarchicallevel, a synapse can be addressed via SynapseOnQuadrant,SynapseOnHemisphere or SynapseOnChip. This is helpfulwhen implementing functionality which, for example, doesnot depend on which quadrant it is applied to. Coordinatesof a higher level can be cast down to a lower level, e.g.

Page 5: Extending BrainScaleS OS for BrainScaleS-2

Listing 1: Example usage of custom coordinate typefor(auto synapse : iter_all<SynapseOnChip>()) {

my_synapse_matrix[synapse].weight =Synapse::Weight(42);

}

SynapseOnChip.toSynapseOnQuadrant(). Vice versa, lowerviews can be combined to create higher-level coordinates, e.g.,SynapseOnChip(SynapseOnHemisphere, HemisphereOnChip).It is also possible to convert to di�erent componentscorresponding to each other. For example, one can con-vert from a synapse coordinate to a neuron coordinatewith SynapseOnChip.toNeuronOnChip(). As components arestructured di�erently there is support for linear as well astwo dimensional, grid-like, coordinates. Again the synapseis an example for a grid coordinate: it is composed ofSynapseRowOnQuadrant and SynapseColumnOnQuadrant.

Figure 3 also shows the addressing scheme (orange) whichadheres to row-major order. Furthermore, the developedranged types enable coordinates to be used like iterators.This facilitates, for instance, the creation of arrays withtyped indexes.

Listing 1 shows an example of how this is used in C++.Implementation of this coordinate system can be found at[27].

D. Structuring Data for Con�guration

To provide type-safe secure con�gurability of hardwareentities we encapsulate the con�guration in so-called con-tainers. A container is an object storing a representationof a possible state of a speci�c hardware entity or entitygroup. Application of a represented state to the hardwareor retrieval from the hardware is provided in a register-likefashion, the allowed operations are write and read. Dependingon the layer of abstraction, the granularity of access di�ers.Figure 4 shows the encapsulation at di�erent granularities.The implemented concepts are described in the following.

On the lowest level, access to a register-like memorylocation on the hardware is abstracted with the user-facingAPI being a con�guration of a variable-length —dependingon the coordinate— register word. This encapsulates thestate-machine behavior of a heterogeneous set of clients, e.g.,SPI [28] or Omnibus [29], the latter being the on-chip busprotocol featuring multi-master operation with guaranteedmaster-to-client ordering. Thereby, correct usage of theunderlying protocol is guaranteed, while the transportedword is only restricted to the supported value range.

Building on these register-access containers, smallest-accessible continuous entities are encapsulated in containersof the so-called hardware abstraction layer (hal)[30]. Forthe API user, the composition of sub-word con�guration,corresponding to physical entities on a circuit level, areaccessible as �at or hierarchical properties. Depending onthe property type, type-safe enumerations, ranged integeror boolean value types are used. Representation of sub-wordvalues with a one-to-one correspondence to hardware entitiesallow for in-code self-documenting parameter names, for

Chip

...

SynapseMatrix

Synapseweightlabel

...

...

...

...

...NeuronConfigenable_spike...

AnalogParameterleakthreshold...

0x3450x12

addr value

0x3450x13

addr value

0x3450x14

addr value

0x3450x15

addr value

encode write

0x28

addr

encode read

decode

0x678

value

bitstre

am

Fig. 4: Con�gurable hardware entities are modeled bynested data structures encapsulating named data elements.An algorithm visits the nested data structures and generatesa hardware con�guration bitstream.

example a NeuronCon�g might have a enable_leak booleanproperty or a refractory_time ranged integer value. Namedproperties inherently state intent opposed to con�guringraw unnamed bits of a register word manually. A halcontainer can encapsulate a state spanning over multiplewords if the corresponding circuit or con�gurable entitymakes use of distributed con�guration bits. On the otherside, small containers may be combined, e.g., to form largerheterogeneous or repetitive containers. In addition to type-safe access to sub-word properties, a container implementsconversion to pairs of register coordinate and payload typesfor write operation, called encoding, and register coordinatesfor issuing a read operation together with extraction ofcontainer state from read answer register-word payload,called decoding. To allow arbitrary grouping, the encodingand decoding for read and write operation is implementedusing a visitor pattern to build a linear sequence of registeraccesses by recursively visiting sub-containers. There mayexist a 1 → N relation between a hal container andmultiple register container types, since a speci�c entitymight be accessible via di�erent communication protocols,e.g., JTAG[31] and Omnibus. The protocol is selected uponinvocation of a visitor. This allows for a uni�ed interfacefor the user-facing container API and the conversion to andfrom register values.

E. Runtime Control

When performing experiments on a neuromorphic system,the timing of input stimulus, output data, access to observ-

Page 6: Extending BrainScaleS OS for BrainScaleS-2

ables as well as controllables is essential. Runtime controlencapsulates the time-annotated �ow of how to actually usethe chip. This includes, among other things, bringing thechip into a working state, controlling of the actual spikingneuron network experiment and data transfer in general.

Multiple operational modes are supported by the hardware.First, batch mode is suited for independent experiments. Itis characterized by not featuring read-modify-write oper-ation via the host computer. Conversely, the in-the-loopmode makes use of an iterative usage pattern featuringread-modify-write operations from and to an experimentcontroller. It requires in-experiment synchronization. Finally,a spiking neural network experiment that runs concurrently,time-coupled with an experiment controller is the thirdoperation mode, the real-time closed-loop operation mode,cf. section III-A. The controller might be located on theembedded processors or on another device. It performsread-modify-write operations, e.g., in the form of plasticityupdates or environment state variable updates within asensor-motor loop.

To perform experiments on the neuromorphic chip, ex-periment descriptions need to include the sequence of timedspike events, the stimulus data. However, the con�gurationof the chip might also require time-controlled executionfor technical or experiment control reasons. For instance,experiments might involve externally-triggered changes tonetwork parameters during runtime. In BSS-2, access to on-chip parameters is possible from multiple locations —or busmasters— the PPUs as well as the FPGA. Hence, experimentcontrol is distributed and the timing between these busmasters needs to be synchronized.

First, we describe the software framework for timedexecution that has been developed. Then, we illustrate thecontrol �ow of a typical experiment running on BSS-2.The general concept is to construct a temporally orderedstream of commands that is sent to the communication FPGAwhich than handles timed release of these commands to thechip as well as time-stamped recording of responses fromchip. Constructing such a command stream, hereafter calledplayback program, is facilitated by three functions: write,read, wait. The �rst two functions allow, as their namessuggest, to issue write and read commands of containers,see section III-D, at their respective coordinate locations.Calling the read function returns an object which providesaccess to the read-back data only after the experiment run,inspired by the std::future class. Write commands are likewiseused to issue spike events. Timed release of commands isfacilitated by the wait function allowing to delay commandsrelative to a timer that itself can be modi�ed via writecommands. Listing 2 provides a basic usage example.

Figure 5 illustrates the �ow of a typical experimentthat involves external spike stimulus and concurrent PPUinteraction. First, communication with the FPGA is set up andsubsequently used for transfer and starting of the compiledplayback program. The �rst stage of each program is theinitial con�guration of all chips components. This stage hasto be timed as analog parameters may require time to settle

Listing 2: Example usage of playback builder patternPlaybackProgramBuilder builder;

builder.write(NeuronConfigOnDLS(42),my_neuron_config);

builder.wait_until(Timer::Value(1000));auto ticket = builder.read(SpikeCounterOnDLS(3));

auto program = builder.done();my_executor.run(program);

auto const read_count = ticket.get();

FPGA

staticconfiguration

Host

inputstimulus(spikes)

PPU

upload

triggerexecution

download

end ofexperiment

start ofexperiment

experimentruntime

Fig. 5: Schematic showing control �ow of a playback programrunning concurrently with code on the embedded processor.

which makes it necessary for following commands to bedelayed. With the chip now being in a working state theexperiment is started. For this normally the timer is resetto have a zero-referenced time clock. Likewise, execution ofthe PPU program is initiated. All read responses as well asspike and ADC data are recorded with time annotation. Aspecial instruction de�nes the end of the experiment andsignals the FPGA to send the recorded data to the host.

F. Preemptive Experiment Scheduling

Going one step further, this framework also allows for aseparation of experiment setup, execution and analysis ofhardware experiments: instead of executing experiments onlocally-attached hardware, the FPGA program is constructedon the client side, transferred to and executed at a sharedremote site and its results sent back to the client. Due tothe high speed-up factor of the hardware, single experimentruntime typically ranges in the order of milliseconds realtime. Experiment assembly and result evaluation — requiringthe same order of magnitude in terms of execution time —ordinarily happen in sequence with experiment execution.Relegating both tasks to client side, we eliminate hardwaredown time. On the remote side both experiment receptionand result delivery happen in parallel to experiment execu-tion and hence do not cause down time as well. This allowsfor the hardware to be shared among several experimentersexecuting experiments seemingly in parallel on the samechip but more densely packed parameter sweeps for a singleexperimenter. Plus, the chip remains interactively accessibleas one experimenter is able to inject small experiments whilea long parameter sweep is underway that would normallyblock anyone else from using the chip. Overall the measures

Page 7: Extending BrainScaleS OS for BrainScaleS-2

increase experiment throughput, thereby e�ectively speeding-up the hardware even more.

IV. Results

Among the �rst experiments implemented via the hard-ware abstraction software framework presented in this workis the Neuromorphic Spike-based Expectation Maximization(NSEM) model. As platform for the experiment the secondBrainScaleS-2 prototype [32, 33] is chosen, for which thehardware abstraction software framework presented in thiswork is fully implemented.

Network Architecture: As seen in �g. 6 (top), the causelayer —comprised of LIF-neurons brought into the stochasticregime by excitatory and inhibitory Poisson input— receivesinput from an input layer that is modeled via Poissonspike trains. Its aim is to distinguish hidden causes inthe presented input stimuli. The cause layer neurons areconnected via an inhibitory population with parrot-likebehavior: each spike from a cause layer neuron elicits a spikefrom the inhibitory population, preventing all other causelayer neurons from �ring. The cause layer therefore forms aWTA-like structure representing a Boltzmann-machine withvery strong inhibitory weights. Therefore, it follows thatonly one cause layer neuron can ideally respond to eachpresented input pattern. The weights Vik between inputand cause layer evolve according to update rules [34, 35]adapted to the restrictions of the computing substrate. Theactivity of each cause neuron is kept at a predeterminedvalue via dynamic synapses, implementing a form of spike-based homeostasis heavily inspired by Habenschuss, Puhr,and Maass [36].

Implementation: NSEM employs two plasticity rules(homeostasis as well as learning), acting on di�erent synapsesat di�erent time scales. Both are executed on the singlePPU of the prototype system at the same time. Theyare implemented separately and combined using a simpledeadline scheduler. In order to facilitate plasticity occurringon di�erent timescales, each plasticity rule has a con�gurabledeadline after which it is applied again.

Results: The center sections in �g. 6 depict the di�erentplasticity rules in action: (center left) while a single neuronreceives excitatory input from a background source withvarying rate, homeostasis is able to maintain a constant �ringrate after (top) sudden shifts in the background rate (middle);the weight evolution of both homeostatic synapses (bottom)re�ect the adaptive process. (center right) Several synapsesemploying NSEM-rule are able to correctly infer the rate oftheir pre-synaptic neuron despite limited weight resolution.The dashed lines represent the expected target rate whilethe dotted lines represent the mean of the (colored) inferedrates. (bottom) After learning, a network of three neurons isable to di�erentiate between three input patterns: for everypresented pattern (bottom), a di�erent neuron is clearlymost active (top). The network is hence able to infer hiddencauses of its input in an unsupervised manner, all the whilemaintaining an activity equilibrium via homeostasis. Thesuccessful implementation and execution of the experiment

input layer (Poisson spike trains)

z1

zk

zK

LIF neurons

cause layer

Vik

EI

inhibitory populationfacilitating

WTA dynamicsI

Poisson background sources (one of each type per neuron)

introducing stochasticity

wkhom

hombackground sources

for spike-based homeostasis

y1

yN

yi

0

25

k [H

z bio

]

meas.target

0

200

bg [H

z bio

]

exc. bg

0 1000 2000 3000 4000 5000t [sbio]

500

50

w [L

SB]

exc. syn.inh. syn.

0 2000 4000 6000 8000t [sbio]

20

40

60

80

100

Infe

rred

rate

ik [H

z bio

]

9500 9502 9504 9506 9508 9510t [sbio]

Labe

l / N

euro

n

Neuron spikesLabels

Fig. 6: Example experiment network architecture (top, takenfrom Breitwieser [37]), homeostasis mechanism (center left),learning of target rates (center right), spike data as wellas classi�cation output (bottom). See section IV for moredetails.

demonstrates the suitability for complex low-level single-chip experiments concurrently employing several distinctplasticity rules running at di�erent time scales on the PPU.

V. DiscussionThis work describes the latest developments for

BrainScaleS OS in the light of BSS-2. In particular, we focuson the expert-level user interfaces for system con�gurationas well as experiment control. Fundamental ideas of thesoftware design were already devised by the authors inBrüderle et al. [9], the software stack for the precursor ofthe BrainScaleS-1 architecture [38]. Following that and dueto being a large-scale neuromorphic platform [5, 39], thesoftware stack was rewritten for BrainScaleS-1 [6]. Eventu-ally, the second-generation BrainScaleS architecture demandsfor increased �exibility due to its improved con�gurabilityand programmability [8, 40]. In particular, the embeddedSIMD processors require additional software during both,experiment design and experiment runtime.BrainScaleS OS for BSS-2 builds upon libraries as well

as development methodologies that have been developedfor BSS-1. In the current version, single-chip BSS-2 systemscan be robustly con�gured and controlled during runtime.

Page 8: Extending BrainScaleS OS for BrainScaleS-2

The software layers presented in this work focus on expertusage. Higher-level experiment description languages suchas PyNN [10] are not yet supported.

We demonstrate the con�guration and experiment controlin a hybrid experiment setting also involving two plasticityrules, homeostasis and learning. Multiple other experiments,e.g. [12–15], demonstrate the system’s applicability inmachine learning and computational neuroscience.

The group also focuses on increasing reliability and qualityassurance in mixed-signal hardware development [41]. Inparticular, co-simulating software and hardware enablesa continuous-integration-based work�ow during hardwaredevelopment.

VI. Future Developments

BrainScaleS OS for BSS-2 is still under development. Thepresented software layers have been su�cient for expertexperimenters. However, we identi�ed several aspects thatneed attention in the future.

Structured Data Exchange in Distributed Systems

Communication is a key element of distributed systems.Inter-operation of host computers, FPGAs and PPUs typicallyrequires communication to exchange state as well as toprovide synchronization. In particular, data is exchangedbetween di�erent architectures with varying endianness andalignment constraints. For example, exchanging plasticityparameters between host and PPU already bene�ts fromcross-platform structured data exchange. To solve these tasks,we aim for a thin message passing library that is integratedwith a platform-agnostic serialization library.

Full Stack Hardware Design Validation

Veri�cation of hardware design prior to manufacturingis vital as chip production is expensive. Past experienceshave shown that unit testing of individual chip componentsalone is often insu�cient. By providing a simulator backendin a lower-level communication layer, the full softwarestack can be used to run tests/experiments on a simulatedhardware device. This will facilitate the validation of newFPGA features and —most importantly— chip designs byutilizing the complete test suite of the software stack priorto fabrication.

Algorithmic Task O�oading

Increasing complexity in plasticity and �rst steps towardsstandalone on-chip calibration algorithms demand for ac-cess of arbitrary on-chip facilities. In section III-D andsection III-C, we introduced an API providing this kindof access. Therefore, the lower-level parts of BrainScaleS OS—in particular the hardware abstraction layer— are to beported to the PPU architecture. The availability of the vectorunit for on-chip code enables optimized access to synapsesand similar facilities which are unavailable for host or, moreprecisely, FPGA-based access. At the same time, size andruntime performance overhead needs to be minimized.

Logical Experiment Description

BrainScaleS architectures already support a variable num-ber —scalable to cortical connection densities— of pre-synaptic connections per “logical” neuron representingmultiple linked neuron circuits. In addition, the currentversion of the BSS-2 architecture makes use of a similarmechanism to build structured neurons consisting of mul-tiple compartments; see Aamir et al. [42] for a detaileddescription of the hardware implementation. However, whilethe hardware abstraction layer provides support for type-safe and correct con�guration of the system, it does notabstract the constitution of user-friendly encapsulation ofcon�guration entities itself. The set of parameters bound tosuch a “logical“ con�guration entity could be included in acollection of hardware abstraction layer containers possiblywith constraints on their placement via coordinates. Forthe encoding and decoding of these larger, logical entities,composition of hardware abstraction layer containers is to beused. The rules under which to group parameters to “logical“con�guration entities are currently under development.

Higher-level User Interfaces

For higher-level usage, e.g., as accelerator for spikingneural network models, a topology-centric graph-basedcon�guration API on top of the hardware abstractionestablished in this document is planned; Similarly, multi-chip systems increase the need for automation of tasksrelated to transforming a user-de�ned experiment to avalid hardware con�guration, hardware calibration anddistributed experiment control. Inspiration is taken from theexisting higher-level software infrastructure for the BSS-1system Müller et al. [6].

Recent developments in the machine learning communitya�ect the way people think about data �ow as well ashow to programmatically describe learning algorithms [43,44]; on the other hand, the neuromorphic communitystarts building a bridge between deep neural networksand spiking neuromorphic substrates [45–47]. As a �rststep, the BSS-2 non-spiking operation mode allows for atransparent integration into typical libraries for classicalneural networks libraries. Furthermore, the exploitation ofthe same neural network libraries allows for the speci�cationof, e.g., plasticity rules in a computational graph; fullintegration of BSS-2 into a neural network library wouldbe a large step towards a high-level speci�cation of, e.g.,plasticity rules.

VII. ContributionsE. Müller is the lead developer and architect of the

BrainScaleS-2 software stack. C. Mauch contributed tothe software architecture. P. Spilger contributed to the�nal software architecture and is the main contributor ofthe described experiment. O. Breitwieser contributed thepreemptive experiment scheduling capabilities, designed theinitial experiment and contributed to the implementationon hardware. J. Klähn and D. Stöckel contributed to theinitial software architecture. T. Wunderlich contributed to

Page 9: Extending BrainScaleS OS for BrainScaleS-2

the remote debugger implementation. J. Schemmel is thelead designer and architect of the BrainScaleS-2 neuromor-phic system. All authors discussed and contributed to themanuscript.

AcknowledgmentsThe authors wish to thank all present and former members

of the Electronic Vision(s) research group contributing to theBrainScaleS-2 hardware platform, software development andoperation methodologies, as well as software development.The authors express their special gratitude towards: • ArthurHeimbrecht for his initial work on adding vector-unit supportto the compiler; • Simon Friedmann for the implementationof the initial commissioning software. We especially expressour gratefulness to the late Karlheinz Meier who initiatedand led the project for most if its time.

This work has received funding from the EU ([H2020/2014-2020]) under grant agreements 720270 (HBP) and 785907(HBP).

References[1] Paul A Merolla, John V Arthur, Rodrigo Alvarez-Icaza,

et al. “A million spiking-neuron integrated circuit witha scalable communication network and interface”. In:Science 345.6197 (2014), pp. 668–673.

[2] Steve B. Furber, David R. Lester, Luis A. Plana, et al.“Overview of the SpiNNaker System Architecture”. In:IEEE Transactions on Computers 99.PrePrints (2012).issn: 0018-9340. doi: http://doi.ieeecomputersociety.org/10.1109/TC.2012.142.

[3] Thomas Pfeil, Andreas Grübl, Sebastian Jeltsch, et al.“Six networks on a universal neuromorphic computingsubstrate”. In: Frontiers in Neuroscience 7 (2013), p. 11.issn: 1662-453X. doi: 10.3389/fnins.2013.00011. url:http : / / www. frontiersin . org / neuromorphic % 5C _engineering/10.3389/fnins.2013.00011/abstract.

[4] Mike Davies, Narayan Srinivasa, Tsung-Han Lin, et al.“Loihi: A neuromorphic manycore processor with on-chip learning”. In: IEEE Micro 38.1 (2018), pp. 82–99.

[5] Johannes Schemmel, Daniel Brüderle, Andreas Grübl,et al. “A Wafer-Scale Neuromorphic Hardware Systemfor Large-Scale Neural Modeling”. In: Proceedings ofthe 2010 IEEE International Symposium on Circuits andSystems (ISCAS). 2010, pp. 1947–1950.

[6] Eric Müller, Sebastian Schmitt, Christian Mauch,et al. “The Operating System of the NeuromorphicBrainScaleS-1 System”. In: arXiv preprint (Mar. 2020).url: TODO.

[7] PowerISA. PowerISA Version 2.06 Revision B. Tech. rep.Available at http://www.power.org/resources/reading/.power.org, July 2010.

[8] S. Friedmann, J. Schemmel, A. Grübl, et al. “Demon-strating Hybrid Learning in a Flexible NeuromorphicHardware System”. In: IEEE Transactions on BiomedicalCircuits and Systems 11.1 (2017), pp. 128–142. issn:1932-4545. doi: 10.1109/TBCAS.2016.2579164.

[9] Daniel Brüderle, Eric Müller, Andrew Davison, et al.“Establishing a novel modeling tool: a python-basedinterface for a neuromorphic hardware system”. In:Frontiers in Neuroinformatics 3 (2009), p. 17. issn: 1662-5196. doi: 10.3389/neuro.11.017.2009. url: https://www.frontiersin.org/article/10.3389/neuro.11.017.2009.

[10] A. P. Davison, D. Brüderle, J. Eppler, et al. “PyNN: acommon interface for neuronal network simulators”.In: Front. Neuroinform. 2.11 (2009). doi: 3389/neuro.11.011.2008.

[11] Jochen M. Eppler, Moritz Helias, Eilif Muller, et al.“PyNEST: a convenient interface to the NEST simula-tor”. In: Front. Neuroinform. 2.12 (2008).

[12] Timo Wunderlich, Akos F. Kungl, Eric Müller, et al.“Demonstrating Advantages of Neuromorphic Com-putation: A Pilot Study”. In: Frontiers in Neuroscience13 (2019), p. 260. issn: 1662-453X. doi: 10.3389/fnins.2019.00260. url: https://www.frontiersin.org/article/10.3389/fnins.2019.00260.

[13] Sebastian Billaudelle, Yannik Stradmann, KorbinianSchreiber, et al. “Versatile emulation of spiking neuralnetworks on an accelerated neuromorphic substrate”.In: arXiv preprint arXiv:1912.12980 (2019).

[14] Benjamin Cramer, David Stöckel, Markus Kreft, etal. “Control of criticality and computation in spikingneuromorphic networks with plasticity”. In: (2019).arXiv: 1909.08418 [cs.ET].

[15] Thomas Bohnstingl, Franz Scherr, Christian Pehle, etal. “Neuromorphic Hardware Learns to Learn”. English.In: Frontiers in neuroscience 2019.13 (May 2019), pp. 1–14. issn: 1662-4548.

[16] Wenzel Jakob, Jason Rhinelander, and Dean Moldovan.pybind11 – Seamless operability between C++11 andPython. https://github.com/pybind/pybind11. 2019.

[17] Johann Klähn. genpybind software v0.2.0. 2020. doi:10 . 5281 / zenodo . 372674. url: https : / / github . com /kljohann/genpybind.

[18] Michael Barr. Programming embedded systems in Cand C++. eng. 1st ed. Sebastopol, Calif.: O’Reilly, 1999.isbn: 978-1-56592-354-6.

[19] B. Gough and R.M. Stallman. An Introduction toGCC: For the GNU Compilers Gcc and G++. Net-work theory manual. Network Theory, 2005. isbn:9780954161798. url: https://books.google.de/books?id=yIGKQAAACAAJ.

[20] J. Tyler, J. Lent, A. Mather, et al. “AltiVec/sup TM/:bringing vector technology to the PowerPC/sup TM/processor family”. In: 1999 IEEE International Perfor-mance, Computing and Communications Conference(Cat. No.99CH36305). Feb. 1999, pp. 437–444. doi: 10.1109/PCCC.1999.749469.

[21] Programming languages — C++. Standard. Geneva,Swiss: International Organization for Standardization,Dec. 2017.

[22] Bill Gatli�. “Embedding with gnu: Newlib”. In: Em-bedded Systems Programming 15.1 (2002), pp. 12–17.

Page 10: Extending BrainScaleS OS for BrainScaleS-2

[23] Free Software Foundation. The GNU C++ Library. url:https://gcc.gnu.org/onlinedocs/libstdc++.

[24] Tool Interface Standard Commitee. Executable andLinking Format (ELF) Speci�cation. 1995. url: http ://refspecs.linuxbase.org/elf/elf.pdf.

[25] Richard Stallman, Roland Pesch, Stan Shebs, et al.Debugging with GDB. Free Software Foundation, 2011.isbn: 978-0-9831592-3-0.

[28] Susan C. Hill, Joseph Jelemensky, Mark R. Heene, et al.“Queued serial peripheral interface for use in a dataprocessing system”. US4958277A. 1987.

[29] Simon Friedmann. Omnibus On-Chip Bus. forked fromhttps://github.com/�ve-elephants/omnibus. 2015. url:https://github.com/electronicvisions/omnibus.

[31] IEEE. “IEEE Standard Test Access Port and Boundary-Scan Architecture”. In: IEEE Std 1149.1-2001 (2001),pp. i–200. doi: 10.1109/IEEESTD.2001.92950.

[32] S. Friedmann, J. Schemmel, A. Grübl, et al. “Demon-strating Hybrid Learning in a Flexible NeuromorphicHardware System”. In: IEEE Transactions on BiomedicalCircuits and Systems 11.1 (2017), pp. 128–142. issn:1932-4545. doi: 10.1109/TBCAS.2016.2579164.

[33] S. A. Aamir, Y. Stradmann, P. Müller, et al. “AnAccelerated LIF Neuronal Network Array for a Large-Scale Mixed-Signal Neuromorphic Architecture”. In:IEEE Transactions on Circuits and Systems I: RegularPapers 65.12 (Dec. 2018), pp. 4299–4312. issn: 1549-8328. doi: 10.1109/TCSI.2018.2840718.

[34] Bernhard Nessler, Michael Pfei�er, Lars Buesing, et al.“Bayesian Computation Emerges in Generic Corti-cal Microcircuits through Spike-Timing-DependentPlasticity”. In: PLoS Computational Biology 9.4 (2013).Ed. by Olaf Sporns, e1003037.

[35] Johannes Bill, Lars Buesing, Stefan Habenschuss, et al.“Distributed Bayesian Computation and Self-OrganizedLearning in Sheets of Spiking Neurons with LocalLateral Inhibition”. In: PLOS ONE 10.8 (Aug. 2015),pp. 1–51. doi: 10 .1371/ journal .pone .0134356. url:https://doi.org/10.1371/journal.pone.0134356.

[36] Stefan Habenschuss, Helmut Puhr, and WolfgangMaass. “Emergence of Optimal Decoding of PopulationCodes Through STDP”. In: Neural Computation 25.6(2013), pp. 1371–1407.

[37] Oliver Breitwieser. “Towards a Neuromorphic Imple-mentation of Spike-Based Expectation Maximization”.Master thesis. Ruprecht-Karls-Universität Heidelberg,2015.

[38] J. Schemmel, A. Grübl, K. Meier, et al. “ImplementingSynaptic Plasticity in a VLSI Spiking Neural NetworkModel”. In: Proceedings of the 2006 International JointConference on Neural Networks (IJCNN). IEEE Press,2006.

[39] Sebastian Millner, Andreas Grübl, Karlheinz Meier,et al. “A VLSI Implementation of the AdaptiveExponential Integrate-and-Fire Neuron Model”. In:Advances in Neural Information Processing Systems 23.

Ed. by J. La�erty, C. K. I. Williams, J. Shawe-Taylor,et al. 2010, pp. 1642–1650.

[40] Syed Ahmed Aamir, Paul Müller, Andreas Hartel, et al.“A highly tunable 65-nm CMOS LIF neuron for a large-scale neuromorphic system”. In: Proceedings of IEEEEuropean Solid-State Circuits Conference (ESSCIRC).2016.

[41] Andreas Grübl, Sebastian Billaudelle, BenjaminCramer, et al. “Veri�cation and Design Methods forthe BrainScaleS Neuromorphic Hardware System”. In:arXiv preprint (2020). url: http://arxiv.org/abs/2003.11455.

[42] S. A. Aamir, P. Müller, G. Kiene, et al. “A Mixed-Signal Structured AdEx Neuron for Accelerated Neu-romorphic Cores”. In: IEEE Transactions on BiomedicalCircuits and Systems 12.5 (Oct. 2018), pp. 1027–1037.issn: 1932-4545. doi: 10.1109/TBCAS.2018.2848203.

[43] Adam Paszke, Sam Gross, Francisco Massa, et al. “Py-Torch: An Imperative Style, High-Performance DeepLearning Library”. In: Advances in Neural InformationProcessing Systems 32. Ed. by H. Wallach, H. Larochelle,A. Beygelzimer, et al. Curran Associates, Inc., 2019,pp. 8024–8035. url: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.

[44] Martín Abadi, Ashish Agarwal, Paul Barham, et al.TensorFlow: Large-Scale Machine Learning on Heteroge-neous Distributed Systems. 2015. url: http://download.tensor�ow.org/paper/whitepaper2015.pdf.

[45] Bodo Rueckauer, Iulia-Alexandra Lungu, YuhuangHu, et al. “Conversion of Continuous-Valued DeepNetworks to E�cient Event-Driven Networks forImage Classi�cation”. In: Frontiers in Neuroscience 11(2017), p. 682. issn: 1662-453X. doi: 10.3389/fnins.2017.00682. url: https://www.frontiersin.org/article/10.3389/fnins.2017.00682.

[46] B. Rueckauer and S. Liu. “Conversion of analogto spiking neural networks using sparse temporalcoding”. In: 2018 IEEE International Symposium onCircuits and Systems (ISCAS). May 2018, pp. 1–5. doi:10.1109/ISCAS.2018.8351295.

[47] Julian Göltz, Andreas Baumbach, Sebastian Billaudelle,et al. Fast and deep neuromorphic learning with time-to-�rst-spike coding. 2019. eprint: arXiv:1912.11443.

Own Software[26] Sebastian Jeltsch. rant. url: https://github.com/ignatz/

rant.[27] Electronic Visions(s), Heidelberg University. halco. url:

https://github.com/electronicvisions/halco.[30] Electronic Visions(s), Heidelberg University. haldls.

url: https://github.com/electronicvisions/haldls.