Nios II Embedded Processor Design Contest - Outstanding - Altera

386

Transcript of Nios II Embedded Processor Design Contest - Outstanding - Altera

Page 1: Nios II Embedded Processor Design Contest - Outstanding - Altera
Page 2: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded ProcessorDesign Contest

Outstanding Designs 2006

Page 3: Nios II Embedded Processor Design Contest - Outstanding - Altera

Altera, The Programmable Solutions Company, the stylized Altera logo, specific device designations, and all otherwords and logos that are identified as trademarks and/or service marks are, unless noted otherwise, the trademarksand service marks of Altera Corporation in the U.S. and other countries. All other product or service names are theproperty of their respective holders. Altera products are protected under numerous U.S. and foreign patents andpending applications, mask work rights, and copyrights.

Page 4: Nios II Embedded Processor Design Contest - Outstanding - Altera

Foreward

Foreward

One thing hasn’t changed since Altera created the world’s first reprogrammable logic device more than 20 years ago: a commitment to innovation: ours and yours. We help you take your design ideas from concept to reality with a portfolio of products that includes the Nios® and Nios II soft-core embedded processors. Since they were introduced in June 2000, academic and professional designers worldwide have innovated using Altera’s Nios and Nios II processors, integrating them in a wide range of commercial applications from video broadcast systems and WiMAX base stations using high-performance Stratix® II FPGAs to SOHO networking equipment and automotive telematics using low-cost Cyclone® II devices. Today, more than 20,000 development kits have been shipped, and over 5,000 companies—including the world’s top 20 original equipment manufacturers (OEMs)—are licensed and implementing Nios processors in their designs.

The versatility of the Nios processor supports creativity and innovation. You can tailor Nios systems with the exact peripherals, memory, and interfaces required, and add your own proprietary functions—your own differentiation—to create a unique competitive advantage. Altera solves the IP integration problem with the SOPC Builder productivity tool that allows you to drag and drop the exact mix of functions required, so you can focus on higher, system-level requirements instead of mundane, error-prone, manual tasks. Plus, Altera® tools work seamlessly with other third-party industry-standard tools, minimizing training time and improving time to market. The soft-core processor implementation enables easy software and design upgrades, effectively making your design obsolescence-proof. With the added advantages of FPGA flexibility, fast time-to-market, and system integration, you have a risk-free path to a custom embedded solution. Your possibilities are unlimited.

Altera continually cultivates designers and supports innovation in Asia and around the world through our University Program. As part of that program, the Nios II Embedded Processor Design Contest aims to increase student interest in embedded processors, improve their design and creative abilities, and ultimately motivate the continued development of FPGA-based embedded processor designs. This year, we received 614 qualified entries—a new record. The significant growth of the program is yet more proof of how rapidly the Nios design community is expanding.

The winning entries presented in this book showcase not only the breadth of possibilities that can be addressed using Altera’s embedded solutions—including everything from aircraft navigation, traffic monitoring, robots, watermarking, and oscillographs to video processing and a software-defined radio—but also technology trends in the industry. When you have the tools and flexibility you need, your potential for great design is unlimited.

Congratulations to all the Nios II Design Contest winners and their professors. Keep the fires of innovation burning!

Jordan S. PlofskySenior Vice President, MarketingAltera Corporation

iii

Page 5: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

iv

Page 6: Nios II Embedded Processor Design Contest - Outstanding - Altera

Contents

Contents

Foreward .................................................................................................. iii

Automotive Applications......................................................................... 1Star AwardArtificially Intelligent Autonomous Aircraft Navigation System Using a Distance Transform on an FPGA.............................................................................................................................................3

Institution: Monash UniversityParticipants: Nathan William SmithInstructor: Dr. Andrew Price

Third PrizeDriver Assistance Tool ................................................................................................................15

Institution: Hanyang UniversityParticipants: Jonguk Song, Hwanjun Kang, Kangsub Kwack

Third PrizeNios II-Based Multiple-Core Intelligent Traffic On-Vehicle Terminal ......................................19

Institution: Xi’an Institute of Post and TelecommunicationsParticipants: Xiao-Wu Chen, Li-Bin Liu, and Fang-Fang ZhangInstructor: Jun Liu

Consumer Applications......................................................................... 39First PrizeUnattended Wireless Search Robot .............................................................................................41

Institution: Kwangwoon UniversityParticipants: Yoongoo Kim, Younggon Lee, Jeongwook YimInstructor: Yongjin Jeong

Second PrizeSet-Top Box Capable of Real-Time Video Processing...............................................................51

Institution: Xi’an University of Electronic Science and TechnologyParticipants: Fei Xiang, Wen-Bo Ning, and Wei ZhuInstructor: Wan-You Guo

v

Page 7: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Second PrizeDigital Watermark-Based Trademark Checker ...........................................................................71

Institution: Institute of Information Science, Beijing JiaoTong UniversityParticipants: Sheng-Kai Song, Wei-Ming Li, and Li SongInstructor: Xiao-Ming Ding

Second PrizeSOPC-Based Speech-to-Text Conversion...................................................................................83

Institution: National Institute of Technology, TrichyParticipants: M.T. Bala Murugan and M. BalajiInstructor: Dr. B. Venkataramani

Second PrizeNios II Embedded Electronic Photo Album..............................................................................109

Institution: Electrical Engineering Institute, St. John’s UniversityParticipants: Hong-Zhi Zhang, Wei-Ming Yeh, and Wei-Min YangInstructor: Rui-Xi Chen

Third PrizeAutomatic Scoring System........................................................................................................123

Institution: Huazhong University of Science & TechnologyParticipants: Ya-bei Yang, Zun Li, and Yao ZhaoInstructor: Xiao Kan

Third PrizeMultimedia Decoder Using the Nios II Processor.....................................................................173

Institution: Indian Institute of ScienceParticipants: Mythri Alle, Naresh K. V., Svatantra SinghInstructor: S. K. Nandy

Third PrizeOmnidirectional Mobile Home Care Robot ..............................................................................183

Institution: Department of Electrical Engineering, National Chung-Hsing UniversityParticipants: Hsu-Chih Huang, Chia-Ming Chen, and Tung-Sheng WangInstructor: Professer Ching-Chih Tsai

Communications Applications............................................................ 195First PrizeNetwork Data Security System Design with High Security Insurance .....................................197

Institution: Department of Information Engineering, I-Shou UniversityParticipants: Jia-Wei Gong, Jian-Hong Chen, and Zih-Heng ChenInstructor: Professor Ming-Haw Jing

First PrizeSOPC Implementation of Software-Defined Radio ..................................................................235

Institution: National Institute of Technology, TrichyParticipants: A. Geethanath, Govinda Rao Locharla, V.S.N.K. ChaitanyaInstructor: Dr. B. Venkataramani

vi

Page 8: Nios II Embedded Processor Design Contest - Outstanding - Altera

Contents

Industrial Applications......................................................................... 251First PrizeNios II Processor-Based Remote Portable Multi-Function Logic Analyzer .............................253

Institution: Huazhong University of Science and TechnologyParticipants: Lian Zeng, Yong Li, and Hong-mei ZhuInstructor: Jie Luo

Second PrizeBlack Box for Robot Manipulation ...........................................................................................285

Institution: Hanyang University, Seoul National University, Yonsei UniversityParticipants: Kim Hyong Jun, Ahn Ho Seok, Baek Young Min, Sa In Kyu

Third PrizeDigital Oscillograph ..................................................................................................................295

Institution: Department of Microwave Engineering Air Force Radar AcademyParticipants: Hui Wu, Zhi-Xiong Deng, and Li-Hua GuoInstructor: Yao-Jun Chen

Third PrizeNios II-Based Air-Jet Loom Control System ............................................................................325

Institution: Donghua UniversityParticipants: Yu-Bin Lue, Hong Chen, and Bin ZhouInstructor: Ge-Jin Cui

Third PrizeNios II-Based Audio-Controlled Digital Oscillograph..............................................................337

Institution: Xian Jiao Tong UniversityParticipants: Wan Liang, Zhang Weile, and Wang WeiInstructor: Penghui Zhang

Appendix: Nios II Embedded Processor Family................................ 359

Appendix: Nios II Embedded Processor Design Contest Winners . 373

vii

Page 9: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

viii

Page 10: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automotive Applications

Automotive Applications

This section includes projects that can be used as automotive applications, including:

Artificially Intelligent Autonomous Aircraft Navigation System Using a Distance Transform on an FPGA.................................................................................................................................................. 3

Driver Assistance Tool ........................................................................................................................ 15

Nios II-Based Multiple-Core Intelligent Traffic On-Vehicle Terminal............................................... 19

1

Page 11: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

2

Page 12: Nios II Embedded Processor Design Contest - Outstanding - Altera

Artificially Intelligent Autonomous Aircraft Navigation System Using a DT on an FPGA

Star Award

Artificially Intelligent Autonomous Aircraft Navigation System Using a Distance Transform on an FPGA

Institution: Monash University

Participants: Nathan William Smith

Instructor: Dr. Andrew Price

Design IntroductionThis project is an artificially intelligent navigation system for use on board unmanned aerial vehicles (UAVs). The system takes the available input data and a user-defined mission statement, integrates the data, and produces navigation reference signals that are used by other systems to control the plane. Figure 1 shows the UAV navigation system block diagram, which is implemented in an Altera® Cyclone® FPGA. The output is continually updated based on the dynamic input from changes in the UAV or obstacle positions. The navigation system produces output signals to other UAV systems, e.g., a signal indicates that the UAV should land because the mission is complete or because an emergency landing is required.

The navigation system also produces a busy signal, which is usually inactive. This signal indicates that the navigation system is busy recalculating several parameters due to complex obstacles in the flight path, and the system will not produce the required pitch and heading for some time. In this case, the busy signal causes the flight control system to enter a holding pattern (i.e., causes the UAV to travel in a circle) until the navigation calculations are complete.

3

Page 13: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 1. UAV Navigation System Block Diagram

The navigation system uses a novel two-level hierarchical distance transform (DT) algorithm in three dimensions to control the aircraft’s longitude, latitude, and altitude. The DT technique is commonly used in signal processing, specifically image processing. However, DT has been used more recently for the robotic navigation of land-based vehicles over low ranges. This project applies the DT for aerospace vehicles in three dimensions over long-range missions.

The project also involves significant systems integration and systems engineering. The navigation system must interface and communicate with other hardware in the overall UAV system. For example, a simple interface includes voltage-level signaling for one-bit signals, such as when to land the UAV. More complex interfacing occurs for the global positioning system (GPS) where a National Marine Electronics Association (NMEA) 183 communications standard applies, and for the RS-232 serial communications required for specific navigation signals (pitch and bearing).

I chose to implement the Nios® II system on a Cyclone FPGA for the following reasons:

■ The application must perform computationally complex algorithms quickly.

■ The application receives inputs from a variety of communications media, such as from the GPS.

■ The application has to debug each part of the design in a user-friendly environment.

Figure 2 describes the required systems engineering and interfaces from the navigation system to other UAV systems. Grey blocks show the systems that are within the project scope. The navigation system produces or communicates with the interfaces indicated by bold arrows.

Current x,y position from GPS

Current z position from altimeter

Detected obstacles (storms, mountains, etc.)

Target waypoint x,y,z (user-defined and created by UAV mission compiler)

Battery condition

Navigation SystemRequired bearing to reach waypoint

Required pitch to reach waypoint

Navigation busy (usually low, becomes high to request entering aholding pattern until more complexnavigation calculations can complete

Land UAV because the mission iscomplete or the battery is low

Enable/disable on-board cameraswhile en-route to current waypoint

4

Page 14: Nios II Embedded Processor Design Contest - Outstanding - Altera

Artificially Intelligent Autonomous Aircraft Navigation System Using a DT on an FPGA

Figure 2. UAV System and Interfaces Block Diagram

Note to Figure 2:(1) Specific hardware is unknown; possible wireless TCP/IP to meterology and topographical site.

Function DescriptionThe heart of the navigation system hardware is an Altera Cyclone FPGA, which is incorporated into Monash University’s miniboard (called the Monash miniboard). Figure 3 shows the board and describes its main components. The FPGA is used as a powerful microcontroller that can implement custom logic or microprocessor systems by configuring the device using software. To configure this device on this board requires the Altera Quartus® II software version 5.1.

Figure 3. Monash Miniboard

NavigationSystem

Altera Cyclone FPGA

GPSGarmin Geko 201

AltimeterLaser Range Finder

Mission CompilerMicrosoft Excel

Long-Range Obstacle Scanning

System Note (1)

Battery ConditionIndicator

(Voltmeter & Comparator)

Current Heading

Current Position (x,y)

Current Position (z)

Navigation System Software

(Includes User-Defined Target Waypoints)

Detected Obstacles

(Long Range: Storms, Mountains, No-Fly Zones, etc.)

Battery Condition

(Low Battery Signal)

Required Heading

Enable Cameras

(& Power Down If Battery Is Low)

Required Pitch

Navigation Busy, Enter Holding Pattern

- +Flight Control

System

Altera Cyclone FPGA

-

+

Signals to Ailerons, Propeller & Other Avionics

Artificial HorizonGyroscopic

Sensor

On-Board Cameras & Image Processing

System

Altera Cyclone FPGA

Landing Zone Clearat Current Position

Request Landing(Mission Complete or

Emergency Landing Due To Critical Battery, Etc.)

Engage Landing

Detected Obstacles(Short Range: Trees, Buildings, Etc.)

CurrentPitch

Altera Cyclone FPGA

UART Connector

Power Connector

JTAG Connector

ASP Connector

Flash Memory

Voltage Regulators

Momentary Action SwitchInputs (X3). Reserved for Use as Reset Switch, Etc.

Programmable LEDIndicators (X8)

General PIO Pins

512-KByte SDRAM

5

Page 15: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 4 shows the Quartus II project implementing the navigation system. To create the project, I performed the following general steps:

1. Drew a block diagram of the desired system in the Quartus II software, including a block (NAV_Processor) for a custom microprocessor that I developed with SOPC Builder.

2. Added pin assignments to the block diagram (described later in Table 1), according to the physical pins that were available on the FPGA. Pins include I/O pin blocks that communicate to specific pins on the FPGA, such as pins specific to the navigation system I/O as defined in Figure 1.

3. Used the Quartus II software to analyze and compile the design.

4. Used the Quartus II software to configure the FPGA with the configuration information.

After the configuration information was loaded into the FPGA, I could upload the specific C/C++ navigation software (provided from the mission complier) to the FPGA’s microprocessor. I could then navigate the UAV for the desired mission.

Figure 4. Quartus II Project

Figure 5 shows the Quartus II compilation report, which describes the system usage. This project uses more than 70% of the logic elements (LEs) available on the FPGA.

6

Page 16: Nios II Embedded Processor Design Contest - Outstanding - Altera

Artificially Intelligent Autonomous Aircraft Navigation System Using a DT on an FPGA

Figure 5. Quartus II Compilation Report

A critical part of the system is the microprocessor, which executes the navigation system software to compute a DT and generate navigation control signals. The SOPC Builder graphical interface made it easy to select a Nios II processor and connect it to the navigation system hardware. SOPC Builder provides three processor variants, and lets you customize the selected processor. Depending on the microprocessor system being developed, SOPC Builder allows the the designer to include UART outputs, Ethernet interfaces, VGA displays, etc., provided that the selected components fit into the memory/LEs available on the target FPGA.

Figure 6 shows the navigation system’s microprocessor configuration in SOPC Builder, including all of the required hardware. The project uses the Nios II/f variant because of its speed and data cache. I selected the cache size and pipeline performance by trial and error; I wanted to use the largest possible cache configuration while still fitting the software and configuration data into the Cyclone device’s memory.

7

Page 17: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 6. Navigation System’s Processor Configuration in SOPC Builder

Table 1 shows the pin configuration for the navigation system’s test hardware, which was the Monash miniboard.

Table 1. Navigation System Pin Configuration (Part 1 of 2)

Pin Number Type Function16 Input Clock (CLK)

110 Input Reset (Reset)

3, 4 InputBidirectional

UART 0, Note (1) (RX, TX)

N/A JTAG UART GPS_UART, Note (2) (RX, TX)

108, 100, 104, 106, 107, 109, 103,1 05 Output LED PIO (LED 1-8)

143 Output Battery Low Signal PIO (BATT)

141 Output Landing Signal PIO (LAND)

96, 84, 82, 78, 60, 58, 56, 52, 51, 53, 57, 59, 61, 67,79, 83, 85, 97

Output Address Bus to 512-Kbyte SRAM (ADR 0 - 18)

74, 72, 70, 68, 69, 71, 73, 75 Bidirectional Data Bus to 512-Kbyte SRAM (DAT 0 - 7)

76 Output Chip Select to 512-Kbyte SRAM

77, 62 Output Read, Write Enable to 512-Kbyte SRAM

134, 133, 132 Output Camera Control PIO (CAM1 - CAM3)

8

Page 18: Nios II Embedded Processor Design Contest - Outstanding - Altera

Artificially Intelligent Autonomous Aircraft Navigation System Using a DT on an FPGA

Notes to Table 1:(1) UART 0 is the connection from the miniboard to the PC or to the flight control board. It outputs the required pitch and bearing

control signals as a text stream for decoding by other hardware.(2) GPS_UART is the connection from the miniboard to the GPS. It uses the JTAG UART interface in the miniboard and has no

pins specified in the Quartus II software.

With the microprocessor design uploaded to the FPGA, I loaded the navigation system software, which I developed using the Nios II Integrated Development Environment (IDE), into the microprocessor. Figure 7 shows the navigation system’s output stream, which is displayed by the Nios II terminal. The system outputs more information than that required by the DT algorithm (i.e., the pitch and heading data). The extra data, such as the UAV longitude and latitude, is useful for debugging and data recoding of the actual UAV flight path.

Figure 7. Navigation System Output Stream

Performance ParametersThe most important performance parameter for the navigation system is that it performs a DT computation over a large three-dimensional (3-D) mission area (requiring two 3-D arrays in software to perform the computation) in the time that the GPS receives and outputs the next positional fix, which is roughly 2 to 3 seconds.

SOPC Builder allows the user to configure additional aspects of the microprocessor to improve computation speed, at the expense of using more system memory and LEs. Specifically, the user can control the core type (Nios II/s, Nios II/f, etc.), pipelining, hardware multiply and divide, and cache allocation. Pipelining allows multiple instructions to be fed into each stage of the microprocessor execution cycle in parallel, enabling maximum execution performance of the navigation system software. Larger caches provide more memory data storage, which makes code execute faster. A large

142 Output Navigation Busy PIO (PUSY)

140 Output Ancillary UAV Output PIO (AUX1)

Table 1. Navigation System Pin Configuration (Part 2 of 2)

Pin Number Type Function

Compute DT on Current Position

Output Required Pitch & Bearing for the Aircraft Based on the DT

Get Raw GPS Message on the Serial Interrupt Routine

Decode & Convert GPS Message to Usable DT Coordinates

Perform Other Updates on Inputs/Outputs, Such as the Battery

9

Page 19: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

cache is particularly useful for the navigation system software, which uses an incremental iterative process (i.e., values in a DT matrix are updated in a scanned incremental manner) to determine the DT. However, larger caches also use more FPGA LEs and memory, and the designer can inadvertently create a system that does not fit into the target Cyclone device.

Ultimately, I selected the cache size and pipelining based on trial and error, with the goal of maximizing the cache size while still fitting the design into the Cyclone FPGA. The variant I chose met the design’s timing requirements. Table 2 compares this variant and other hardware options.

Design ArchitectureFigure 1, shown previously, provides the hardware design, and Figure 2 shows the systems integration of the navigation system and other UAV hardware. I implented this design in the Monash miniboard and Garmin Geko 201 GPS hardware (see Figure 8).

Figure 8. Hardware Construction

The navigation system software flow chart in Figure 9 provides information about the software building blocks used to develop the project.

Table 2. Hardware Configuration and Performance Characteristics

Nios II Processor

InstructionCache Size

(Kbytes)

Data Cache

(Kbytes)

Clock Pipeline

Design Fits in Cyclone Device

Time to Perform DT Algorithm in 3-D Array

(25 x 25 x 25)Nios II/s 2 N/A Yes Yes 8 seconds

Nios II/f 2 N/A Yes Yes 4 seconds

Nios II/f 2 1 Yes No N/A, no fit

Nios II/f 1 1 Yes No N/A, no fit

Nios II/f 1 512 Yes Yes 2 seconds (meets GPS timingrequirements)

Altera ByteBlasterTM JTAGCable for FPGA-to-PCCommunication

Altera Cyclone FPGA

Monash Miniboard

Navigation SystemTest Enclosure Box

Power Cable

Serial Cable for GPS Communication

Garmin Geko 201GPS

10

Page 20: Nios II Embedded Processor Design Contest - Outstanding - Altera

Artificially Intelligent Autonomous Aircraft Navigation System Using a DT on an FPGA

Figure 9. Navigation System Software Flow Chart

Design DescriptionThe design’s implementation steps are as follows:

■ Research and determine the validity of the DT technique for navigation, and develop software algorithms to support it. Research the use of Altera FPGAs using the Quartus II software, SOPC Builder, and Nios II IDE. Use the web editions of the software and documentation available from the Altera web site (www.altera.com).

■ Create a Quartus II project applicable to the device on the Monash miniboard, selecting an appropriate Nios II processor in SOPC Builder, as shown in Figures 4 and 6. Compile and debug the Quartus II design and review the compilation report shown in Figure 5. Test this project, including testing the microprocessor response, and miniboard hardware (UART communications, etc.), with the simple hello world Nios II program.

■ Create a 3-D, two-level hierarchical, DT algorithm implemented in the Nios II processor. Test the performance in the Cyclone device with the appropriate test input.

■ Update the SOPC Builder processor configuration to determine the fastest possible configuration that fits in the device’s memory and available LEs (as outlined in Table 2). Optimize the processor until the desired performance requirement is met.

■ Create interrupt-based interfaces using the Nios II IDE to control the appropriate input and output to integrate the navigation system with other systems. Test these I/O interfaces.

■ Test the complete navigation system performance using available land-based vehicles and altitude information relative to sea level provided from the GPS. A UAV was not available at this time because other systems on board the UAV were not complete.

Future work will involve completing other UAV systems as shown in Figure 2, interfacing them with the navigation system, and installing the navigation system on board a UAV. Once installed, I can perform live testing in an airborne (rather than land-based) environment.

GPS (RS-232) Interrupt

Extract Most Recent, Raw$GPGAA Messages from GPS Using a UART InterruptRoutine

Convert Latest Raw GPGAAMessage to Usable Variables(e.g., Longitude, Latitude, etc.)

Convert GPS Coordinates to DT Coordinates

Perform DT Algorithm, Including Scanning forObstacles, to Determine Pitch& Bearing to Next Required Cell

Perform 3-D, 2-Level HierarchicalDT Algorithm, Including Scanningfor Obstacles, to Determine Pitch &Bearing to Next Required Cell

Update PIO Outputs to Other Systems in Flight Based on CurrentCoordinates as Required, IncludingDetermining When to Land & ShutDown Navigation System

Read Inputs from Batteries, etc., &Make Decisions to Change MissionAccordingly

Output Relevant Pitch Bearing to Serial Port to Flight Control System

Main Navigation System Routine

11

Page 21: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Design FeaturesThis design features an Altera Cyclone FPGA. While Altera also offers some higher-performance device families (e.g., Cyclone II, Stratix® and Stratix II devices), the Cyclone device used in this project meets the performance requirements of the complicated navigation system, which uses somewhat computationally and memory-intensive DT algorithms. The Cyclone device is built onto the Monash miniboard, which includes 512-Kbyte on-board static RAM for storing instruction and other program memory from the Nios II processor, as well as other electronic interfaces that support the Cyclone device (power regulators, I/O connectors, communications controllers, etc.). The design hardware also uses a Garmin Geko 201 GPS system, which interfaces with the Cyclone device. SOPC Builder makes it easy to create interfaces between hardware and the Cyclone device.

Additionally, there are several features of the software programmed onto the device, as follows:

■ A two-level, 3-D sweeping DT algorithm gives the UAV associated intrinsic artificial intelligence (AI) to plan and dynamically follow a flight path to predefined waypoints, and provides compensating vectors when the UAV flies off course.

■ A DT orbiting routine uses the DT coordinate system to orbit a waypoint once it is reached. This routine supports missions in which the UAV may take aerial surveillance photography over waypoints, or gather meteorological data at the waypoint with appropriate sensors.

■ A NMEA 183 GPS serial string decoder algorithm decodes GPS data and converts it into X/Y cell and sector equivalents for use by the DT algorithm.

■ The system provides software control of all other required I/O signals, as shown in Figure 2.

■ The system provides communications monitoring of the GPS interface, and can determine whether the GPS has updated data within the last 6 or more seconds (most likely due to a temporary loss of GPS satellite feed in flight). Should this situation occur, the navigation system signals that it is busy. It then requests a holding pattern for the flight control system until GPS communications are restored, at which point, the navigation system resumes its mission.

ConclusionUsing the DT, I developed a reliable, artificially intelligent navigation system for use on a UAV’s on-board FPGA hardware. The system can plan a flight path dynamically, based on data obtained by other hardware, such as a GPS, and the user-defined mission generated by the mission compiler. The system also makes various other AI decisions, such as when to attempt landing the UAV.

This design involves several systems integration issues, as well as a variety of communication with other systems, such as JTAG, RS-232 UART, and programmable I/O (PIO). The Quartus II software, SOPC Builder, and Nios II IDE allow the user to implement a wide variety of I/O, and program and test it quickly. With such simple, powerful systems design and integration packages, the Altera development environment, along with the FPGAs capable of supporting such designs, is the best choice to satisfy the navigation system’s requirements.

Additionally, SOPC Builder’s ability to customize the Nios II processor configuration, including cache, pipelining, hardware multipliers, devices, etc., let me develop a highly efficient custom microprocessor that is capable of computing the DT algorithm over a large mission area for UAV systems. Without this control, I would not have been able to implement the project’s navigation system with the required performance parameters.

12

Page 22: Nios II Embedded Processor Design Contest - Outstanding - Altera

Artificially Intelligent Autonomous Aircraft Navigation System Using a DT on an FPGA

ReferencesGrevera, G J, “The ‘dead reckoning’ signed distance transform,” Computer Vision and Image Understanding 95 (2004): 317-333.

Qin-Zhong Ye, “The signed Euclidean Distance Transform and its Applications” IEEE (1988): 495-499.

Jarvis.R, “An all-terrain intelligent autonomous vehicle with sensor-fusion-based navigation capabilities” Control Engineering Practice, vol. 4, no. 4, (1996): 481-486.

Wong. E, Jarvis.R, “Real Time Path Planning and Navigation for a Humanoid Robot in and Indoor Environment” Proceedings of the 2004 IEEE Conference on Robotics, Automation and Mechatronics (2004): 693-698.

Lim Chee Wang, Lim Ser Yong, Marcelo H, “Hybrid of Global Path Planning and Local Navigation implemental on a Mobile Robot in Indoor Environment” Proceedings of the 2002 IEEE International Symposium on Intelligent Control (2004): 821-826.

13

Page 23: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

14

Page 24: Nios II Embedded Processor Design Contest - Outstanding - Altera

Driver Assistance Tool

Third Prize

Driver Assistance Tool

Institution: Hanyang University

Participants: Jonguk Song, Hwanjun Kang, Kangsub Kwack

Design IntroductionThis driver assistance tool detects the presence of a human voice and adjusts the volume of a car’s audio system accordingly. Car audio system manufacturers are the primary customers for such a device.

Our project implements a complete product that can play .wav files and adjust to the system’s environment (in this case, the presence of a human voice). The design has two parts: an MP3 player and a voice-activated volume adjustment mechanism. Each part uses a Nios® II processor.

Function DescriptionWhen a human voice is detected, the system pauses playback of the MP3 audio file and then resumes playing. The volume is adjusted as the music streams. To perform this function, the design contains:

■ A Nios II processor (named CPU0) that plays the MP3 file and controls the system delay and the attenuation coefficient of the car’s audio system.

■ A secure digital (SD) memory card that contains the MP3 files to be played.

■ An audio codec that interfaces with the speaker that plays the MP3 files and the microphone that detects the driver’s voice.

■ A fast Fourier transform (FFT) block that transforms the audio signal into its real and imaginary frequency components.

■ A second Nios II processor (named CPU1) that monitors the magnitude of the driver’s voice and instructs the audio codec function to reduce the volume.

15

Page 25: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

The system’s operation is described as follows:

1 One data frame (512 bytes) is read from the SD memory card and is saved into a digital-to-analog converter (DAC) buffer.

2. The data, which eventually becomes the MP3 sound, is multiplied by an attenuation coefficient, thereby setting the volume. The result is saved to a delay buffer.

3. The CPU0 processor delays the raw data by sending a delay value to the delay buffer.

4. The audio codec converts the human voice data received from the microphone and digitizes the audio using an analog-to-digital converter (ADC). The delayed data is then subtracted from the ADC data.

5. The resulting data stream feeds into the FFT block, which transforms the data into its frequency components and stores the real and imaginary components into the Real_Result_FIFO and Imaginary_Result_FIFO buffers, respectively.

6. The CPU1 processor monitiors the magnitude of the FFT result, which is filtered to respond to the audio range of the human voice.

7. If the magnitude of the FFT result (i.e., the volume of the driver’s voice), is greater than a preset trigger level, the CPU1 processor commands the audio codec to reduce the volume and tells the CPU0 processor to set another attenuation coefficient and delay.

Design ArchitectureFigure 1 shows our project’s design architecture. The .wav file to be played is saved into a buffer and the microphone receives a .wav file of a human voice. The system subtracts the .wav files with delay and gain and performs an FFT on the result. Next, the system measures the magnitude of the voice frequency region and sets the music volume for the received voice level.

Figure 1. Design Architecture

SD Card CPU0(Audio Play)

DAC FIFO

AudioCodec

Speaker

Microphone

Attenuated& DelayedData FIFO

FFT Result FIFO

FFT

CPU1(Volume Control)

Real Imaginary

16

Page 26: Nios II Embedded Processor Design Contest - Outstanding - Altera

Driver Assistance Tool

Design DescriptionWe built our design using the following general steps:

1 Using SOPC Builder, we created a DAC FIFO buffer to play the MP3 files. Additionally, we created attenuated and delayed data FIFO buffers to subtract the recorded data from the original data. The design sends the subtracted data to the FFT block.

2. We designed an I2C control module as an audio codec that is controlled by a Nios II processor. We can control the audio codec with software written using the Nios II Integrated Development Environment (IDE).

3. We incorporated the FFT MegaCore® function into our design.

4. We linked all of the modules and assigned addresses and interrupt requests (IRQs) to each module.

Design FeaturesOur design consists of two Nios II processors and several intellectual property (IP) functions.

■ One processor has an MPEG audio decoder (MAD) application, making it function as an MP3 player.

■ The other processor communicates with the correlator and FFT blocks, and determines whether the driver is speaking. This processor also controls the MP3 player’s volume.

■ The correlation block compares input sounds with MP3 sounds, and computes the delay between the input and MP3 sounds.

■ The FFT block analyzes the input sounds and decides whether the sounds are human.

We used system-on-a-programmable-chip (SOPC) concepts to implement the design. SOPC Builder helped us to select IP blocks easily. The Quartus® II Compiler automatically recognized the IP blocks selected for the design and generated the required IP block device drivers and added them to the system library that the application software uses.

ConclusionThis project was a good experience for us. We learned a variety of things during this project, including:

■ We learned how to design hardware with the Development and Education (DE2) board.

■ We learned how to use the Quartus II software, SOPC Builder, and Nios II IDE. We think they are very useful for creating digital architectures.

■ We learned how to work on a team project, including how to cooperate, how to divide tasks among members of the team, and how to create a schedule.

17

Page 27: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

18

Page 28: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Multiple-Core Intelligent Traffic On-Vehicle Terminal

Third Prize

Nios II-Based Multiple-Core Intelligent Traffic On-Vehicle Terminal

Institution: Xi’an Institute of Post and Telecommunications

Participants: Xiao-Wu Chen, Li-Bin Liu, and Fang-Fang Zhang

Instructor: Jun Liu

Design IntroductionAs China’s automobile industry has developed in recent years, more people are traveling in private cars to explore tourist sites. The traditional method of using maps to get to a destination does not provide enough objective details. Instead, car navigation systems are the future.

The navigation systems currently in the market provide information about roads and locations, but do not tell the driver about traffic jams, available parking lots, etc. Using a PC-based, real-time, Internet road inquiry system is not convenient or practical. To solve this problem, we designed an embedded system, the Nios II®-based multiple-core intelligent vehicle terminal. The system is convenient and practical, and can obtain real-time information, such as traffic jam and parking lot information, by wirelessly connecting to a background server via the Internet. Figure 1 shows the system schematic.

Figure 1. System Schematic Diagram

GPS Internet

Server

GPRS

CarTerminal

19

Page 29: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

We chose the Nios II processor for our design for the following reasons:

■ Using multiple Nios II processors solves a variety of problems, such as low microprocessor (MCU) processing speed, limited peripheral resources, complicated interface configuration, complex hardware, and software design and programming. With two Nios II processors, we can allocate the system controls and peripheral access, which greatly decreases the software workload.

■ Programmable logic devices (PLDs) are outstanding in lowering project development costs and for trial chip production. As we developed and built the design, we could program and reprogram the PLD.

Our Nios II-based multiple-core intelligent vehicle terminal is suitable for a variety of vehicles, but we mainly target mid-level and high-end cars.

Function DescriptionThe Nios II-based multiple-core intelligent vehicle terminal implements the following functions:

■ Global positioning system (GPS)—The system uses a GPS to note the vehicle’s position accurately.

■ Navigation—Users can scale and translate the map, find a location, and select the destination. The system displays the vehicle’s real-time surroundings (roads and buildings).

■ Real-time road message updates—As the user is driving, the system displays real-time traffic jam information on the e-map, reminding the driver to avoid it.

■ Touch screen operation—With the touch screen, the driver can interact with the system at any time to complete any operation.

■ Client-server mode—The driver can obtain locale information from the background server, such as parking lots, hotels, hospitals, shopping malls, schools, etc.

Performance ParametersThe following list describes the performance and specifications for our system.

■ Power supply

● DC voltage: 12 V

● Operating current: 1 A

● Power consumption: 12 W

■ Operating temperature: 0 to 50° C

■ Hitachi IS61LV6416-10T LCD

● LCD resolution: 640 x 480

● LCD display space: 19 cm (7.5 inches)

20

Page 30: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Multiple-Core Intelligent Traffic On-Vehicle Terminal

● LCD display color depth: 8-bit RGB

● Viewing angle: 80 degrees

■ GS1100

● Operating voltage: 5 V

● Frequency: 1 Hz

● Data transmission velocity: 9,600 bits per second (bps)

● Positioning error: 15 m

● Features a built-in passive antenna, low power consumption, and sensitivity up to -153 dB/mW (dBm)

■ Siemens MC35I—Double-frequency general packet radio service (GPRS) module, circuit switched data (CSD) maximizing to 14.4 kilobits per second (Kbps), unstructured supplementary service data (USSD), and nontransparent mode.

■ Hitachi IS61LV6416-10T touch screen

● 19 cm (7.5 inches), same as the LCD

● 4-wire resistive touch screen

■ Epson S1D 13503F LCD controller:

● Available modes: 320 x 240 x 256 colors, 640 x 480 x 16, 8, or 4 colors, 1,024 x 768 x 2 colors

● Available memory size: 128 Kbytes maximum

■ MXB7843 touch screen controller—Programmable 8-12 route analog-to-digital (A/D) conversion, 2-route input serial peripheral interface (SPI)

Design ArchitectureThe system design is divided into hardware and software.

The hardware design includes the following items:

■ Nios II dual-core configuration

■ Peripheral driver circuit board design

■ Self-defined expanded bus

■ GPS module

■ GPRS module

■ LCD module, touch screen module, etc.

21

Page 31: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

The software design includes the following items:

■ Self-defined custom instruction

■ Embedded real-time operating system μC/OS-II

■ Embedded graphics interface development software μC/GUI

■ Self-defined map messages system

■ GPS data receiving and serial driver

■ GPRS data receiving and delivering, network communication protocols (e.g., PPP, TCP/IP)

Figure 2 shows the software structure.

Figure 2. Software Structure

Table 1 shows the functions and main technologies used in our Nios II-based multiple-core intelligent vehicle terminal system.

Figure 3 shows the system hardware.

Table 1. Functions and Technologies

Function ImplementationNavigationLocation enquiry functionFunction of real-time update of the road messages

LCDTouch screenSelf-defined expanded busSelf-defined map messagesμC/GUI, μC/OS-II

Dynamic positioning function Serial communication

Client-server mode GPRS wireless communicationSelf-defined custom instructionNetwork protocolDatabase

Dual core Nios II CPU 1 operates the application (GUI)Nios II CPU 2 operates the GPS and GPRS modules

Application

Map Messages Place Name Messages

μC/GUI

GPS DataReceiving

GPRSCommunication

Serial Driver Protocol Driver

μC/OS-II

Dual Core & Communication

22

Page 32: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Multiple-Core Intelligent Traffic On-Vehicle Terminal

Figure 3. System Hardware

System StructureThe system provides customers with real-time traffic jam information. In client-server mode, the system stores static map information in the vehicle terminal and delivers dynamic information using the background server and the wireless communication module.

Hardware StructureFigure 4 shows the hardware design. CPU 1 operates the application (GUI), CPU 2 operates the GPS and GPRS modules. Three resistive cores use CPU 1 and CPU 2 to communicate. The GPS signal provides communication between the GPS module and the key GUI. The GPRS uses the GPRS_TX signal to deliver data to the GUI. The GUI uses the GPRS_RX signal to deliver data to the GPRS.

23

Page 33: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 4. Hardware System Structure

Software Module StructureFigure 5 shows the software module interface digram. The interface lets the application (GUI), GPS module, and GPRS module communicate.

Figure 5. Software Module Interface Diagram

Figure 6 shows the software flow.

SDRAM

8-MbyteFlash

CPUP1

CPUP2

Mutex

GPS

GPRS_TX

GPRS_RX

SRAM

FPGA

UART_0 UART_1

GPS GPRSLCD

Interface

Receive Message

GPRS

SendMessage

GUI Get GPS

SendGPS Data

ReceiveMessage

GPS

Communicate bySpecifying the Cache

Send Message

24

Page 34: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Multiple-Core Intelligent Traffic On-Vehicle Terminal

Figure 6. Software Flow Design

Design MethodologyOur methodology was to use an FPGA to construct our system architecture, including the LCD interface, touch screen interface, serial interface, etc. See Figure 7. The following sections describe our design in more detail.

SystemPower-On Reset

CPU (P2) CPU (P1)

μC/OS-II InitializationOS_Init()

System Startsthe Tasks

Task_Start()

Executethe Tasks

GPSInitialization

GPRS NetworkInitialization

Receive GPSData & SendIt to the Users

Receive theUsers'

Commands

Receive the Datafrom the GPRS

Network

Mailbox

Execute ReadConversionOperation

Command ControlWords or Data?

MessagesReturned?

Display MapMessages

SystemInitialization

Initialize theNavigation

System

Command

Data

Y

N

25

Page 35: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 7. System Architecture

Nios II Dual-Core Processor DesignAs we developed our design, we determined that implementing μC/GUI was complex, that we needed heterogeneous map data processing, and that μc/OS-II required a huge stack when deploying the GUI. These factors made for a poor final product, so we decided not to use μc/OS-II for the whole project. Instead, we operated μC/GUI without the OS and the GPS and GPRS modules with the OS. To implement it, we chose a dual-core design.

The dual-core system fully uses the Nios II processor’s features, and has the following arrangement:

■ First CPU:

● Perform human-machine interaction without the OS.

● Operate μC/GUI and the application program.

■ Second CPU:

● Set up the embedded real-time μC/OS-II OS.

● Receive the front-end GPS data through the UART1 serial interface.

● Communicate with the GPRS module through the UART2 (self-defined expansion) serial interface.

The two Nios II processors communicate through a mailbox. The dual-core resistive communication mechanism did not operate well, so we created a mailbox mechanism to allow resistive communication between the two cores.

26

Page 36: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Multiple-Core Intelligent Traffic On-Vehicle Terminal

A mutual exclusion object (mutex) and mailbox allow the two cores to share memory, which is 2 x 32 bits (see Table 2). We set the value to 0x0000, set the reset to 1, and enable the mutex. The mailbox allows the cores to communicate. The mutex resides in the mailbox and waits to be modified by a CPU. The following code shows the structure of the shared memory and mailbox.

typedef struct {alt_u8 Flag; // avoid repeated reading and writing of the dataalt_u32 Data_Length; // data length alt_u8 * Data_Buf; // data cache

}SHARE_MEMERY;

typedef struct {void * Mutex_Base;SHARE_MEMERY * Share_Memery;

}MAIL_BOX;

Table 2 shows the memory structure.

Figure 8 shows the hardware system, which is generated by adding intellectual property cores in SOPC Builder.

Figure 8. SOPC Builder Hardware System

Table 2. Shared Memory

Offset Register Name Read/write Bit Descriptor31 … 16 15 … 1 0

0 mutex RW OWNER VALUE

1 reset RW N/A N/A RESET

27

Page 37: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

LCD and Touch ScreenThe Hitachi SX19V001-Z2A LCD and touch screen have 640 x 480 resolution and an eight-color screen. The touch screen is 4-wire resistive. The LCD controller is the Epson SED13503. The touch screen’s drive module is the analog-to-digital (A/D) sampling converter MXB7843. It connects to the 4-wire resistive touch screen, and converts the touch screen voltage value into a value we can work with more easily (the conversion is 12 bits and 1/4,096 of the X or Y direction). The MXB7843 devices sends the controller the X, Y values that have gone through A/D conversion, which is not practical without translation. The touch screen’s resistance value fluctuates in nearly a line, which means we can use the following translation formulas:

X coordinate = 640 x (X - X1) / (X2 - X1)

Y coordinate = 480 x (Y - Y1) / (Y2 - Y1)

Where the LCD’s resolution is 640 x 480, the voltage on LCD’s upper left corner is X1 and Y1, the voltage on its lower right corner is X2 and Y2, and the voltage on any point of the screen is X and Y.

We designed the LCD touch screen drive circuit, which connects to the self-defined expanded bus and SPI interface, i.e., the address, data, and control signals. All signals are implemented using the two universal input and output interfaces (JP1 and JP2) on the Altera® Development and Education (DE2) board.

GPRS ModuleWe use the Siemens MC35I device as the GPRS module. This module has passed radio and telecommunications terminal equipment (R&TTE) and Global Certification Forum (GCF) authentication. It supports end-to-end and end-to-user communication, SMS and GPRS-like data transmission and voice calls, and the GSM07.07 protocol. It also provides a variety of interfaces, including a subscriber identity module (SIM) card interface, power interface, standard RS-232 interface, etc.

Because the MC35I communication module does not support parity, we used serial communication with a 115,200 bps baud rate, 8 data bits, 1 stop bit, no parity bit, and without streaming control. This arrangement occupies minimal CPU time.

Compared to other wireless communication technologies, GPRS has real-time operation and stability. It also has the following features:

■ Always on-line—As long as GPRS service is activated, it remains on-line, which is similar to a wireless network.

■ Billing by use—Although it is always on-line, the user does not have to worry about the expense. The system only triggers billing when communication traffic is generated. The utilization-based billing mechanism is more reasonable and logical than other methods.

■ Quick login—The packet service is much faster than a dial-up connection.

■ High-speed transmission—Theoretically, GPRS’s maximum transmission speed is 171.2 Kbps. The GPRS systems currently being used support transmission speeds of about 40 Kbps.

GPS ModuleThe GPS module uses the GS1100 device. Its receiver has high sensitivity (up to -153 dBm), features low power consumption, has a built-in passive antenna, conforms to the National Marine Electronics Association (NMEA) v3.0 protocol, and transfers data through the RS-232 interface.

28

Page 38: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Multiple-Core Intelligent Traffic On-Vehicle Terminal

CRC Check DriveBased on Altera’s cyclic redundancy code (CRC) check, we designed a new CRC check in hardware. We implemented a large network data check and accelerated the checking speed.

Hardware System DescriptionThis section describes our system’s hardware design.

μC/OS-IIμC/OS-II is the basis of the GPRS communication module. It allows communication between tasks in the communication module and performs a variety of tasks, such as Task_start (main control tasks), Task_GPS_Send (receives GPS data from the serial interface and sends GPS messages to the GUI), Task_GPRS_Send (sends the data received from the background to the GUI), and Task_GPRS_TCP_Send (sends front-end service inquiries to the background console).

μC/GUI and ApplicationμC/GUI provides interface elements (form and controls), drawing functions, excellent color management, Chinese character support, and a hardware driver. We developed our application based on the GUI. We used a compressed bitmap for the intelligent traffic navigation system interface and the storage format. See Figure 9.

Figure 9. Compressed Bitmap for Navigation Interface

Figure 10 shows the application software flow chart.

29

Page 39: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 10. Application Software Flow Chart

Notes to Figure 10:(1) Front-end command means all of the commands except the surrounding messages on the interface.(2) Background command means the surrounding messages.

Our design does not use μC/GUI and μC/OS-II together for several reasons:

■ Implementing μC/GUI is very complex.

■ Our design requires heterogeneous map data processing.

■ μC/OS-II requires a huge stack when the GUI is implemented at the same time.

■ μC/GUI and μC/OS-II have little connection.

To implement the GUI and software application we defined several files:

■ GUIConf.h—Describes the size of μC/GUI function module and dynamic storage (used for memory devices and window object), the default font setup, and other basic GUI control definitions.

SystemInitialization

CoordinateConversion

Display MapMessages

Initiate theNavigation

System

ObtainGPS

Message?

Display

Are Any Buttons Pressed?

Are Any Messages Returned?

Judge theButton Types

CommandControl Words

or Data?

Perform ReadConversionOperation

DisplayFront-EndCommand

BackgroundCommand

Is It Connected to the

Background?

Deliver Inquiryto Background

Server

ExecuteCorresponding

Front-EndOperation

Command

Data

Y

Y

Y

N

N

N

30

Page 40: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Multiple-Core Intelligent Traffic On-Vehicle Terminal

■ LCDConf.h—Provides the LCD parameter control files such as LCD size, controller types, bus width, color selection, etc.

■ GUI/CORE/LCD_ConfDefaults.h—Describes all default configuration items, such as the LCD numbers, controller numbers, color palette, the screen’s reverse setup, etc.

■ GUITouchConf.h—Configures the touch screen by compiling the following functions according to the touch screen and the control chip (we need not the X, Y data in the software, so these functions are empty structures):

void TOUCH_X_ActivateX (void)?// Ready to measure the data on X-axisvoid TOUCH_X_ActivateY (void)?// Ready to measure the data on Y-axis

The following functions are called by GUI_TOUCH_Exec():

int TOUCH_X_MeasureX(void); // Return the value of X according to AD // conversion result

int TOUCH_X_MeasureY(void); // Return the value of Y according to AD // conversion result

Figure 11 shows the μC/GUI software structure.

Figure 11. μC/GUI Software Structure

Using the built-in μC/GU function, memc_driver, we convert the cache reading and writing into that of the memory (physical address), which allows the LCD to display the images correctly.

The run-length encoding (RLE) algorithm goes through a scan line and replaces adjacent pixels with the same color value based on the number of occurances of the color and the color value of the pixels. For example, aaabccccccddeee is replaced by 3a1b6c2d3e. Using the RLE algorithm, fully compressing the μC/GUI interface requires a large memory and a lot of CPU time. Our system does not need full compression and our map has consecutive white pixels as the dominant color. Therefore, we used an RLE-like algorithm (only compressing the white pixels) to accelerate the compression rate while slightly decreasing the compression ratio. This technique, along with using a partial compression strategy (we only compress the part that is displayed on the screen), means that we do not need as much memory.

The DE2 board’s flash device is 4 Mbytes and our map can be as much as 4.1 Mbytes. So we have to compress the map if we want to use it. With the RLE-like algorithm, we compressed our map to 2.45 Mbytes, which saves a significant amount of flash memory.

GPRS Communication ModuleThis section describes the methods the GPRS uses for communication.

Text

Numeric Value

Image

Input Devices

Font

WindowManager

TouchScreen

Mouse

Window Object

Window Controls

μC/GUIApplication Program

Driver

μC/GUIFunction Blank

LCDDriver

μC/GUIHardwareInterface

LCD

GUIConf.h

LCDConf.h

31

Page 41: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

UART Serial DriverTo avoid CPU hangs caused by the Nios II UART driver and to help determine the package’s start and end points, we compiled a UART driver suited for our system that operates in real time and is easy to use. The driver’s data structure shown below:

typedef struct { alt_u32 BASE; alt_u32 IRQ_ID; alt_u8 Rx_Buf [ Uart_Buf_Size ]; alt_u32 Rx_Bytes; alt_u32 In_Ptr; alt_u8 Rx_Star_Char; alt_u8 Rx_End_Char; alt_u16 Same_Flage; Data_Buf_Queue * Data_Queue; OS_EVENT * Sem; OS_EVENT * Rx_Sem;

}UART;

Controlling the MC35I Module’s AT InstructionThe GSM07.07 standard defines complete AT instructions. Our system only uses instructions to control a short message, phone number directory, and basic AT instructions expanded by GPRS. AT instructions have the form AT+XXXXXX. The system sends the AT instructions’ ASCII codes to the GPRS communication module, then delivers the \n ASCII code to confirm the command. After executing the command, the GPRS communication module returns the result, which is \r\nOK\r\n for success and \r\nERROR\r\n for failure. Table 3 shows the basic AT instructions we used.

Custom InstructionsThe communication module uses the following custom instructions to implement the big/small end conversion.

void IP_Assemble( IP * IP_Packet ) { IP_Packet->Total_Len = ( ALT_CI_ENDIAN ( IP_Packet->Total_Len ) >> 16 );IP_Packet->ID = ( ALT_CI_ENDIAN ( IP_Packet->ID ) >> 16 );IP_Packet->Offset = ( ALT_CI_ENDIAN ( IP_Packet->Offset ) >> 16 );IP_Packet->Ckecksum = ( ALT_CI_ENDIAN ( IP_Packet->Ckecksum ) >> 16 );IP_Packet->Src_IP = ALT_CI_ENDIAN ( IP_Packet->Src_IP );IP_Packet->Dst_IP = ALT_CI_ENDIAN ( IP_Packet->Dst_IP );

memcpy ( &PPP_Packet_Buf [ PPP_Packet_Buf_Length ], IP_Packet, sizeof ( IP ) - 8 );

}

Point-to-Point Protocol (PPP)In this system, the μC/OS-II OS controls the GPRS module to access the network, implementing point-to-point communication. PPP provides a standard method to encapsulate a multi-protocol message on

Table 3. AT Instructions

Instruction Function DescriptionAT If an OK is returned, the GPRS module supports the AT instruction and operation is normal.

AT+CREG Network registration.

AT+IPR Set data terminal device velocity.

AT+CGDCONT Define PDP context.

AT+CGACT PDP context activation.

AT+CGQMIN Service quality introduction (as small as acceptable).

AT&W Reserve all software modification of the module.

32

Page 42: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Multiple-Core Intelligent Traffic On-Vehicle Terminal

the point-to-point link. It supports the dynamic distribution and management of IP addresses, the transmission of a synchronous (delivery of bit-oriented synchronous data module) or asynchronous (start bit + data bit + parity bit + stop bit) physical layer, network layer protocol reuse, link configuration, quality detection, and error correction, and can negotiate multiple parameters.

PPP has three parts: the link control protocol (LCP), network control protocol (NCP), and expanded protocols, such as the multi-link protocol. To establish PPP communication, the system first sends an LCP package to each side of the PPP connection to configure and test the data connection. After the connection is established, the corresponding entity is confirmed. Then, the system sends an NCP package to select one or more network layer protocols for configuration. Once the selected network layer protocol is configured, the package is delivered on the link. The link remains in a configurable state unless the LCP and NCP packages terminate the connection or other unexpected events occur (e.g., the non-activity clock is fully occupied by counting or there is interference). Figure 12 shows the PPP link phase transition.

Figure 12. Link Phase Transition

TCP/IP The system must deliver collected data to the server immediately through the Internet. The delivery must be in real time and fast, but have a small amount of data. Because of these requirements and the functions implemented using TCP/IP, we simplified the subprotocols as shown in Table 4.

Figure 13 shows the GPRS communication module flow chart according to these protocols. We implemented PPP, TCP/IP, and UDP ourselves.

Table 4. Simplified TCP/IP Subprotocols

Layer Implemented ProtocolApplication layer Subscriber self-defined

Transmission layer TCP/UDP

Network layer IP

Link layer PPP

Link UnavailabilityPhase

Link EstablishmentPhase

Link TerminationPhase

AuthenticationPhase

Failure

Network LayerProtocol Phase

33

Page 43: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 13. GPRS Communication Module Flow Chart

GPS PositioningThe vehicle navigation system is part of the intelligent traffic system. Its built-in GPS antenna receives data from at least 3 out of 24 GPS satellites orbiting the Earth. With help from the navigation system’s e-map, it converts the latitude and longitude position coordinates measured by the GPS satellite signals, which confirms the vehicle’s position on the map.

Client-Server ApplicationWe created the server-side software using Microsoft SQL2000 and Microsoft Visual Studio .NET 2003. The database contains hotels, shopping malls, hospitals, railways and bus stations, gas stations, roads, etc. The .NET system mainly deals with interface design, algorithm design, and data processing.

The client-server mode has the following advantages:

■ It reduces the front-end burden and costs by placing the information in the background.

■ It facilitates software function expansion. The driver can quickly download another city’s map (such as Xi’an).

■ It requires real-time operation. The background server can immediately send the road conditions to the driver.

■ The navigation system’s powerful background server provides real-time geographical information as well as traffic jam conditions, helping the driver learn about road conditions and to guide traffic at rush hour.

Figure 14 shows the background server’s login interface.

End

Is There aTerminatedMessage?

Enter NetworkState

Establish PPPLink Layer

GPRSInitialization

Y

N

34

Page 44: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Multiple-Core Intelligent Traffic On-Vehicle Terminal

Figure 14. Background Server Login Interface

Figure 15 shows the interface operation.

Figure 15. Interface Operation

Traffic Congestion AlgorithmThis algorithm is used in server terminals. Using client-server advantages, the vehicle terminal can be used as a data collection terminal for the traffic authority. As long as the traffic authority has the system (e.g., installed in a bus or taxi), it will be informed of road congestion according to the information sent

Intelligent traffic system of the city

About

System informa-tion enquiry

New system infor-mation registration

System informa-tion modification

Systemmanagement

Geographicalinformation

Traffic congestioninformation

Traffic lights

Vehicleterminal

Vehicleterminal

System administrator’sinformation modification

Parking lotterminal

Parking lotterminal

New system administratorauthorization

Traffic lightsterminal

Traffic lightsterminal

Road noticeboard

Road noticeboard

Parking lotaccount

Parking lotaccount

Parkingrate

Parkingrate

Parkingcard

Parkingcard

Road noticeboard

Parking lotinformation

Vehicleterminal

Parking infor-mation

Exit

35

Page 45: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

from the vehicle terminal. (It is also available if the user does not send the information.) The algorithm is defined as:

If R << R0 and N is bigger than a specified number (to reduce the possibility that parking affects the result), the road segment is considerd to be congested. Congestion increases as R decreases.

where:

■ N is the number of vehicles that are installed with this device and running on the road S.

■ R is the average vehicle speed (the specified time is longer than the duration of a red light to avoid errors).

■ R0 is the road’s speed limit.

Design FeaturesOur design has the following features:

■ Uses two Nios II processors—Using two processors solves software design problem and improves system performance. Using an FPGA to implement the two processors reduces hardware costs and improves performance.

■ Custom peripherals and instructions—The LCD, touch screen GPS and GPRS, all as custom peripheral processing, use a custom bus and SPI (the 40-pin expansion headers on the Altera DE2 board) to complete the interface design. We use custom instructions to implement the big/small end conversion of the GPRS communications module.

■ Client-server mode—The server can provide better services for drivers using client-server mode.

■ Traffic jam algorithm and best path—With the traffic jam algorithm described previously, traffic authorities can learn traffic conditions, helping them guide traffic and send traffic updates to drivers who can then choose the best route.

■ Integrating μC/GUI and μC/OS-II in the embedded system—μC/GUI is the design platform of the whole navigation module, which provides various interface elements (windows and controls), good color management, Chinese character support, and low-level driver support for the hardware. We used it to compile applications and provide a friendly user interface. We usedμC/OS-II, which is effective and reliable, to deploy system tasks.

■ Using TCP/IP and PPP in the embedded system—Applying TCP/IP and PPP to the system did not require memory management and allowed transmission of small volumes of data.

■ GPRS—To send real-time road information to drivers and to facilitate communication between drivers and the control center, the system establishes an external GPRS module and connects with the Nios II processor in the FPGA using the serial port. Drivers can directly send service requests to the back-end monitoring center and receive information. Meanwhile, the back-end learns traffic conditions in real-time by collecting the road information.

■ GPS—We used a GPS to implement real-time, accurate vehicle positioning and to provide the necessary data for the traffic jam algorithm.

36

Page 46: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Multiple-Core Intelligent Traffic On-Vehicle Terminal

■ Touch screen LCD—Touch operations are easy and the LCD is beautiful, offering a user-friendly interface.

■ Collaborative hardware/software development—Because the hardware and software in the SOPC Builder can be clipped, we can jointly develop the hardware and software during system development. Hardware and software development begin and end almost simultaneously, saving time. Additionally, we accelerated the product launch and can support a longer product life cycle. If we need to modify some definitions during development, we can simply generate a new Nios II core, which does not impact other peripherals or Nios II programs.

ConclusionIn our two-month project, we successfully designed the Nios II-based multiple-core intelligent vehicle terminal, improving our knowledge of the Nios II embedded system. By meeting specific requirements, the multiple-core processor solution provides higher performance without adding more hardware. Using dual-core processor technology is another way to enhance a processor’s performance. Because the processor’s real performance is the sum of the instructions that processes per clock cycle, adding another processor doubles the number of instructions executed per clock cycle.

The system design is centered on one FPGA. Using a range of Nios II functions and features, the control and data processing are managed by two Nios II processors implemented in an FPGA, demonstrating the high integration of system-on-a-programmable-chip (SOPC) solutions. Designing the system with SOPC tools helped us to streamline the hardware and software system, disconnecting the software development from the hardware development. Moreover, by combining multiple processors and selecting suitable peripherals, memories, and I/O interfaces, we could build a customized embedded system with the lowest price, simplest design, highest performance, and lowest risk.

The design uses the new Nios II Integrated Development Environment (IDE), which, compared with the original environment, has simpler register operation and a more convenient operating and debugging environment. The original development tool was difficult to download and debug because it did not have a graphical compiling and debugging environment and download speeds were slow. The new Nios II IDE downloads and debugs through the JTAG-UART with a Windows-style interface.

The competition showed us how to develop a project and helped us learn a new design concept: SOPC. We believe cooperation is vital to the success of such an arduous task with many sophisticated technologies. Without a joint effort and team spirit, we would never have finished the design. We will always have good memories of the experience.

Due to limited time and resources, the system is unsatisfactory in some areas. The GPS cannot receive GPS satellite signals indoors, so after starting the indoor system normally, the GPS shows the GPS information that was saved when the system was last closed.

Due to limited storage space and data, we only made a map for Xi’an. Our design is missing data for other cities, e.g., Shanghai, and information on relevant places.

Appendix: GPRS Communications Testing DataRefer to the PDF of this paper on the Altera web site at http://www.altera.com for a detailed description of the GPRS communications testing data.

37

Page 47: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

38

Page 48: Nios II Embedded Processor Design Contest - Outstanding - Altera

Consumer Applications

Consumer Applications

This section includes projects that can be used as consumer applications, including:

Unattended Wireless Search Robot ..................................................................................................... 41

Set-Top Box Capable of Real-Time Video Processing ....................................................................... 51

Digital Watermark-Based Trademark Checker ................................................................................... 71

SOPC-Based Speech-to-Text Conversion ........................................................................................... 83

Nios II Embedded Electronic Photo Album...................................................................................... 109

Automatic Scoring System................................................................................................................ 123

Multimedia Decoder Using the Nios II Processor ............................................................................ 173

Omnidirectional Mobile Home Care Robot ...................................................................................... 183

39

Page 49: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

40

Page 50: Nios II Embedded Processor Design Contest - Outstanding - Altera

Unattended Wireless Search Robot

First Prize

Unattended Wireless Search Robot

Institution: Kwangwoon University

Participants: Yoongoo Kim, Younggon Lee, Jeongwook Yim

Instructor: Yongjin Jeong

Design Introduction Our project, an unattended wireless search robot, implements a real-time image processor using a camera and a Nios® II-based system running on an FPGA. The robot searches for objects; when it finds one, it attempts to recognize it. It then sends a scaled 160 x 120 image via Ethernet to a PC. We used a radio frequency (RF) module, which increases the range of the robot, and the system ignores losses or errors during transfer. We used Sobel edge detection and the sonde method for faster image processing. Our implementation processes and transfers two to three frames per second.

Figure 1 shows the project’s hardware system setup, including the Development and Education (DE2) board with an FPGA, National Television System Committee (NTSC) camera, 2.4-GHz RF modules, remote control (RC) car unit, and client PC. On the software side, we use the Nios II processor with a μC/OS-II RTOS, a lightweight TCP/IP network stack, and a Microsoft Foundation Class (MFC) Library. Finally, we considered various standards, e.g., NTSC, YCbCr, and RGB. We also explored concepts involving digital image processing, TCP/IP, RF theory, and RTOS.

41

Page 51: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 1. System Organization

Our system operates as follows:

1. Using a client PC, the user specifies the object to be searched and transfers information about the object through the Ethernet link.

2. The RF module transfers picture frames from a camera to the DE2 board, extracts Y-pixel information from the picture, and saves it to SRAM on the DE2 board.

3. The Y-pixel picture is converted to 160 x 120, and is subjected to Sobel edge detection, which converts the picture into a binary image. Noise is removed and the picture is cropped, yielding a result picture.

4. The resulting picture is compared to the information about the search object. The picture is deemed to be that of the object if the results of the comparison are a greater than 90% match.

5. The result picture is transferred to the PC via Ethernet, where a client program displays the picture.

The project’s system, named NIOS2-System, consists of an imaging device, CPU, RAM, DRAM, flash, a video codec, the Avalon® bus, an Ethernet interface, RS-232 port, and a user-defined module. We implemented the image processing module in the C language. We performed data transfer using the built-in lightweight TCP/IP (LWIP) stack configured for TCP/IP running on a μC/OS-II RTOS. We used the RS-232 connection to test the data transfer. The image recognition algorithm chooses between sonde and pixel matching, based on the simulation results.

Function DescriptionThis section provides a functional description of our project.

Imaging Device Module Figure 2 shows the system block diagram. The imaging device module, located on the right side of Figure 2, reads the NTSC video data. The 2.4-GHz RF module transfers video data to the DE2 board.

42

Page 52: Nios II Embedded Processor Design Contest - Outstanding - Altera

Unattended Wireless Search Robot

The ADV7181 video codec on DE2 board converts the NTSC video data to YCbCr. In the YCbCr standard, the original size of one frame is 720 x 525, of which only Y pixel information is required. We implemented a hardware module in Verilog HDL that saves the Y pixel data, which is then resized to 640 x 480 and stored in SRAM.

Figure 2. System Block Diagram

Image Preprocessing ModuleThe System_0 module performs the image preprocessing. After the image data is saved in SRAM, it is copied to SDRAM using the device driver functions provided by the hardware abstraction layer (HAL). The image data is resized to 160 x 120, undergoes Sobel edge detection, and is converted into binary format.

Image Recognition ModuleThe sonde method provides optical character reading. Lines are drawn across the character in question and cross-points, i.e., points where the lines intersect the character, are captured to yield their characteristic position.

For example, Figure 3 shows three lines drawn across the number three. The number three has a characteristic cross-point position, so we can recognize the number by checking the cross-point positions. The sonde method is faster and more intelligent than pixel matching. However, sonde is not as effective for complex images. Therefore, we limited our test objects to 3 shapes—triangle, circle, and square—so that we could use the sonde method.

System_0

SDRAM

Multiplexer SRAM Interface

SRAM

EthernetInterface Y_pixel_select

Ethernet ADV7181

ClientPC

DE2_NIOS

DE2 Board

RC Car

RFModule CAM

RFModule

43

Page 53: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 3. Sonde Method

Using the sonde method, we performed image recognition by comparing the features of triangles, squares, and circles, and checking to see if there is a match of better than 90%.

Image Transfer Module This module transfers images and results via Ethernet to the client PC. The Ethernet chip is a DM9000a device. We modified the Ethernet driver to operate with the μC/OS-II RTOS. The robot communicates with the client PC at 100 megabits per second (Mbps) using the LWIP network stack. The PC converts the raw image data to bitmap data and displays the resulting image.

Performance ParametersThe design uses 14% of the logic elements available in the Cyclone® II device on the DE2 board and 2,094 registers, as shown in Figure 4. The DE2 board provided sufficient resources for this project.

44

Page 54: Nios II Embedded Processor Design Contest - Outstanding - Altera

Unattended Wireless Search Robot

Figure 4. Compilation Result

Figure 5 shows the debug messages sent from the DE2 board to the PC console through the JTAG connection. The messages show the various system processes.

Figure 5. JTAG Debug Messages

45

Page 55: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 6 shows the original image and the edge-detected output resulting from the Sobel edge detection operation. Edge detection makes it easy to work with the image.

Figure 6. Sobel Edge Detection Result

We limited shapes in our test to circles, triangles, and squares. Figure 7 shows the character line and cross points for each shape.

Figure 7. Sonde Method Character Lines & Cross Points

After performing the simulation 30 times, our results showed that the error rate using the sonde method was lower than using pixel matching, as shown in Table 1. Table 2 shows statistics of the resulting 30 pictures.

Using an RS-232 connection, it took 22 seconds to transfer one frame, which includes the time required to process the image. In contrast, TCP-IP requires only 2 to 3 seconds to transfer 1 frame, a 10x decrease in transfer time. However, this transfer time is still far from real-time. Therefore, we resized the image to 160 x 120, which requires only 0.25 seconds to transfer 1 frame. Because 3 to 4 frames can be transferred in 1 second, we treated this rate as roughly real-time.

Table 1. Sonde & Pixel Matching Error Rates

Sonde Pixel MatchingError Rate 1 : 5 1 : 3

Table 2. TCP-IP & RS-232 Comparison

640 x 480 160 x 120RS-232 22 seconds per frame 2 seconds per frame

TCP-IP 2 to 3 seconds per frame 0.3 seconds per frame

46

Page 56: Nios II Embedded Processor Design Contest - Outstanding - Altera

Unattended Wireless Search Robot

Figure 8 shows the resulting screen output by the program using an MFC on the client PC. The user selects the object to look for in the Request box. The Original Frame box shows a cylinder image and the Identified Frame box shows the processed image. The Result Messages box shows the messages that appear when the robot finds an object.

Figure 8. Client PC Captured Result

Design ArchitectureFigure 9 shows a photograph of our demo and the DE2 board setup. The demo shows an RC car image captured through the camera during a demonstration of our project. The board setup shows the DE2 board and RF modules working together. We ignored any errors caused by the differences in the illumination made for demonstration.

47

Page 57: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 9. Demo and DE2 Board

Figure 10 shows SOPC Builder and its components for the Nios II system. The project has one CPU, an LED, and a switch for simulation.

Figure 10. SOPC Builder Components

Our software used multi-threading and an event-driven interrupt mechanism for speed and flexibility, as shown in Figure 11.

48

Page 58: Nios II Embedded Processor Design Contest - Outstanding - Altera

Unattended Wireless Search Robot

Figure 11. Software Flow Chart

Design FeaturesOur design features are as follows:

■ Hardware/Software Co-Design Accelerated the Design Process—Image processing implemented with the DE2 board and Nios II processor is faster than general PC applications. The Nios II processor, Quartus® II software, SOPC Builder, and Nios II Integrated Development Environment (IDE) tools make hardware/software co-design fast and easy to implement and customize.

■ TCP/IP Increases the Image Transfer Rate—It takes about 20 seconds to transfer data using an RS-232 connection. In contrast, three frames can be transfered per second using TCP-IP. We were able to provide a real-time image display by reducing the image size.

■ Wireless Network Using 2.4-GHz RF Link—Wired networks did not work for long distances. We used a 2.4-GHz RF module to overcome distance restrictions. The wireless link makes the search robot ideal for dangerous unmanned search operations.

■ Wide Application—Many unmanned search robots exist for specific purposes, e.g., military or space applications. However, this project is designed primarily for commercial applications because of its reasonable price and small size. For example, it could be used for a cleaning robot, security machine, etc. Using FPGAs and a wireless network makes processing fast and efficient.

ConclusionWe designed a system that provides image recognition and transfers three to four frames per second. By resizing the image to 160 x 120, we shortened the processing time and designed a hardware module to capture the Y-pixel information. Sobel edge detection makes image processing easier and the sonde method works quickly and enables accurate recognition. Our program is fast and flexible because it is multi-threaded and event driven. Anyone can easily use the application via the interface we designed with MFC.

We learned the following things while working on this project:

■ For those who are skilled in software development, it is not easy to design a hardware system using an FPGA. But we learned how to implement our design one step at a time and it was fun.

NN

CommandBuffer

LWIP

ExpbotTask

ImageProcess

Task

MessageSend Task

B

MessageBuffer

B

N

MessageReceive

Task

N

49

Page 59: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

■ We spent a long time studying clock synchronization and implementing data path design control logic using Verilog HDL.

■ Because of the small SRAM size, we could not save full-color images and we had to reduce the image size to work with the SRAM size. We also studied the YCbCr, NTSC, RGB, and bitmap standards.

■ We found that it was not easy to use Ethernet to perform real-time transfers. The RS-232 connection was not easy to control because of the baud rate, flow control, etc. There was no easy solution, but we concentrated on the project with all of our efforts and were able to find a solution that worked for us.

■ This project was an opportunity to study about image processing and RF theory.

■ We found that we seldom had time for solving problems. When we did, we solved them one at a time and are proud of ourselves.

■ We had to find a new algorithm for image recognition and image processing. We made many mistakes and did not have enough information for the task. The sonde algorithm worked well for us. It is the best algorithm for the project.

■ We have not finished yet. We want to use a new RC car and use multiple CPUs for improved performance.

The Nios II processor-based system provided good performance for use in commercial applications and it offered us a multitasking environment. SOPC Builder is useful for implementing components in hardware easily, including a CPU and other system components. The Quartus II software and Nios II IDE made it easy to co-design hardware and software. The DE2 board is for educational purposes, but it could be used for small commercial projects.

50

Page 60: Nios II Embedded Processor Design Contest - Outstanding - Altera

Set-Top Box Capable of Real-Time Video Processing

Second Prize

Set-Top Box Capable of Real-Time Video Processing

Institution: Xi’an University of Electronic Science and Technology

Participants: Fei Xiang, Wen-Bo Ning, and Wei Zhu

Instructor: Wan-You Guo

Design IntroductionThe set-top box (STB) played an important role in the early development of digital TV. As the technology develops, digital TV will undoubtedly replace analog TV. This paper describes an FPGA-based STB design capable of real-time video processing. Combining image processing theories and an extraction algorithm, the design implements real-time processing and display, audio/video playback, an e-photo album, and an infrared remote control.

BackgroundThe television, as a home appliance and communications medium, has become a necessity for every family. Digital TV is favored for its high-definition images and high-fidelity voice quality. The household digital TV penetration rate in Europe, America, and Japan is almost 50%.

In August 2006, the Chinese digital TV standards (ADTB-T and DMB-T) were officially released. In 2008, our country will offer digital media broadcasting (DMB) during the Beijing Olympic games, signaling the end of analog TV. However, the large number of analog TVs in China means that digital TVs and analog TVs will co-exist for a long time. The STB transforms analog TV signals into digital TV signals and it will play a vital role in this transition phase.

There are four types of STBs: digital TV STBs, satellite TV STBs, network STBs, and internet protocol television (IPTV) STBs. Their main functions are TV program broadcasting, electronic program guides, data broadcasting, network access, and conditional access. Most STBs in the market use ASIC chips.

This paper provides an FPGA-based STB design that implements real-time processing, display, and playback of audio/video as well as e-photo album and music playback functions. The system uses system-on-a-programmable-chip (SOPC) technology with a Nios® II processor embedded in an FPGA.

51

Page 61: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

The design enhances the system’s flexibility and integrity, shortens the development period, and improves efficiency. Additionally, it is highly scalable, easy to update, cheap, energy-efficient, and highly competitive.

Development PlatformsWe used the following hardware and software development platforms to build our design.

Hardware PlatformWe used the Altera® Development and Education (DE2) board as the core platform. The monitors, speakers, and infrared remote controls are peripherals. The design processes and plays video information input by the video source in real time, and can simultaneously display pictures and play audio/video data stored in the secure digital (SD) card.

The DE2 board provides a variety of resources, including the Cyclone® II EP2C35672C6 device (which has 105 M4K RAM blocks), SRAM (512 Kbytes), SDRAM (8 Mbytes), flash (4 Mbytes), an SD card interface, a VGA digital-to-analog converter (DAC), TV decoder, Infrared Data Association (IrDA) transceiver, etc. Figure 1 shows the hardware platform.

Figure 1. Hardware Platform

Software PlatformWe used the Altera Quartus® II software, SOPC Builder, and the Nios II Integrated Development Environment (IDE) to create the logic and program files. The Quartus II software is the fourth-generation development tool for programmable logic devices, providing functions ranging from design entry to device programming. The SignalTap® II logic analyzer, which is provided with the software, is a system-level debugging tool that captures and displays real-time signal behavior. It allows engineers to observe the interactions between hardware and software in system designs.

SOPC Builder is an automatic system development tool. It enables the designer to add processors and peripheral interfaces to the Nios II processor. Additionally, SOPC Builder can automatically generate logic that the on-chip bus and bus arbiter need to connect microprocessor cores, peripherals, memories, and other IP cores, which facilitates high-performance SOPC designs.

The Nios II processor is a configurable, versatile, RISC embedded processor. It can be embedded into Altera FPGAs, and users can adjust the features and performance of the embedded system according to their needs. The Nios II IDE accelerates software development.

52

Page 62: Nios II Embedded Processor Design Contest - Outstanding - Altera

Set-Top Box Capable of Real-Time Video Processing

Using these software tools and SOPC concepts, we implemented an FPGA-based STB demonstration system that is capable of embedded real-time video processing.

Function DescriptionThis section provides a functional description of our design.

Video Processing and DisplayThe digital TV resolution and system are different from those of analog TV; therefore, the STB must convert between digital and analog TV resolution. The system converts the resolution and aspect ratio, captures pictures of the original video, and outputs it. During the output, several display effects, such as video-in-picture and picture-in-video, appear.

The original video provided by the video source is input by the audio/visual (AV) interface and is decoded by the TV decoder chip (ADV7123). It passes through the FPGA internal hardware logic for color space conversion, which converts YCbCr signals into RGB ones. Then, the system starts video processing, sends the signals to the video buffer, displays them, and outputs them via the VGA interface.

Video processing includes:

■ Aspect ratio conversion—Images convert between the 16:9 and 4:3 aspect ratios in video output. The 16:9 ratio meets ergonomic requirements, and 4:3 satisfies traditional programming needs.

■ Image freeze—The user can freeze images in the video output to view them for a long time.

■ Image capture—During video output, the user can zoom in on a selected part of the image, and display it on the whole screen.

■ Picture-in-video—To enrich the effect, the user can display pictures while video is playing. For example, if an annoying advertisement appears, the viewer can watch pictures as an alternative. Or, the user can adjust the display positions of pictures and videos.

■ Video-in-picture—In this case the user can display video with pictures. For example, the user can set pictures in the background when the video resolution is low.

■ Video resolution conversion—This function converts the original 640 x 480 video resolution into 120 x 60 or 120 x 96 to meet display requirements.

Menu DisplayThe menu display helps users learn which information is in the system, makes the system user friendly, and designs the menu’s on-screen display (OSD). This function shows the system name, resolution, aspect ratio, system, and other information.

During operation, the Nios II processor determines the video status and generates the menu information according to the system. If the system is in 4:3 mode, it generates the menu information for 4:3 mode; if it is in 16:9 mode, it generates the corresponding menu information. Then, the Nios II processor sends the information to the menu buffer, combines the menu data with the video data, outputs them through the VGA interface, and displays them on the monitor.

E-Photo AlbumMost digital devices today use external memories (such as an SD card) to store pictures and music. Our system can read the picture and audio information of external data sources. When there is no video input, the user can view pictures independently; otherwise, the user can view pictures as video-in-picture or picture-in-video. Additionally, users can enjoy music while viewing pictures.

53

Page 63: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

The Nios II processor reads the pictures stored in the SD card and sends them to the SRAM, overlapped with video data in real time, and outputs them through the VGA interface. The Nios II processor also reads the audio data on the SD card while reading pictures. It performs digital-to-analog conversion and outputs the audio through the audio port.

Infrared Remote ControlWe designed an infrared remote control to help users control the system. The infrared remote control sends out control signals, which are decoded into relevant pulse signals by the infrared remote control logic module. Signals are then sent to the Nios II processor, which converts them to level control signals that control each hardware module, thus controlling the system.

Performance ParametersThis section describes the performance parameters for our design.

System PerformanceThis section describes the system performance parameters for video, picture, audio, and the menu.

Video ParametersThe video parameters are:

■ Input video system—NTSC system

■ Output video format—RGB format

■ Input video resolution—640 x 480 pixels

■ Output video resolution—120 x 60 and 120 x 96 pixels

■ Video color—18-bit color, 6 bits each for red, green, and blue

■ Aspect ratio—4:3 and 16:9

■ Frame rate—60 frames per second (fps)

Picture and Audio ParametersThe picture and audio parameters are:

■ Picture resolution of the e-photo album—240 x 180 pixels

■ Picture format of the e-photo album—Bitmap (.bmp) format

■ Picture color—15-bit color, 5 bits each for red, green, and blue

■ Audio format—.wav format

Menu ParametersThe menu parameters are:

■ Menu resolution—207 x 32 pixels

■ Menu color—3-bit color, 1 bit each for red, green, and blue. The first line of the menu is the menu; the second line is for information display.

54

Page 64: Nios II Embedded Processor Design Contest - Outstanding - Altera

Set-Top Box Capable of Real-Time Video Processing

Use of System ResourcesThis section describes the system resources our design uses.

Use of FPGA ResourcesThe FPGA resources used are:

■ Logic units—33%

■ On-chip memories—79%

■ Pins—47%

■ Phase-locked loop (PLL)—50%

DE2 Board ResourcesThe DE2 development board resources used are:

■ FPGA—Altera Cyclone II EP2C35 FPGA

■ Dynamic memorizer—8-Mbyte SDRAM

■ Static memory—512-Kbyte SDRAM

■ Flash memory—4-Mbyte flash memory

■ SD card reader

■ 50- and 27-MHz crystal oscillators

■ Switch, button, LED, seven-segment numeral tube, and LCD

■ Infrared transceiver

■ XSGA video port, XSGA 10-bit digital-to-analog converter (DAC)

■ TV decoder chip (supports the NTSC system)

■ 24-bit audio CODEC

Auxiliary ResourcesThe following additional resources were used:

■ SD card—1-Gbyte Kingston SD card

■ Infrared remote control—Epson projector infrared remote control

Design StructureThe system design fully uses the available hardware resources, including the FPGA and the DE2 board. The system includes a DVD, SD card, DE2 board, speaker, and monitor. The DE2 board provides an FPGA, SDRAM, SRAM, flash memory, SD card reader, XSGA 10-bit DAC, infrared interface, and a 24-bit audio CODEC. The system implements a variety of functions, including video processing and

55

Page 65: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

display, menu display, an e-photo album, and infrared remote control. Figure 2 shows the system structure.

Figure 2. System Structure

The DVD video source is input through the video in port. It passes through the TV decoder chip (ADV7181) on the DE2 board. The analog video signals input through the video in port are converted to NTSC-format system digital video signals that comply with the ITU-R656 standard. Then, the signals are sent to the FPGA internal video decoding module, which converts the YCbCr color-difference signals into RGB signals. Next, the signals go through the video processing module, which performs16:9 resolution processing, 4:3 resolution processing, or cutting. The system stores the processed video signals in the video buffer module. It reads the data in the buffer module, conducts analog-to-digital conversion with the XSGA 10-bit DAC chip (ADV7123), and outputs the signals through the VGA interface.

The SD card reader obtains picture and audio information and provides it to the system. The Nios II processor embedded in the FPGA reads the picture and music information provided by the SD card and outputs it to the picture buffer module (512 Kbytes of SRAM) and the 24-bit audio CODEC chip (Wolfson WM8731). The picture buffer module stores the picture data, and the display output module controls the display output. The 24-bit audio CODEC chip transforms digital audio signals into analog audio signals, which are output through the audio output interface.

The infrared remote control sends out control signals to the system. The DE2 board’s infrared transceiver (Agilent HSDL-3201) receives the remote control’s infrared signals, which are decoded by the FPGA’s infrared remote control module and sent to the Nios II processor, which generates each module’s control signals.

The LCD control module controls the LCD system name. The LED control module turns the LED on and off. The buttons and switchs function like the infrared remote control, generating signals to control the system’s operation. SDRAM and flash are as the memory and program memory of the system, respectively. Figure 3 shows the system components.

SDRAM(8 Mbytes)

Flash(4 Mbytes)

SRAM(512 Kbytes)

TVDecoder

SD CardConnector

DVD

SD Card

SDRAMControl

Interface

FlashControl

Interface

SD CardControl

Interface

I CMaster

2 AudioControl

Interface

FPGA (Cyclone II EP2C35F672C6)

Avalon Bus

IrDAInterface

LCDControl

Interface

Nios II(50 MHz)

VGAControl

Interface

IrDATransceiver LCD LED Button

& Switch

IrDAController

IrDA Control

Control

Data

24-BitAudioCodec

XSGA10-Bit DAC

Speaker

VGAMonitor

DE2 Board

YCbCRVideo Signals

Picture Data

Audio Data

YCbCRHSVS

ClockDataControl

Clock

Data

RGBHS_VSClock

RGBHS_VSClock

56

Page 66: Nios II Embedded Processor Design Contest - Outstanding - Altera

Set-Top Box Capable of Real-Time Video Processing

Figure 3. System Components

Design MethodsThis section describes our design methodology.

Function DesignOur design includes the video decoding module, video processing module, video buffer module, picture buffer module, menu buffer display module, output display module, and process control module. Figure 4 shows the block diagram.

Figure 4. Functional Block Diagram

Non-Processed Video Stream

Clipped

Video Stream

Processed

Video Stream

RGB VideoStream

VideoDecodingModule

Input VideoStream

Input Menu Information

SD Card

Audio DataStream

Picture Data Stream

Audio Buffer

Output AudioStream

Menu BufferingDisplay Module Menu

Information

PictureData Stream

DisplayOutputModule

Video ProcessingModule (Output

16:9 Video)

Video ProcessingModule (Output

4:3 Video)

Video ProcessingModule (OutputClipped Video)

Video Stream16:9 RGB

4:3 RGB

Video StreamVideo

Buffering Module

Output VideoStream

PictureBufferingModule

57

Page 67: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Hardware DesignThis section describes our hardware design.

Video Decoding ModuleThe video decoding module decodes NTSC video, converting from YCbCr to RGB and from interlacing to a progressive scan of the horizontal synchronizing signals. Meanwhile, it generates the control signals required for the VGA display. It also decodes video and controls the I2C bus.

The video decoding function outputs RGB three-route color signals and generates control signals such as vertical synchronization, horizontal synchronization, and the VGA clock. It also provides ITU-R656 video decoding and video data exchange. Because the NTSC video signal is for interlaced scanning, the ITU-R656 video decoding section mainly converts the YCbCr video signal and the synchronous signal from interlaced 4:2:2 to progressive 4:4:4. The following formula is for the video data conversion from 4:4:4 YCbCr to RGB:

R = Y + 1.402(Cr - 128) (1-a)

G = Y - 0.34414(Cb - 128) - 0.71212(Cr - 128) (1-b)

B = Y + 1.772(Cb - 128) (1-c)

The I2C bus control section outputs the I2C bus control signal and the video stream control signal, according to the video data list. Figure 5 shows video decoding block diagram.

Figure 5. Video Decoding Module Block Diagram

Video Processing ModuleFigure 6 shows the video processing block diagram.

VGA_BLANK

VGA_HS

VGA_VS

VGA_SYNC

CLK_27

TD_DATA[7..0]

TD_HS

TD_VS

iRST

ITU_r656_Decoder

HS VS Generate

YCbCr2RGB

TV_to_VGA

Y[7..0]

Cb[7..0]

Cr[7..0]

VGA_CLK

Red[9..0]

Green[9..0]

Blue[9..0]

VGA_EN

58

Page 68: Nios II Embedded Processor Design Contest - Outstanding - Altera

Set-Top Box Capable of Real-Time Video Processing

Figure 6. Video Processing Module Block Diagram

The video-processing module processes the RGB video data output by the video decoding module. There are three kinds of processing: 16:9 video processing, 4:3 video processing, and video section processing. Therfore, this module includes a 16:9 video processing module, 4:3 video processing module, video clipping processing module, and output control module.

We considered two schemes for the video resolution processing algorithm: linear interpolation and extraction. The linear interpolation algorithm is widely used for video interpolation. The pixels are obtained by weighted calculation of the adjacent domain points. In the following equation we adopt eight adjacent domain sampling weighted calculations to generate the new data.

(2)

W(X + i, y + j) are the weighted values of the 8 adjacent domain points.

According to the MATLAB software calculations, this algorithm cannot meet real-time demands. The time complexity is not available for video processing, so we chose not to use this algorithm.

The extraction algorithm is simple and implements resolution compression by dropping part of the original image data. This algorithm does not require mathematical calculations and is very fast. As seen by human eyes, this image is indistinguishable from the one generated using the linear interpolation algorithm. Therefore, to achieve real-time performance and reduce time complexity, we used the extraction algorithm with a hardware module.

The 16:9 video processing module converts the input video to 120 x 60 resolution and 16:9 aspect ratio. During extraction, the module marks a line for every 8 data lines in each frame of the original video

EN1

DATAR1[7..0]

DATAG1[7..0]

DATAB1[7..0]

EN2

DATAR2[7..0]

DATAG2[7..0]

DATAB2[7..0]

EN3

DATAR3[7..0]

DATAG3[7..0]

DATAB3[7..0]

R[9..0]

G[9..0]

B[9..0]

CLK_27

HS

VS

pause

R[9..0]

G[9..0]

B[9..0]

CLK_27

HS

VS

pause

R[9..0]

G[9..0]

B[9..0]

CLK_27

HS

VS

pause

MUX_control

16:9Video

Processing

EN1

R[7..0]

G[7..0]

B[7..0]

4:3Video

Processing

VideoCutting

Processing

Video Processing

Multiplexer

I x y,( ) I x i y j+,+( ) W x i y j+,+( )×

j 1–=

1

∑i 1–=

1

∑⎝ ⎠⎜ ⎟⎜ ⎟⎛ ⎞

=

59

Page 69: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

image, and then marks a point for every 6 points in the line. It outputs the marked points one by one to the video buffer module, reducing the video resolution and converting the aspect ratio. This module suspends the input terminal to determine whether to output the next video frame. If the terminal is at a high level, it stops outputting data to the buffer module, which freezes the picture. Otherwise, it outputs the data normally.

The 4:3 video processing module converts the input video to 120 x 96 resolution and 4:3 aspect ratio. Then, it marks a line for every 5 data lines in each frame of the original video image and marks a point for every 6 points in the line. It outputs the marked points one by one to the video buffer module, converting to the 4:3 aspect ratio and reducing the video resolution. This module also has a video input suspension terminal that plays and stops the video.

The video clipping processing module captures and displays part of the video, playing at 16:9 or 4:3 aspect ratio. The switch and infrared remote control can determine the captured video’s position in the original video. On the basis of the original video, it consecutively marks 96 lines of data in each frame of the image, and then selects 120 consecutive points from the marked lines (column marking). It outputs the selected points one by one to the video buffer module, clipping the video data. The system shifts the clipping position by changing the starting position of the line and point marking. This module also has a video input suspension terminal to control that plays and stops the video.

The output control module switches between the video data output by the three processing modules, which avoids collision.

Video Buffer ModuleThe video buffer storage module implements the video data buffer storage and output control. It stores data as 18 x 11,232 (pixel bits x buffer length) bits in size, and stores a frame of 11,232-pixel video images. This module includes a data buffer storage block and an output address generation block. Figure 7 shows the video buffer block diagram.

Figure 7. Video Buffer Module Block Diagram

After processing video, the video processing module sends a logic high to the video buffer’s WREN signal. Then, it writes the video data into the data buffer from the starting point; otherwise, it does not write anything. Figure 8 shows the read/write waveform.

CLK_27

Address Bus

Control Bus

Data_in Bus

Data_out BusOn-Chip

Memory (M4K)

Video Buffer

60

Page 70: Nios II Embedded Processor Design Contest - Outstanding - Altera

Set-Top Box Capable of Real-Time Video Processing

Figure 8. Video Buffer Module Read/Write Waveform

After completing video processing, the module waits for 1,000 clock cycles. Then it sends a logic high to the video buffer’s REN signal via a counter. The output address generation section uses the buffer length as the module cycle counting and delivers the generated address signals to the data buffer storage. This buffer outputs the stored video data according to the address.

Menu Buffer Display ModuleThe menu buffer display module stores the menu data output by the Nios II processor and controls the menu storage section output and menu display. It has a menu storage block and a menu display controlling block. The menu storage block’s read/write sequence is the same as that of the video buffer block.

When the switch or infrared remote control sends the menu display signal, the Nios II processor generates menu data and the menu storage block writes the data into the menu storage. Meanwhile, the address generator outputs the address to the menu storage block after it receives the menu display signal. The menu storage block outputs the menu information to be displayed to the menu display control block.

The menu display control block controls the menu display, implementing the video data and menu data’s weighted stacking. After the remote control sends the menu display signal, the menu display control block generates new video data according to the menu data output by the menu storage block. After menu data is sent to the input terminal, if the menu data’s weighted value is much bigger than that of the video data, the system outputs the menu data as video. If the menu data’s weighted value is much smaller than that of the video data, it outputs the original video instead.

Picture Buffer Module The picture buffer module stores the bitmap data read from the SD card and acts as a VGA control buffer. Because the pictures are stored in the SD card in clusters, the Nios II processor’s picture data is read and written into the picture buffer module according to the picture’s start address and the clusters occupied. When displaying, the address generator generates the output address to control the reading of the data in SRAM.

The picture buffer module has three parts: SRAM, a bus controller, and an address generator. We wrote the bus controller and address generator modules in Verilog HDL. Figure 9 shows bus controller block diagram.

61

Page 71: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 9. Bus Controller Block Diagram

The SRAM cannot perform dual-port reads/writes, so we designed a bus controller. When displaying the picture, the Nios II processor first sends a control signal to the bus controller, which is connected to the SRAM’s data and address lines, to the Avalon® bus. After writing the data, the Nios II processor sends another control signal to the bus controller, thereby connecting the SRAM’s data and address lines with the output display module. Meanwhile, the address generator outputs the address signal to SRAM and sets the SRAM to read mode. The system reads the data from SRAM into the output display module.

Output Display ModuleThe output display module displays the image data (overlapped or not) in each buffer while generating the synchronous signals required by the VGA monitor. It includes an output controller and the overlapped images. Figure 10 shows the output display module block diagram and Figure 11 shows the output display module waveform.

Figure 10. Output Display Module Block Diagram

R[9..0]

G[9..0]

B[9..0]

R[9..0]

G[9..0]

B[9..0]

vga_controlRGB

OutputControl

HS VSGenerator

VGA Controller

CLK_27

HS

VS

SYNC

Blank

VGA_CLK

62

Page 72: Nios II Embedded Processor Design Contest - Outstanding - Altera

Set-Top Box Capable of Real-Time Video Processing

Figure 11. Output Display Module Waveform

The dock clock iCLK is 27 MHz and the vertical frequency is 59.94 Hz. As shown in Figure 11, VS is the vertical synchronizing signal and the field duration tVS is 16.683 ms. Each field has 525 lines, of which 480 lines are valid display lines and 45 lines are field blanking intervals. Each field of the vertical synchronizing signal VS has a pulse, with the low level width tWV of 63 μs (2 lines). The field blanking interval includes the vertical synchronizing time tWV, a field blanking front porch tHV (13 lines), and field blanking back porch tVH (30 lines).

HS is the horizontal synchronizing signal, and the line period tHS is 31.78 μs. Each display line has 800 points, of which 640 are valid data periods and 160 are line blanking periods (i.e., non-display fields). Each line of the horizontal synchronizing signal HS has a pulse, with the low-level width tWH of 3.81 μs (namely 96 DCLK). The line blanking period includes the horizontal synchronizing time tWH, a line blanking front porch tHC (19 DCLK), and a line blanking back porch tCH (45 iCLK). The composite blanking signal is the logical AND of the line blanking signal and field blanking signal. The composite signal is high during a valid display period and low for non-display periods.

The output controller outputs the RGB video data signal and control signal required by the VGA monitor. According to the input clock, the output controller can generate the control signals, including field synchronization, line synchronization, composite blanking signals, etc., while the output data is the same as the input data. Additionally, it can be controlled to alter the video’s position in the monitor.

The system can overlap the picture output from the picture buffer module to implement video-in-picture, or overlap the video on the picture to realize picture-in-video. For video-in-picture, the picture is displayed instead of parts of the video image because the picture data is weighted higher than the video data. Video data is shown in locations that do not have input picture data. For picture-in-video, video is shown instead of the picture in some places because the video data is weighted higher than the picture data. Picture data is shown in locations that do not have input video data.

Process Control ModuleThe process control module generates the modules’ control signals, reads the picture and music data in the SD card, outputs the picture data into SRAM and directly outputs the music data, generates menu data, and delivers it to the menu buffer control module. Figure 12 shows the process control module symbol.

The Nios II processor implements the process control module. The module receives button and switch signals as well as signals passing through the infrared remote control module. After analyzing these signals, the module outputs control signals, such as suspension, picture hand-over, motion, menu, etc. Then, it reads the pictures in the SD card into SDRAM and transfers them into SRAM via the Avalon bus. Meanwhile it reads the audio information in the SD card and outputs it through audio interface.

63

Page 73: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 12. Process Control Module Symbol

Infrared Remote Control ModuleThe infrared remote control module receives signals from the infrared remote control and decodes them according to its defined coding rules. Figure 13 shows infrared remote control module block diagram.

Figure 13. Infrared Remote Control Module Block Diagram

According to the coding rules, the module determines whether the signal is 0 or 1 according to the infrared signal’s high-level length. It is 0 if the high-level duration is within the sampling period of 220 to 370 samples; it is a 1 if it is within 520 to 670 samples. Other lengths are invalid.

Screen Position Control

Screen Proportion Control

Cutting Position Control

Menu Display Control

Menu Select Control

Pause Control

Video Switch Control

CLK_50

IrDA_RXD

100Frequency

Divider

500 kHz

IrDADecoding

IrDA Controller

64

Page 74: Nios II Embedded Processor Design Contest - Outstanding - Altera

Set-Top Box Capable of Real-Time Video Processing

The infrared remote control module translates the instructions from the remote control and sends them to the Nios II processor. This module has two parts:

■ Frequency division sampling, which divides the 50-MHz clock into 500-kHz to sample the remote controlled infrared signals.

■ Instruction translation, which translates the sampling signals.

Software DesignThe system software runs on the Nios II processor. It responds to the user and meanwhile sends control signals to the external hardware modules. The system software uses an inquiry method, cyclically auditing the state of the input ports upon initiating the software. It executes the corresponding operations if pulse signals are sent to the input port.

The system menu has three parts: the menu item, cursor, and menu content. The menu item includes the system name, resolution, aspect ratio, and mode, and the cursor indicates the selected item. The menu is generated by the following functions:

■ void draw_menu(int cur_position), where cur_position is the current position of the cursor.

■ void draw_item(int row, alt_u16 *title, int len), where row refers to the line of the menu, title is the content string of the menu to display, and len is the content length.

■ void draw_cursor(int xstart, int w), where xstart is the starting x coordinate and w is the cursor width.

While displaying the menu, the system uses the draw_menu function to generate the menu content and update the menu buffer according to the cursor position. The system uses the draw_item for drawing the menu items and and draw_cursor for drawing the cursor.

The menu has two lines. The first line displays the menu items, including name, resolution, aspect ratio, and mode. The second line display the menu content according to the cursor position. Figure 14 shows the menu graphics.

Figure 14. Menu Graphics

When displaying the video’s aspect ratio and resolution, the Nios II processor automatically checks the current video aspect ratio against the value of the state variable ratio_state (0 refers to the 16:9 state, and 1 refers to the 4:3 state) and displays the corresponding information. Figure 15 shows the system software procedure.

Cursor

Menu Content

Menu Item

65

Page 75: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 15. System Software Procedure

Design FeaturesThe system has the following features.

■ SOPC concepts—The system uses SOPC concepts to combine the Nios II processor, modules, and other peripheral control logic. With the custom peripherals we added using SOPC Builder, the system selects processors, buses, memories, and I/O interfaces to build a SOPC system that matches the STB design. This process reduces volume, enhances the reliability, lowers power consumption, and allows the system to be upgraded and scaled.

■ User-defined logic—We used the user-defined logic in the system design. The user-defined logic consists of an audio analog-to-digital converter (ADC), FIFO buffer, and PLL. The logic reads audio data from the SD card and outputs it to the speaker. The user-defined logic is added into the Nios II processor; therefore, we simply control the audio data reading and output with software.

■ Bus-switching—In the picture display function, we use the off-chip memory SRAM as the picture buffer as the on-chip memory is stuffed. SRAM is a single-port memory, when the system outputs one picture, the SRAM must be refreshed. We used bus switching for this function. When the picture data is updated, the system connects the SRAM to the Avalon bus; after the update, the SRAM disconnects from the Avalon bus and connects to the external bus. The display output module uses the external bus to read data in the buffer and outputs the picture data.

■ Fully uses development board resources—The system fully uses the development board resources, such as the FPGA, SDRAM, SRAM, flash, SD card reader, XSGA 10-bit DAC, infrared interface, and 24-bit audio CODEC. The design uses 79% of the on-chip memory (because of the huge volume of information required for video processing) and only about 33% of the logic.

■ Hierarchical display—The system uses hierarchical design for the menu layer, video layer, and picture layer. The video layer contains the original video layer and processed video layer. The

Y

Y

Y

Y

Y

Y

N

N

N

Switch toCross State

Page Up

Start

Initiate SystemModule

Enter Picture &Music Browse

Width-HeightRatio Handover

Display SystemMenu?

Enter Cross State?

MenuDisplay

Width-HeightRatio Handover

PageDown?

Page Up?

N

N

Page Down

66

Page 76: Nios II Embedded Processor Design Contest - Outstanding - Altera

Set-Top Box Capable of Real-Time Video Processing

menu layer contains the menu word layer, menu background layer, and menu cursor layer. Every point of the final image is the result of real-time overlapping of corresponding points in those layers according to the image’s needs. The hierarchical design lets the system control each layer separately without affecting the display of other layers. It also facilitates data processing of the display layers and system updates.

■ On-screen display (OSD)—Human-machine interaction requires many notification messages. The state of the video must be shown synchronously with images on the screen; the system displays them using OSD to facilitate user operation.

Due to limited resources, the system does not add a bitmap character library. Instead, it stores several fixed combinations of the information to be displayed to save storage space. When the system works, the Nios II processor first determines the user’s operation, generates the menu information through the Nios II processor, writes the menu information on the menu buffer module, and adds it to the display module.

ConclusionUsing an FPGA design, SOPC concepts, bus switching technology, and hierarchical display, the system implements video processing and display, menu display, an e-photo album, and an infrared remote control. Video processing must be real-time, so the hardware circuit module implements the video decoding, processing, and display. The Nios II processor controls peripherals, generates the menu, and performs human-machine communication. The Nios II processor improves system control flexibility and simplifies the read/write memory operations.

Using the Nios II processor is a new concept in embedded system design. After several months of study, we witnessed the powerful function of the Nios II processor as an embedded processor. When designing systems with SOPC Builder, users define each component according to the system’s functional requirement. Development tools provide common IP cores for users to deploy directly. Additionally, users can add custom peripherals to simplify the hardware design, enabling us to focus on function development and algorithm design, thereby improving our design efficiency.

The EP2C35 FPGA chip on the DE2 board that we used in this competition has limited storage resources. Therefore, we only created an FPGA-based STB design. If we used an FPGA with more resources, our system could implement additional functions such as real-time decoding, network communications, video-on-demand, and conditional access of the video code stream of MPEG-2 or MPEG-4.

We are grateful to Altera Corporation for giving us the opportunity to participate in the competition.

67

Page 77: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Appendix 1Ths following photos show the display effect of the STB capable of real-time processing.

16:9 4:3Display

Snapshot

Menu

68

Page 78: Nios II Embedded Processor Design Contest - Outstanding - Altera

Set-Top Box Capable of Real-Time Video Processing

Video-in-Picture:

Video-in-Picture:

69

Page 79: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Appendix 2The following table shows the STB remote control buttons. Because the input system control signals are pulse signals, the switch should be turned down as soon as it is turned on to simulate a pulse signal while using the switch to carry out corresponding functions.

STB Remote Control Buttons

Name of Remote Control Buttons

Corresponding Buttons of DE2

Board

Meaning

NUM8 KEY3 Video position up

NUM2 KEY2 Video position down

NUM4 KEY1 Video position left

NUM6 KEY0 Video position right

NUM8 (Num = 1) SW3 Internal video up in the snapshot

NUM8 (Num = 1) SW2 Internal video down in the snapshot

NUM8 (Num = 1) SW1 Internal video left in the snapshot

NUM8 (Num = 1) SW0 Internal video right in the snapshot

ENTER SW15 Pause

ESC SW16 Snapshot and display video conversion

Page+ (NUM = 1) SW10 Processing video and original video conversion

DVI SW17 16:9 and 4:3 conversion

Video (NUM = 1) SW9 Picture-in-video and video-in-picture conversion

Freeze (NUM = 1) SW11 Picture up in video-in-picture

Comp SW12 Picture down in video-in-picture

Effect (NUM = 1) SW13 Menu on/off

Effect SW14 Menu options

70

Page 80: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multifunction Logic Analyzer

Second Prize

Digital Watermark-Based Trademark Checker

Institution: Institute of Information Science, Beijing JiaoTong University

Participants: Sheng-Kai Song, Wei-Ming Li, and Li Song

Instructor: Xiao-Ming Ding

Design IntroductionA digital watermark adds undetectable identification information to a digital product that can be verified with certain algorithms. Using watermarking to prevent counterfeit printing is of great significance for social development. Digital watermark-based anti-counterfeit technology is different from traditional anti-counterfeit techniques and has its own features: it is imperceptible, unambiguous, robust, secure, advanced, and universal.

This design leverages Altera® FPGA and Nios® II resources, implementing a digital watermark trademark checking algorithm on the Nios II platform and using hardware to implement the watermark checking module. The system has a reusable intellectual property (IP) core, which protects the algorithm IP and facilitates a new trademark anti-counterfeiting method.

The algorithm-protected object is the printed trademark image, and all such images can be added with the anti-counterfeit watermark during printing. When the checker reads the trademark through the interface between the checker and the image collector during checking, the watermark checking algorithm embedded in the checker can check the identification for the trademark.

The checking object is the printed trademark image and it identifies whether a commodity infringes the IP. The checker has an extensive user base, including supermarkets, wholesalers, and retailers who offer customers products with image trademarks. It provides an objective, convenient, quick method to check the commodity identification, which protects the IP and improves customer confidence in commodities. Additionally, with the proliferation of 3G mobile phones, this algorithm’s IP core can be applied to the camera’s mobile phone to implement remote on-line trademark checking for ordinary users.

The Nios II processor and FPGA achieve design portability and allow designers to integrate peripheral circuits and processors in the same chip, reducing system volume and scale and implementing the on-

71

Page 81: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

chip system. Additionally, the PC-developed algorithm and C language program can be rapidly migrated to the Nios II processor to shorten the system development cycle.

With this solution, we can achieve algorithm/class system development using the Simulink software, DSP Builder, and IP cores. We can design important modules—or even the whole digital signal processing (DSP) system—as custom instructions with DSP Builder, according to the requirements of the DSP algorithm. Then, we can integrate the instructions into the Nios II processor using SOPC Builder and the Quartus® II software. The design performs the DSP algorithm through software by invoking the custom instructions. This process facilitates development and replaces the high-end DSP device that is used in traditional image processing.

Function DescriptionIf the trademark image is printed or scanned (videography) once, the watermark algorithm is robust. However, if the resulting image is printed and scanned again, the algorithm is frail.

Figures 1 and 2 show the transmission process required to set the watermark image for fabrication and non-fabrication. After the transmission process, the watermark image becomes the trademark to be checked (using a camera to collect the image). The anti-counterfeit function operates as follows: if the trademark image is transmitted as shown in Figure 1, the checker indicates that the watermark exists; if the image is transmitted as shown in Figure 2, the checker indicates that the watermark does not exist.

Figure 1. Non-Fabrication Transmission Process

Figure 2. Fabrication Transmission Process

The design rapidly checks the protected trademark on- and off-line in the on-chip watermark checker. It leverages hardware to configure the reusable IP core and implement the patent algorithm for trademark image anti-counterfeit printing.

The design consists of a collection module, image display module, image preprocessing module, and watermark checking module. The Nios II CPU implements the system control function and the FPGA performs high-speed data processing. The design contains the following modules:

■ Nios II soft-core CPU—Implements the required system management logic. Dispatches the relevant algorithm and controls the whole data flow from image collection and preprocessing to watermark checking. Manages devices, users, and remote interaction.

■ Image collection module—Integrates the camera configuration module with the system bus, and becomes the self-defined peripheral that flexibly controls the camera. Collects the specific trademark image in memory through the on-chip control logic. Stores the image data in a block so that the system can transmit image data using direct memory access (DMA) mode.

■ Image display module—Saves the collected, preprocessed image in SRAM. The module’s read/write image data generates the VGA signal control ports, such as the horizontal and vertical sync signals that the monitor displays.

Anti-Counterfeit Trademark Printing Process Trademark Image to

be CheckedCamera Collection

Process

Printing Process Scanning Process

Re-Printing ProcessTrademark Image to

be Checked

Anti-CounterfeitTrademark

Camera CollectionProcess

72

Page 82: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multifunction Logic Analyzer

■ Image preprocessing module—Performs geometric correction for the collected trademark image to be checked according to the rotation, horizontal move, and zoom control operations. Sends the parts that have large calculations and long processing time to the hardware module, which accelerates preprocessing.

■ Trademark checking module—Has three parts, which are implemented in a combination of hardware and software:

● Software—Operates the watermark checking algorithm in the Nios II CPU with software.

● Software and hardware—Implements the main algorithm operation with a custom instruction and peripheral to accelerate operation.

● Hardware—Implements the algorithm with a hardware description language and integrates it as an independent processing module. Saves the preprocessed trademark image into SRAM. That module automatically interacts with the SRAM, conducts corresponding operations, and returns the result after the operation finishes.

The design also has an LCD screen, LED light, and user keys to support user interaction.

Performance ParametersFor this system, accuracy and identification speed are the most important performance parameters. Therefore, we focused on these areas. Using the Altera design platform, we were able to speed up the design without lowering the design’s complexity. A single identification, including complex preprocessing, watermark checking, accelerated C-to-hardware acceleration (C2H) preprocessing, and hardware watermark checking, takes about three seconds.

Based on the original limited print samples, we checked the trademark images of printing and reprinting. According to the experiment data, the threshold gives a 95% identification rate.

Watermark Checking Module Performance AnalysisThe operation time when implementing the module with software alone is about 20 seconds.

The operation time when implementing the design using a custom instruction and peripherals for the key algorithm and a combination of software and hardware is about 7 seconds.

The operation time when implementing the design with hardware cannot exceed 20 million clock cycles. Therefore, at 50 million clock cycles, the operation is limited to 0.5 seconds.

C2H CompilerThe Nios II C2H Compiler can automatically integrate the high-performance C program into the hardware accelerator, which is then integrated into the FPGA-based Nios II subsystem. The C2H Compiler supports standard ANSI C code, accelerates multiple application programs, and improves operational efficiency, including access to local and external memory and peripherals. We can use the Quartus® II SOPC Builder to generate a broadband Avalon® interconnected architecture, which can process the external memory and peripherals, such as pointer dispersal and array access.

The Nios II C2H Compiler accelerates implementation of memory interfaces, and generates hardware accelerator logic and the correct Avalon host and slave interface to match the memory delay. It shares the data computing and memory access functions with the Nios II processor, and lets the processor perform other tasks. Because the Avalon architecture does not limit the number of hosts and slaves in a system, the Nios II C2H Compiler can generate multiple hardware accelerators according to the target code’s transfer requirements. The C2H Compiler helps embedded system developers improve design efficiency.

73

Page 83: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

In our system, the image preprocessing function is implemented in software. Because we have high-speed identification requirements and the C software code takes a long time to perform the task, we optimized the image preprocessing module with the Nios II C2H Compiler to accelerate processing.

We tested the implementation speed. Image preprocessing and integrating all the software requires 47 seconds. If we change the code into fixed point and optimize it using the C2H Compiler, the time is reduced to 2 seconds, a speed increase of more than 20 times.

The design uses extra logic resources, however: 65% instead of the original 20%.

Nios II ProcessorThe Nios II processor’s excellent performance facilitated our design. We chose the Nios II/f CPU because we have high-speed processing requirements. We combined the processor, peripherals, memory, and I/O interface with the Nios II processor and FPGA design.

Because the Nios II processor is configurable, we can modify the system performance requirements at any time. Furthermore, we were able to improve the module performance with Nios II custom instructions.

Design ArchitectureThe hardware is composed of the watermark trademark image collection system, watermark extracting and identification system, and display system. Figures 3 and 4 show the general system and modules.

Figure 3. Data Flow

Watermark Checking Result DisplayImage Collection

Trademark to Be Checked ImagePreprocessing

& Matching

74

Page 84: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multifunction Logic Analyzer

Figure 4. Hardware Structure

Figure 5 shows the device algorithm software flow.

Figure 5. Software Flow

Hardware Checking Module

SRAM

TrademarkImage

SDRAM

UART

Flash

Avalon Bus &Tri-State General

BridgeImage

InterceptionBus Controller

Camera & Video Coding

Chip

User Key

LCD

Nios II Processor

DMA

Custom Instruction

FPGA

System Initialization

Trademark ImageCollection

Check?

ImagePreprocessing

WatermarkExtraction

Checking

Result Display

75

Page 85: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Design MethodologyThis section describes our design methodology.

Design EnvironmentWe used the Altera Stratix® development board for the initial code debugging and then used the Development and Education (DE2) development platform for final implementation. The DE2 board has a variety of integrated peripheral interfaces, which were convenient to use in our design.

Software and Hardware DesignFigure 6 shows the SOPC Builder configuration.

Figure 6. SOPC Builder Configuration

Modules and ImplementationThis section describes the modules in our design and how we implemented them.

Trademark Image Collection ModuleTrademark image collection is an important part of the front-end system because the collection quality can directly impact identification. Because we need high image quality, we use a mobile camera with 1.3 million pixels and the ov9650 chip sensor to allow the system to be more portable in the future.

The camera is controlled using the serial camera control bus (SCCB). To collect the image data, the system controls the SCCB to configure the ov9650 CMOS image sensor chip parameters. The SCCB provides simple control measures. Through the SCCB interface, we can modify the ov9650 CMOS image sensor chip register values, which allows us to control the device.

The SCCB is bidirectional (two-slave or multi-slave with a serial interface standard) and has a bus arbitration system that can perform short-distance and irregular data communication between the components. Because the system uses only one CMOS image sensor chip, it uses a two-slave SCCB mechanism. To keep the system small, use the FPGA logic resources effectively, and improve the hardware system integration, we designed SCCB control core in the FPGA. The SCCB control module is integrated into the system as a user-defined peripheral, and it can configure the camera’s operations. Figure 7 shows the SCCB controller structure.

76

Page 86: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multifunction Logic Analyzer

Figure 7. SCCB Controller Structure

We write and store the camera’s image in the IP core’s SRAM, and then integrate it into the system. The system performs a variety of image collection activities. It captures the image data required by the system (the collected image is 660 x 660 in YUV format; we remove the Y data to save storage space) and stores the data into corresponding SRAM. Figure 8 shows the SCCB module’s symbol and connections.

Figure 8. SCCB Block Diagram

VGA Display ModuleVGA display module shows the original trademark images we collected and the processed images so that we can observe the effects of the front-end image collection and image preprocessing. The VGA module reads the image data from the SRAM repeatedly. The module sends the data to the VGA chip with a module-generated control signal. An image can then be displayed on the LCD.

The DE2 development board uses the ADV7123 device to transmit a 10-bit digital-to-analog (D/A) video signal, which is connected to the 15-pin interface, as the VGA output. Figure 9 shows the VGA waveform. The vertical and horizontal time cycles are in four zones: H-synchronization (a), descend h (b), ascend (d), and display (c).

Figure 9. VGA Signal Waveform

Figure 10 shows the VGA sequence standards.

Nios IIProcessor

SCCBCommand

StateMachine

SCCBSequence

StateMachine

Camera(ov9650)

Back Porch b Front Porch d

Display Interval cDATA

HSYNCSync a

77

Page 87: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 10. VGA Sequence Standards

The images we used in our system are 660 x 660 and 600 x 600, so we chose the 800 x 600 mode. Figure 11 shows the module integrated into the system.

Figure 11. VGA Module in the System

Trademark Image Preprocessing ModuleDue to illumination differences, shooting distances, and trademark position, the collected images may contain noise and geometrical changes caused by light, size, and angles. Therefore, preprocessing is very important and provides the basis for future authentication. Our preprocessing module removes illumination differences and extracts the standard trademark image. The module has the following functionality:

■ Image balance—To remove illumination differences, we balance the images by applying the same grayscale standard to them. Because the images we selected are 600 x 600, the data is comparatively large. Therefore, we used the C2H Compiler to optimize the image balancing.

78

Page 88: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multifunction Logic Analyzer

■ Trademark image extraction—We position and extract the central trademark image. First, we perform a binary process on the balanced image and separate the image from background. Then we extract the image with a geometric operation. The processing is fairly complex due to the large image size, slow operation speed, and high accuracy. Therefore, we used the C2H Compiler to optimize some functions that have large operation volume.

■ Trademark image standardization—To match the authentication interface, the extracted image is standardized and zoomed to 600 x 600. This part is also optimized with the C2H Compiler.

Figure 12 compares the original image and the standardized one. Some small deviations may exist in the original image that was collected on site.

Figure 12. Original vs. Standardized Image

Watermark Checking ModuleWe used the lab’s watermark patent algorithm in the watermark embedded algorithm and checking algorithm. With a hardware module implementing the whole process, we greatly improved the system speed. During processing, the image to be processed is placed into SRAM (we used 1-Mbyte SRAM). The module automatically interacts with SRAM (read/write) and shows the results after processing. We implemented the module in Verilog HDL. Figure 13 shows the watermark checking module symbol.

Figure 13. Watermark Checking Module Symbol

79

Page 89: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Applying System-on-a-Programmable Chip (SOPC) ConceptsAltera introduced SOPC technology and its related development platform, the Quartus II software. SOPC is the FPGA version of system-on-chip (SOC). Compared to ASIC SOC, SOPC has many unique features. Our design uses SOPC concepts in the following ways:

■ Modularized system design—At the beginning of the system design, we partitioned the system into image collection, preprocessing, and processing. The system is divided and simplified, which makes it easier to implement. According to the module interfaces, we can accurately evaluate the design’s application scope and future development at the initial design stage. We can perform market exploration and product research and development simultaneously in the practical product design, which shortens the time-to-market and accelerates enterprise development.

■ System integration—An embedded system shows its features with its size, power consumption, and integrity. Except for the expanded 1-Mbyte SRAM front-end collection module, we could implement all system functions on the development board. It is very difficult to implement such a highly integrated design without lowering the design target or using a different FPGA.

■ Various modes—We can diversify the implementation. For example, the front-end module uses an IP core, preprocessing is implemented with software, and key steps are fulfilled using the C2H Compiler to transfer operations to hardware. The trademark checking module uses hardware, software, or hardware/software with custom peripherals and instructions. Using SOPC and the excellent design tools enabled us to use these various modes.

■ Final system can be upgraded—The design can be flexibly configured and updated during the design process.

Design FeaturesOur digital watermark technology-based trademark is different from traditional trademarks, and has many real applications. Implementing the watermark checking in hardware and producing a portable device effectively promotes watermark technology for anti-counterfeit areas. Currently, watermark checking devices do not exist. Our system has the following features:

■ The IP core used for the hardware-based watermark checking algorithm significantly facilitates algorithm protection and volume applications. Additionally, the speed is high compared to the hardware/software-based or software-based watermark checking algorithms.

■ The hardware/software-based watermark checking algorithm uses custom instructions and peripherals, improving its operation speed and making the system integrated and modularized. With the existing IP core, it can implement the image processing algorithm in the FPGA.

■ The system can perform both offline trademark checking using an independent database and online checking using a central database. The independent operation uses integrated Nios II on-chip functions, including data collection, processing, display, and storage. Additionally, the system can use the Nios II extension interface to add a GPRS module that communicates with a remote server and checks the trademark online.

■ The Nios II C2H Compiler can automatically transfer the high-performance C program into the hardware accelerator and integrate it with the FPGA-based Nios II subsystem, which enhances the system’s operation speed.

■ The FPGA-based system design integrates the processor, peripherals, memory, and I/O interface in a single FPGA, lowering costs, complexity, and power consumption. The Nios II processor provides a series of CPUs, so the user can create a perfect solution that meets their system

80

Page 90: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multifunction Logic Analyzer

requirements. The processor provides reasonable performance, saves development costs, and increases the product’s competitiveness.

■ Our design implements an SOPC system, a real-time, portable checking device featuring one or multiple Nios II processor(s) as well as a hardware accelerator, custom instruction, and custom configurable peripherals.

ConclusionWe learned a lot using the Nios II processor during the competition. For example, we learned to:

■ Develop a design specification—We started by writing a design document, including the design idea and implementation procedures, which helped all the team members understand the scope of work. The design document provided general guidance and facilitated coordination during the design process. We also made the code readable and portable, which helped us understand each module’s operation and perform testing at each stage.

■ Fully use the design software—In addition to Nios II-based SOPC design and FPGAs, Altera provides good design software: the Quartus II and SOPC Builder software. The new Nios II C2H Compiler effectively enhances the embedded software’s performance, and helps developers improve efficiency and implement a successful design. The tool can automatically transfer high-performance C programs into a hardware accelerator for integration with the Nios II subsystem on the FPGA, which shortens the development cycle from several weeks to several minutes. Compared to other implementations, the Nios II C2H Compiler increases the system’s performance by 10 to 45 times, and only uses about 0.7 to 2.0 times more logic resources. With Altera design software, we can produce an excellent system in a short time.

■ Leverage resources, software, and hardware—There are many implementation methods. We needed to leverage the existing resources, software, and hardware to implement the design. We optimized the complex software algorithm using the C2H Compiler. Altera SOPC design helped us simplify the design, allowing us to focus on the system analysis and embedded design instead of the module design.

■ Cooperate and use teamwork—Guided by the design document, we could cooperate, communicate, and improve our team spirit, which was the key to success.

81

Page 91: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

82

Page 92: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC-Based Speech-to-Text Conversion

Second Prize

SOPC-Based Speech-to-Text Conversion

Institution: National Institute of Technology, Trichy

Participants: M.T. Bala Murugan and M. Balaji

Instructor: Dr. B. Venkataramani

Design IntroductionFor the past several decades, designers have processed speech for a wide variety of applications ranging from mobile communications to automatic reading machines. Speech recognition reduces the overhead caused by alternate communication methods. Speech has not been used much in the field of electronics and computers due to the complexity and variety of speech signals and sounds. However, with modern processes, algorithms, and methods we can process speech signals easily and recognize the text.

ObjectiveIn our project, we developed an on-line speech-to-text engine, implemented as a system-on-a-programmable-chip (SOPC) solution. The system acquires speech at run time through a microphone and processes the sampled speech to recognize the uttered text. We used the hidden Markov model (HMM) for speech recognition, which converts the speech to text. The recognized text can be stored in a file on a PC that connects to an FPGA on a development board using a standard RS-232 serial cable.

Our speech-to-text system directly acquires and converts speech to text. It can supplement other larger systems, giving users a different choice for data entry. A speech-to-text system can also improve system accessibility by providing data entry options for blind, deaf, or physically handicapped users.

Project OutlineThe project implements a speech-to-text system using isolated word recognition with a vocabulary of ten words (digits 0 to 9) and statistical modeling (HMM) for machine speech recognition. In the training phase, the uttered digits are recorded using 16-bit pulse code modulation (PCM) with a sampling rate of 8 KHz and saved as a wave file using sound recorder software. We use the MATLAB software’s wavread command to convert the .wav files to speech samples.

83

Page 93: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Generally, a speech signal consists of noise-speech-noise. The detection of actual speech in the given samples is important. We divided the speech signal into frames of 450 samples each with an overlap of 300 samples, i.e., two-thirds of a frame length. The speech is separated from the pauses using voice activity detection (VAD) techniques, which are discussed in detail later in the paper.

The system performs speech analysis using the linear predictive coding (LPC) method. From the LPC coefficients we get the weighted cepstral coefficients and cepstral time derivatives, which form the feature vector for a frame. Then, the system performs vector quantization using a vector codebook. The resulting vectors form the observation sequence. For each word in the vocabulary, the system builds an HMM model and trains the model during the training phase. The training steps, from VAD to HMM model building, are performed using PC-based C programs. We load the resulting HMM models onto an FPGA for the recognition phase.

In the recognition phase, the speech is acquired dynamically from the microphone through a codec and is stored in the FPGA’s memory. These speech samples are preprocessed, and the probability of getting the observation sequence for each model is calculated. The uttered word is recognized based on a maximum likelihood estimation.

FPGA Design SignificanceThe trend in hardware design is towards implementing a complete system, intended for various applications, on a single chip. The advent of high-density FPGAs with high-capacity RAMs, and support for soft-core processors such as Altera’s Nios® II processor, have enabled designers to implement a complete system on a chip. FPGAs provide the following benefits:

■ FPGA systems are portable, cost effective, and consume very little power compared to PCs. A complete system can be implemented easily on a single chip because complex integrated circuits (ICs) with millions of gates are available now.

■ SOPC Builder can trim months from a design cycle by simplifying and accelerating the design process. It integrates complex system components such as intellectual property (IP) blocks, memories, and interfaces to off-chip devices including application-specific standard products (ASSPs) and ASICs on Altera® high-density FPGAs.

■ SOPC methodology gives the designer flexibility when writing code, and supports both high-level languages (HLLs) and hardware description language (HDLs). The time-critical blocks can be implemented in an HDL while the remaining blocks are implemented in an HLL. It is easy to change the existing algorithm in hardware and software by simply modifying the code.

■ FPGAs provide the best of both worlds: a microcontroller or RISC processor can efficiently perform control and decision-making operations while the FPGA can perform digital signal processing (DSP) operations and other computationally intensive tasks.

■ An FPGA supports hardware/software co-design in which the time-critical blocks are written in HDL and implemented as hardware units, while the remaining application logic is written C. The challenge is to find a good tradeoff between the two. Both the processor and the custom hardware must be optimally designed such that neither is idle or under-utilized.

Nios II-Based DesignWe decided to use an FPGA after analyzing the various requirements for a speech-to-text conversion system. A basic system requires application programs, running on a customizable processor, that can implement custom digital hardware for computationally intensive operations such as fast Fourier transform (FFT), Viterbi decoding, etc. Using a soft-core processor, we can implement and customize various interfaces, including serial, parallel, and Ethernet.

The Altera development board, user-friendly Quartus® II software, SOPC Builder, Nios II Integrated Development Environment (IDE), and associated documentation enable even a beginner to feel at ease

84

Page 94: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC-Based Speech-to-Text Conversion

with developing an SOPC design. We can perform hardware design and simulation using the Quartus II software and use SOPC Builder to create the system from readily available, easy-to-use components. With the Nios II IDE, we easily created application software for the Nios II processor with the intuitive click-to-run IDE interface. The development board’s rich features and customization, SOPC Builder’s built-in support for interfaces (such as serial, parallel, Ethernet, and USB), and the easy programming interface provided by the Nios II hardware application layer (HAL) make the Nios II processor and an FPGA the ideal platform for implementing our on-line speech-to-text system.

Application Scope and Targeted UsersThe design is very generic and can be used in a variety of applications. The basic system with the vocabulary of numbers from 0 to 9 can be used in applications such as:

■ Interactive voice response system (IVRS)

■ Voice-dialing in mobile phones and telephones

■ Hands-free dialing in wireless bluetooth headsets

■ PIN and numeric password entry modules

■ Automated teller machines (ATMs)

If we increased the system’s vocabulary using phoneme-based recognition for a particular language, e.g., English, the system could be used to replace standard input devices such as keyboards, touch pads, etc. in IT- and electronics-based applications in various fields. The design has a wide market opportunity ranging from mobile service providers to ATM makers. Some of the targeted users for the system include:

■ Value added service (VAS) providers

■ Mobile operators

■ Home and office security device providers

■ ATM manufacturers

■ Mobile phone and bluetooth headset manufacturers

■ Telephone service providers

■ Manufacturers of instruments for disabled persons

■ PC users

Function DescriptionThe project converts speech into text using an FPGA. Run-time speech data is acquired from a transducer through analog-to-digital (A/D) conversion and is sent to the FPGA, which provides speech preprocessing and recognition using the Nios II processor. The system sends text data to a PC for storage. Functionally, the project has the following blocks:

■ Speech acquisition

■ Speech preprocessing

85

Page 95: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

■ HMM training

■ HMM-based recognition

■ Text storage

Figure 1 shows the system’s functional block diagram

Figure 1. Functional Block Diagram

Speech AcquisitionDuring speech acquisition, speech samples are obtained from the speaker in real time and stored in memory for preprocessing. Speech acquisition requires a microphone coupled with an analog-to-digital converter (ADC) that has the proper amplification to receive the voice speech signal, sample it, and convert it into digital speech. The system sends the analog speech through a transducer, amplifies it, sends it through an ADC, and transmits it to the Nios II processor in the FPGA through the communications interface. The system needs a parallel/serial interface to the Nios II processor and an application running on the processor that acquires and stores data in memory. The received samples are stored into memory on the Altera Development and Education (DE2) development board.

We easily implemented speech acquisition with the Altera DE2 development board. The microphone input port with the audio codec receives the signal, amplifies it, and converts it into 16-bit PCM digital samples at a sampling rate of 8 KHz. The codec requires initial configuration, which is performed using custom hardware implemented in the Altera Cyclone® II FPGA on the board.

The audio codec provides a serial communication interface, which is connected to a UART. We used SOPC Builder to add the UART to the Nios II processor to enable the interface. The UART is connected to the processor through the Avalon® bus. The C application running on the HAL transfers data from the UART to the SDRAM. Direct memory access (DMA) transfers data efficiently and quickly, and we may use it instead in future designs.

Speech PreprocessingThe speech signal consists of the uttered digit along with a pause period and background noise. Preprocessing reduces the amount of processing required in later stages. Generally, preprocessing involves taking the speech samples as input, blocking the samples into frames, and returning a unique pattern for each sample, as described in the following steps.

1. The system must identify useful or significant samples from the speech signal. To accomplish this goal, the system divides the speech samples into overlapped frames.

2. The system checks the frames for voice activity using endpoint detection and energy threshold calculations.

3. The speech samples are passed through a pre-emphasis filter.

4. The frames with voice activity are passed through a Hamming window.

HMM TrainingHidden Markov

Models

HMM-BasedRecognition

Speech Preprocessing

Text StorageSpeech Acquisition

C Program on PC

Nios II SOPC VB Program on PCExternal Hardware

86

Page 96: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC-Based Speech-to-Text Conversion

5. The system performs autocorrelation analysis on each frame.

6. The system finds linear predictive coding (LPC) coefficients using the Levinson and Durbin algorithm.

7. From the LPC coefficients, the system determines the cepstral coefficients and weighs them using a tapered window. The cepstral coefficients serve as feature vectors.

Using temporal cepstral derivatives improves the speech frame feature vectors. We use them in case the cepstral coefficients do not give acceptable recognition accuracy.

We implemented these steps with C programs, which execute based on the algorithms. Although we used the same C programs for training and recognition, the C code executes on a PC during training and it runs on the Nios II processor during recognition.

HMM TrainingAn important part of speech-to-text conversion using pattern recognition is training. Training involves creating a pattern representative of the features of a class using one or more test patterns that correspond to speech sounds of the same class. The resulting pattern (generally called a reference pattern) is an example or template, derived from some type of averaging technique. It can also be a model that characterizes the reference pattern statistics. Our system uses speech samples from three individuals during training.

A model commonly used for speech recognition is the HMM, which is a statistical model used for modeling an unknown system using an observed output sequence. The system trains the HMM for each digit in the vocabulary using the Baum-Welch algorithm. The codebook index created during preprocessing is the observation vector for the HMM model.

After preprocessing the input speech samples to extract feature vectors, the system builds the codebook. The codebook is the reference code space that we can use to compare input feature vectors. The weighted cepstrum matrices for various users and digits are compared with the codebook. The nearest corresponding codebook vector indices are sent to the Baum-Welch algorithm for training an HMM model.

The HMM characterizes the system using three matrices:

■ A—The state transition probability distribution.

■ B—The observation symbol probability distribution.

■ n—The initial state distribution.

Any digit is completely characterized by its corresponding A, B, and n matrices. The A, B, and n matrices are modeled using the Baum-Welch algorithm, which is an iterative procedure (we limit the iterations to 20). The Baum-Welch algorithm gives 3 matrices for each digit corresponding to the 3 users with whom we created the vocabulary set. The A, B, and n matrices are averaged over the users to generalize them for user-independent recognition.

For the design to recognize the same digit uttered by a user for which the design has not been trained, the zero probabilities in the B matrix are replaced with a low value so that it gives a non-zero value on recognition. To some extent, this arrangement overcomes the problem of less training data.

Training is a one-time process. Due to the complexity and resource requirements, it is performed using standalone PC application software that we created by compiling our C program into an executable. For recognition, we compile the same C program but target it to run on the Nios II processor instead. We

87

Page 97: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

were able to accomplish this cross-compilation because of the wide support for the C language in the Nios II processor IDE.

The C program running on the PC takes the digit speech samples from a MATLAB output file and performs preprocessing, feature vector extraction, vector quantization, Baum-Welch modeling, and averaging, which outputs the normalized A, B, and n matrices for each digit. The normalized A, B, and n matrices are then embedded in the recognition C program code and stored in the DE2 development board’s SDRAM using the Nios II IDE.

HMM-Based Recognition Recognition or pattern classification is the process of comparing the unknown test pattern with each sound class reference pattern and computing a measure of similarity (distance) between the test pattern and each reference pattern. The digit is recognized using a maximum likelihood estimate, such as the Viterbi decoding algorithm, which implies that the digit whose model has the maximum probability is the spoken digit.

Preprocessing, feature vector extraction, and codebook generation are same as in HMM training. The input speech sample is preprocessed and the feature vector is extracted. Then, the index of the nearest codebook vector for each frame is sent to all digit models. The model with the maximum probability is chosen as the recognized digit.

After preprocessing in the Nios II processor, the required data is passed to the hardware for Viterbi decoding. Viterbi decoding is computationally intensive so we implemented it in the FPGA for better execution speed, taking advantage of hardware/software co-design. We wrote the Viterbi decoder in Verilog HDL and included it as a custom instruction in the Nios II processor. Data passes through the dataa and datab ports and the prefix port is used for control operations. The custom instruction copies or adds two floating-point numbers from dataa and datab, depending on the prefix input. The output (result) is sent back to the Nios II processor for further maximum likelihood estimation.

Text StorageOur speech-to-text conversion system can send the recognized digit to a PC via the serial, USB, or Ethernet interface for backup or archiving. For our testing, we used a serial cable to connect the PC and RS-232 port on the DE2 board. The Nios II processor on the DE2 board sends the digital speech data to a PC; a target program running on the PC receives the text and writes it to the disk.

We wrote the PC program using Visual Basic 6 (VB) using a Microsoft serial port control. The VB program must be run in the background for the PC to receive the data and write it to the hard disk. The Windows HyperTerminal software or any other RS-232 serial communication receiver could also be used to receive and view the data. The serial port communication runs at 115,200 bits per second (bps) with 8 data bits, 1 stop bit, and no parity. Handshaking signals are not required.

The speech-to-text conversion system can also operate as a standalone network device using an Ethernet interface for PC communication and appropriate speech recognition server software designed for the Nios II processor.

Design ArchitectureFigure 2 shows the system’s block diagram.

88

Page 98: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC-Based Speech-to-Text Conversion

Figure 2. Block Diagram

The system has the following parts:

■ Training software

■ Audio input hardware

■ FPGA interface

■ Nios II software

■ PC software for storage

Design DescriptionThe design is an efficient hardware speech-to-text system with an FPGA coprocessor that recognizes speech much faster than software. With an Altera FPGA, we implemented the time-critical functions (written in VHDL/Verilog HDL) in hardware. The Nios II processor, which is downloaded onto the FPGA, executes C programs. Custom instructions let us implement the whole system on an FPGA with better software and hardware partitioning.

The phases of the project were:

■ Speech sampling at 8 KHz and encoded as 16-bit PCM.

■ Preprocessing (voice activity detection, autocorrelation analysis, and feature vector extraction).

■ Training (codebook generation, finding codebook index of uttered word, Baum-Welch for training HMM).

■ Recognition (Viterbi decoding and maximum likelihood estimation).

Audio CodecInternal

Interfacing

PCInterfacing

Nios II EmbeddedProcessor

Altera DE2 Development BoardMicrophone

Analog Speech

Digital Speech

Digital Text

Storage PC

Legend

89

Page 99: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

The following table shows the methods we used and their purpose.

We used the following software and hardware to develop the design:

■ Sound recorder

■ MATLAB version 6

■ C – DevC++ with gcc and gdb

■ Quartus II version 5.1

■ SOPC Builder version 5.1

■ Nios II processor

■ Nios II IDE version 5.1

■ MegaCore® IP library

■ Altera Development and Education (DE2) board

■ Microphone and headset

■ RS-232 cable

Speech AcquisitionSpeech acquisition requires a microphone coupled with an amplified ADC to receive the voice speech signal, sample it, and convert it into digital speech for input to the FPGA. The DE2 development board has a WM8731 audio codec chip connected both to the microphone input pins and the Altera Cyclone II FPGA pins through an I2C serial controller interface. The WM8731 is a versatile audio codec that provides up to 24-bit encoding/decoding of audio signals in various sampling rates ranging from 8 to 96 KHz.

The codec clock input is generated by dividing the system clock by four using a custom hardware block. The block combines the clock divider logic and the I2S digital audio interface logic as well as options for programming the control registers. The DE2 board’s microphone input port connects the

Methods Used

Method PurposeThreshold detection Detects the voice activity.

LPC analysis Extracts the feature vector.

HMM Characterizes real signals in terms of signal models.

Viterbi decoding Detects the unknown word.

Maximum likelihood estimation Chooses the best matching digit.

90

Page 100: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC-Based Speech-to-Text Conversion

microphone and headset for speech acquisition. The following table shows the codec’s control register settings:

These settings are programmed by setting or resetting of various bits in the control registers as shown in the following table:

The I2S digital audio interface connects to the Nios II processor via a UART. The UART has two additional pins, one for serial transmit data and one for serial receive data, and SOPC Builder connects the UART to the Nios II processor via the Avalon bus. The UART is also supported by the HAL, which makes it easy to get the input from the codec through the UART by reducing communication to simple file-handling functions. The fopen() function opens the UART (/dev/micinput) as a file and the fread() function reads 16 bits by 16 bits whenever required. The data is stored directly in the SDRAM, which contains a buffer array in the heap region. The buffer array is a circular buffer that loads the speech samples and overwrites with new samples when processing finishes.

PreprocessingPreprocessing takes the speech samples as the input, blocks it into frames, and returns a unique pattern for each sample. Figure 3 shows the preprocessing block diagram.

Codec Register Settings

Register SettingADCDAT 16 bit

USB/normal mode Normal mode

Master/slave mode Master

Digital audio interface I2S interface

MCLK input 50 MHz divided by 4

Bit oversampling rate 256 fs

ADC sampling rate 8 KHz

Serial control mode 2-wire interface

Control Register Settings

Register Address Register Name ValueR0 0000000 Left line in 0017

R1 0000001 Right line in 0217

R2 0000010 Left headphone Out 047F

R3 0000011 Right headphone Out 067F

R4 0000100 Analog audio path control 0838

R6 0000110 Power down control 0C00

R7 0000111 Digital audio interface format 0EC2

R8 0001000 Sampling control 1030

R9 0001001 Active control 1201

91

Page 101: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 3. Preprocessing Block Diagram

Voice Activity DetectionThe system uses the endpoint detection algorithm to find the start and end points of the speech. The speech is sliced into frames that are 450 samples long. Next, the system finds the energy and number of zero crossings of each frame. The threshold energy and zero crossing value is determined based on the computed values and only frames crossing the threshold are considered, removing most background noise. We include a small number of frames beyond the starting and ending frames so that we do not miss starting or ending parts that do not cross the threshold but may be important for recognition.

Pre-Emphasis The digitized speech signal s(n) is put through a low-order LPF to flatten the signal spectrally and make it less susceptible to finite precision effects later in the signal processing. The filter is represented by the equation:

H(z) = 1 - az-1 where a is 0.9375.

Frame BlockingSpeech frames are formed with a duration of 56.25 ms (N = 450 sample length) and an overlap of 18.75 ms (M = 150 sample length) between adjacent frames. The overlapping ensures that the resulting LPC spectral estimates are correlated from frame to frame and are quite smooth. We use the equation:

Xq(n) = s(Mq + n)

with n = 0 to N - 1 and q = 0 to L - 1, where L is the number of frames.

Sampled Speech Input

Block intoFrames

FL M*

Finding MaximumEnergy & MaximumZero Crossing Rate

for All Frames

Finding Starting& Ending of theSpeech Sample

VAD

Pre-EmphasisHPF

OverlappingEach Frame

WindowingEach Frame

Auto-CorrelationAnalyss

M*

W

LPC/CepstralAnalysis

LPC ParameterConversion

WindowingEach FrameW

Weighted CepstrumOutput for theGiven Sample

M* - Overlap Frame Size = FL/3FL - Frame Length = 450 Samples

92

Page 102: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC-Based Speech-to-Text Conversion

WindowingWe apply a Hamming window to each frame to minimize signal discontinuities at the beginning and end of the frame according to the equation:

x’q(n) = xq(n). w(n)

where w(n) = 0.54 = 0.46 cos(2πn/N - 1).

Autocorrelation AnalysisWe perform autocorrelation analysis for each frame and find P + 1 (P = 10) autocorrelation coefficients according to the equation:

where m = 0, 1, ..., P

The zeroth autocorrelation coefficient is the energy of the frame, which was previously used for VAD.

LPCThe system performs LPC analysis using Levinson and Durbin’s algorithm to convert the autocorrelation coefficients to the LPC parameter set according to the following equations:

E(0) = rq(0)

for 1 ≤ i ≤ P

ai(i) = ki

aj(i) = aj

(i-1) - ki. a(i-j)(i-1)

E(i) = (1 - ki2) E(i-1)

where am10 1 ≤ m ≤ P are the LPC coefficients.

Cepstral Coefficient ConversionThe cepstrum coefficients provide a more robust, reliable feature set than the LPC coefficients. We use the following equations:

c0 = lnσ2

where σ is the gain of the LPC model.

rqm x'q n( )x'q n m+( )

n 0=

N 1– m–

∑=

Ki rq i( ) aji 1–( ) rq i j=( )( )⋅

j 1=

L 1–

∑⎩ ⎭⎪ ⎪⎨ ⎬⎪ ⎪⎧ ⎫

⎩ ⎭⎪ ⎪⎨ ⎬⎪ ⎪⎧ ⎫

Ei j–⁄=

93

Page 103: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

for 1 ≤ m ≤ P

for P < m ≤ Q

Parameter WeighingBecause the lower-order cepstral coefficients are sensitive to the overall slope and the higher-order coefficients are sensitive to noise, we need to weigh the cepstral coefficients with a tapered window to minimize the sensitivity. We weighed the coefficients using a band-pass filter of the form:

wm = [1 + (Q/2). sin (πm/Q)] for 1 ≤ m ≤ Q

Temporal Cepstral DerivativeWe can obtain improved feature vectors for the speech frames using temporal cepstral derivatives. We use them with the cepstral derivative if the cepstral coefficients do not have an acceptable recognition accuracy.

Vector QuantizationA codebook of size 128 is obtained by vector quantizing the weighted cepstral coefficients of all reference digits generated by all users. The advantages of vector quantization are:

■ Reduced storage for spectral analysis information.

■ Reduced computation for determining the similarity of spectral analysis vectors.

■ Discrete representation of speech sounds. By associating phonetic label(s) with each codebook vector, choosing the best codebook vector to represent a given spectral vector is the same as assigning a phonetic label to each spectral speech frame, making the recognition process more efficient.

One obvious disadvantage of vector quantization is the reduced resolution in recognition. Assigning a codebook index to an input speech vector amounts to quantizing it, which results in quantization errors. Errors increase as the codebook size decreases. Two com algorithms are commonly used for vector quantization: the K-means algorithm and the binary split algorithm.

In the K-means algorithm, a set of L training vectors can be clustered into M (<L) codebook vectors, as follows:

■ Initialization—Arbitrarily choose M vectors as the initial set of codewords in the codebook.

■ Nearest neighbour search—For each training vector, find the codeword in the current codebook that is closest and assign that vector to the corresponding cell.

cm amkm----⎝ ⎠

⎛ ⎞ ck am k–⋅ ⋅

k 1=

m 1–

∑+=

cmkm----⎝ ⎠

⎛ ⎞ ck am k–⋅ ⋅

k 1=

m 1–

∑=

94

Page 104: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC-Based Speech-to-Text Conversion

■ Centroid update—Update the codeword in each cell using the centroid of the training vectors assigned to that cell.

■ Iteration—Repeat the above two steps until the average distance falls below a preset threshold.

Our implementation uses the binary split algorithm, which is more efficient than the K-means algorithm because it builds the codebook in stages as described in the following steps:

1. Design a 1-vector codebook, which is the centroid of the entire training set and hence needs no iteration.

2. Double the codebook by splitting each current codebook yn according to the rule:

yn+ = yn(1 + e)

yn- = yn(1 - e)

where n varies from 1 to the codebook size and e is the splitting parameter.

3. Use the K-means iterative algorithm to obtain the best set of centroids for the split codebook.

4. Iterate the above two steps until the required codebook size is obtained.

TrainingThe system trains the HMM for each digit in the vocabulary. The same weighted cepstrum matrices for various users and digits are compared with the codebook and their corresponding nearest codebook vector indices is sent to the Baum-Welch algorithm to train a model for the input index sequence. The codebook index is the observation vector for the HMM model. The Baum-Welch model is an iterative procedure and our system limits the iterations to 20. After training, we have three models for each digit that correspond to the three users in our vocabulary set. We find the average of the A, B, and n matrices over the users to generalize the models. Figure 4 shows the HMM training flow chart.

95

Page 105: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 4. HMM Training Flow Chart

RecognitionThe system recognizes the spoken digit using a maximum likehood estimate, i.e., a Viterbi decoder. The input speech sample is preprocessed to extract the feature vector. Then, the nearest codebook vector index for each frame is sent to the digit models. The system chooses the model that has the maximum probability of a match. Figure 5 shows the recognition flow chart.

Compress the VectorSpace by Vector Quantization

Find the Index of the Nearest Code Book (in a Euclidian Sense) Vectorfor Each Frame (Vector)

of the Input Speech

Train the HMM for EachDigit for Each User &

Average the Parameters(A, B, π) over All Users

Append WeightedCepstrum for Each User

for Digit (0 - 9)

Preprocessing

Speech SampleInput for Each Digit

Trained ModelsFor Each Digit

Code Book of Size 128

Weighted Cepstrum

96

Page 106: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC-Based Speech-to-Text Conversion

Figure 5. Recognition Flow Chart

PC Software for StorageData goes to the PC for storage as shown in Figure 6.

Find the Index of the Nearest Code Book (in a Euclidian Sense) Vectorfor Each Frame (Vector)

of the Input Speech

Find the Probabilityfor the Input Being

Digit K = 1 to M

Find the Model with theMaximum Probability &the Corresponding Digit

Preprocessing

Sample Speech Inputto be Recognized

Recognized Digit

Code Book Input

Trained Digit Modelfor All Digits for All Users

97

Page 107: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 6. Text Storage on PC Flow Chart

Design FeaturesOur system has the following features:

■ It is a standalone system for converting speech to text without using a PC for recognition.

■ It optimally uses resources with custom instructions implemented as custom hardware blocks and peripherals.

■ Runtime data acquisition and recognition makes the system suitable for practical use in ATMs, telephones, etc.

■ Storing the output text to a PC aids logging and archiving.

■ It uses a variety of features and components available for Nios II-based development, such as PIO, UART, and RS-232 communication.

In the future, we would like to implement a TCP/IP stack so that we can provide a standalone speech-to-text conversion engine/server on the network.

Ask Filename

Start

Stop

SerialPort DataReceived?

Open File

Copy Serial Bufferto File

Close File

Y

N

SerialPort Data

Over?

Y

N

98

Page 108: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC-Based Speech-to-Text Conversion

Performance ParametersThe important performance parameters for our project are:

■ Recognition accuracy—The most important parameter in any recognition system is its accuracy. A recognition accuracy of 100% for all digits, independent of the speaker, is the goal.

■ Recognition speed—If the system takes a long time to recognize the speech, users would become restless and the system loses its significance. A recognition time of less than 1 second is required for the project.

■ Resource usage—Because the system is implemented in an FPGA, the resources required to implement the project are important. The FPGA has limited resources, such as logic blocks, memory cells, phase-locked loops (PLLs), hardware multipliers, etc. that must be used efficiently. Our project had to fit into a Cyclone II EP2C35F672C6 device, which has 33,216 logic elements, 82,016 memory bits, 475 pins, 4 PLLs, and 70 embedded multiplier elements.

Although we tried to maximize the system performance, there were some bottlenecks. The following sections describe the actual system performance.

Recognition AccuracyWe studied the recognition accuracy for four cases:

■ Case 1—Used samples from speakers 1, 2, and 3 that were also used for training.

■ Case 2—Used samples from speakers 1, 2, and 3 that were not used for training.

■ Case 3—Used samples of 3 untrained speakers (4, 5, and 6) whose voices were not used for training.

■ Case 4—Used samples from speakers 1, 2, 3, 4, 5, and 6 to obtain the overall recognition performance.

The results are summarized in the following table.

Recognition Accuracy Test Results

Digit Recognition PercentageCase 1 Case 2 Case 3 Case 4

0 100 100 85.7 95.2

1 100 85.7 71.4 85.7

2 100 85.7 57.1 80.9

3 100 85.7 57.1 80.9

4 100 100 71.4 90.5

5 100 100 71.4 90.5

6 100 85.7 57.1 80.9

7 100 100 71.4 90.5

8 100 100 57.1 85.7

9 100 100 71.4 90.5

99

Page 109: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Our system had an overall recognition accuracy of more than 85%.

Recognition SpeedThe recognition time of the system was 3.54 seconds for a vocabulary of ten digits.

Resource UsageThe Cyclone II FPGA on the DE2 board provided many more resources than we needed to implement our system: we used fewer than 20% of the available resources. The following table summarizes the resource usage of the vitercopy custom instruction and the overall system. Refer to the Screenshots section for more information.

ConclusionOur project implements a speech-to-text system in which a Nios II processor performs the recognition process. The Nios II processor is very useful for implementing the application logic using C code. We implemented computationally intensive tasks as custom blocks and the SOPC Builder helped us integrate these blocks into the Nios II system.

The Cyclone II FPGA, DE2 board, Quartus II software, Nios II IDE, and SOPC Builder provided a suitable platform for implementation our embedded system, which required both hardware and software designs.

Our speech-to-text system has a recognition time of about 3 seconds and a recognition accuracy of ~85%.

ScreenshotsThe following figures provide additional information.

vitercopy Custom Instruction Resource Usage

Resource vitercopy Custom Instruction Usage

Complete System Usage

Total combinational functions 389 3,650

Logic element usage by number of LUT inputs4 input functions3 input functions<=2 input functionsCombinational cells for routing

851681360

1,8091,1466951

Logic elements by modeNormal modeArithmetic mode

222167

3,147503

Total registers 264 2,699

I/O pins 109 47

Total memory bits 10,848 82,016

Maximum fan-out node clk 4

Maximum fan-out 281 1

Total fan-out 2593 CLOCK_50

Average fan-out 3.23 2684

100

Page 110: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC-Based Speech-to-Text Conversion

The Final Project Block Description File

101

Page 111: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Synthesis Report of Custom Hardware Block

Fitter Report of Custom Hardware Block

102

Page 112: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC-Based Speech-to-Text Conversion

Compilation Report of Custom Hardware Block

103

Page 113: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Synthesis Report of Complete Design

Fitter Report of Complete Design

104

Page 114: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC-Based Speech-to-Text Conversion

Compilation Report of Complete Design

105

Page 115: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Training Software and Output

Nios II Program Code Memory Usage

106

Page 116: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC-Based Speech-to-Text Conversion

Sample Output for Digit 5

VB Program for PC Storage

107

Page 117: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

ReferencesGarg, Mohit. Linear Prediction Algorithms. Indian Institute of Technology, Bombay, India, Apr 2003.

Li, Gongjun and Taiyi Huang. An Improved Training Algorithm in Hmm-Based Speech Recognition. National Laboratory of Pattern Recognition. Chinese Academy of Sciences, Beijing.

Quatieri, T.F. Discrete-time Speech Signal Processing. Prentice Hall PTR, 2001.

Rabiner, Lawrence and Biing-Hwang Juang. Fundamentals of Speech Recognition. Prentice Hall PTR, 1993.

Rabiner, Lawrence. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, vol 77, issue 2, Feb 1989 (257-286).

Turin, William. Unidirectional and Parallel Baum–Welch Algorithms. IEEE Transactions on Speech and Audio Processing. vol 6, issue 6. Nov 1998 (516-523).

Altera Nios II documentation.

AcknowledgementsWe are very grateful to our project instructor and guide, Dr. B.Venkataramani, Professor, Department of ECE, NIT, Trichy for his valuable support and ideas from start to finish.

We also want to thank Ms. Amutha, Research Assistant, Department of ECE, NIT, Trichy for always being there with us to help in the lab.

We express our gratitude to all the students, scholars, and faculty who work around-the-clock in our VLSI lab for their suggestions, help, and comments.

We also thank all peer Nios II developers from niosforum.com, other universities, and worldwide for their valuable help and support.

Finally, we thank Altera for providing with tools for the project, including the FPGA, software, board, documentation about this contest, and a platform to showcase our designs.

108

Page 118: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Electronic Photo Album

Second Prize

Nios II Embedded Electronic Photo Album

Institution: Electrical Engineering Institute, St. John’s University

Participants: Hong-Zhi Zhang, Wei-Ming Yeh, and Wei-Min Yang

Instructor: Rui-Xi Chen

Design IntroductionDigital photos are very popular on the Internet; unfortunately, unauthorized distribution and use of these photos is very common. To prevent unauthorized use, an author can place a watermark on the image, but this technology has not been implemented in an embedded system. Without an instant watermark and encryption option on digital cameras and e-albums, the user must manipulate online photos with a computer or leave them unmarked.

To protect the author’s copyright effectively, we used hardware/software co-design to create a Nios® II embedded photo album that provides instant raw data preset processing. Typically, digital photos are saved in compressed JPEG format. To encrypt a digital watermark, you must recover the photo from the archive in which it was saved. However, if you encrypt the photo before it is compressed, you do not have to perform as many operations. We implemented the watermark encryption technology using the μClinux v2.6.11 embedded system and a hardware drive.

Our design can be incorporated into digital cameras and camera phones that can save photos or deliver them to the Internet via a wireless network instead of a computer. If the author’s watermark is added to the photo before it is stored into an archive, the photo can then be uploaded to the Internet directly without requiring additional processing software.

Function DescriptionOur watermark encryption technology and the resulting marked photos must be transparent, robust, secure, and unambiguous. Additionally, the embedded system watermark (ESWM) photos can display

109

Page 119: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

functions of the picture. We used the Nios II processor and hardware/software co-design concepts to perform the following functions:

■ Use a discrete cosine transform (DCT) operation to separate the major images and the elements that are not sensitive to the human eye.

■ Determine which frequency domain to use when adding the watermark so that the photo is not damaged.

■ Use random number seeds to scatter the watermark.

■ Retrieve the scattered image points.

■ Store the program and photo onto a compact flash (CF) card.

■ Migrate the μClinux 2.6.11 kernel, establish a file system, create a Boa network server and application program.

■ Use a built-in network server to display the watermarked photo on a web site.

■ Directly embed the watermark.

Performance ParametersThe time it takes to compute an algorithm determines whether hardware is more efficient for the operation. We used profiling software to detect the time spent on all of the the ESWM program’s operations. Additionally, the profile software assessed the algorithm’s efficiency for each sector.

We used a standard test image of “Lena” (a 24-bit, 128 x 128 color image) as the photo to watermark. See Figure 1.

Figure 1. Standard Testing Image of Lena

We use a grayscale 32 x 32 image for the watermark (see Figure 2).

110

Page 120: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Electronic Photo Album

Figure 2. Watermark

When we execute the ESWM program, we receive information that can be used to apply or remove the watermark. Figures 3 and 4 show the efficiency analysis for applying and removing the watermark. The colored pie slices represent the amount of time spent on a specific operation.

Figure 3. Efficiency Analysis of the Watermark Application Program

Figure 4. Efficiency Analysis of the Watermark Removal Program

When we analyzed the efficiency of the watermark application and removal process, the color space converter (CSC), discrete cosine transform (DCT), and rearrange (embedding and quantization) operations occur most frequently and use the most resources. Seen from the program code of the photo, the CSC and DCT functions read pixel information to operate; therefore, the number of operations performed vary with the size of the photo. That is, the bigger the photo, the more time is required for the computation. The embedding and quantization process uses a random number generator. If the generator repeats numbers, it will cause collisions when the application rearranges the photo. Most collisions are resolved by re-selecting the random numbers. A larger photo leads to a greater possibility of collision, which is the source of the bottleneck.

CSC

DCT

SHA1

Rearrange

EmbedMean

CSC

DCT

SHA1

Rearrange

Extract

Mean

111

Page 121: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

When we created our Nios II system, we used our watermark application and removal program, as well as the operational analysis. When the system applies the watermark, it first treats the photo with a picture process (PICP) program that includes CSC and a forward discrete cosine transform (FDCT). Then, we apply the watermark process (WMP) operations (CSC, FDCT and quantization (Q)). Finally, we integrate the frequency domain and recover the three areas of the operation program in the time domain. Table 1 shows the watermark application process for the 128 x 128 Lena photo and the 32 x 32 watermark implemented on a Cyclone® II EP1C20 50-MHz device.

Notes to Table 1:(1) PICP: CSC (RGB to YCbCr) + FDCT(2) WMP: CSC (RGB to YCbCr) + FDCT + Q(3) Integration Process: Embedded + inverse discrete cosine transform (IDCT) + CSC (YCbCr to RGB)

To remove the watermark, the program executes the decode watermark process (DeWMP), performs dequantization, returns from the frequency to the time domain, and converts from the YCrCb to RGB color space. This process provides a deintegration operation, which occurs in two areas to remove the watermark.

Notes to Table 2:(1) Deintegration Process: Dequantization + inverse DCT (IDCT) + CSC (YCbCr to RGB)(2) DeWMP: CSC (RGB to YCbCr) + FDCT + Extracted

Design ArchitectureFigure 5 shows the system architecture.

Table 1. Watermark Application Process

Process CSC (Types)

DCT (Types)

Q (Types)

Embed (Types)

PICP (1)

WMP (2)

Integration Process (3)

Total Time

FLOAT_CESWM Float Float Float Short 24.1 3.23 10.47 37.77

FLOAT_CI _CESWM Float custom instruc-tion (CI)

Float CI Float CI Short 4.34 0.72 2.82 7.88

Table 2. Watermark Removal Process

Process CSC (Types)

DCT (Types)

Q (Types)

Extract (Types)

Deintegration (1)

DeWMP (2)

Total Time

FLOAT_CESWM float float Float Short 21.91 3.17 25.08

FLOAT_CI _CESWM float CI float CI float CI Short 4.11 0.41 4.52

112

Page 122: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Electronic Photo Album

Figure 5. System Architecture

Figure 6 shows the system block diagram.

Figure 6. System Block Diagram

Design MethodologyThis section describes our design methodology, including the method we used to apply the watermark and the steps we used to implement the system. To apply the watermark, we used a DCT to convert the image to the frequency domain, examined the position of the frequency band, and added random number functions to scatter the watermark throughout the photo, making it robust. To implement the system, we used a watermark algorithm as the program code. We tested each stage and did not add new functions until the established functions were stable. According to our efficiency analysis, we needed to hardware accelerate the CSC and DCT operations.

User Program

C StandardLibrary

μC LinuxAPI

HAL API

DeviceDriver

DeviceDriver

DeviceDriver...

Nios II Processor System Hardware

JTAGDebug Module

JTAG Connectionto Software Debugger

Nios II Processor Core

FloatSDRAMMemory

SDRAMController

UART

Timer1

Timer2

LCD Display Driver

General-Purpose I/O

Ethernet Interface

Compact FlashInterface

On-Chip ROM

Tri-State Bridge toOff-Chip Memory

FlashMemory

SRAMMemory

AvalonSwitchFabric

Data

Instruction

TXDRXD

LCDScreen

Buttons,LEDs, etc.

EthernetMAC/PHY

CompactFlash

Reset Clock

113

Page 123: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Implementation MethodFigure 7 shows the architecture for applying the watermark.

Figure 7. Watermark Application Architecture

We used the following method to apply the watermark.

1. Read the original image and perform a CSC operation, converting the original image from RGB to YCbCr.

2. Starting with YCrCb in the original image, perform an FDCT conversion and work out the DCT coefficient of the image. Look in the coefficients to find a medium or low frequency parameter that will be the starting position to apply the watermark.

3. Read the user’s passwords, use an encryption algorithm to create two groups of random number seeds, read in the watermark (converting it from RGB to YCrCb), and use the first group of random number seeds for embedding and Q to scatter the watermark.

4. Perform an FDCT conversion of the scattered watermark to obtain the watermark’s DCT coefficient. Take the average of the medium frequency, and let the quantification form make a digital quantization of all the coefficients, thus obtaining the quantified watermark DCT coefficient.

5. Use the second group of random number seeds on the quantified watermark DCT coefficient to replace, in random order, the low frequency parameter value of the original image.

Host Image

RGB -> YUV

FDCT

Watermark

YUV -> RGB

Random Rearrange PasswordKey1

FDCT

CalcMFMean

GenQTable

Quantization

Random RearrangeKey2

Embed

IDCT

YUV -> RGB

Output

SaveQTTable

PICP WMP

Integration

114

Page 124: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Electronic Photo Album

6. Perform an IDCT conversion of the modified image to get the YUV image embedded with watermark information. Perform a CSC operation, converting from YCbCr to RGB, thus obtaining the image embedded with the watermark.

Figure 8 shows the architecture for removing the watermark.

Figure 8. Watermark Removal Architecture

We used the following method to remove the watermark.

1. Read the watermarked image and perform a CSC operation, converting from RGB to YCbCr.

2. Perform an FDCT conversion of YCbCr. Find the position of the medium and low frequency parameter and use it as the starting point to remove the watermark.

3. Read the user’s password. Create two groups of random number seeds with an encryption algorithm. Use the second group of random number seeds to remove the watermark from the medium and low frequency parameters in a non-random order.

4. Read the quantization form and perform dequantization processing of the material that is removed. Perform an IDCT operation on the processed information to obtain the original watermark.

5. Use the first group of random number seeds to non-randomly rearrange the watermark. Convert from YCbCr to RGB to the original watermark image.

Software ImplementationWhen we began this project, we started by creating a basic flow and then added functionality. We tested the software at each stage to ensure that it operated correctly. The basic flow in our first ESWM

Watermarked Image

RGB -> YUV

FDCT

Random Rearrange PasswordKey2

Dequantization

IDCT

Random Rearrange

YUV -> RGB

Output

Key1

Deintegration

DeWMP

Extract

LoadQTTable

115

Page 125: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

software version was: read the archive, perfom FDCT, incorporate the watermark into the image, perform IDCT, and output to the archive. The following sections describe the functionality we added to subsequent versions of the ESWM software.

ESWM v0.2: Reading and Managing the Watermark Image We used a bitmap (BMP) file as our watermark image. The BMP image format is easy to use and is universal. Additionally, the color information in BMP files is stored as RGB and is not compressed, making it easy to work with.

ESWM v0.4: Applying and Removing the Watermark In this version we implemented basic watermark application and removal functions, such as FDCT, IDCT, quantization, and frequency domain analysis, and experimented with different ways of applying the watermark. We needed to add a strong watermark that could resist attack. The major technologies we used are described as follows:

■ DCT—The DCT operation is divided into FDCT and IDCT, with conversion from space domain information to frequency domain information and back. This operation allowed us to distinguish the photo’s high, medium, and low frequencies and designate the correct area in which to add the watermark.

■ Quantization—Quantization adjusts the information after the watermark image goes through DCT conversion, so that the watermark information value is similar to the value of the medium and low frequency parameters of the original image to be watermarked. This process cushions the impact of the watermark on the original photo.

ESWM v0.6: Using a Random Number TechniqueIn this version, we focused on keeping the watermark from damaging the original image. In our previous software version, applying the watermark only uses FDCT and IDCT, and a slight breakage can cause massive loss of the hidden watermark media (see Figure 9). To solve this problem, we added a random number function before applying the watermark to protect the watermark even if the image is damaged. Essentially, we scattered the physical breakage to each part of the watermark so that the watermark would still be distinguished (see Figures 10 and 11).

Figure 9. Damaged Image

116

Page 126: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Electronic Photo Album

Figure 10. Watermark Removed without Random Number Technique

Figure 11. Watermark Removed with Random Number Technique

ESWM v0.8: Improving JPEG Compression Resistance In this version we focused on improving resistance to breakage during JPEG compression. Before applying the watermark, we used the YCbCr function and then performed the CSC operation on the original photo. Then, we converted from RGB to YCbCr and hid the watermark in the Y data, preserving the watermark when the image is JPEG compressed. Additionally, we improved software efficiency. These improvements are described as follows:

■ YCbCr—YCbCr is also a color space. JPEG compression converts the information in the RGB color space into YCbCr, in which Y refers to the brightness, and Cb and Cr refer to the chroma. The data is converted into this mode because the human eye is more sensitive to brightness than to chroma. Therefore, JPEG compression preserves the brightness data but only preserves part of the chroma by sampling some proportion of it to reduce the information. Hiding the watermark in the Y segment protects the watermark from any damage resulting from reducing the samples.

■ Quantified Breakage of JPEG Compression—JPEG compression also involves quantization processing. The JPEG built-in standard quantization form removes most of the DCT high-frequency coefficients, leaving only half of the information. That is, half of the DCT coefficient values in an 8 x 8 area are changed to 0, according to the device and adjustment standard for the compression quality. The quantization affects the amount of information saved in the images. To strengthen the ability of the watermark to resist JPEG compression, we performed numerous tests to find the most appropriate place to store the watermark information.

ESWM v1.0: Use a Hash Function for SecurityPrevious versions of the ESWM program used fixed random number seeds to execute the scattering work. In this version, we used a secure hash algorithm (SHA-1) encryption technique to ensure the

117

Page 127: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

security of the system. We added a function to request the user to enter a password. Then, we used the SHA-1 to convert the password into two groups of random number seeds to improve security.

ESWM v1.2: Improve DCT ConversionIn this version, we used the quick DCT calculating program created by Mr. Takuya Ooura to enhance software efficiency. The ESWM software can execute the algorithm correctly, read the bitmap image, and apply a watermark. Alternatively, it can remove the watermark from a watermarked image.

CSC and DCT Principles and ArchitectureThis section describes the CSC and DCT functions, which we implemented in hardware to accelerate the functionality.

CSC and Improving Logic EfficiencyThe CSC operation requires many multiplication and addition operations. Converting an RGB pixel to a YUV pixel requires 9 multiplication and 6 addition operations, which equates to 2,359,296 multiplications and 1,572,864 additions for a 512 x 512 image. Because the software spends a long time on these operations, we decided to use the Nios II processor with a CSC hardware accelerator to enhance the circuit’s efficiency. We define the CSC matrix algorithm in Petri net mode.

Petri net N = (P, T, A, w, ) of the CSC operation has the following architectures:

P = {p0, p1, p2, p3,.... p14}

T = {t1, t2, t3,.... t9}

A = {( p1, t1), (t1, p2), ( p2, t2), (t2, p2), ( p3, t3), (t3, p1), (t1, p4), (t2, p5), (t2, p6), (t3, p7), ( p4, t4), ( p5, t7), ( p6, t5), ( p7, t6), ( p8, t4), ( p8, t5), ( p8, t6), (t4, p11), (t5, p9), (t6, p10), ( p9, t8), ( p10, t9), ( p11, t7), (t7, p12), ( p12, t8), (t8, p13), ( p13, t9), (t9, p14)}

This model is concurrent, including 14 places, 9 transitions, and 28 arcs. See Figure 12.

118

Page 128: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Electronic Photo Album

Figure 12. Petri Net Algorithm of the CSC Accelerator

As shown in Figure 12, the CSC algorithm circuit architecture we implemented is a four-layer pipeline with a two-layer value mode adjustment and a limitation pipeline. See Figure 13.

Figure 13. CSC Implementation Architecture

Figure 14 shows the CSC simulation waveform.

119

Page 129: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 14. CSC Simulation Waveform

DCT AlgorithmAccording to the efficiency testing we performed, the DCT operation is a bottleneck in the watermark algorithm. Therefore, we decided to implement the DCT by integrating hardware and software to improve performance. For the two-dimensional DCT/IDCT, we implemented a straightforward design that compresses and decompresses the image using a Nios II processor.

1-dimensional (1D) DCT formula:

c(n)= { 1/(square root of 2) if n = 0 1, otherwise }

To simplify the cosine operation in 1D DCT conversions, it helps to establish the value of the DCT cosine ahead of time, as shown in Figure 15. Because floating-point operation is not ideal for implementation in a hardware circuit, the values in Figure 15 are multiplied by 256 (i.e., moved left 8 bits) to determine the closest fixed-point numbers (see Figure 16).

Figure 15. DCT Cosine Values

120

Page 130: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Electronic Photo Album

Figure 16. Fixed-Point Values

Figure 17 shows the equation for the case of the eight-point 1D DCT operation.

Figure 17. 8-Point 1D DCT Operation Equation

Design FeaturesOur design contains the following features:

1. Provides digital watermarking technology as the core function. Using this technology, the design allows the user to embed a signature and establish a photo authentication mechanism to protect the rights of the owner.

2. Develops the system for a Cyclone II EP1C20 device using the Nios II processor and μClinux environment.

3. Makes it difficult to break the watermark, and uses the most appropriate frequency domain for the tests.

121

Page 131: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

4. Allows image manipulation functions, such as rotating, cutting, and changing the image size. The watermark can still be identified in the edited photos.

5. Uses CF cards to store the images and display the watermarked images on a web site.

ConclusionIn our project, the Nios II embedded e-album, the watermark encryption algorithm uses YUV space conversion, as well as DCT, to convert the image to the frequency domain. Quantization processing enhances the image robustness, and we explore the appropriate frequency band in which to apply the watermark to achieve transparency and resistance to JPEG compression breakage. We added a random number function to the embedded algorithm to scatter the watermark throughout the image, protecting the watermark even if the photo is cut or damaged.

After completing the watermarking software, we analyzed the time required for each function using profile software to determine the efficiency. We added in the Nios II version 6.0 floating-point ordering command, accelerating the operational speed of the CSC, DCT, and quantization floating points. We then compared the time differences when using the floating-point ordering command, not using it, or using only software. Our result showed that using the floating-point ordering command is 5 times faster than not using it.

To fully implement the e-album, we needed to have a strong combination of input, output, and human-computer interface. Therefore, we used the μClinux version 2.6.11 kernel on a Cyclone II EP1C20 device and established a file system. Then we created the application program, controlled the CF storage device, generated a Boa web server to display the results, and used a web site as the development environment for the whole system. To implement these functions, we used the Nios II architecture to deploy the relevant algorithms, generate the major input and output (I/O) devices, and drive the program. When selecting operating system features, we needed to consider support, adjustability, and memory allocation flexibility of the Nios II system and the required device so the system can perform properly.

122

Page 132: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

Third Prize

Automatic Scoring System

Institution: Huazhong University of Science & Technology

Participants: Ya-bei Yang, Zun Li, and Yao Zhao

Instructor: Xiao Kan

Design IntroductionHistory records what happened in the past. Do you remember the 23rd Olympic Games in Los Angeles? Xu Haifeng, the Chinese athlete, won China’s first gold medal for shooting. Do you remember the 1992 Olympic Games in Barcelona? Zhang Shan, a female Chinese athlete, outperformed all male athletes and became the only female champion in the history of skeet shooting. Do you remember the 1996 Olympic Games in Atlanta and the 2000 Olympic Games in Sydney? The shooting athlete Yang Ling twice won the 10 meter running target championship and became the only repeat winner until today. Do you remember the 2004 Olympic Games in Athens? In an unprecedented sweep, the Chinese athletes Du Li, Wang Yifu, Zhu Qi’nan, and Jia Zhanbo won 4 gold, 2 silver, and 3 bronze medals, and the five-starred red flag flew time and time again on the Olympic field. Over the past 50 years, Chinese shooting athletes have won 14 Olympic gold medals, 113 world championships, and broken records 117 times. In short, shooting has become a competitive Chinese sport. We strongly believe that Chinese athletes will make additional progress in the upcoming 2008 Olympic Games in Beijing.

While taking pride in Chinese athletes’ achievements, have you paid attention to the software and hardware used for Chinese shooting? We investigated which automatic targets and electronic scoring systems Chinese athletes used, and found that most are not made in China: they are imported. For example, we visited the Hubei Shooting Management Center to investigate their setup. The Hubei shooting ground only has two or three sets of devices, which are imported from Germany. Each set costs about 100 thousand RMB. Therefore, the athletes do not use the devices for daily training: they only use them for official situations. Even the automatic targets and electronic scoring systems that the athletes will use in the 2008 Olympic Games in Beijing are imported by BOCOG from Swizerland (from Sius). This research does not even consider the devices used for our troops, police, and training grounds!

We need a cheap, practical automated target and electronic scoring system. If we can develop this product, it will significantly improve the shooting skill of China’s athletes and modernize China’s shooting devices.

123

Page 133: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Design ObjectiveOur goal was to develop a small, intelligent automatic scoring system for training and shooting competitions. We placed a high-resolution camera in front of the target (at the bottom) that takes a real-time image of the target plane. We then use an FPGA to analyze the shot, calculating the bullet’s location and determining it’s position on the target. The bullet hole is tracked for each shot, helping the human replacing the target to avoid danger.

Related Research and Current SituationThere are four basic types of automated scoring systems:

■ Double-layer electrode short-circuit sampling—This method has a very low reporting rate and does not work for round or short headed pistols. Additionally, it does not work for tracer bullets and has a low reporting rate for ordinary long bullets.

■ Laser diode array—In this method, a laser diode and the receiving tube are placed in a line. The bullet blocks the diode-emitted laser when the bullet passes through the diode, which generates a pulse at the laser receiving circuit. The background circuit can determine the circular number based on this signal. The system can report the target’s circular number without target paper, but it is difficult to improve the reporting accuracy because small-diameter laser diodes are not available.

■ Sound positioning—Today’s media frequently shows the sound positioning scoring system. This method uses four sound sensors on the four corners of the target. It uses the noise the bullet makes when it passes through the target to determine the bullet’s position. Because it uses a lot of technology, the sound sensor is very expensive. Additionally, the system has comparatively high on the target plane requirements, because the target paper must be made of special material so that it generates enough sound. These restrictions will prevent wide use of the system.

■ Image processing—The system determines the bullet hole’s position by analyzing the bullet’s image on the target plane. Theoretically, this method is very accurate, even better than sound positioning. With a camera that has millions of pixels, the system can meet competition demands at a low price. Provided the costs can be lowered enough, this system can be used in a wide range of applications, can help improve Chinese athlete’s shooting ability, and break the automatic scoring system monopoly.

Today, companies and institutes are researching video-processing-based automated scoring systems. However, most of them use a PC to deal with the video. Compared to an embedded system, a PC has poor stability, adaptability, and size, and is not suitable for outdoor use or the special training needs of army and police systems.

Design ArchitectureThis section describes the design architecture and system function.

System FunctionLooking at the shooting field structure, the system is composed of a target-side image processing front end and an athlete-side human-machine interaction back end. The system integrates image processing, dual-machine communication, and human-machine interaction functions. See Figure 1.

124

Page 134: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

Figure 1. General System Functions

Image Processing FunctionThe design has the following image processing functions:

■ Bullet hole identification—The FPGA controls the camera to collect the image, saves the pixel array into the SDRAM on the Development and Education (DE2) board for buffering, and then reads the data to apply a template-matching algorithm and identify the bullet hole. The function converts the coordinates to determine the bullet hole’s coordinates and circular number.

■ Bullet hole tracking (automatic adjustment of the target paper)—We used a closed-loop feedback control mechanism based on the bullet hole identification function. The system can effectively track the bullet hole and control the paper feeding mechanism to adjust the target paper automatically. This setup helps the person replacing the target avoid danger and ensures security.

Communication FunctionThe design has the following communication functions:

■ RS-485 transmission—To meet the transmission distance demand on the front or back end while ensuring correct data transmission, the system uses the RS-485 bus to provide duplex communication between the image processing front end and human-machine interaction back end.

■ Wireless transmission—To increase the system’s flexibility, we use software radio technology to wirelessly transmit data between the front and back end.

Function

LCDDisplays the

Scores

PrintedScore Record

Storage onSD Card

Audio Broadcast

RS-485Transmission

WirelessTransmission

DE2 Board DE2 Board

RS485 bus

Software radio

Bullet HoleIdentification

Bullet HoleTracking

ImageProcessing

Machine VisualSource of Light

Image Tracking

AutomaticControl

Human-MachineInteraction Back End

Image ProcessingFront End

Related Technology

AlteraCyclone II

FPGA

AlteraCyclone II

FPGA

Nios IICPU

Nios IICPU

Voice ShootingReport

TimingService

125

Page 135: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Data Exchange FunctionThe system has the following data exchange functions:

■ LCD display—The display is an important tool for providing information. This system uses a 640 x 480 LCD module to display the real-time competition scores, letting athletes easily access competition information.

■ Printer—Printing is a simple, effective way to store information and is vital rating proof in shooting competitions. Therefore, the micro-printer module is an integral part of our system. We use a TPUP-T16 series dot matrix printer to record the athletes’ scores and thereby calculate the total scores.

■ Voice reporting—To inform the audience of the competition progress, the system has an audio reporting function. The system stores audio material onto a secure digital (SD) card and delivers the data to a 24-bit audio codec when the card is read, resulting in voice broadcasting.

■ Timing service—The International Shooting Sport Federation imposes stringent shooting time requirements. We added a system clock service to the human-machine interaction back end to provide an electronic clock and use the DE2 development board’s four built-in keys to implement the time setup.

System ArchitectureThe system is composed of a target-side image processing front end and the athlete-side human-machine interaction back end. The front-end DE2 board collects and processes the image and communicates with the back-end board. The back-end DE2 board communicates with the front-end board and provides the human-machine interaction environment. Figure 2 shows the architecture.

Figure 2. Automatic Scoring System Architecture

We used two Nios® II processors for the image processing front end and human-machine interaction back end. The Nios II processors control data collection and processing, provide communication, and establish an interactive environment. The RS-485 connection and a radio allow the front and back end to communicate, making the system flexible and adaptable.

Nios II Architecture: Image Processing Front EndThe Nios II system in the image processing front end controls the image sensor, data collection and processing, paper feeding mechanism, and data transmission. Figure 3 shows the detailed architecture.

LED

Printer

Nios IINios II640 x 480

LCD

16 x 2 LCDDisplay Clock

Voice ShootingBroadcast

InputDevice

RS-232

DigitalDecoding

AnalogReceiving &

Demodulation

ConvertRS-232 toRS-485

Human-Machine InteractionBack-End Nios II System

Image-Processing Front-EndNios II System

ConvertRS-232 toRS-485

RS-232 Image Sensor Control

& Image Processing

CMOSImageSensor

ElectricMachine

Control I/O

ElectricMachine

D/A ConversionHigh-Frequency

PowerAmplifier

DigitalIntermediateFrequency

CommunicationCables

DE2 BoardDE2 Board

126

Page 136: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

Figure 3. Nios II Architecture of the Image Processing Front End

The FPGA collects an image with a CMOS camera, saves the pixel array into the DE2 board’s SDRAM for buffering, and reads the data to apply a template-matching algorithm and identify the bullet hole. The Nios II embedded processor converts the coordinates and calculates the bullet hole’s coordinates and circular number. It uses a UART peripheral to switch the RS-485 bus interface, and follows the RS-485 bus specification to make the transmission distance meet the system demand. The self-defined pulse width modulation (PWM) controller peripheral controls the direct current machine that automatically adjusts the target paper. The programmable I/O (PIO) inferface controls the wireless transmission module directly. Because the SDRAM contains the image collecting units, the system self-defines the peripherals (using an interface to user logic) to implement SRAM controller and run the programs in SRAM.

Nios II Architecture: Human-Machine Interaction Back EndThe Nios II system in the human-machine interaction back end receives data sent from the front-end system, displays and prints the results in real time, and provides a human-machine interaction interface, clock service, etc. See Figure 4.

JTAGDebug Module

Nios II Fast Processor

Core

Interfaceto User Logic

EPCSController

SRAMMemory

Tri-State BridgeOff-ChipMemory

AvalonSwitchFabric

Timer

PIO

PIO

PIO

RS-485

Clock Reset

Data

Instr.

Paper FeedingMechanism

Press Key

WirelessTransmission Module

UART

PWM

Image ProcessingUnit

CMOS Camera

System ID

DataAddress

WRRDCS.

127

Page 137: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 4. Nios II Architecture of the Human-Machine Interaction Back End

We take advantage of the peripherals provided with the Nios II processor: two UART peripherals implement the RS-485 switch and printer control. We use multiple PIO peripherals to build the connection between the 640 x 480 LCD controller and the wireless receiving module. A system timer generates a clock and provides a timing service for the peripheral LCD 16207 and the seven-segment numeric display. Switches and keys provide human-machine interaction.

Function DescriptionThis section describes the various modules in our design.

Image Processing ModuleImage processing module is the core of the system; it identifies and positions the target and converts the coordinates. In the module design, we applied an image processing algorithm in hardware and used software to convert the matched coordinates and track the target. Figure 5 shows the image processing hardware circuit.

JTAGDebug Module

Nios II Processor

Core

Interfaceto User Logic

EPCSController

SDRAMMemory SDRAM

Controller

AvalonSwitchFabric

UART1

Timer

User Logic

LCD DisplayDriver

PIO

PIO

PIO

RS-485

Clock Reset

Data

Instr.

LCD DisplayModule

WirelessReceiving Module

Switch

UART2

PIO

Printer

Clock

Press Key

Seven-SegmentDisplay

LCD 16207

128

Page 138: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

Figure 5. Image Processing Hardware Circuit

Image CapturingWe wanted to use a camera to capture the bullet’s image when it goes through the target. We had several options from which to choose:

■ USB camera—USB cameras are the most popular cameras on the market today, and are widely used in network and monitoring applications. However, these cameras have low resolution and short transmission distance of about 5 meters. Solutions that extend the USB line exist. For example, the UIC4101CP USB extension chip can extend a USB signal to about 50 meters, which is sufficient for an ordinary target system. However, this scheme will not work if the system must be upgraded to use longer distances. Additionally, the extended USB line supports only the USB version 1.1 protocol. Therefore, the maximum theoretical transmission speed is 12 megabits per second (Mbps) with an actual engineering transmission speed of about 8 Mbps. These speeds cannot meet our requirement for transmitting a real-time, high-resolution image.

■ AV interface camera—AV interface cameras are widely used for monitoring. These cameras have the same shortcomings as USB cameras, such as a short transmission distance of about 50 meters. Additionally, it is very hard to find a high-resolution camera.

■ CMOS camera—CMOS cameras are used for video conference and security camera applications. They are easy to connect, have high resolution, and are convenient for pixel upgrading.

After comparing our options, we decided to use a Terasic Technologies, Inc 1.3-megapixel digital camera module. This module has a 1.3-megapixel CMOS image sensor chip and a focusing lens. We can use the module to improve the camera resolution easily; it works fine as long as you replace a high-resolution image sensor with one that has the same interface. We controlled this module using the FPGA. Because we did not need an additional control chip, it was easy to develop the system and integrate the FPGA, Nios II processor, and image sensor chip.

The module’s biggest shortcoming is that it’s serial peripheral interface (SPI) signal has a very short transmission distance; the signal is typically transmitted on a board or with a short cable. To address this problem, we implemented the system using two DE2 boards that communicate using a software radio or the RS-485 bus. Using radio transmission, our system becomes very flexible and can work under various conditions. Additionally, using a software radio to perform frequency shift keying (FSK) transmissions has advantages when used with an FPGA and the Nios II processor.

In our system, we used Terasic Technologies’ camera module source code for image collection, which reduced the collection work. The image collection code can display the image on a VGA display, which

CMOSImageSensor

MT9M011

CMOS SensorData Capture

I2C Sensor

Configuration

Bayer Colorto

30-Bit RGBVGA

Controller

Multi-Port SDRAMComtroller

VGADAC

ImageProcessing

Circuits

SDRAM

DATA

FVAL

LVAL

PCLK

MCLK

SDAT

SCLK

129

Page 139: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

helped us build an image-processing platform. We added our image processing code to the collection code. The image collection operates as follows:

1. Configure the image sensor using the I2C bus.

2. Read the collected image dot matrix according to the timing attributes of the image sensor.

3. Use algorithms to convert the dot matrix Bayer color space to the RGB color space.

4. Save the dot matrix data to SDRAM.

5. The VGA display code reads the data from the SDRAM. Because the SDRAM controller has a dual-port control, our image-processing code can read data from the SDRAM.

Image ProcessingThis section describes the image processing algorithms and hardware circuit we used in our design.

Image Processing Algorithm Selection and AnalysisDue to circuit power noise and the image sensor chip’s limitations, the analog-to-digital (A/D) sensor conversion result has a lot of noise. Even if the sensor converts the same image, there is a difference in the image array data produced by different frame numbers generated at a different time. Comparing two images to judge the target position involves image fuzzy processing and other complicated algorithms.

To solve this problem, we use an innovative template matching algorithm. The algorithm has a two-dimensional (2D) filtering effect, which eliminates traditional median filtering and image smoothing and completely removes noise, which accelerates the speed. We were confident that the algorithm could achieve the target check. Figure 6 shows the template matching algorithm.

Figure 6. Template Matching Algorithm

The template matching algorithm performs target searching and positioning. The template matching step compares a specific area in the original image to a known template (in this design it is the target image) of the same size. If the compared image domain value is within a specified range, the images match. We start with the upper left point of the template and image and compare an area of the template to the same size area in the original image. Then, we move to the next pixel and perform the same operation. After all areas are compared, the area that meets our specified domain value is the target that we want.

We use the difference of the pixel domain values to measure the difference of the target template and target plane. The suggested the template size is m x n (i.e., width by height) and the image size is width x height. The coordinates of a template point are x0, y0, the point’s grey level is U (x0, y0), the

ImageHeight

Image Partto be Matched

Image Width

M

Template

130

Page 140: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

coordinates of the coincident point in the image is X0 - x0, Y0 - y0, and the grey level is V (X0 - x0, Y0 - y0), yielding a matched result of:

During matching, the point that meets our specified domain value is the result. Template matching requires considerable operations. The m x n subtraction and m x n - 1 addition must be made for each match, and the whole image requires (width - m + 1) x (height - n + 1) iterations. Additionally, the software processing template matching algorithm performs circular operations, which requires a lot of calculations. The operation volume increases rapidly as the template is enlarged. However, if we perform the process in hardware using a parallel, pipelined architecture, we can increase the processing speed remarkably. With this implementation, enlarging the template does not affect the processing speed, it simply affects the circuit size.

Validating the Image Processing AlgorithmBefore implementing the video processing in hardware, we used software to validate the algorithm. We used software because it is comparatively more mature and is more flexible for computer-based design tuning. Additionally, software provides a variety of image processing functions that we can call directly and conveniently. In contrast, there are only a few methods available for hardware tuning. We used the MATLAB software to validate the algorithm, which proved that our concept was the same as the implementation. Then, we converted the software algorithm into hardware to perform the image processing.

During the algorithm validation, we used the DE2_Control_Panel software that came with the DE2 board to read image data from SDRAM. The Image_Converter_v1.1 software translates the RGB format data from the camera into bitmap format. The MATLAB image processing function processes the picture file in bitmap format. We used the following key MATLAB functions:

Imagedata=imread('target. bmp');Twovalue=im2bw(Imagedata);Imshow(Twovalue);

Figure 7 shows the image from the camera and Figure 8 shows the image after it is converted to binary. The target position is obvious in the binary file. However, if a bright light is added in front of the target plane, the whole target plane has light reflection. In this case, the image sensor’s converted A/D result is saturated and influences the target judgment (the light directly operates the exposure registry in the image sensor chip). Controlling the camera’s exposure time reduces the sunlight input to the camera, which decreases the camera’s value after A/D conversion and makes the target identifiable. This solution solves the problem of a bright light on the target plane, making the system more adaptable and validating the advantages of using the image sensor chip directly. After we chose the right domain value, the target position in the image can be identified.

U x0 y0,( ) V X0 x0– Y0 y0–,( )–

x0 0=

m 1–

∑y0 0=

n 1–

131

Page 141: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 7. Target Plane Image

Figure 8. Target Plane Image after Binarization

Image Processing Algorithm Hardware Circuit and ConceptsThe system’s image processing module has an innovative hardware template matching algorithm. We got this concept from the software template matching algorithm and the hardware median filtering algorithm. It has the speed and agility of hardware processing, which is superior to the software template matching algorithm and the hardware median filtering algorithm regardless of the computing volume or complexity.

We used the shift register from the library of parameterized modules (LPM) in the Quartus® II software, simplifying the system design and optimizing the code. The shift register length is the length of the horizontal line of the image. We use 640 horizontal pixels in the design.

In the design we used a 4 x 4 template, i.e., 16 processing units (PEs), each of which is a domain value computing circuit synchronized with the clock. The 4-level pipeline structure provides parallel processing. When a synchronized clock signal arrives, the template processing structure generates 16 domain values that are sent to the addition circuit to compute the template’s total domain value. We compare the computed result to the preset domain value. If it is smaller than the preset value, the template is assumed to match the target. An external interrupt is sent to the Nios II CPU, allowing the CPU to read the coordinates that matched the target. The remaining work is performed in software. Figure 9 shows the hardware image processing algorithm.

132

Page 142: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

Figure 9. Hardware Image Processing Algorithm Structure

Image Processing Light Source DesignThe light source design plays a key role in the machine’s vision. In fact, the light source determines whether the machine vision is successful. Therefore, we fully consider the light source in the system design. To reduce costs, we use a general fluorescent lamp instead of a professional light source. To solve the fluorescent lamp’s light quality problems, we added a baffle plate behind the target plane to capture the bullets and diffuse the light. The baffle makes the light through the target even and bright, which improves the system’s target identification and enhances it’s adaptability to the external light. This design enhances the system’s adaptability, and light from the incandescent lamp in front of the target plane does not affect the target identification accuracy.

Changing the Coordinate SpaceThe camera must be put on the upper left, upper right, or below the target plane. In these positions, it leans at an angle, which transmutes the collected image to a certain degree. In our design, we place the camera below the target plane with an iron board to protect it. Figure 10 shows the camera position.

SDRAM

Image LineShift Register

Register 1 Register 2 Register 3 Register 4

Addition CircuitDomain Value

Computing

Compare the CircuitDomain Value

Judgement

PE PE PE PE

Meet the Requirements& Output the Result

Nios II CPU

PE

PE Circuit Structure

CLK

iData

rst_n

oData_bak

result

PE PE PE PE

PE PE PE PE

PE PE PE PE

133

Page 143: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 10. Spatial Position of the Camera Relative to the Target Plane

As shown in Figure 10, Hup > Hdown, that is, as a point moves further from the camera, the line collected by the camera becomes longer. Because the camera’s horizontal pixel point is fixed, as two pixels move further away from the camera, the distance between them becomes longer. From a vertical perspective, the two pixel’s image distances collected by the camera are different too. The camera we used has a short focus, so we ignored the focus effect and considered the camera to be a point. This point forms an angle with the upper and lower border, which is divided into the same angles by the camera’s vertical resolution. The target plane’s real distances for each corresponding angle are different. The target plane’s left and right camera also form an angle, which is affected by the vertical direction. Therefore, the problem is one of three-dimensional (3D) coordinate changes. To obtain the actual score, we need to find the actual shooting position. The design finds the actual distance between the target and the center of the target plane, and then computes the corresponding score according to the target plane standard.

Figure 11 shows the relative spatial positions of the target and target plane.

Figure 11. Relative Spatial Position of the Target and the Target Plane

We need to compute the actual target coordinates (shown as a black circle in Figure 11). The horizontal resolution is PixelH and the vertical resolution is PixelV. The coordinates collected during the target’s

Hup

Hv

Hdown

Lh

Hv

Hdown

Lh

134

Page 144: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

image identification are Xtemp, Ytemp. To obtain the actual coordinates and the score, we perform the following computations.

The actual vertical distance from the target to the lower border is:

The actual vertical distance from the target to the center of the target plane is:

The actual distance from the target to the vertical central line is:

The actual distance from the target to the center of the target plane is:

We can compute the actual score with this data. The design uses the target plane for a 25-meter air rifle, in which the actual size of the central circle is 10 mm (score: 2) and other circles are 8 mm (score: 1). The actual score can be computed with the real distance between the target and the center of the target plane. If Lcenter < 5, the actual score is:

If Lcenter >= 5, the actual score is:

Software (Nios II) and Hardware (FPGA) CooperationThe hardware (FPGA) processes the image to obtain the relative position of the target. When the hardware identifies the target, it sends an interrupt signal to the Nios II CPU. The hardware processing result is read after the CPU enters the interrupt service program generated by the image processing hardware unit, i.e., the identified X, Y coordinates. The software transforms the X, Y coordinates into actual coordinates and continues the transfer work. With this method, the system uses both hardware speed and software agility.

Image TrackingAfter the target position is identified, the system adds automated target tracking to adjust and replace the target sheet strip automatically. This process avoids confusing the targets and improves system security by preventing manual interference of the target sheet. If we used a preset distance it would waste the paper strip. Instead, we used a closed-loop control design concept to move the paper strip to the shortest position from which the bullet hole cannot be seen. Figure 12 shows the target tracking algorithm’s software/hardware cooperation.

Lv Lh TtempPixelV------------------⎝ ⎠

⎛ ⎞tan×=

Lvcenter Hv2

------- Lv–=

Lhcenter Lh2 Lv2– Ytemp PixelH

2------------------–⎝ ⎠

⎛ ⎞tan×=

Lcenter Lvcenter2 Lhcenter2+=

Number 10.9 Lcenter5

--------------------–=

Number 10 Lcenter 5–8

-----------------------------–=

135

Page 145: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 12. Target Tracking Algorithm’s Software/Hardware Cooperation

Wireless Communication ModuleTo account for different indoor/outdoor contest environments, enhance system agility, and optimize the system, we added a wireless communication module to save a redundant transmission line. This module uses a software-defined radio (SDR) for the data communication system, including transmitting, transferring, and receiving data.

The transmission part uses direct digital synthesis (DDS) to allow direct transmission in the intermediate frequency (IF) using a high speed digital-to-analog converter (DAC). For the actual shooting distance, the IF can be transferred 10 to 15 meters away. It does not require a radio frequency (RF), which eliminates the need for an upconverter that is used in a traditional SDR system. The data transfer part uses frequency shift keying (FSK) and packaging and cyclic redundancy code (CRC) check technology to minimize the communication bit-error-rate between the host and the guest and to send/receive the target coordinates between the host and the guest. Due to analog-to-digital converter (ADC) limitations, the receive section uses a simulated, doubled frequency conversion and digital phase locked loop (DPLL) to perform FSK demodulation and the connect to the transmit function. Figure 13 shows the overall module structure.

Figure 13. Overall Module Structure

Transmit ModulationFigure 14 shows the transmit modulation schematic diagram.

CameraImage Collection

ImageProcessing

Match to the Target Output

Interrupted

0.5 SecondSoftware Timing

Tracking FinishesStop the Machine

Are there 2 Interruptions?No Match to the Target

Machine RotatesMatch ID is 1

Y

N

TargetCoordinate

DigitalCoding

CRC

Image Processing Front End

SendModulation

ModuleWirelessChannel

OutputDigital

Decoding

Validation

Human-Machine Interaction Back End

ReceiveDemodulation

Module+

136

Page 146: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

Figure 14. Transmit Modulation Schematic Diagram

This module is designed in simplex mode, so the host cannot receive feedback from the slave. As a result, the information cannot be transmitted stably and the frequency band utilization is not very demanding. In commonly used binary digital modulation, the amplitude shift keying (ASK) frequency band has high utilization and low reliability. In contrast, the FSK frequency band has low utilization, strong anti-interference ability, high reliability, and is easily implemented using DDS. To ensure reliable data transmission, the signal energy can be concentrated on the analog channel. If the frequency band utilization has low requirements, the system can use FSK modulation to transmit the data.

An ordinary IF carrier can support 10- to 15-meter transmission distances, which eliminates digital frequency transformation and simplifies the module design. The ADCs and DACs currently available in the market usually work in IF, so the transmit and receive modules can be connected easily. Custom logic or the Nios II processor can control the interface between the DAC/ADC and the baseband processing unit, making the module compact.

The carrier’s frequency stability is up to 10-5 for wireless transmission. To ensure that the DDS synthesized waveform is not obviously deformed, we interpolate16 points into a signal cycle. The synthesized frequency is not required to be more than 10M. The FPGA and DAC interface data transmission rate is 10M x 16, 160 Mwords/second.

The transmission rate is hard to implement on the DE2 board, and it may cause DAC data errors due to incomplete high-speed signals. We used a MAX5858A device to remove the conflict between waveform deformation and transmission rate. It is a two-route, 10-bit 300 millions of samples per second (MSPS) DAC with a 4x/2x/1x interpolation low-pass filtering circuit. If we use 4x interpolation at its maximum output velocity, the data transmission rate is (300 Mwords/second)/4 = 75 Mwords/second.

The DDS outputs a sine wave with 75M interpolation points per second. The wave finishes the 4-step interpolation and digital low-pass filtering in the DAC. The interpolation and filtering are converted into actual voltage for output. This flow eliminates distortion of the high-frequency band output sine wave signal, lowers the data interface transmission rate, simplifies the post-DAC filtering circuit design, and improves the module’s stability. According to the MAX5858A device data sheet, when the device is in the 4-step interpolation and digital low-pass filtering state, the maximum system clock is 75M and the maximum output frequency is 31M. To achieve the system transmission distance and implement a connection with the receiving demodulation module, we selected 30.7M as the system carrier frequency with 500 Hz and 3 kHz wave frequencies.

In the FSK modulating transmission module, all digital carriers and frequency modulating basic waveforms are implemented using the DDS technology. Figure 15 shows the DDS schematic diagram.

Bullet HoleCoordinates CRC Error

CorrectionCoding

PackagingFrame Head

Parallel-SerialConversion

FSKModulation DDS DAC

DigitalIntermediateFrequency

Nios IICPU

AlteraCyclone II

FPGA

137

Page 147: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 15. DDS Schematic Diagram

The phase accumulator is the core of the DDS. Controlled by the reference clock source, the phase accumulator:

■ Linearly accumulates the frequency control words.

■ Takes the phase codes obtained as the address to search the waveform storage’s address.

■ Obtains the discrete amplitude coding.

■ Outputs the analog staircase voltage through the DAC.

■ Smooths the output waveform using the low-pass filter.

These actions result in a frequency waveform with consecutive phases.

Based on the DDS module, digital FSK modulation operates by overlapping the discrete signals generated by the frequency modulating baseband DDS on the fixed-frequency control words. Figure 16 shows the FSK digital modulation schematic diagram.

Figure 16. FSK Schematic Diagram

With multiple clocks in the modulation module, coordinating the clocks in the Verilog HDL code is very important. The clock domains must be coordinated so that the when the code is operating in the FPGA there are no logic element (LE) timing errors. We follow this principle throughout the system design: the signal traffic in any logic module depends only on one clock, which avoids multiple clock timing.

Receive DemodulationIn a traditional SDR receiver, the antenna is directly followed by a high-speed ADC, which is then followed by a channel processing module and a decoder module. The channel processing module

FrquencyControl Word

PhaseAccumulator

PhaseRegister Adder RAM

PhaseControl Word

Clock Source

FPGA

DACLow-PassFiltering

Clock SourceDIV

ClockSource

ClockSource PLL

BasebandSequence

FrequencyModulating Baseband

DDS

Frequency OffsetModulation

RAMPhaseAccumulator

Carrier FrequencyControl Word

DigitalIntermediateFrequency

138

Page 148: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

addresses display data channel (DDC) choice, filtering, and balance, and the serial cascaded integrator-comb (CIC) filter and finite impulse response (FIR) filter. The decoder module performs the decoding and checking using a predetermined protocol. Limited by the high-speed ADC, this module abandons purely digital demodulation and simulates the second frequency transformation and DPLL to perform FSK decoding. Figure 17 shows the data receiving schematic diagram.

Figure 17. Data Receiving Schematic Diagram

The second frequency transformation converts the modulation signal frequency received from the antenna into the first mid-frequency signal (such as 10.7 MHz). Then, it converts the mid-frequency signal into 455 kHz, that is, the mid-frequency signal goes through the second frequency transformation. Our design uses MC13135 tuning to receive the decoding circuit. Figure 18 shows the composition and realization.

Figure 18. Tuning Receiving and Demodulating Circuit

The radio sends the received signal to the MC13135 device and mixes first local oscillator (LO) frequency with a 20-MHz crystal oscillator to obtain the 10.7-MHz first mid-frequency signal. By frequency mixing the signal and the 10.245-MHz second LO signal, it obtainst the 455-kHz second mid-frequency signal. The MC13135 device processes this signal to generate the modulated signal.

The FSK decoding circuit uses a universal 74HC4046 CMOS DPLL. The PLL functions as a narrow-band filter in which the central frequency tracks the input signal’s frequency fluctuation. With the PLL

SimulatedReceivingFront End

FSKDemodulation

Frame HeadDetection

Serial-ParallelConversion

CRCCheck

Nios IICPU

AlteraCyclone II

FPGA

139

Page 149: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

tracking function, the carrier and the phase have the same frequency extraction with little phase difference. The narrow-band filtering feature improves the synchronous system’s noise performance, which provides low-threshold frequency authentication. In this system, the frequency of two FSK carriers, fmin, are 500 Hz and 3 kHz and the central frequency f0 = 2 kHz. The R2 and C1 values are determined by the component’s fmin - R2/Cl curve. The R2/R1 value is determined by the (fmax/fmin) - R2/R1 curve, from which we can obtain the resistance of R1.

Figure 19 shows the FSK demodulating circuit including the 74HC4046 device. The LM311 fore comparer converts the analog input frequency transformation signal into TTL levels for the 74HC4046 input. The back comparer uses a NE5532 device (implementing a 2-step, low-pass filter) to remove the high-frequency components in the demodulation output signals. Finally, we use a 74LS04 device to reshape the signal and output 0- to 5-V digital signals.

Figure 19. FSK Modem Circuit

The FSK demodulation output enters the FPGA and detects the frame header (synchronization code 8'h55) using a synchronization state machine. If a data frame is received, the system immediately enters a data receiving state, converts between serial and parallel, outputs the parallel data and CRC check, and sends the 10-bit parallel coordinate information to the Nios II processor. Figure 20 shows the hardware state diagram.

140

Page 150: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

Figure 20. Data Receiving Hardware State Diagram

Figure 21 shows the frame header detection and serial/parallel conversion module simulation using the ModelSim SE software version 6.0.

Figure 21. Frame Header Detection and Serial/Parallel Conversion Module Simulation

Encoding and Error CorrectingBecause this system uses a simplex channel, the host cannot receive feedback from the slave. Therefore, the host cannot ensure that the information is transmitted stably. To ensure correct data transmission, which is crucial to ensure fairness in the competition, the design must address code errors and missing code problems.

To solve the problem of code errors, the data transmission protocol references the user datagram protocol (UDP). Data is divided into small frames, with each frame delivered independently and added with the CRC. The slave performs a CRC check and error correction when it detects data, which ensures that each frame of received data is correct. For missing code, the system uses a three-time check method. If the slave receives the same data twice in a given time interval, the data is considered correct.

Because the system only needs to deliver the bullet hole coordinates that are processed by the host hardware, the data is minimal and does not require a high data transmission rate. Thus, the previously described scheme can work and our tests shows that is 100% correct, which is very important in an actual competition. For the frame format, each data frame includes a synchronization head, X/Y coordinate information, data load, and a check word, or 34 bits total. At a 100-Hz frequency, the system achieves a 100-bit per second (bps) data transmission rate. Figure 22 shows the data frame structure.

Figure 22. Data Frame Structure

0 0

1

1

1 1

1

0

0

0

1

1

0

0 1

0

0

01

010

1

2

3

4 5

6

7

8

Serial-Parallel Conversiontarget_p <= {target_p[22:0], serial_i}

counter <= counter + 1'b1;

Convert to 24 Bits?if counter == 23

CRC Check

01010101

0101010

010101

01010 0101

Y

N

FrameHeader

FrameSynchronization

X/Y CoordinateInformation

Data CRC Check Codes

01010101 1 Bit 9 Bit 16 Bit

141

Page 151: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Taking advantage of the FPGA hardware, the system uses the wide bit CRC-16 (x16 + x12 + x2 + 1) to check and correct errors. Despite the wide bits, the hardware’s parallel nature ensures that the transmission is accurate as the checking capability increases proportionally.

CRCCRCs are important linear grouping codes with simple encoding/decoding methods and strong error detection/correction capability. They are widely used for error control in the communications industry. To use a CRC to detect errors, we generate an r-bit supervise code (CRC code) on the delivery side that checks rules according to the k bit binary code sequence to be delivered. We attach the code at the end of the original information to form a new binary code sequence k + r, and then send the combined information. On the receiving end, we check the information and CRC code according to the defined rules, which lets us determine whether there are any delivery errors.

In the hardware circuit, the division circuit performs polynomial division using a shift register and module 2 adder (exclusive OR unit). For example, the CRC-ITU is composed of a 16-level shit register and 3 adders. See Figure 23 (used for encoding and decoding). Before encoding and decoding, all registers are set to 1 and the information bit moves in with the clock. When all information bits are input, the registers output the CRC results.

Figure 23. CRC-ITU Hardware

The serial encoding uses more clock cycles and is concerned with the input bit width, which affects the software’s flexibility. So it uses parallel coding in this system. Figure 24 shows the CRC parallel coding hardware flow chart.

15 14 13 12 11 5 3 2 1 010 9 8 7 6 4

Bit Input

142

Page 152: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

Figure 24. CRC Coding Process

Figure 25 shows the CRC coding module function’s simulation waveform in the ModelSim SE software version 6.0.

Figure 25. CRC Coding Simulation

CRC CheckThe received CRC is divided by G(x). If there are no delivery errors, the result is 0. If there are errors, the result is not 0. Figure 26 shows the hardware implementation block diagram.

crc_reg Initializationcrc_reg <= {idata, 9'h0}

Is crc_reg's FirstBit a 1?

N

Y

crc_reg <= crc_reg^Polynomial

crc_reg Moves Leftfor 1 Bit, Read a Zero

Have Seven ZerosBeen Read?

N

Y

The First 16 Bits of crc_regis the CRC to Be Calculated

Parallel-SerialConversion Output

143

Page 153: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 26. CRC Check Block Diagram

Software and Hardware InterfaceFigure 27 shows the decoder receiving module.

Figure 27. Decoding Receiving Module

Figure 28 shows the Nios II processor’s software control module and connection diagram.

Figure 28. Radio Receiving Module and CPU Connection Functions

To implement the procedure, we set a switch interrupt. When SW17 is 1, it generates a switch interrupt and sends the interrupt service program’s start_nowire_receive_sem signal to notify μC/OS-II OS

Y

N

N

Y

Y

N

Initialization of crc_bufcrc_buf<= crc_i

Isthe First Bit of crc_buf

[23:7] 1?

crc_buf <= crc_buf^ Polynomial

crc_buf Moves Left for 1 Bit

Read tothe End of crc_buf?

Iscrc_buf 0?

Do Not Process Correct Data

Nios II

AlteraCyclone II

FPGA

wireless_reset_pio

wireless_endover_pio

wireless_data_pio

Generate a High-Level PulseReset Receive Module

Generate Interrupt AfterCompleting the Data,Notifying the CPU that

Data is Being Read

8-Bit Parallel Data Interface

144

Page 154: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

to initiate nowire_receive_task. Meanwhile, it stops the RS-485 serial interrupt and starts the radio receiving interrupt.

We set a radio receiving interrupt on nowire_endover_pio. When the receive module finishes processing the data, the nowire_endover signal goes high. We check nowire_endover_pio on the rising edge, enter the interrupt service program nowire_receive_isr, read the 8-bit valid data, send the get_raw_data_sem signal, and initiate process_data_task. Figure 29 shows the prodecure details.

Figure 29. Radio Receiving Module Software Flow Chart

We use a dual Nios II processor scheme on the delivery and receiving ends. Two Nios II processors control data collection, communication, and display the collection result. A one-way data transmission system links between the delivery and receiving end, the FSK modulation, and PLL demodulation. To improve the reliability, the system uses the CRCs for package communication.

Transformation ModuleTo meet the gunshot distance demands while ensuring correct transmission of the front-end data, we use a RS-485 bus transformation scheme.

Start

Wait for Switch Interrupt

Switch Interrupt?N

Y

Send start_wireless_receive_sem

Initiate wireless_receive_task &Close Serial Interrupt. Start

wireless_endover_pio Interrupt

Wait for wireless_endover_pioInterrupt

wireless_endover_pioInterrupt?

N

Y

Receive Data & Send get_raw_data_sem

Start process_data_task

End

145

Page 155: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

The RS-232, RS-422, and RS-485 serial data interface standards are produced and released by Electronic Industries Association (EIA). RS-422 is evolved from RS-232. To overcome the RS-232 interface standard’s short communication distance and low velocity shortcomings, RS-422 defines a parallel communication interface. It increases the transmission velocity to 10 Mbps and the transmission distance to 4,000 inches (the velocity is lower than 100 Kbps), and it permits up to 10 receivers to connect on a parallel bus. To expand the application scope, the EIA created the RS-485 standard, which is based on RS-422, in 1983. This standard includes multi-point, two-way communication, i.e., a multi-transmitter is connected to the same bus, adding transmitter drive capability and collision protection, and expanding the bus’s module sharing.

With the RS-485 standard’s long transmission distance, the DE2 board’s image processing front end sends the data out using the standard RS-232 interface. Then, the RS-232/RS-485 passive interface converter converts the data into an RS-485 signal, which is transmitted via a shielded twisted cable. After receiving the signal, the human-machine interaction back end delivers the signal to the RS-232/RS-485 passive interface converter to reconvert the signal into an RS-232 signal. Then, the signal goes to the DE2 development board.

The system uses Jabsco’s RS-232/RS-485 passive interface converter. It’s RS-232 and RS-485 interfaces have a DB9 connector. The converter is also equipped with terminal binding posts. Figure 30 shows the converter.

Figure 30. RS-232/RS-485 Passive Interface Converter

Table 1 shows the converter’s pins (the sixth pin on the RS-485 port is the +5 V backup power input).

The RS-485 interface supports two-line and four-line connection. Our system uses the two-line method, and the RS-485 port’s sixth +5 V backup power input can be impending due to short transmission distance (10 meters). Note that the MAX232 and RS-232/ RS-485 passive interface converters’ connection on the RS-232 port is cross hair. Figure 31 shows the connection method.

Figure 31. RS-485 Module Line Connection

Table 1. RS-232/RS-485 Passive Interface Converter Pin Assignments

Pin RS-232 Pin RS-485Second pin Data in (RXD) First pin 485+

Third pin Data out (TXD) Second pin 485-

Fifth pin Signal ground (GND) Fifth pin Ground

RS-232 RS-485

InterfaceConverter

485- 485+

DE2 DE2RXD

TXD

GND

Image ProcessingFront End

MAX232 RS-232/RS-485Passive Interface

Converter

Human-MachineInteraction Back End

MAX232RS-232/RS-485Passive Interface

Converter

RXD

TXD

GND

TXD

RXD

GND

TXD

RXD

GND

485+

485-

GND

RXD

TXD

GPIO[0]

GPIO[1]

146

Page 156: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

The data we delivered is the bullet hole’s X and Y coordinates and circular number in character form, for instance, X = 240. We send three characters: 2, 4, and 0. To prevent missing data, we opened a serial interrupt in the μC/OS-II OS and defined a set of simple protocols. Table 2 shows the serial data format.

We save the received data into the rxbuf[20] array and use the integer index to record the received characters. After receiving the data, we process it to obtain the X, Y coordinates, the circular number, the global variable result_number (which is the circular number result sequence that is used for displaying and printing), and address some abnormal conditions. Figure 32 shows the data receiving and processing procedure.

Figure 32. RS-485 Data Receiving and Processing Procedure

Table 2. Serial Data Format

Starting character “s” X coordinate Y coordinate Circular number Ending character “t”

Start

Open Serial Interrupt,Wait to Receive Data

Was an S Received?

Receive Data & Save torxbuf[20]. Increment

Index

Was a T Received?

Parity

N

Y

N

Y

N

Y

Initiate process_data_task &Add 1 to result_number

Is the IndexLarger than 10?

Send Error Informationerr, Inquire about Resending

Circular Number is Larger than 10,Get X,Y & Circular Number

Circular Number is Smaller than 10,Get X, Y & Circular Number

Send Information Trafficshow_result_sem

End

Y

N

Is ParityCorrect?

147

Page 157: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Display ModuleThe display is an important tool for transferring information to the user. The system uses the NS-TFT6448 LCD control board module from Wuhan Bokong technology Co., Ltd for the display.

The NS-TFT6448 LCD control module is designed for the TFT true color screen with a resolution of 640 x 480 and can implement 256 colors and a double-page display. It also provides a high-speed, 8-bit bus interface (I/O command mode) that connects with the CPU directly to enter the X, Y coordinates without computing the address. The module integrates the double-page display memory to provide pixel and line writing modes. It adds a 1 automatically during the write operation and does not need initialization.

LCD System StructureFigure 33 shows the LCD system structure diagram.

Figure 33. LCD System Structure Diagram

Coordinates and Pixel MappingThe column coordinate (X) value range is 0 to 639. The row coordinate (Y) value range is 0 to 479. The pixel format is R3G3B2.

Register DescriptionThere are 4 registers, including the column address register, status control register, and display data register. See Table 3.

The row and column address has to be specified before line reading/writing data process. The column address register (X) is shown below. The X coordinate’s valid value range is 0 to 639. It’s highest order X[9:8] is in the control register.

Table 3. LCD Module Register Distribution

CS A1A0 WR Functions0 00 0 Column address register

0 01 0 Row address register

0 10 0 Control register

0 11 0 Display data register

0 ×× 1 —

1 ×× × —

D7 D6 D5 D4 D3 D2 D1 D0

AlteraCyclone II

FPGA

Nios II

Data

Control

LCD ControllerTFT6448

True-ColorLCD

640 x 480

148

Page 158: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

The row address register (Y) is shown below. The Y coordinate’s valid value range is 0 to 479, its highest order Y[8] is in the control register.

The control register is shown below. P_disp chooses the display page (page 0/page 1). P_rw chooses the read/write page (page 0/page 1). P_disp and P_rw are set at will and choose the page freely. X[9:8] is the highest order of the column address. X[9:8] is the highest order of the row address.

Display the Data Read/Write ModeThe NS-TFT6448 modules provids two data display read/write modes, row and byte:

■ In row mode, it first specifies the row and column address Y, X. Then it continuously reads/writes the data beginning with the row’s X address. It does not need to reset X and Y. After each display data read/write operation, it adds a 1 automatically to the X column address. When it reads/writes a new row, it resets X and Y.

■ In byte mode, the module sets X and Y before each display data read/write. It adds a 1 to X automatically.

In the Nios II SOPC Builder system, we added LCD_ADDR_PIO, CD_CTRL_PIO and LCD_DATA_PIO, which are 2, 3, and 8 digits, respectively, to control the NS-TFT6448 LCD control board module. Figure 34 shows the system structure.

Figure 34. NS-TFT6448 LCD Control Module System

We used row mode to display the read/write data. Figures 35 and 36 show the process flow.

D7 D6 D5 D4 D3 D2 D1 D0

P_disp P_rw x x X[9] X[8] x Y[8]

Nios II

AlteraCyclone II

FPGA

LCD_ADDR_PIO

LCD_CTRL_PIO

LCD_DATA_PIO

A0, A1

/CS, /RD & /WR

D0 to D7

149

Page 159: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 35. Write Display Data Process Flow

Figure 36. LCD Refresh Process Flow

To display the target on the LCD, we drew a 480 x 480 target in Photoshop and saved it as a black-and-white bitmap with only one bit of data (0 or 1) for each pixel. Then, we translated it into a 480 x 480 0 and 1 2-dimensional (2D) integer array after processing. To save storage area and ease processing, we further processed the image by combining the 0 and 1 values of each 8, 16, or 32 pixels and saved them

Start

/WR = 0, Write Enabled

A1 A0 = 11, Choose the Display Data Register

Write Display Data

/WR = 1, Write Disabled

End

Start

Choose the Read/Write Page

Set X & Y = 0 Initially

Write Display Data, Add a 1to the Column Coordinate X

Y

N

X < 640?

N

Y

Y = Y + 1, Reset the InitialX & Y Coordinates

Y < 480?

Finish Page Refresh

End

150

Page 160: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

as integer data. We compiled the format conversion application, Project1.exe, using C++. The processing procedure is described below:

1. Choose the saved data type after conversion: char (8 bits), short (16 bits), and int (32 bits).

2. Initialize data = 0.

3. Scan the pixel value of the bitmap point by point. If it is black, set the corresponding data position as 1 and move the data to the right 1 bit.

4. Determine whether the data has reached the specified bit width. If it has, store the data and continue converting the next row.

As a result of the conversion, we obtain a 2D array saved as .h or .txt, which we can use after including it in the Nios II project. See Figure 37.

Figure 37. Format Conversion Processing

Because the LCD does not support a word library, we used a similar process to display words. We drew 2 sets of 64 x 64 and 32 x 32 pixel digits in Photoshop. Then we ran and stopped a 100 x 160 pixel picture, converted to a 2D array using the format conversion program, saved as a .h file, and included it in the Nios II project.

We read the 2D array when displaying the target and obtain the pixel value using a shift operation. If the value is a 1, we write the display data in yellow; if it is 0, we write is in red or black. With this process, we can write the data to the LCD and display the score, target setting, and other functions. See Figure 38.

151

Page 161: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 38. LCD Row Scanning Program Process

Printer ModulePrinting provides important evidence to judge the final score and is an economical and effective way to store information during shooting contests. Therefore, the micro-printer module is essential. Our system uses the TPUP-T16 series micro-dot matrix printer to record the athletes’ shooting score. This imformation is used to compute their total performance.

The printer uses the RS-232 communication mode (DB-25 connector) with an optional baud rate of 150, 300, 600, 1,200, 2,400, 4,800, 9,600, and 19,200 bps (K1 through K3). It uses asynchronous transmission mode, handshaking uses marking control or X-ON/X-OFF protocol (K4) with a parity check (K5 - K6). The uses can select the mode using the machine’s dipswitch (see the printer instructions for detailed operations). All signals use EIA logic levels.

We connected to the DE2 board’s standard RS-232 port using a cross adapter from a DB-25 connector to a DB-9 connector through the serial port. Handshaking uses the marking control mode, the baud rate is 9,600 bps with no parity check, and the data digit is 8. The data uses asynchronous transmission mode

N

Y

Y

N

Start

Set the Display PositionX & Y

Shift Codeshift_mask = 0x80000000

Read 2-Dimensional Array. Comparewith shift_mask to Obtain the Pixel

Value of Each Point

Pixel Value = 1?

Show the Under Color, shift_mask Moves 1 Bit

to the Left

Show Black, shift_mask Moves 1 Bit

to the Left

Has a Row Been Displayed?

Reset X & YDisplay the Next Row

End

152

Page 162: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

(see Table 4). Table 5 shows the marking handshaking mode (where mark is a logic 1 and space is a logic 0).

The printer supports the ESC and TPUP-T16 command codes. Our system uses the TPUP-T16 command code. Table 6 shows the general command codes.

Figure 39 shows the printer control program flow.

Table 4. Asynchronous Transmission Mode

Start Bit 0 Data Bit Parity Bit Stop Bit1 bit 7/8 1 1

Table 5. Marking Handshake Mode

Handshaking Mode Data Direction RS-232C Interface SignalMarking Control Data input is allowed Signal cable 5 and 8 in space status

Data input is not allowed Signal cable 5 and 8 in mark status

Table 6. General Command Codes

Command Code Format DescriptionHex Decimal Hex

00 0 00 n Choose the character set 1 or 2: n = 01, 02.

0A 10 0A Press the Enter key to start a new row.

0D 13 0D Press the Enter key to start a new row, the command ends.

0E 14 0E n Reprint the code before 0E for n times.

153

Page 163: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 39. Printer Control Program Flow

Paper Feeding Mechanism and Motor Control ModuleIf two bullet holes exist on a target plane, there is a possibility of target overlap that cannot be solved using image identification. We designed a paper tape feeding mechanism to roll the paper tape automatically to a position that has no bullet hole after a hole has been identified, avoiding the problem that image processing cannot identify overlapping targets. We use the printer’s paper feeding mechanism to move the paper tape, which provides excellent performance.

Our system uses a DC servo motor to drive the paper feeding mechanism because the motor’s forward/backward turning control is easy and has an auto-lock function. Generally, the DC servo motor requires a high driving voltage and current. The L298 motor drive chip’s highest working voltage is 46 V and it’s full swing output current is 4 A. It’s on/off speed is very fast and it has a powerful drive capability. Additionally, it has a low saturation voltage, overheating protection, and a strong anti-jamming capability (a logic 0 can reach 1.5 V). With it’s excellent performance, simple peripheral circuit, and a convenient motor interface, the system uses a symmetrical H-bridge motor-driving circuit with a split apparatus and uses the L298 chip to drive paper feeding mechanism directly. Figure 40 shows the circuit.

Serial PortOperation

Set Baud Rate with K1 - K3

Set the Parity Check with K5 & K6

Signal Cable 5 & 8 in Space Status?N

Y

N

Y

Send Data

Is Data Finished Sending?

End

154

Page 164: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

Figure 40. L298 Motor Driving Circuit

System Clock Service ModuleTo implement the shooting timing, the system uses the system clock service on the human-machine interaction back-end to generate an electronic clock service. Typically, when creating an electronic clock service with a single-chip microcomputer, the system first initializes the timer, selects a 1-second timer (or some other timer), counts using a counter, and uses 1 second interrupts. However, with a Nios II processor, we do not need to run this program on the hardware application layer (HAL) because the processor takes care of the initialization process and related hardware details. Instead, we start the system clock service, which generates an alarm event every other second and adds a 1 to the number of seconds in the callback function. Figure 41 shows the system clock service procedure.

Figure 41. System Clock Service Flow

Start

Call alt_alarm_start to Startthe System Clock Service

Add a 1 to the Number of Seconds in the Call-back Function my_alarm_callback & Send Signal Volume add_second_sem

Call alt_alarm_stop to Stop the System Clock Service

End

Call process_timer_task to Process theCarry of Seconds, Minutes & Hours

155

Page 165: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

The HAL provides the following steps for using the system clock service:

1. Call alt_alarm_start() to start the system clock service.

2. Compile the callback function according to the format requirements (the function implements the user-defined function).

3. Call alt_alarm_stop() to stop the system clock service.

To build the time interaction setting, our system uses the DE2 development board’s four buttons. This function mainly implements the system operation status and time setting/control (e.g., setting the date and time), and controls the Nios II back end start, stop, and reset. We had to figure out how to use only 4 buttons for all of the functions: we used different statuses to implement different functions.

In the Nios II CPU, we distribute the RESET pinout to the development board’s KEY0. KEY0 implements the Nios II system back-end reset function. KEY1, KEY2, and KEY3 implement the date and time setting and some other control functions. Table 7 shows the button functions.

When implementing the program, we determined the status first and then the key value. We implemented various functions according to different key values. See Figure 42.

Table 7. Button Functions

Pin Function DistributionKEY0 Reset of the whole Nios II system back end.

KEY1 Run control status. Setting status: add 1.

KEY2 Stop control status. Setting status: subtract 1.

KEY3 Change status button.

156

Page 166: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

Figure 42. Interaction Setting Flow

Speaker ModuleThe speaker module makes the system more personalized and automated. We save the speaker data in the SD memory card and transfer it to the 24-bit audio codec using the SD card read operation for audio playing.

The multifunctional SD memory card provides large capacity, high performance, and excellent security. SD cards were developed by Panasonic, Toshiba, and Sandisk based on the multimedia card (MMC) card, which have been used in digital cameras, personal digital assistants (PDAs), mobile phones, and other portable devices.

SD Memory Card Bus ProtocolThe SD memory card protocol is question-answer. The host sends a command (CMD) and the card sends a response (RES). If there is data to be transferred, it is sent on the DATA line. The SD card has 34 commands, including copyright protection commands (26 basic and 8 special commands). The CMD command format is predefined (see Figure 43).

Start

Initialize the Button, Register Button Interrupt & Start the Interrupt

Interrupt the Service Program to Obtain the KeyValue. Send the Signal Volume to

get_key_value_sem

Start process_key_data_task to PerformDifferent Processing According to the

Key Value

Determine Status

Set Run or Stop Status Set Year, Month & Day Status Set Hour & Minute Status

Judge key_value

Key 1 Key 2 Key 3Key 1 Key 2 Key 3Key 1 Key 2 Key 3

CloseRS-485Interrupt

StartRS-485Interrupt

Enter intoSet YearStatus

Add 1 tothe Year,Month &

Day

Subtract1 from

the Year,Month &

Day

Enterinto the

NextStatusin Turn

Add 1 toHour &Minute

Subtract 1 fromHour &Minute

Enterinto the

NextStatusin Turn

Judge key_valueJudge key_value

157

Page 167: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 43. SD Card Command Format

The SD card responds to four formats, R1, R2, R3, and R6. All commands have a specified response except CMD 0. Figure 44 shows the R1 response format; other response formats are similar to R1—the only difference is the length and information carried.

Figure 44. R1 Response Format

Refer to SD memory card specifications for more information on the commands, responses, and information carried.

SD Memory Card Register DescriptionThe SD card configuration process is the read/write process of the SD card register. The card identification (CID), card-specific data (CSD), and operation conditions register (OCR) are the key registers. Table 8 describes the register function.

SD Memory Card InitializationSD card initialization generally includes following steps:

1. Check whether the card is inserted.

2. Rest the control module and card.

3. Check the card’s type.

4. Validate the card’s voltage.

5. Get the card’s CID.

Bit Position 17 16 [15:10] [39:8] [7:1] 0

Width (Bits)Value

1‘0’

1‘0’

6x

32x

7x

1‘1’

Description Start Bit Transmission Bit Command Index Card Status CRC7 End Bit

Table 8. SD Card Register Description

Register FunctionCID Include the card manufacturer and version information.

CSD Include card capacity information, block size, and whether write protection is enabled.

OCR Include the card operation voltage information. Can set the card’s working voltage by reading/writing the register.

Transmitter Bit:'1' = Host Command

Start Bit:Always '0'

0 1 Content CRC 1

Command Content: Command & Address Informationor Parameter, Protected by 7-Bit CRC Checksum

End Bit:Always '1'

Total length = 48 Bits

158

Page 168: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

6. Distribute the relative card address (RCA).

7. Set the data’s read/write block size.

SD Memory Card Read/Write OperationSD cards are usually read/written using the data block mode (BLOCK). Each read/write is an integer multiple of BLOCK. The CMD17/CMD18 and CMD24/CMD25 commands read one or more or write one or more data blocks, respectively. The data has a CRC at the end. If validation fails, the transferred data is discarded and the data read/write operation is also suspended.

Audio Encoding/DecodingThe DE2 board has the WM8731 audio encoding/decoding device, which supports 24-bit multi-bit sigma triangle analog-to-digital (A/D) and digital-to-analog (D/A) conversion, and an A/D converter (ADC) and D/A converter (DAC), all of which support inserting a digital value, 16 to 32 bits, a sampling rate of 8 to 96 kHz, stereo audio output with data cache, and digital volume adjustment. The CPU controls the chip via the I2C bus.

I2C Bus OverviewThe I2C serial bus is composed of an SDA data cable and SCL clock, which can send and receive data between the CPU and components it controlls. The maximum transfer speed between components is100 kbps. All controlled components, each of which has a unique address, are connected to the bus in parallel. During information transfer, the host controller’s control signal includes the address code and control volume. The address code selects the address, i.e., it selects the components that are expected to provide control. Therefore, all control components are on the same bus but independent of each other.

Speaker ImplementationWe include three SOPC Builder modules for the SD card, I2C bus, and audio modules. The SD card module provides data, clock, and commands using three programmable I/Os (PIOs): SD_DAT, SD_CLK, and SD_CMD. The I2C module connects to the CPU and the audio encoding/decoding device, and allows the CPU to control the audio encoding/decoding device. The audio module contains the following functions:

■ User-defined peripheral Audio_0—Provides data processing and time sequence synchronization. The module’s clock is generated by the 27-MHz external crystal oscillator using a phase locked loop.

■ AUD_FULL PIO—Generates a high-level signal when the audio encoding/decoding device data register is full. Every time it writes data to the WM8731 device’s data register, it first queries whether it is a low level. Because the SD card’s data reading speed is not the same as theWM8731 device’s decoding speed, we use another 512-byte buffer area to save the data read from SD card.

Figure 45 shows the audio playback flow.

159

Page 169: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 45. Audio Playback Program Flow

μC/OS-IIThe embedded system is important technology that is widely used in communications. It has greatly enhanced people’s quality of life and affected their lifestyle. An embedded OS is system software that runs on an embedded hardware platform and provides unified coordination, operation, and control over the system and its components. Because of it’s hardware features, diversified application environment, and unique development approach, an embedded OS is very different from an ordinary OS. Embedded OS’s have the following features:

■ Small size

■ Can be clipped

■ Real-time operation

■ High reliability

■ Portability

Start

initialize SD Card

Read Data with Data Block (512 Bytes) tothe Buffer Area

Is AUD_FULL at a Low Level?

N

Y

N

Y

Write 16-Bit Data to Audio_0

Is the Data BlockFinished Writing?

End

Read Data of the Next DataBlock to the Buffer Area

160

Page 170: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

Featuring small size, high efficiency, real-time personal customization, ROM storage, etc, the embedded system is widely used in a variety of fields. Generally, the embedded OS provides the following three services to help application designers:

■ Memory management—Assigns and releases the memory for the application so that the memory can be used repeatedly.

■ Multi-tasking management—Provides a good task schedule mechanism, controlling the start, operation, suspension, ending, etc. tasks.

■ Peripheral management—Schedules and manages peripherals such as the keyboard, display, communication port, peripheral controller, etc.

Comparing μC/OS-II and μCLinuxμC/OS-II and μCLinux are both excellent embedded operating systems. They have good performance and open source code, however, one is real-time and the other is not. μC/OS-II is a real-time OS that is applicable to small control systems. It has high efficiency, a delicate structure, real-time operation, good scalability, etc. μCLinux is not real-time and it has the advantages of Linux. It is specifically designed for embedded processors: it has built-in network protocols and supports multiple file systems, and includes Linux advantages such as stability, powerful network capability, and an excellent file system.

Table 9 compares μC/OS-II and μCLinux in terms of real-time operation, task schedules, file system support, and system portability.

Our system has a strong real-time requirement and only needs a small control system, therefore, we used μC/OS-II, which is sufficient to perform the required tasks.

μC/OS-II IntroductionμC/OS-II is a priority-based, hard, preemptive real-time kernel, and has gained acceptance worldwide since its launch in 1992. The kernel is designed for embedded devices and has been ported to over 40 CPUs with different structures and 8- to 64-bit systems. The system has authenticated by the American FAA since version 2.51, and can run on very demanding systems such as aircrafts. μC/OS-II source code is available for free, which is, undoubtedly, the most economical choice for an embedded RTOS. Figure 46 shows the μC/OS-II system structure.

Table 9. Comparing μC/OS-II and μCLinux

Feature μC/OS-II μCLinuxReal-time operation Supports real-time operation. Does not support real-time operation.

Task schedules Has preemptive schedules. When a higher priority task occurs in the ready state, the schedule immediately suspends the current task’s operation (places it in the ready state) and assigns the CPU to the higher priority task.

Has non-preemptive schedules using a time slice service. The system initiates the tasks at some interval while generating quick and periodic interrupts to determine the function schedule (when the program can get its time slice).

File system support Has no support. Uses the romfs file management system.

System portability Very simple. Altera has ported μC/OS-II to the Nios II platform.

Comparatively complex and divided into structure level, platform level, and board-level portability.

161

Page 171: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 46. μC/OS-II System Structure

μC/OS-II TasksThe μC/OS-II kernel manages and schedules tasks. It can manage up to 64 tasks, including 8 system tasks and 56 user tasks. Each task is composed of a task control module, stack, and code. The task control module records the tasks’ stack pointer, current state, priority etc., connecting the task code and stack. To manage multi-tasking, μC/OS-II links all system task control modules together, forming two task chains that manage the task control module, which performs relevant operations. The task stack saves information in the CPU register or the task’s private data when the task switches or the response is interrupted. The task code implements user-defined functions, and it is an infinite cycling structure. Figure 47 shows the code for a task with an infinite cycling structure.

Figure 47. Compiled C Language Task

μC/OS-II has a preemptive task schedule, which keeps the highest priority task in a ready state and running all the time. μC/OS-II schedules the tasks according to the ready task list and the task scheduler schedules the detailed tasks. The task scheduler looks in the ready task list for the highest priority task in ready state and switches tasks. μC/OS-II has two types of task schedulers: a task-level scheduler and an interrupt-level scheduler. The task-level scheduler uses the OSSched() function and the interrupt-level scheduler uses the OSIntExt() function.

μC/OS-II InterruptsSimilar to an ordinary MCU, μC/OS-II response interrupts support interrupt nesting. The Nios II HAL interrupt is compatible with μC/OS-II, therefore, the Nios II HAL can run on μC/OS-II with a slight modification. Of particular note is that the μC/OS-II kernel is preemptive: after completing the interrupt service program, the system runs the highest priority task in the ready state, if necessary, instead of the interrupted task.

User's Application Program

Code Explaining Irrelevancebetween μC/OS-II & Processor

Code Explaining Relevance between

μC/OS-II & ApplicationProgram

OS_CFG.HINCLUDES.H

OS_CORE.COS_FLAG.COS_MBOX.COS_MEM.COS_MUTEX.C

OS_Q.COS_SEM.COS_TASK.COS_TIME.CUC/OS-II.CUS/OS-II.H

CPU Timer

Software

Hardware

μC/OS-II

Code Explaining Relevancebetween μC/OS-II & Processor

(Must Be Modified when Transplanted)

OS_CPU.HOS_CPU_A.ASMOS_CPU_C.C

162

Page 172: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

When compiling the μC/OS-II interrupt service, we use the following functions:

■ OSIntEnter() is the interrupt entering function and it usually occurs after the interrupt service program’s interrupted data but before the user’s interrupt service code operation. It adds a 1 to the OSIntNesting global variable to record the interrupt nesting layers.

■ OSIntExit() is the interrupt service out function. It subtracts a 1 from the OSIntNesting global variable and switches the ready-to-operate task if OSIntNesting is 0. If the scheduler is not locked up and the ready task list’s highest priority task is not the interrupted task, it returns to the interrupted service sub-program.

Synchronization and Communication of μC/OS-II TasksWhen performing multiple tasks, the OS must resolve two problems:

■ The shared resource can only be accessed by one task at one moment, that is, the tasks are mutually exclusive.

■ The tasks must be executed in order to perform task synchronization.

Solving these problems requires communication between the tasks. μC/OS-II achieves this communication by signal traffic, e-mail, and a message queue. Simple signal traffic communicates between the system tasks, which is what we describe here.

Signal traffic was invented in the mid 1960s. It controls access to the shared resource, presents events, and synchronizes two tasks. The μC/OS-II signal traffic operations include establishing, waiting, and delivering signal traffic, using the OSSemCreate(), OSSemPend(), and OSSemPost() functions, respectively. The initial values must be provided when the signal traffic is established. The initial value is one of three types:

■ 0—The signal traffic represents the occurrence of a event.

■ 1—The signal traffic controls access to shared resources.

■ n—The signal traffic stands for the number of resources available to be accessed.

When multiple tasks are waiting for the same signal traffic, μC/OS-II always gives it to the highest priority task.

μC/OS-II Task ArchitectureIn this system, the OS operates on the athlete side, and receives, processes, displays, and prints the matches results (7 tasks, 6 traffic signals, and 5 interrupts). Table 10 shows the task functions.

Table 10. Task Functions

Task Function Descriptioninitialize_task() Father task. Performs self-cancellation after initializing signal traffic and

establishing task functions.

process_key_data_task() Addresses differently according to different key values.

process_timer_task() Manages the date and time.

process_data_task() Processes the data received by wired or wireless cable. Gets the X, Y coordinates and circular number.

Wireless_receive_task() Initializes the wireless receiving module, opens a wireless receiving interrupt, and closes the wired receiving interrupt.

163

Page 173: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Table 11 shows the signal traffic functions.

Figure 48 shows the μC/OS-II OS flow chart.

lcd_show_task() Controls the LCD target surface, sets the target, and displays the circular number.

printer_task() Prints the match results.

speaker_task() Reports the target shooting via audio.

Table 11. Signal Traffic Functions

Signal Traffic Functionadd_second_sem Sent by the system clock service’s callback function my_alarm_callback() to

notify process_timer_task() of time processing.

get_key_value Sent by the key interrupt service sub-program button_isr(). Notifies process_key_data_task() that is should judge the key values and make processions.

printer_sem Sent by process_data_task(). Notifies printer_task() that it should print the match results when it obtains the circular number.

show_result_sem Sent by process_data_task(). Notifies lcd_show_task() that it should display the match results when it obtains the circular number.

get_raw_result_sem Sent by wired or wireless receiving interrupt. Notifies process_data_task() that it should process the X, Y coordinates and circular number upon receipt of the data.

start_wireless_sem Sent by the switch interrupt switch_pio_isr(). Notifies Wireless_receive_task() that it should open the wireless receiving interrupt.

speaker _sem Sent by process_data_task(). Notifies speaker _task() that it should broadcast the match results when it obtains the circular number.

Table 10. Task Functions

Task Function Description

164

Page 174: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

Figure 48. μC/OS-II Procedure Flow Chart

System Testing and AnalysisThis section describes the tests we performed and discusses the results.

Target Identification TestTarget identification is the key technology of the system, and it directly affects the shooting reporting accuracy. In the actual report, the circular number refers to the shortest distance between the ring’s center and the target. Table 12 shows the test results.

We can place a light in front of the target. Our practice shots show that the system can identify the target even if the light is not very bright.

Table 12. Target Identification Test

Number of Times Circular NumberVisual Inspection System Test

1st time 4.2 to 4.4 4.2

2nd time 5.6 to 5.7 5.7

3rd time 6.5 to 6.7 6.6

4th time 7.5 to 7.8 7.6

5th time 8.6 to 8.8 8.6

6th time 9.5 to 9.7 9.6

7th time 10.4 to 10.6 10.5

Start

SystemInitialization

Establish Father TaskInitiate OS

Wait Interrupt

JudgeInterrupt

TimerInterrupt

Key PressInterrupt

Wired ReceiveInterrupt

SwitchInterrupt

Send Signal Trafficadd_second_sem

Send Signal Trafficget_key_value

Send Signal Trafficge_raw_result_sem

Send Signal Trafficstart_wireless_sem

Initiate Taskprocess_timer_task

Initiate Taskprocess_key_data_task

Initiate Taskprocess_data_task

Initiate Taskwireless_receive_task

165

Page 175: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Test results—When we compared the actual measurement and the system’s measurement, the system’s target identification accuracy is up to 0.1 ring, which is sufficient to meet the demands of ordinary competitions and meets the design goal.

Target Tracking TestTo improve the system’s automation and intelligence, we introduced a closed-loop feedback control mechanism to add target tracking ability. When the athlete finishes shooting and the system identifies the target position, the paper feeding mechanism rotates the target paper downward until the target disappears. Table 13 shows the test results.

Test results—The test showed that the system can track the target until it disappears, which meets the design goal.

Wireless Transmission TestTo improve the system’s flexibility for various conditions, we added a wireless transmission module. Because correct data transmission is crucial for the competition to be fair and just, the transmission distance and error code rate are strictly required. Table 14 shows the test results.

Test results—The test showed that the error code rate is next to zero, which satisfies the system demands and meets the design goal.

RS-485 TestTo ensure correct data transmission, the system supports RS-485 transmission. The data is sent from the image processing front-end via the DE2 development board’s standard RS-232 peripheral, through shielded twisted-pair cables, through the human-machine interaction module’s RS-232 peripheral (in the background), to the DE2 development board. Table 15 shows the test results.

Table 13. Target Tracking Test

Number of Times Target Position Tracking Goal Achieved?1st time About 7 rings Yes

2nd time About 8 rings Yes

3rd time About 9 rings Yes

4th time About 10 rings Yes

Table 14. Wireless Transmission Module Test

Number of Times Data Sent Data Received1st time 305 305

2nd time 535 535

3rd time 524 524

Table 15. RS-485 Bus Transmission Test

Number of Times Data Sent Data Received1st time s240240109t s240240109t

2nd time s235245950t s235245950t

3rd time s345198580t s345198580t

166

Page 176: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

Test results—The test showed that the module correctly transmits the data, which satisfies the system demands and meets the design goal.

Printer TestThe printed competition scores are the proof that rates the athletes’ performance and serves as a data back-up. To ensure that the competition is fair and just, the printed scores must be accurate with no errors. Therefore, correct data delivery, transmission, and printing is vitally important. The data is sent via the DE2 board’s standard serial port and is transmitted to a micro-printer using a serial cable. Figure 49 shows the printer and cable.

Figure 49. Printer and Cable

Table 16 shows the test results.

Test results—Comparing the actual printed data to the sent data, the printer can correctly print the competition scores, which meets the design goal.

LCD Display TestThe display is an important tool for human-machine interaction. It presents the system and competition information to the users and improves the system’s visual value. The system uses three PIOs to control the LCD, which can simultaneously display the current and past scores as well as the target position. Figure 50 shows the LCD.

Table 16. Printer Test

Number of Times Data Sent Data Printed1st time 1 1

2nd time 2 2

3rd time a a

4th time b b

167

Page 177: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 50. LCD

Table 17 shows the test results.

Test results—The test shows that the LCD module can display the current and past scores as well as the target position, which meets the design goal.

Design FeaturesOur design has the following features.

■ With the approach of the 2008 Beijing Olympic Games, we combined advanced FPGA technology and an embedded SOPC with shooting to built a cheap, applied, intelligent, highly integrated auto riflery scoring system.

■ We convert the machine vision and template matching into practice and perform bullet hole positioning and recognition.

■ The system combines software and hardware to build the design in one FPGA.

■ Implementing image processing in hardware greatly improves the response speed and is important for image processing in ASICs.

■ The system applies software radio technology, adds the wireless transmission function, and enhances system flexibility.

Table 17. LCD Display Test List

Number of Times Data Sent Data Displayed1st time X 240 X 240

Y 240 Y 240

Circular number 10.9 Circular number 10.9

2nd time X 190 X 190

Y 230 Y 230

Circular number 9.3 Circular number 9.3

3rd time X 180 X 180

Y 198 Y 198

Circular number 7.1 Circular number 7.1

168

Page 178: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

■ When recognizing the bullet hole, the system introduces machine vision: using a light source design that adopts a diffused reflection in poor light to increase the hole’s brightness and adapt the light in front of the target.

■ The system pays attention to human interaction, and uses an LCD and speaker to deliver all-in-one users information.

■ Using an auto-control theory, the system introduces closed-loop feedback to adjust the system intelligently and save the maximum resources without increasing the system cost.

■ The system fully uses the Nios II processor advantages. The resources used can perform the system operation without any other controller.

■ Each part of the system is designed modularly, providing abundant peripheral interfaces for extending the system. The system provides excellent migration and utility; besides the scoring system, the application can be applied to other areas, such as image processing and software radio technology.

ThoughtsWe were very honored to participate in the Nios II contest. Not only did we learn new technology, we learned cooperation and persistance. We enthusiastically chose the automatic scoring system as our topic because we had worked on one before. In the past, however, we used a laser diode, which cannot improve precision. After participating in the contest, we decided to use audio processing. In the beginning, we knew nothing about audio processing and searched for information on the Internet and in the IEEE archives with keywords such as FPGA and IMAGE. Fortunately, we found a lot of materials and experience from others. Our algorithm originated from median filtering with an FPGA.

Of course, we did have some difficulties, e.g., the DE2 development board SDRAM was used as the camera module buffer, so our Nios II program could not run on the SDRAM. But, the FPGA’s RAM was too small, and we could only run the program on SRAM. This problem caused us difficulty for quite a long time, because the SRAM could access data in 16-bit format, but the operation program was not stable. We eventually found that if we changed the SRAM into 8-bit data format we could ensure stable program operation.

In the process of defining the custom instruction, we realized that if we did not include a custom instruction when we first built the CPU, we needed to reconstruct the CPU when we added the instruction later. We successfully migrated μClinux to the Nios II processor. But with the network and USB functions, and because we did not have to migrate a lot of programs or use network functions for our system, μC/OS-II met all of our requirements and we used it instead.

We noticed that when we combined the FPGA with the Nios II processor, the strength of the Nios II processor could be fully exerted. We could combine all of the Verilog HDL digital modules with the Nios II processor. We could select the best way to implement the whole system.

We want to thank the teachers and students who provided great support and encouragement. We want to thank Altera for the opportunity and Terasic Technologies for their support during the contest—they provided us with excellent Verilog HDL code examples that we used to learn how to create the Verilog HDL project. For example, all of the development board pinouts are defined in the top-level module. The pins are all assigned, and the following development work does not needs to worry about whether the pins have been assigned correctly, makeing development more convenient. Many of our problems had nothing to do with the technology, but they did affect the project process.

Thanks to Mr. Huang Weifeng’s (of Huazhong University of Science and Technology) image processing speech and Mr. Chen Jia’s (an excellent analog IC designer) great support. Thanks to Mr. Jin Qiang’s (from the department of mechanical engineering of Huazhong University of Science and Technology) guidance on machine design and the Huazhong University of Science and Technology’s machine factory environment support. Thanks to the great support from director Huang and Wang, the

169

Page 179: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Hubei Sports Shooting Training Center coach. Thanks to the technical guidance from KZW (you must be familiar with the name if you have visited the Nios II forum). We also want to express our appreciation to the Yin and Xiao teachers from the Electrical and Electronic Innovation Center of Huazhong University of Science and Technology. We could not complete our work without the support of so many friends. This kind of work is not only a technological achievement, but it is also a collaboration many people’s efforts.

We expended a lot of effort in our work, but we cannot say it is perfect and we plan to make improvements in the future.

ConclusionOur design implements an automated scoring system for shooting sports, which provides an effective solution for target identification, data transfer, result display, and automated target sheet replacement. The design is based on template matching, closed-loop control, software-defined radio, and other leading concepts, and successfully combines with embedded SOPC technology, image processing, wireless communication, and automatic control technology. The system is adaptable and agile, and can be used in indoor/outdoor fixed target contests and training as well as field and military training. It also improves shooting training levels and automates the shooting contest management.

It took three months to complete the design and collaboration strongly with each other during the development. We learned the difference between the Nios II embedded processor and general embedded processors, and the distinctive features of embedded SOPC technology. We focused on our project unceasingly during the three months, which helped us to learn and improve.

AcknowledgementsThanks to the contest organizers, the Hubei Undergraduate Electronic Design Contest committee, Wuhan Institute of Technology, South Central University, and Altera, who gave us the opportunity to learn and use new technology, widen our view, and encourage our innovative spirit. We want to express our appreciation.

Additionally, we want to thank Huazhong University of Science and Technology for the opportunity it gave us to participate in the contest and their support during the contest.

Thanks to Yin Shi and Xiao Kan, the Electrical and Electronic Innovation Center teachers, and Chen Yongquan for their guidance and help. We also want to thank Chen Jia from the Department of Electronic Engineering of Huazhong University of Science and Technology for his guidance on the image processing module. We could not have made the design without their guidance and encouragement. We want to express our thanks and appreciation to all of them.

ReferencesBeeckler, John Sachs, Warren J. Gross. FPGA Particle Graphics Hardware. Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2005.

Chao, Zhang and Li Fanming. The Application of an Improved Template Matching Mode on Target Identification. Shanghai: Infrared, no. 12.

Chenglian, Peng, et al. Challenge SOC - Nios II-based SOPC Design and Practice. Beijing: Tsinghua University Press, Jul., 2004.

Fenglin, Fu. Collection of Nios II Softcore Embedded Processing. Xi’an Xidian University Press, 2005.

Fuchang, Wang, Xiong Zhaofei, and Huang Benxiong. Communication Theory. Beijing: Tsinghua University Press, 2006.

170

Page 180: Nios II Embedded Processor Design Contest - Outstanding - Altera

Automatic Scoring System

Gonalez. Digital Image Processing. Beijing: Electronic Industry Press, 2003.

Inoue, Seiki. C Language Practical Image Processing. Beijing: Science Press, 2003.

Jianbo, Zhang, Lu Chaoyang, Gao Xiquan, and Ding Yumei. A Method to Improve the Shooting Automatic Scoring System Accuracy. Xi’an: Journal of Xidian University (Natural Science Edition), Jun., 2002.

Jihua, Wu and Wang Cheng. Altera FPGA/CPLD Design (Advanced). Beijing: People’s Post and Telecommunication Press, Jul., 2005.

Labrosse, J. J., et al. Translated by Shao Beibei, etc. μC/OS-II: Source Code Opened Real Time Embedded OS. Beijing: China Electric Power Press, 2001.

Ligong, Zhou, et al. Nios II SOPC Embedded System Development Guide. Guangzhou: Guangzhou Zhouligong SCM Development Co., Ltd, 2005.

Lili, Chen, Jia Yunhe, Zhao Kongxin, and Qian Feng. The Designing and Realizing of Portable Wireless Automatic Scoring System. Beijing: Science Technology and Engineering, May, 2005.

National Undergraduate Electronic Design Contest Committee. Collection of Work of National Undergraduate Electronic Design Contest Winners (2001). Beijing: Beijing Institute of Technology Press, 2003.

Peng, Song, Jiang Jie, and Zhang Guangjun. Application of FPGA Apparatus with Embedded ARM Core on Image Detection. Nanjing: Industrial Control Computer, vol. 18, no. 5, 2005.

Shujun, Guo, Wang Yuhua, and Ge Renqiu. Embedded Processor Theory and Application - Nios II System Design and C Language Programming. Beijing: Tsinghua University Press, Jun., 2004.

Song, Pan, Huang Jiye, and Zeng Yu. Practical Course of SOPC Technology. Beijing: Tsinghua University Press, 2005.

Suwen, Zhang. High Frequency Electronic Circuits. 4th ed. Higher Education Press, 2004.

Wei, Zhang and Gao Hang. The Designing and Realizing of Image Processing Technology-based Automatic Scoring System. Nanjing: Journal of Nanjing University of Aeronautics and Astronautics, no. 12, 2000.

Xiren, Xie. Computer Network. 4th ed. Beijing: Electronic Industry Press, 2003.

Yamaoka, K., T. Morimoto, H. Adachi, T. Koide, and H. J. Mattausch. Image Segmentation and Pattern Matching Based FPGA/ASIC Implementation Architecture of Real-Time Object Tracking. IEEE, 2006.

Zhe, Ren. The Principal and Application of Embedded Real Time OS μC/OS-II. Beijing: Beijng University of Aeronautics and Astronautics Press, 2005.

Zimei, Xie. Electronic Circuit Design, Experiment & Measurement. 2nd ed. Wuhan: Huazhong University of Science and Technology Press, 2000.

Altera Nios II documentation and forums.

Appendix: CodeRefer to the PDF of this paper on the Altera web site at http://www.altera.com for code that was created for this project.

171

Page 181: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

172

Page 182: Nios II Embedded Processor Design Contest - Outstanding - Altera

Multimedia Decoder Using the Nios II Processor

Third Prize

Multimedia Decoder Using the Nios II Processor

Institution: Indian Institute of Science

Participants: Mythri Alle, Naresh K. V., Svatantra Singh

Instructor: S. K. Nandy

Design Introduction Our design target was to build a low-cost, high-performance H.264 decoder with a prototype H.264 decoder created using multiple small FPGAs. H.264 is a computationally complex, advanced video standard for achieving high compression ratios. To cater to the needs of high throughput applications such as HDTV, which requires 216,000 macroblocks/second of throughput, designers usually advocate a hardware implementation. Our proposed design is targeted at consumer electronics such as digital cameras or video players. Specifically:

■ We created a novel display mechanism that is driver independent. The display output is in luminance-bandwidth-chrominance (YUV) format, which allows us to use any player that plays YUV data for any frame size.

■ The design meets high throughput requirements, even when operating at 50 MHz. This speed is possible because we designed the application to use as few as 250 clock cycles to decode one macroblock.

■ Operating at low frequencies enabled us to use multiple small, low-cost FPGAs to implement the decoder.

■ The design is partitioned into multiple modules. The modules form different stages of the decoder pipeline.

173

Page 183: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

■ The proposed decoder implementation outputs the decoded frame in YUV format. We use a software player that can play streaming YUV data to display the video sequence.

■ Our design is intended to meet the performance requirements of the main profile of H.264. Because we do not have an implementation of B-frame prediction, we are limited to a baseline profile.

Figure 1 shows the implementation setup.

Figure 1. Implementation Setup

High-Level Implementation OverviewThe high-level implementation involved the following items:

■ The H.264 decoder is added as custom user logic to the Nios® II processor.

■ The user logic and Nios II processor communicate using the Avalon® bus.

■ The user logic is an Avalon slave.

■ The Avalon bus width is configurable and is set to 32 bits.

■ To meet real-time video transfer from the FPGA, high-speed communication is essential. Therefore, we used the Avalon burst mode for the data transfer.

174

Page 184: Nios II Embedded Processor Design Contest - Outstanding - Altera

Multimedia Decoder Using the Nios II Processor

■ For good-quality video, we used Ethernet cables to provide high-speed communication with external devices.

■ We implemented Reference Picture Management, a non-compute-intensive part of the H.264 decoder, in software.

Why Use the Nios II Processor?We used the Nios II processor for the following reasons:

■ It was easy to load μClinux into the Nios II processor, letting us use many devices.

■ It provided a high-speed communication channel using sockets to transfer real-time video.

■ It was easy to combine FPGAs using an Ethernet cable and socket programming.

■ The Avalon bus burst mode allowed fast transfer of data to custom logic.

■ It efficiently integrated the hardware and software portions of the application.

■ We could run parallel processes with μClinux using the clone system call.

■ Because the peripherals are memory mapped, it was very easy to perform input/output for the custom peripherals using simple C programs.

Function DescriptionH.264 is a video coder/decoder (CODEC) standard defined jointly by Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG). It is aimed at achieving high-resolution video at low bit rates.

In H.264, video is compressed by exploiting similarity in video frames. The current frame is predicted from previously encoded frame(s). The frame can be predicted from other parts of the same frame by exploiting spatial similarity (intra), or it can be predicted from a different frame by exploiting temporal similarity (inter). Residual data is the difference between the actual frame and the predicted frame. This residual data is further quantized and transformed to compress it efficiently. A loop deblocking filter removes the blocking artifacts. The user selects the parameters and the residual data is encoded using different entropy encoders and is sent as an encoded bitstream. All of the units are computation-intensive. The following sections describe each module and its implementation.

High-Level Decoder DesignWe implemented a macroblock-level pipeline to achieve high throughput. The encoded bitstream contains data per macroblock. Therefore, the macroblock-level pipeline provides an elegant interface between the parser and the rest of the decoder. Figure 2 shows the parser’s high-level block diagram.

The parser processes the incoming bitstream. The motion compensation and reconstruction (MCR) module takes the prediction modes and its parameters as the input from the bitstream parser.

The residual data obtained after inverse discrete cosine transform (IDCT) serves as input to the MCR module. Residual data is added to the predicted frame and is then computed by the MCR module before it passes the data to the deblocking filter.

The communication between the IDCT and MCR modules is through a FIFO buffer. When the FIFO buffer is full, the IDCT stalls; when it is empty, the MCR stalls, ensuring synchronization. The parser and IDCT modules share two buffers in a mutually exclusive manner. The buffer access is switched for every macroblock. The MCR and deblocking module also synchronize through a FIFO buffer.

175

Page 185: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 2. Parser High-Level Block Diagram

Module DescriptionsThe following sections provide a short summary of each module.

ParserThe decoding first step is to parse the encoded bitstream. In the baseline profile, H.264 uses context-adaptive variable-lenght coding (CAVLC) and exponential-Golomb (Exp-Golomb) entropy encoders to encode various syntax elements. Both encoding schemes are variable length codes and the length of the encoding element is not in multiples of bytes. Using variable length codes makes the bitstream processing inherently sequential. To achieve high throughput, the entropy decoders must be efficiently implemented.

The parser buffers the incoming bitstream and provides a byte-aligned bitstream for different computing elements. The number of bits required to decode a syntax element is not a multiple of eight, which means data can be read from any position.

To avoid the overhead of a multiplexer, we use two-level buffering. The first-level buffer is circular and caches the incoming bitstream. Data is read and written in multiples of bytes. We use a read/write pointer to keep track of the data. We ensure that the buffer is never full by making the processing rate greater than maximum incoming bit rate. Therefore, we do not have to maintain additional data to differentiate between full and empty conditions. A small second-stage buffer reads data from the first-stage buffer. The second stage serves data for processing. The data is also byte-aligned before it goes to various computing elements.

The entropy encoders are implemented very efficiently and take less area and clock cycles compared to the best implementation described in the literature we have read.

IDCTThis module performs an inverse integer discrete cosine transform. Unlike most other standards, IDCT is performed for each 4 x 4 block, which makes it a computationally intensive module.

We implement this module using a five-stage pipeline. We used a novel approach to split it into pipeline stages. Four pixels are inverse-transformed and de-quantized for every clock cycle. This module takes residual data as input from the parser.

BitstreamParser

BufferB

BufferA Inter/Intra

RAM128 KBits

Register (128 * 2 + 104)

ReferenceFrame RAM

+

IDCTBlock

FIFOBuffer

FIFOBuffer

DeblockingFilter

MCR

176

Page 186: Nios II Embedded Processor Design Contest - Outstanding - Altera

Multimedia Decoder Using the Nios II Processor

MCRThis module predicts the frame using the previously decoded frames, depending on the parameters from the parser. This module contains two kinds of prediction, inter and intra prediction.

Inter PredictionH.264 allows the capture of very fine motion at quarter-pixel resolution using a six-tap finite impulse response (FIR) filter to interpolate accurate sub-pixel values. H.264 also let the designer choose different block sizes for specifying the motion. The block size can be 16 x 16, 16 x 8, 8 x 16, 8 x 8, 8 x 4, 4 x 8, and 4 x 4. The encoded bitstream contains the reference frame number, motion vector, and size of each block.

Inter prediction is both compute- and memory-intensive. A six-tap FIR filter interpolates the reference picture pixels to obtain the predicted frame. H.264 also allows quarter-pixel granularity, which requires computation of the half pixel first and the quarter pixel second, which requires more memory.

Inter prediction has three units: a fetch unit, a compute unit, and a cache. These units function in parallel. In our design, we assigned a maximum of nine clock cycles for the fetch unit and four clock cycles for the compute unit. A cache buffers the data fetched from the reference frame to reduce the memory bandwidth required. The fetch unit takes the reference frame and the motion vector as inputs, and computes macroblock addresses required for the prediction of a 4 x 4 block. It fetches the blocks that are not available in the cache. A block is fetched only if the cache entry is free. We do not perform out-of-order fetching because it complicates the control logic. The compute unit performs the interpolation using a six-tap FIR filter. This unit contains a nine-stage pipeline. Nine stages are required because of the quarter-pixel interpolation. The first four pixels perform half-pixel interpolation. The remaining stages perform quarter-pixel interpolation.

Intra PredictionIntra prediction is more control-intensive. Depending on the prediction type, different computations are performed on the data.

Intra prediction has three phases. The first phase is the initialization phase, where the data required for the intra prediction is obtained. The second phase, called the compute phase, uses the data fetched by the first phase and performs the computation. In the final phase, data is passed on to a deblocking filter. Intra prediction retains the required pixels that are not passed through the deblocking filter for prediction of other blocks.

Deblocking FilterThe deblocking filter removes blocking artifacts, which improves the subjective quality of the picture. This module is the most computationally expensive block, which affects the entire design’s processing rate. This block also requires a lot of memory.

We implemented this block using an eight-stage pipeline. The pipeline processes luma and chroma components. A read/write counter controls the pipeline efficiently. We used a novel transpose buffer, which allows us to start vertical processing immediately after horizontal processing. In the existing implementations, vertical processing begins only after the entire pipeline is flushed.

Performance ParametersThe performance targets for our design were:

■ Use high communication bandwidth for real-time video display. We needed 100 megabits per second (Mbps) to display high quality video in real time.

■ Implement the design using the smallest possible area to provide low-cost solutions.

177

Page 187: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

■ Provide high throughput, even when operating at low frequencies. This requirement meant we needed to use fewer clock cycles.

■ Operate at low frequencies so that the design works with low-cost boards.

■ Partition the functionality efficiently so that communication between these partitions is reduced. This method enables the use of multiple small FPGAs.

We achieved the following performance:

■ The design operates at 50 MHz and meets the throughput requirements needed for large frames.

■ The modules use the following numbers of logic elements (LEs) on a Stratix® II FPGA:

● Parser—40,000 LEs

● IDCT—2,000 LEs

● MCR—59,000 LEs

● Deblocking—17,000 LEs

■ A maximum of 250 clock cycles is required to decode one macroblock.

■ We can separate our design into various partitions. The communication between the partitions is limited and therefore it does not add overhead to the partitioning,

■ Communication is through an Ethernet cable, which means the design can achieve 100 Mbps.

Design Architecture

H.264 baseline decoder is implemented in hardware. The decoder design is described earlier in this paper. The communication to and from the hardware is performed through the Nios II processor. Figure 3 shows a block diagram of the architecture.

Figure 3. Architecture Block Diagram

μClinux is loaded onto the Nios II processor, which allows us to use sockets for communication. Using sockets, the design performs high-speed data transfer that is sufficient for real-time video. The input is the encoded bitstream and the output is a decoded stream in YUV format. The output bit rate is much higher than the input bit rate. We use two sockets for communication, one for the input stream and one for the output stream.

H.264Decoder

ExternalSystem

SOPC

Nios II Processor

Linux OS

Socket

178

Page 188: Nios II Embedded Processor Design Contest - Outstanding - Altera

Multimedia Decoder Using the Nios II Processor

The data exchange between the Nios II processor and the hardware is through the Avalon bus. We programmed our decoder as an Avalon slave. The bus is 32 bits wide, therefore reads and writes are 32 bits wide. We used a C application to read and write data from the hardware. We write three bytes of data and one byte of control information, which is used for synchronization. During reads, the last byte indicates control information.

SoftwareThree parallel processes run in software.

■ One process reads the input stream from the socket and stores it in memory.

■ The second process reads from memory and provides data as the input to the custom logic. When the output of the custom logic is ready, it reads the output and writes it into a buffer. We did not make reading and writing two independent processes, because we know the approximate rate of input and output. For every two outputs from custom logic, we perform one write to the input of the custom logic. Because the peripherals are memory mapped, it is very easy to provide input and output to the custom logic through a C program.

■ The third process reads data from the output buffer and writes to the socket.

Figure 4 shows the flow.

Figure 4. Software Process Flow

Design DescriptionThis section describes the design.

Custom Logic ImplementationThe H.264 decoder is added as custom logic to the Nios II processor. The decoder and Nios II processor communicate using the Avalon bus. The bus width is 32 bits. The Avalon bus requires a particular handshaking protocol for communication. We used following signals for the handshaking protocol.

■ Clk—Clock input

■ CS—Chip select

■ Reset—System-wide reset

■ Wr—Write signal, specifies input is ready

■ Din—Data input, 32 bits

■ Rd—Read signal

■ Dout—Data output, 32 bits

Process BProcess A Process C

Custom LogicIP Buffer OP Buffer

From To

179

Page 189: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

The custom peripheral is an Avalon slave.

Synchronization of I/OThe input and output to the custom logic is 32 bits wide. 3 bytes contain data and 1 byte contains control information, which is used for synchronization. The least significant bit of the control byte indicates if the data is valid for both input and output. One more bit is used as toggle bit, which is toggled on every read (write) for input (output). This bit ensures that we are not reading old data. The input data is the encoded data and the output data is the decoded video stream. Therefore, the output rate is much higher than the input rate. This I/O is performed by a C program running on the Nios II processor.

Burst ModeTo meet our real-time video data processing requirement, we used the Avalon bus burst mode to achieve high bandwidth. Burst mode transfers are only supported in the Nios II processor version 6.0 and higher.

Combining FPGAsThe H.264 decoder is complex hardware that requires multiple small FPGAs for implementation. Because we have only one Development and Education (DE2) board, we could only implement and send a demo for the deblocking filter. The process for implementing the design in multiple FPGAs is described below:

■ We separated the H.264 decoder into multiple partitions such that the communication bandwidth required between the partitions is limited.

■ The partitions communicate using the Ethernet.

■ The modules operate in a pipeline.

■ Each pipeline module can be implemented in one FPGA.

■ Because they operate in a pipeline, the communication between each module is well defined.

■ We include a mechanism to stall the pipeline when data is not available.

These features provide a very economical mechanism to synthesize an H.264 decoder that uses fewer than 80,000 LEs. An 80,000-LE FPGA costs approximately $8,000 USD. With partitioning, the decoder can be implemented using four DE2 boards, which costs about $1,200 USD, a gain of 85%. This cost savings is a very important feature of the design and has huge business impact.

Parallel Processes & SynchronizationThe input/output to the external world is handled effectively using μClinux loaded in the Nios II processor. Three processes can be run on μClinux using the clone system call. Because the processes share memory, a synchronization mechanism should be used to avoid race conditions.

The buffer shared between the processes has two parts. When a process reads from one partition, the other part writes into another partition. The writing partition writes all 0s in the first four bytes of the partition before writing data to the partition. The reading process writes all 1s in the first four bytes after it finishes reading a partition. The writing process waits for all 1s before it writes data, which ensures that it is not overwriting the data before the process reads the data. The reading process waits for all 0s before reading data to ensure that it is not reading the stale data. We tested this mechanism for various video sequences and it works correctly and does not consume much overhead. Using high-level synchronization mechanisms such as semaphores would have added a lot of overhead.

180

Page 190: Nios II Embedded Processor Design Contest - Outstanding - Altera

Multimedia Decoder Using the Nios II Processor

Design FeaturesThis section describes the design features.

Hardware/Software PartitioningIn our design, the external interaction is performed via software and the actual processing is performed in hardware. The system-on-a-programmable-chip (SOPC) methodology allows us to execute this process easily. The software part is executed on μClinux loaded in the Nios II processor. The processing unit is implemented in Verilog HDL and is added as a custom peripheral to the Nios II processor. SOPC Builder is very efficient and allows the user to add logic easily: we can add only the required peripherals, limiting the LE overhead required.

High-Speed CommunicationIt is essential to have high-speed communication to provide the user with a high-quality real-time video experience. Using the μClinux Ethernet driver makes it possible to provide a communication medium of 100 Mbps. Loading μClinux onto the Nios II processor let us use a rich set of drivers available with μClinux. With the Ethernet driver, we can use socket programming. Therefore, communication with the external system is very easy and the board can also be accessed remotely.

Fast Data I/O to the PeripheralIt is essential to have fast I/O to the peripheral to achieve high throughput. Although the processing rate is high if the I/O to the peripheral is not fast, it forms a bottleneck. The Avalon bus burst mode allows us to meet the required rate.

Combining FPGAsThe decoder is complex hardware and requires as many as 100,000 LEs on a Cyclone® FPGA. It is very economical if we can combine multiple FPGAs to implement the entire design. We proposed a novel methodology (described in an earlier section). The Ethernet driver allows us to implement our methodology.

Memory-Mapped I/OSOPC Builder uses memory-mapped I/O. The peripherals can be addresses as we address memory. The base address of a peripheral is available from the SOPC Builder user interface. The data can be read and written from the input and the output ports of the peripheral, as we would do with any memory location. It is very easy to access the peripheral I/O using a C application running on μClinux.

Device-Independent DisplayWe used a very flexible, driver-independant display mechanism. We send the decoded data stream as output; the decoded data stream is in YUV format. This format is a great advantage because any software player that can play a YUV file can be used to play the data stream. There are many streaming YUV players available, making it very easy to show a real-time demo. The YUV player takes the frame size as input and can play any frame size. We can demo any frame size without any change in the setup, unlike other formats. For example, for VGA, different data drivers must be used for different frame sizes. This option is not available in any of the existing setups we researched. The VGA has a single driver and can display only one frame size. In our proposed design we can display any frame size.

181

Page 191: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

ConclusionWe drew the following conclusions from this project:

■ SOPC Builder

● SOPC Builder is an ideal platform to build applications that have both software and hardware parts.

● SOPC Builder provides three variants of the Nios II processor.

● The user can customize the processor’s peripherals.

● It provides full control of the base address and the interrupt controller (IRQ) numbers, if desired.

● The auto assign mode assigns the addresses automatically, which should be sufficient for most uses. The SOPC Builder flow is user-friendly and is very easy to use.

● Custom logic is very easy to add. The user must follow the proper handshaking protocol, as specified in the documentation. If proper signals are not present, it is not possible to add the logic.

■ Nios II Integrated Development Environment (IDE)

● It allows the user to build μClinux, the file system, and other C applications.

● It is very easy to use.

● It is slow compared to the Linux tool flow.

● For the DE2 board, using the Nios II IDE to build Linux is very difficult. It is very easy to build it in Linux.

■ Nios II Forum

● The Nios II forum is very helpful for Nios II and μClinux related issues.

● We received very prompt responses to our postings.

● The forum are already has many postings, and we can often find a solution to our problem by searching through the postings.

182

Page 192: Nios II Embedded Processor Design Contest - Outstanding - Altera

Omnidirectional Mobile Home Care Robot

Third Prize

Omnidirectional Mobile Home Care Robot

Institution: Department of Electrical Engineering, National Chung-Hsing University

Participants: Hsu-Chih Huang, Chia-Ming Chen, and Tung-Sheng Wang

Instructor: Professer Ching-Chih Tsai

Design IntroductionIn modern manufacturing, robot arms are used in automated production, replacing humans to avoid risk, increase efficiency, and improve accuracy. Fixed robot arm technology is very well-developed worldwide. In our project, we designed a smart robot to expand the applications for which robot arms are traditionally used. The robot arm is installed on an omnidirectional mobile vehicle; a network camera is installed on the front of the arm.

Our project, the omnidirectional mobile home-care robot, provides home care for the disabled and improves their quality of life. Because of physical disabilities, a disabled person cannot move and reach for a desired item as a non-disabled person would. The smart robot can perform these tasks with the help of its network camera, the platform, and the robot arm. With our system, the user grabs the article’s image in the camera and the system directs the platform and arm to pick up the object. The robot helps the disabled person pick up objects and perform household tasks.

Figure 1 shows the omnidirectional mobile home care robot.

183

Page 193: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 1. Omnidirectional Mobile Home Care Robot

The robot arm and the omnidirectional mobile vehicle system uses two Altera® Nios® II test boards. The Stratix® and Stratix II devices control the system, motor position, and speed. A PC performs image processing. We used a Nios II embedded processor to implement the system design. The Nios II processor has a consistent instruction set and programmable module, and can cooperate with assorted peripheral devices to transfer information, enabling joint use of the client’s instruction and hardware acceleration unit. The Nios II processor provided excellent performance.

Unlike ordinary chips, the Nios II processor’s peripheral I/O provides an I/O interface Ethernet port for communication. We implemented communication between the Nios II processor and the PC via the TCP interface, greatly improving integration of the robot’s subsystems.

Programmable logic devices (PLDs) enhance the Nios II processor’s capabilities. The robot arm has five servomotors and the platform has three. The hardware design control circuit architecture is so complex that ordinary digital signal processing (DSP) chips cannot meet the system’s requirements. We used the Nios II processor and system-on-a-programmable-chip (SOPC) design techniques to create the smart robot hardware circuit. With the Nios II processor’s calculation capability and the Quartus® II development software, we could implement the overall system using the Altera Stratix II and Stratix development boards.

184

Page 194: Nios II Embedded Processor Design Contest - Outstanding - Altera

Omnidirectional Mobile Home Care Robot

Function DescriptionThe omnidirectional mobile home care robot uses two Altera development boards; one controls the arm and the other controls the platform. The PC processes the image and calculates the object’s position. Figure 2 shows the overall system architecture.

Figure 2. System Architecture

A PC controls the image processing and genetic algorithm and the Nios II processor (running on two development boards) controls the robot arm and the platform. The Nios II processor performs all control algorithms. Our test results show that the Nios II processor can act as the servo controller, while combining the hardware circuit and the software control program code in an FPGA accelerates research and development (R&D). Altera’s Nios II processor performs important tasks and the development boards help us perform R&D.

The development board and PC designs are described as follows.

■ PC-based image processing core—The user uses the webcam network camera to select image information, which is delivered to the PC for coordinate processing. Figure 3 shows the processing flow. We used Borland’s C++ Builder to create the image processing design, and used the video for Windows (VFW) software development kit (SDK) to implement Gray-level processing, edge detection, and binary imaging. The edge detection design uses the Sobel algorithm and the binary imaging boundary values are determined using the Otsu algorithm. When image processing finishes, the system determines the object’s three-dimensional (3-D) coordinates compared to the robot’s. Then, the system substitutes the coordinates in the genetic algorithm to obtain the rotation angle of the robot arm’s five-axis motor as well as the platform’s best rotating X and Y coordinates. The system transmits the control signal to the hub via TCP and

Webcam: Logitach QuickCam Pro 4000 (Advanced Version). Function: Captures the Video Image.

Robot Arm: Movemaster EX RV-M1 from Mitsubishihas 5 Vibration Freedoms (Excluding the RevolvingClamp) wth a Structure Similar to Human Arms.Function: Grabs the Designated Object.

7-Inch LCDFunction: Displays the System Status & FaciliatesOperation with the Human-Computer Interface.

Switch: Includes the Master, Arm & Bottom Switches.Function: Facilitates the Sub-Tests & Lowers Battery Power Consumption.

Power Supply Converter: Converts DC 24 V to AC 110 V.Function: Supplies Power for the Display, Bare System &Nios II Test Board.

Bare System: Connects the On-Line Cameras.Function: Processes Images & Computes the Motor Axis'sRotation Angle.

Two Altera Nios II Test Boards: Stratix & Stratix II Boards.Function: Responsible for the Omnidirectional MobilePlatform & the Robot Arm's Servo Control.

12-V DC Output Battery: The Two Batteries Combine toProduce a 24-V DC Voltage..Function: Supplies Power for the General System.

All-Directional Wheel: The System has Three Wheels.Function: Enables the Robot to Move Quickly in AnyDirection.

185

Page 195: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Winsockets and delivers the rotation angle and the XY coordinates to the development boards, which control the speed and position of the robot arm and platform.

Figure 3. Image Processing Procedure

■ Stratix II development board—The board receives the five-axis motor’s rotation angle from the PC. It uses the built-in lightweight IP (LWIP) interface to connect to the PC, reducing the time spent on R&D. After the system determines the rotation angle for each axis of the robot arm, the board smoothly moves the arm to its destination. The Nios II CPU calculates the trajectory. A photo encoder installed on the arm motor gives the motor position (we wrote this circuit in VHDL). The Nios II CPU controls the motor position and speed with a PI controller. For the robot arm, we use a double PI control system to achieve the best performance. Finally, we used pulse width modulation (PWM) to output the control command to the motor driver. We wrote the PWM circuit in VHDL, which is implemented in the FPGA. The arm motor control circuit hardware and the software are integrated onto the Stratix II development board. This imeplementation greatly reduces the circuit size, provides design flexibility, and gives excellent performance.

■ Stratix development board—The board receives the XY coordinates from the PC. The Nios II CPU, implemented on the board, controls the robot arm and the platform’s point-to-point motion. The programmable I/O (PIO) outputs the digital control signal to the digital-to-analog converter (DAC), which converts it into an analog signal. The DAC outputs the analog signal to drive the platform’s motor. The Nios II CPU integrates the circuit with the FPGA to control the platform.

Performance ParametersOur system performs as described below.

■ Image processing and genetic algorithm—During image processing, the system determines the object’s position. The genetic algorithm determines how to control the arm and the platform compared to the object. We created the design using the Borland C++ Builder development interface. The system displays the results on screen, which makes the interface work more smoothly. In the future, we plan to implement these functions in the Nios II processor.

■ Communication between the PC and Nios II processor—Our design uses one PC and two Altera development boards. Therefore, communication between the PC and the Nios II processors is very important. The LWIP interface, embedded in the Nios II processor, is the TCP interface for the system. On the PC side, we used a socket application programming interface (API) to write the TCP communication program, which achieves stable and fast information transfers. Based on our

ImageData

Gray LevelProcessing

EdgeDetection

BinaryImage

CircleSearching

Circle Center

& Radius

Image Processing

186

Page 196: Nios II Embedded Processor Design Contest - Outstanding - Altera

Omnidirectional Mobile Home Care Robot

test results, using the Nios II processor helped us to achieve our design goal for this part of the design.

■ Robot arm subsystem control—The subsystem design integrates hardware and software in the Altera Stratix II development board. The design performs as follows:

● Trapezoidal route planning—The built-in LWIP interface effectively receives control commands from the PC. The Nios II CPU calculates the trapezoidal route planning to facilitate smooth robot arm movement. Our tests show that the Nios II CPU has sufficient performance to ensure smooth arm movements.

● Five-axis servo motor control—After performing route planning, the Nios II CPU propels the motors on each axis to rotate the arm. The Nios II CPU calculates the PI motor position and speed control. The Nios II processor is useful for this part of the design.

● PWM output, servo motor drive—The development board’s PIO drives the arm, which grasps the object. This test was successful.

■ Subsystem platform control—The subsystem design integrates hardware and software on the Altera Stratix development board. The design performs as follows:

● Point-to-point algorithm—The built-in LWIP interface receives the platform control command from the PC. Then Nios II CPU calculates the point-to-point algorithm, so that the platform can move to the object’s correct position. Our test showed that Nios II calculation helps the robot make a successful point-to-point motion to reach the object.

● Three-axis servo motor control—The platform is controlled by three servo motors that use a PI control rule. According to our tests, the Nios II processor does well for both position and speed control.

● DAC—The Stratix development board PIO outputs a 12-bit digital control signal, and converts it in the DAC into an analog signal that controls the platform’s servo motor. Our testing shows that the digital analog conversion is accurate.

Design ArchitectureThis section describes our design architecture. Figures 4 and 5 show the hardware design block diagrams for the arm and mobile platform, respectively.

187

Page 197: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 4. Arm System Hardware Block Diagram

Figure 5. Mobile Platform System Hardware Block Diagram

Figure 6 shows the software flow chart, which describes the software and the control signal flow for the visual imaging system, robot arm control system, and platform system.

DriverBoard

PWM Controller QEP Circuit

Motors Encoders

Nios II /f

System Timer

Tri-State Bridge

TCP/IP

PIO

Flash

PC

Stratix II Development Board

SRAM

DACConverter

D/A Controller QEP Circuit

Motors Encoders

Nios II /f

System Timer

Tri-State Bridge

TCP/IP

PIO

Flash

PC

Stratix Development Board

SRAM

188

Page 198: Nios II Embedded Processor Design Contest - Outstanding - Altera

Omnidirectional Mobile Home Care Robot

Figure 6. Software Flow Chart

The robot’s software has four parts:

■ Image processing—After selecting the object’s image, the smart robot’s visual system (webcam) delivers the image to the PC. Then, it uses C++ Builder to analyze and process the data, and read the object’s 3-D position. Figure 7 shows the C++ Builder program interface.

PC(Image Processing & Genetic Algorithm)

Arm System

Web CAM

Image Data

Vision System

TCP/IP

Nios II/Stratix II(Motion Planning)

PWM Driver

Arm MotorsQEP

PWM Signal

PositionCommand

Mobile Base System

TCP/IP

Nios II/Stratix(Motion Planning)

D/A Converter

Mobile BaseMotors

QEP

12-Bit Digital Signal

PositionCommand

Analog Signal

189

Page 199: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 7. Borland C++ Builder Program Interface

■ Robot arm and the platform’s genetic algorithm—The PC uses the C++ Builder program to convert the XYZ coordinates into the rotation angle axes and the platform’s rotation coordinates. The PC uses the Winsocket API and TCP communication to deliver the signal to the Nios II processor on the Stratix II development board (for the arm) and the Nios II processor on the Stratix development board (for the platform).

■ Robot arm route planning—The Stratix II development board receives the rotation angle axes from the PC. We used the Nios II Integrated Development Environment (IDE) to develop the arm’s joint motion algorithm. The Nios II processor calculates the point-to-point route planning according to the axis’s largest acceleration and speed. Then, it determines the joint’s angular speed and acceleration, and outputs the PWM signal to control the arm’s rotation. The quadrature encoding pulse (QEP) produced by the motor rotation is sent to the Nios II processor to control the PI servo motor feedback. Thus, the robot arm reaches the specified position and grabs the object.

■ Omnidirectional mobile vehicle route planning—This subsystem controls the omnidirectional mobile vehicle’s Stratix development board and receives the object position command from the PC. Then, it determines the kinematic equation according to the physical structure of the vehicle. It uses the Nios II IDE to write the point-to-point route-tracking rule, determine the motor speed, and output the 12-bit digital signal that controls the vehicle motor rotation. The QEP produced by the motor rotation is sent to the Nios II processor to control the PI servo motor feedback.

After simulating the controls on a computer, we implemented the design and confirmed its accuracy using the Stratix II and Stratix development boards.

190

Page 200: Nios II Embedded Processor Design Contest - Outstanding - Altera

Omnidirectional Mobile Home Care Robot

Design MethodologyFigure 8 shows the system design methodology flow chart. Our methodology has four parts, as described in the following sections.

Figure 8. System Design Methodology Flow Chart

Manipulator DesignThe robot arm controller is the Stratix II EP2S60 development board and the Nios II processor is the microcontroller. We implemented the arm’s five-axis motor PWM circuit (see Figure 9) and the A, B phase QEP counting circuit (see Figure 10) using hardware circuits. The circuit design lets us implement the arm’s servo control system on an FPGA; therefore, it has low cost and low power, and is digital and small.

SystemDesign

ManipulatorSOPC Design

Mobile BaseSOPC Design

MechanismDesign

PC

Hardware

Software

Hardware

Software

Nios II Processor

PWM ControlCircuit

A B Phase MotorQEP Counter Circuit

Trapezium TrajectoryPlanning

PI Control

Nios II

Single-Phase MotorQEP Counter Circuit

Point-to-PointTrajectory Planning

PI Control

D/A Converter

PWM Driver

Power Supply

Hub

Web CAM

GA Algorithm

Image Processing

191

Page 201: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 9. PWM Circuit Diagram

Figure 10. A, B Phase QEP Counting Circuit Diagram

The control rules are written into the Nios II IDE and are implemented in software, including the PI servo motor feedback control (Figure 11) and the trapezoidal route planning. We used software because it facilitated adjusting the Kp and Ki control rule parameters, and we could add program interrupts, which helped save debugging time.

192

Page 202: Nios II Embedded Processor Design Contest - Outstanding - Altera

Omnidirectional Mobile Home Care Robot

Figure 11. Robot Arm PI Control Diagram

Mobile Base DesignThe platform uses the Stratix EP1S10 development board as the controller and the Nios II processor as the microcontroller. We implemented the platform’s three-axis motor QEP counting circuit (see Figure 12) using a VHDL hardware circuit. The control rules are written into the Nios II IDE and implemented in software, including the platform’s point-to-point route planning and PI servo motor feedback control. We used software because it facilitated adjusting the Kp and Ki control rule parameters as well as the platform’s rotation speed.

Figure 12. Single-Phase QEP Counting Circuit Diagram

Mechanism DesignThe part of the design concerns the analog-to-digital converter (ADC) used to drive the platform’s motor, the PWM driver for the arm’s motor, and the robot’s power supply.

PCThe PC retrieves image information via the webcam. It determines the object’s 3-D coordinates compared to the robot, the robot arm axis’s rotation angles, and the platform’s moving distance. It sends the information to the Stratix II and Stratix development boards through the hub using TCP/IP.

Design FeaturesWe integrated many functions on the Altera development boards, which greatly reduced hardware requirements and system development time. The two development boards and the PC performed image

MotionTrajectory

1S

PI PositionController

DCServomotor

DriverPI VelocityController

+

-

+

-

Θ 1

1S

PI PositionController

DCServomotor

DriverPI VelocityController

+

-

+

-

Θ 2

1S

PI PositionController

DCServomotor

DriverPI VelocityController

+

-

+

-

Θ 3

1S

PI PositionController

DCServomotor

DriverPI VelocityController

+

-

+

-

Θ 4

193

Page 203: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

processing. The Stratix II development board controls the robot arm, including the control rule calculations and command output. The other Stratix development board calculates the omnidirectional mobile vehicle’s control rules, route planning, and control command output. We implemented the control rule calculations in the FPGAs using VHDL and software.

Traditionally, designers use the RS-232 port to communicate between a PC and the Nios II processor. To increase transmission speed and accuracy, we used TCP to transmit the information. With the Altera development boards, we did not need to add additional hardware, but could directly embed the TCP interface into the FPGA. Because we used TCP, the communication between the PC and development board was secure.

The Nios II processor performed well in the system and was outstanding at performing the control calculations and generating the control command output. Because the system is complex, we needed to use two development boards. If we had used a DSP or some other chip instead of the Nios II processor, we would have spent much more time developing the system hardware.

During system development, debugging is necessary. Altera provides good HDL simulation tools to help with this task. Additionally, the LED and seven-segment display on the development boards were very helpful for debugging. With the Nios II IDE hardware debugging interface, we could simply use the Quartus II software and Nios II IDE to develop the arm and platform. These tools are very easy for beginners to learn. For image processing, we used Borland C++ Builder.

ConclusionOur R&D process shows that Altera’s Nios II processor is a very useful soft-core embedded processor system. If we had not used the Nios II processor, we would have spent more time on R&D and would not have achieved such excellent performance. With the Nios II processor as the system core, users can use the Quartus II software and SOPC Builder to create custom Nios II processors and peripherals. The resulting embedded system satisfies the user’s requirements for hardware structure, functional features, and resource consumption. For the debugging users encounter in R&D, both the Quartus II software and Nios II IDE provide convenient tools, enabling the user to find problems quickly.

Our design uses the Quartus II software to plan and design the system. The Quartus II software’s compete multi-platform design environment satisfies designer’s demands and is an all-around tool for SOPC design. Its diversified functions fully support the VHDL and Verilog HDL design process, which eases hardware circuit design. Additionally, the user can simulate and test the HDL designs using the same tool. The Quartus II software provides a simulator and also supports third-party simulation tools such as the ModelSim® software. Simulation checks whether the design circuit is correct before the design is impemented in hardware, shortening debugging time. In our project, we integrated both the software and hardware design into the FPGA, which would have been impossible in other development systems. Using Altera tools and FPGAs greatly reduced the number of wires and electronic components our design required.

The Quartus II software also includes the library of parameterized modules (LPM), which helps the user create complex, advanced systems. The LPM can be used widely in SOPC designs, or used together with ordinary Quartus II designs. The Quartus II software’s MegaWizard® Plug-In Manager helps the user create function modules, LPM functions, and intellectual property (IP) functions.

For the three parts of our design, only the image processing part does not use the Nios II CPU. We plan to work on this aspect of the design and implement the image processing in the FPGA as well. Working on this project gave us a deeper understanding of the Nios II processor, and we plan to use it to develop new designs in the future.

194

Page 204: Nios II Embedded Processor Design Contest - Outstanding - Altera

Communications Applications

Communications Applications

This section includes projects that can be used as communications applications, including:

Network Data Security System Design with High Security Insurance ............................................. 197

SOPC Implementation of Software-Defined Radio .......................................................................... 235

195

Page 205: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

196

Page 206: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

First Prize

Network Data Security System Design with High Security Insurance

Institution: Department of Information Engineering, I-Shou University

Participants: Jia-Wei Gong, Jian-Hong Chen, and Zih-Heng Chen

Instructor: Professor Ming-Haw Jing

Design IntroductionBecause of the availability of relatively large bandwidth, computers worldwide are connected via a network. Computer users can transfer information and exchange data easily on the Internet, and as a result, many network data servers have been set up. With this growth, ensuring data security during storage and transmission has become very important. In our project, a channel coding and encryption system protects data, ensuring data security in case of network invalidation. It also prevents data loss for cases in which a single packet of data is revealed (including losing an encryption key). Our goal is to make a new network data security system with high-security insurance.

A traditional network data security system (e.g., network-attached storage devices or a storage area network) achieves high security through encryption/decryption algorithms such as advanced encryption standard (AES), data encryption standard (DES), RC6, and other symmetric key cryptosystems. However, with these schemes data errors have become more frequent, particularly as network transmissions enter the gigabit per second (Gbps) range. Therefore, our new network data security system must consider data accuracy and privacy.

We used the AES block cipher to encrypt the data and we used an RSA algorithm to encrypt the AES key. The key and encypted data are encoded using a Reed-Solomon (RS) encoder and stored, via Ethernet, onto multiple archive data servers. When the data is read from the server, the reverse process occurs and decoding is performed on each server. Because of fault tolerance mechanisms, the server works normally even when data errors occur during transmission or if the server is damaged.

To develop our proposed embedded system, we used a Nios® II processor (a 32-bit RISC soft-core processor) to integrate various peripherals that we downloaded via the Internet. The design’s AES, RS encoder, RSA algorithm, and network communication protocol require many look-up tables (LUTs) and hardware design. We used system-on-a-programmable-chips (SOPC) concepts and hardware/software co-design to shorten the design process.

197

Page 207: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

The Nios II processor supports user-defined custom instructions as well as hardware acceleration. It can also perform other functions such as processing, controlling, decision making, and ordering. These features allow the software designer to write the control flow program in C/C++, while most of the computing is done in hardware. The software application includes a custom instruction that operates as a high-speed, hardware-accelerated computing module. In addition to supporting custom instructions, the Nios II Integrated Development Environment (IDE), which includes the GNU C/C++ assembler and Eclipse IDE, is ideal for the entire design process. Furthermore, the designer can perform simulation, development, debugging, integration, and validation in real time on the Development and Education (DE2) board. All of these features make development efficient.

Design ConceptsUsing hardware/software co-design helped us complete the network data security system rapidly. The design can be divided into 3 parts, as shown below:

■ System—The Nios II processor performs the major system control functions. These functions include flow control, general computing, system monitoring, event decision making, receiving and sending data, user interfacing, and implementing an interrupt service routine.

■ Hardware—The hardware includes an RS encoder, AES, and RSA high-speed operation components, charge-coupled device (CCD) image capture components, dual-port SDRAM controller, Ethernet controller, VGA controller, and SRAM controller.

■ Software—The software provides a testing and validating platform for the communication between the software and the hardware. We also use it for software simulation and development.

Figure 1 shows the test platform for the system.

Figure 1. Network Data Security System Test Platform

The AES algorithm has two parts: encryption and decryption (see Figure 2). Table 1 shows the operation of these parts (which are described in detail later in this paper).

Virtual Server Virtual Server Virtual Server Virtual Server Virtual Server Virtual Server

Network

Cyclone II Device

Nios II ProcessorAES + RS

Original Data(CCD Camera)

PC(Simulation Platform)

198

Page 208: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Figure 2. AES Encryption/Decryption Conceptual Diagram

We use four key components to implement the AES block cipher: SubBytes, ShiftRows, MixColumes, and AddRoundKey. AES security is based on the encryption/decryption key. Therefore, we transmit the key with an open-key encryption system (RSA), which is performed in hardware. See Figure 3. Because the RSA component requires a large amount of integral power and modular operation, we implemented it as a custom instruction and incorporated it into the Nios II arithmetic logic unit (ALU) to improve overall efficiency (see Figure 4). Other hardware elements include the (Inv)SubBytes, (Inv)ShiftRows, (Inv)MixColumes, and (Inv)AddRoundKey AES functions.

Figure 3. Open-Key Encryption/Decryption Conceptual Diagram

Table 1. AES Work Flow

AES Encryption AES DecryptionAddRoundKey

for round=1 to N-1SubBytesShiftRowsMixColumesAddRoundKey

end forSubBytesShiftRowAddRoundKey

(Inv)AddRoundKeyfor round=1 to N-1

(Inv)ShiftRows(Inv)SubBytes(Inv)AddRoundKey(Inv)MixColumes

end for(Inv)ShiftRows(Inv)ShbBytes(Inv)AddRoundKey

CryptographicKey

AESDecryption

Plain Text Original Plain TextCipher TextAESEncryption

Plain TextMessage, m

EncryptionAlgorithm

DecryptionAlgorithm

Plain TextMessage, m

m = d (e (m))b g

Private Decryption Key, dg

Public Encryption Key, eg

Cipher Text e (m)g

199

Page 209: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 4. RSA Decoder Custom Instruction

Our design implements an RS encoder and decoder (see Figures 5). Both functions are made up of basic mathematical operations, including multipliers, inverses, and finite field squares. The encoder uses Berlekamp-Massey and chain search concepts, and is implemented in hardware.

Figure 5. RS Encoder/Decoder Conceptual Diagram

The core infrastructure has the following elements, as shown in Figure 6:

■ High-Speed Data Processor—Includes the RS encoder/decoder, AES, and RSA operations.

■ CCD Controller—Connects to the CCD camera via a programmable I/O (PIO) and direct memory access (DMA) to capture images.

■ VGA Controller—Displays the image captured by the CCD camera on a computer display.

■ SDRAM Controller—Reads/writes data at high speed.

■ Network Controller—Performs network card initialization, receives packets, and sends interrupt processing and user datagram protocol (UDP) communications.

■ System Software—Provides LUT operation, hardware core component control, peripheral control, operation control, data flow control, interface control, interrupt control, etc.

Nios II Embedded Processor

+--

&

<<>>

Out

result

A

dataa

Nios IIALU

B

databCustom Logic

Combinatorial

Multi-Cycle

Parameterized

dataa

databclk

clk_enresetstart

n

a

b

c

32

32

8

5

5

5

32result

status/Flag

RSA Decoder

Encoder

I0 I 1 I k - 2 I k - 1...

C0 C 1 C k - 2 C k - 1... Cn - 2 Cn - 1...

I(x) d(x)C(x)

I(x)

(n,k) RSEncoder Decoder

C0 C 1 C k - 2 C k - 1... Cn - 2 Cn - 1...

C(x)

r(x)

(n,k) RSDecoder

r0 r 1 r k - 2 r k - 1... r n - 2 r n - 1...

200

Page 210: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Figure 6. Core Infrastructure

The system’s communication interfaces are UART, USB, and Ethernet. They use DMA technology to increase CPU usage time, and they are connected via the Avalon® bus, allowing improved data channel and SDRAM efficiency (see Figure 7).

Figure 7. Communication Interfaces

Application ScopeOur project can be used in the following applications:

■ Distributed archive security system

■ Archive server

Nios II Processor

On-ChipDebugging

Reliable &Secure

EncodingKernel

(AES + RS)

DMAController

Data Memory

Instruction Memory

SDRAM Controller

UART

USB

PIO

Ethernet 10/100

AvalonBus

SRAM Controller

CCD Camera VGA Controller M4K RAM

On-Chip

Nios II CPU

InstructionMaster

DataMaster

DMA Controller

WriteMaster

ReadMaster

Avalon Bus Module

DataFlow

DataFlow

Avalon Bus Module

ArbitratorUARTPIOUSBetc.

SRAM Flash Ethernet MAC

201

Page 211: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

■ Network hard drive or mail server

■ High-speed database storage system

■ Disk array

Target UsersOur project targets the following users:

■ Companies that need a high-security data storage system

■ Designers of network data security systems that require high security insurance

■ Portable communication or storage system producers

■ Personal data storage system designers

Development BoardOur design uses the Altera® DE2 development board, which contains a Cyclone® II EP2C35F672C6 FPGA, EPCS16 serial configuration device, 8-Mbyte SDRAM, 512-Kbyte SRAM, 4-Mbyte flash, secure digital (SD) memory card slot, 10/100 Ethernet, RS-232 port, infrared port, etc. See Figure 8.

Figure 8. DE2 Development Board

Function DescriptionThis section describes the function of our design.

202

Page 212: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Development StepsWe used the following general steps to implement the design.

1. Implement the design’s operational blocks (AES, RSA, and RS encoding) in HDL (VHDL and Verilog HDL) using the Quartus® II development environment.

2. Capture an image using a CCD camera, which is displayed in real time with a VGA controller. The captured image is the data source to be encoded and encrypted. The SDRAM image data on the DE2 development board outputs from the computer over the network.

3. Transmit data for real-time encoding/decoding using Ethernet and DMA control.

4. Develop custom instructions to implement the RSA decoder. Return the result directly to the ALU.

5. Build the entire system using SOPC Builder. Complete the hardware/software co-design and demonstration.

6. Simulate AES-128 encryption and RS (6,4,1) encoding/decoding with C++ Builder.

7. Use C++ Builder to complete the network communication GUI interface with the DE2 development board.

8. Simulate several data servers with one PC (for example, it could simulate a situation in which the service of one or more servers is suspended).

9. Test the systems using test data.

ImplementationThis section describes how we implemented the design.

Create the ComponentsUsing the Quartus II software version 5.1, we created the AES, RSA and RS encoding functions in VHDL and Verilog HDL. Specifically, we:

1. Analyzed the AES algorithm and developed its four key components. We integrated the encoding and decoding functions for increased CPU efficiency and data property. This functionality is recursive.

2. Analyzed the RS function. We implemented the RS decoder and encoder using a general reliable error-correcting mechanism.

3. Created all components according to the input requirements: multiplier, square, and finite field inverse, KeyExpansion, (Inv)SubBytes, (Inv)ShiftRows, (Inv)MixColumes, (Inv)AddRoundKey, RSA decoder, RS decoder, and RS encoder.

4. Developed the VHDL components using the Quartus II software version 5.1, and performed functional validation and timing simulation.

203

Page 213: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Capture and Display the ImageWe captured an image via a CCD camera and displayed it in real time on-screen using a VGA controller. The captured image is the data source to be encoded and encrypted. The following steps describe this process.

1. Use Verilog HDL to connect the CCD controller and VGA controller with the Nios II processor (using a PIO function) as shown in Figure 9.

2. Display the captured image on-screen using a VGA controller. Alternatively, use the Nios II processor to transmit it back to the PC via Ethernet for use by the software and hardware encryption/decryption and encoding validation.

Figure 9. CCD and VGA Controller

Transmit the DataWe implement real-time data encoding/decoding transmission via Ethernet, the USB interface, and DMA control as described in the following steps.

1. Incorporate the Ethernet and USB interfaces by selecting the components in SOPC Builder as shown in Figure 10.

2. Set the arbitration parameter and connect the interfaces with the Avalon bus as shown in Figure 11.

3. Use the Ethernet port to transfer encoding/decoding data between the PC validation platform and the virtual server on the network.

DMA VGA

CCD Control & VGA Controller

GPIO Bus

PIO Bus

204

Page 214: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Figure 10. SOPC Builder Components

Figure 11. Avalon Bus and Arbitration

Generate the SystemWe generated the entire system with SOPC Builder as described in the following steps.

1. Open the Altera-provided Cyclone II standard CPU example.

2. Add the user-defined PIO pins, as shown in Figure 12. Table 2 shows the setting of each pin.

3. Design the system infrastructure with SOPC Builder, and generate the system.h header file, peripheral drives, and EDIF file.

4. Generate the executable file using the GNUPro encoder, which is downloaded to the Cyclone II device after validation and debugging.

205

Page 215: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 12. Add User-Defined Pins

Table 2. User-Defined Pin Functions

Name Size (Bits) Direction FunctionsCCD_KEY ? Output Control the CCD control.

CCD_SW 18 Output Initialize the CCD micro-adjusting.

SRAM_SEL 1 Output Control the SDRAM.

CODE_KEY 32 Output Set the encryption key.

CODE_KEY_SEL 6 Output Control the number of keys.

CODE_REST 1 Output Initialize the encoding hardware core component.

CODE_SEL 1 Output Control encoding/decoding.

CODE_iDATA_A~F 32 Input Transmit encrypted data.

CODE_oDATA_A~F 32 Output Transmit encrypted data.

206

Page 216: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Encrypt and Encode DataWe used C++ Builder to complete 128-bit AES encryption and RS encoding/decoding as described in the following steps.

1. Write a user interface program with C++ Builder. Figure 13 shows an example of the parameter settings.

Figure 13. Parameter Settings

2. Write reference rules according to the specification of the Federal Information Processing Standard Publication 197 and RS encoding concepts.

3. Finish the software testing platform gradually and use the platform as the reference for the whole system (see Figures 14 through 20).

Figure 14. Validate Whether the Encoding/Decoding Is Correct

207

Page 217: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 15. Primary Data

Figure 16. Encode a Large Amount of Data

208

Page 218: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Figure 17. Encoded Data (Encrypted via AES)

Figure 18. Encoded Data (Redundancy Encoded Using an RS Code)

209

Page 219: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 19. Decode the Data

Figure 20. Decoded Data

210

Page 220: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

4. The software application checks whether the encoded data is the same as the decoded data, and later validates it with the data encoded by the Nios II processor for development and debugging.

Create the GUIWe built the GUI interface that communicates with the DE2 development board using C++ Builder as described in the following steps.

1. Use the UDP communication protocol to communicate with the DE2 development board over Ethernet.

2. Transmit the CCD image from the virtual server to the Nios II processor for validation, as shown in Figure 21.

Figure 21. Transmit the CCD Image to the Validation Software

In this case, the data source of the validation software on the PC side will vary, which makes the data more flexible. Additionally, the network function makes sending/receiving of data between virtual servers possible.

SimulateWe simulated several data servers using one PC, including simulating situations in which service for one or more servers is suspended. We wrote the back-end GUI using C++ Builder, as shown in Figure 22.

211

Page 221: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 22. Back-End GUI

We simulated the situation in which service is suspended for one or more servers, as shown in Figure 23.

Figure 23. Service Suspension Simulation

212

Page 222: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Test the SystemWe used test data to test the system for the four modes shown in Table 3.

If a single server is suspended, the data is sent to the virtual server after encoding by the Nios II processor, and then is downloaded to the software validation platform for decoding through the network (as shown in Figures 24 through 27). This flow completes the mode shown in test number 3 in Table 3.

Figure 24. Primary Image Data Captured by CCD

Table 3. Test Modes

Test Number

Test Data Encoding Mode Decoding Mode Performance Comparison

1 CCD camera Nios II encoding Nios II decoding Fastest

2 CCD camera Validation software encoding Nios II decoding Second fastest

3 CCD camera Nios II encoding Validation software decoding Third fastest

4 CCD camera Validation software encoding Validation software decoding Slowest

213

Page 223: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 25. Data after Software Encoding

Figure 26. Data Sent to Virtual Server In Case of Service Suspension

214

Page 224: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Figure 27. Decoded Result

DE2 Development Board OperationWe used the DE2 development board in this design as described below:

■ If SW1 is 0 and Key0 is unlocked, the CCD camera captures images and saves them to SDRAM.

■ If SW1 is 0 and Key1 is unlocked, the CCD camera stops capturing images.

■ If SW1 is 0 and Key2 is unlocked, the Nios II processor transmits the CCD camera image back to the validation software for display on the network.

■ If SW1 and SW0 are 0 and Key3 is unlocked, the Nios II processor encodes the CCD camera image and transmits it to the virtual server over the network.

■ If SW1 is 0, SW1 is 1, and Key3 is unlocked, the Nios II processor downloads the data from the virtual server through the network and transmits it to the validation software, which displays it after decoding.

■ If SW1 is 1 and Key0 is unlocked, the Nios II processor waits for the validation software to transmit the new AES key back so that the RSA decoding process can update the key.

Performance ParametersOur design provides channel encoding, and after encoding, the data is transmitted over the network. We analyzed the performance of the encoding/decoding process. For test purposes, we used the low-level graphic data of the CCD camera as data source (see Figure 16).

215

Page 225: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 28. CCD’s Low-Level Data Sequence

After the CCD camera captures an image, two sets of image data are generated: RG (8 bits of red and 8 bits of green) and BG (8 bits of blue and 8 bits of green). Because we set the CCD camera to capture 640 x 480 images, our encoded graphics are composed of 640 x 480 pixels at 16 bits times 2 (RG and BG). AES encoding processes 128 data bits at a time, therefore, it needs to encode 640 x 480 x 16 x 2/128 = 76,800 times total. RS encoding can process 192 data bits at a time, therefore, it needs to decode 640 x 480 x 16 x 2 x 1.5/192 = 76,800 times. Each encoding/decoding process uses a large amount of memory for the reads/writes (we use the SDRAM as the memory). Tables 4 and 5 show the performance analysis.

Table 4. Encoding Performance

Function Description Memory Expected Cycles (Times)Read (Times) Write (Times)

Load the encoding data from memory (128 bits) 4 4 200

Encoder 16 24 500

Add the encoded data into memory (192 bits) 6 6 300

Flow and peripheral control 0 0 1,000

Table 5. Decoding Performance

Function Description Memory Expected Cycles (Times)Read (Times) Write (Times)

Load the encoding data from memory (192 bits) 6 6 300

Decoder 24 16 500

Add the encoded data into memory (128 bits) 4 4 200

Flow and peripheral control 0 0 1,000

Column Readout Direction

RowReadoutDirection

...

...

G R G R G R G

B G B G B G

G R G R G R G

B G B G B G B

G R G R G R G

B G B G B G B

Pixel(26,8)

Black Pixels

B

216

Page 226: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Table 6 shows the performance for the software encoding/decoding (on the PC) and Nios II encoding/decoding. We found that the Nios II encoding/decoding performance is significantly faster than the PC’s software encoding/decoding.

Tables 7 and 8 show our performance analysis.

We set the network packet size at 1,200 bytes, and there must be some delay betwen packet transmissions. Table 9 shows the overall network transmission performance.

Although the network performance is not as good as we expected, the frequency is good enough. If we can improve the network frequency speed, the overall system performance would improve significantly.

Table 6. DE2 and PC Performance

CPU Type CPU Frequency Memory Types & FrequenciesPC AMD Athlon XP 2600+ DDRAM, 400 MHz

DE2 Nios II processor at 50 MHz SDRAM, 50 MHz

DE2 Nios II processor at 100 MHz SDRAM, 100 MHz

Table 7. Performance Anaylsis (50 MHz)

Operation Expected Nios II Implementation

Duration (s)

Nios II at 50 MHz (Time/Data Rate)

PC Software (Time/Data Rate)

Performance Improvement

(Times)Encoding [200 + 500 + 300 + 1000] x 20

ns x 76,800 = 3.072 (s)2.46 s/3.996 Mbps 101 s/0.097 Mbps 41.057

Decoding [200 + 500 + 300 + 1000] x 20 ns x 76,800 = 3.072 (s)

2.46 s/3.996 Mbps 107 s/0.092 Mbps 43.435

Table 8. Performance Analysis (100 MHz)

Operation Expected Nios II Implementation

Duration (s)

Nios II at 100 MHz (Time/Data Rate)

PC Software (Time/Data Rate)

Performance Improvement

(Times)Encoding [200 + 500 + 300 + 1000] x

10 ns x 76,800 = 1.5362.26 s/4.350 Mbps 101sec./0.097 Mbps 44.691

Decoding [200 + 500 + 300 + 1000] x 10 ns x 76,800 = 1.536

2.26 s/4.350 Mbps 107sec./0.092 Mbps 47.345

Table 9. Network Performance

Network Platform Used

Expected Network Speed

(Mbps)

Packet Delay (ms) Actual Network Speed (Mbps)

Nios II processor at 50 MHz 20 4 3

PC 20 2 to 4 3 to 6

Nios II processor at 100 MHz 20 2 6

PC 20 2 to 4 3 to 6

217

Page 227: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Design ArchitectureThis section describes our design architecture for the project.

System Development FlowFigure 29 shows the project development flow.

Figure 29. Development Flow

System DesignFigure 30 shows the system design. The client must have the device acquire the functions of a new network data security system.

Compose CPU

Select & ComposeIP & Periphery

Connect Blocks

Generate

ProcessorComponent Library

PeripheralComponent Library IP Modules

Verify & DebugAssemble &Install

GNU Pro Compiler

AES + RS

SOPC Builder InterfaceDefine System

Generate SOPC System

Define System

EDIF FilesHDL Source FilesTestbench

C Header FileUser LibraryPeripheral Driver

User-Defined IPAlteraSOPC

On-ChipDebugging

JTAG Interface

Quartus II Software

User CodeLibrary RTOS

GNU Pro Tools

Hardware Development Software Development

218

Page 228: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Figure 30. System Design

Table 10 shows the parameters in the system user link library.

Hardware DesignThis section describes the hardware design.

AES-128 Encryption/DecryptionWe designed the AES components recursively. Figure 31 shows the AES hardware diagram. Figure 32 shows the waveform after AES combines with the RS code. We used this design for the following reasons:

■ The AES algorithm repeats four specific components many times. If we used a parallel or pipelined design, it would use a substantial number of LEs (about 23,000) although the speed

Table 10. System User Link Library

Parameter Functionethernet_interrupts Interrupts the adapter.

InitHardware Sets the hardware initialization.

StartCCDToCatchImg Captures the CCD images and sends them to SDRAM.

StopCCD Stops capturing the CCD images.

CatchImgSendToPC Sends the captured CCD images to the PC verification software through the Internet.

CodeDdata Encodes and decodes data and sends/receives it to the virtual server/PC verification software.

splitPackage Splits network packets to be sent.

combinePackage Combines the received network packets.

219

Page 229: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

would be very high (throughput is about 13 Gbps). Additionally, this number of LEs does not provide real-time data. Therefore, we did not adopt this design.

■ Using the AES algorithm, encryption and decryption are almost the same. Therefore, we can co-design both functions and use a multiplexer to select which function to use.

■ By setting one round as the unit element and designing the AES algorithm recursively, we reduced the LE usage significantly (about 2,300) but the design is slower. However, the overall efficiency is sufficient because the hardware computes the AES algorithm rapidly (the CPU can receive a value 3 cycles after sending datd).

Figure 31. AES Hardware

Figure 32. AES and RS Code Waveform

(Inv)SubBytesWe implemented the (Inv)SubBytes function using a LUT. The encryption LUT is referred to as S-Box, and the decryption one is referred to as Inverse S-Box. Both LUTs are 256 bytes and are stored in M4K RAM. A muliplexer selects whether encryption or decryption is performed (see Figure 33). In contrast, parallel computing of the encryption /ecryption requires 16 LUTs and the overall design requires a memory space of 256 x 2 x 16.

220

Page 230: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Figure 33. Realize (Inv)SubBytes Components Using a LUT

(Inv)MixColumesWe use the xtime component, mentioned by Rijndael (the author of the AES algorithm) in the specifications, to design the (Inv)MixColumes component. a(x) represents a 4-byte polynomial expression and the xtime component effect is the value of mod (x4 + 1) after multiplying x by the a(x) polynomial. We needed to use the component many times in the (Inv)MixColumes function, so we decided to co-design the encryption/decryption MixColumes components. The MixColumes component has two outputs and a multiplexer that selects which function to use. See Figures 34 and 35.

Figure 34. (Inv)MixColumes Design

xtime xtime xtime xtime

a0 a1 a2 a3

0 b1 b2 b3b0 b1 b2 b3b' ' ' '

xtime

xtime

xtime

xtime

MixColume Output (Inv)MixColume Output

221

Page 231: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 35. (Inv)MixColumes Implementation

(Inv)ShiftRow and (Inv)AddRoundKeyWe created the (Inv)ShiftRow component by changing lines of hardware (see Figure 36). (Inv)AddRoundKey is composed of XOR components (see Figure 37). These designs are quite simple, so we will not describe them further.

Figure 36. (Inv)ShiftRow Components

Figure 37. AddRoundKey Components

S0,0 S0,1 S0,2 S0,3

S1,0 S1,1 S1,2 S1,3

S2,0 S2,1 S2,2 S2,3

S3,0 S3,1 S3,2 S3,3

S

S0,0 S0,1 S0,2 S0,3

S1,0 S1,1 S1,2 S1,3

S2,0 S2,1 S2,2 S2,3

S3,0 S3,1 S3,2 S3,3

S0,0 S0,1 S0,2 S0,3

S1,3 S1,0 S1,1 S1,2

S2,2 S2,3 S2,0 S2,1

S3,1 S3,2 S3,3 S3,0

S0,0 S0,1 S0,2 S0,3

S1,1 S1,2 S1,3 S1,0

S2,2 S2,3 S2,0 S2,1

S3,3 S3,0 S3,1 S3,2

S'

Line the Wire

S0,1 S0,2 S0,3

S1,1 S1,2 S1,3

S2,1 S2,2 S2,3

S0,0

S1,0

S2,0

S3,0 S3,1 S3,2 S3,3

S'0,0 S'0,1 S'0,2 S'0,3

S'1,3 S1,0 S'1,1 S'1,2

S'2,2 S2,3 S'2,0 S'2,1

S'3,1 S3,2 S'3,3 S'3,0

S0,c

S1,c

S2,c

S3,c

l = round * NbS'0,c

S'1,c

S'2,c

S'3,c

XOR

W l Wl+2 Wl+3

Wl+c

222

Page 232: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

RS Encoder/Decoder (6,4,2)The RS algorithm uses system encoding. It replenishes the r(x) with m(x) during encoding. After encoding, the message can remain on the string; therefore, this method greatly improves the decoding efficiency.

RS EncoderFigures 38 and 39 show the RS encoder hardware design. The original data, M(X), is divided, generating r(x), which is added to m(x) and forms a new codeword. Figure 40 shows the RS encoder waveform.

Figure 38. RS Encoder Conceptual Diagram

Figure 39. RS Encoder Hardware

Figure 40. RS Encoder Waveform

RS DecoderFigures 41 and 42 show the RS decoder hardware design. The decoder decodes the codeword that was output from the encoder via the hardware kernel. The error location and value can be obtained using the

Multiply

Divide

Adder

Codeword = M(X) Xn-k R(X)

System EncodingM(X)

Xn-k

G(X)

Control Signal

Control Signal

R(X)

C(X)

223

Page 233: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Berlekamp-Massey algorithm and Forney algorithms, respectively. Figure 43 shows the RS decoder waveform.

Figure 41. RS Decoder Conceptual Diagram

Figure 42. RS Decoder Hardware

0

0

0

0

Multiplexer

G(X) Control Signal

Codeword

S1

S2

A-1

A2

A-1

Error Location

Error Value

224

Page 234: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Figure 43. RS Decoder Waveform

RSA DecoderIn the RSA decoder, one byte is a unit. Table 11 shows the RSA parameters. RSA decoding involves integer powers and modular arithmetic, and is time consuming. Therefore, we used a multi-cycle custom instruction that operates using three cycles. Table 12 shows the pin descriptions. Figure 44 shows the RSA decoder waveform.

Table 11. RSA Software Pseudo-Code

Variable Name Valuep 31

o 11

n = pxo 341

z = (p - 1) x (q - 1) 300

e: public encryption key 79

d: primate decryption key 19

Table 12. RSA Pins

Pin Name Direction Descriptiondataa Input Operand 1

datab Input Operand 2

result Output Result

clk Input Frequency

Clk_en Input Frequency enabled

reset Input Reset

done Output Signals are finished

start Input Capture operand

225

Page 235: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 44. RSA Decoder Waveform

Software FlowThis section describes the software flow.

Function FlowFigure 45 shows the function flow. The flow is also described as follows.

■ Software encoding/decoding simulation—Send the captured CCD camera images to the DE2 development board to the verifying software. Then send the images to the virtual server through the Internet via the software-simulated coding. The virtual server then ends its active state. Download the data via the Internet to the verifying software to decode and verify the result.

■ Hardware encoding/decoding—Upload the captured CCD camera images to the DE2 development board and upload the encoded data to the virtual server via the Internet. The virtual server then ends its active state. Download the data via the Internet to the DE2 development board to decode. Send the data back to the verifying software and check the result.

■ Software encoding and hardware decoding—Send the captured CCD camera images to the DE2 development board to the verifying software. Then, upload them to the virtual server through the Internet via the software-simulated encoding. The virtual server ends its active state. Download the data via the Internet to the DE2 development board to decode. Return the data to the verifying software and check the result.

■ Hardware encoding and software decoding—Upload the captured CCD camera image to the DE2 development board and upload the encoded data to the virtual server via the Internet. The virtual server then ends its active state. Download the data via the Internet to the verifying software to decode and check the result.

226

Page 236: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Figure 45. Software Function Flow

Software Pseudo-Code and Function DescriptionTable 13 shows the software pseudo-code.

Table 14 describes the software functions.

Table 13. Software Pseudo-Code

Encoder DecoderLoad data from DE2 (CCD).Show data on simulation platform.Encode data.Send data to virtual server.

Download data from virtual server.Decode data.Show data on simulation platform.

Table 14. Software Functions

Function DescriptionEncoder AES + RS encoding core

Decoder AES + RS decoding core

splitPackage Network package split

combinePackage Network package combine

LowDataToRGB Transform the low data of captured CCD images into RGB

Encryption AES encryption core

Decryption AES decryption core

RS-Encoder RS encoding core

RS-Decoder RS decoding core

Virtual Server(Can Shut DownAny One Server)

CCD Camera (DE2)

Encoder (DE2)

Show Data(PC Simulation Platform)

Encoder(PC Simulation Platform)

Decoder(PC Simulation Platform)

Send UDP Packet

Send UDP Packet

Download

Decoder (DE2)

Download

Show Data(PC Simulation Platform)

227

Page 237: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Function HighlightsThe function highlights are:

■ Environment setting—The user can set environment variables and increase the use of flexible spaces (see Figure 46).

■ Image processing—The data resources are diversified, as shown by the captured CCD camera images on the DE2 development board (see Figure 47).

■ Data encoding—We provide functions such as encoding/decoding time analysis optimization (see Figure 48).

■ Upload to the virtual server through the Internet—The user uploads the encoded data through the Internet (see Figure 49).

■ Virtual server stops operation—The user can easily stop the virtual server with a button (see Figure 50).

■ Download from the virtual server through the Internet—The user downloads data onto the virtual server via the Internet (see Figure 51).

■ Data decoding—The application decodes the downloaded data and provides specific time analysis and decoding optimization (see Figure 52).

Figure 46. Environment Setting

228

Page 238: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Figure 47. Image Processing

Figure 48. Data Encoding

229

Page 239: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 49. Upload to Virtual Server

Figure 50. Stop Virtual Server

230

Page 240: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

Figure 51. Download from Virtual Server

Figure 52. Data Decoding

231

Page 241: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Design MethodologyThe process we used to create our design is described as follows:

■ Define the system—Include processors, memory, peripheral components, and pins that connect the peripheral components.

■ Generate the system—Use SOPC Builder to create a .ptf file, which is used by the Nios II IDE to generate the system.h file. This file contains all of the component information.

■ Create the hardware design—Use VHDL code to write and build the required components. Combine, encode, decode, and simulate the circuit.

■ Create the software design—Use the Nios II IDE to create relevant header files and drivers. Write applications, and encode and decode them to .elf files.

■ Simulate—Use the ModelSim software to simulate. If problems occur, modify the system and redesign the software and hardware.

■ Verify—Download the software and hardware through JTAG to the RAM on the Cyclone II development board and perform physical verification.

■ Test—Test the system using the PC GUI to send test data.

SOPC Builder provides a platform to integrate software and hardware, and provides an environment for mutual development. Our design has three major parts: logic (intellectual property (IP) design), storage (RAM), and the computing core (CPU or digital signal processor). The design procedure is as follows:

■ Select the algorithms and IP functions—We use Rijndael’s AES algorithm and RS algorithm. The dynamic table generated after the addition of input variables is stored in on-chip RAM. Then we use the Nios II CPU to control the AES + RS encoding and decoding.

■ Select IP and custom IP components—For our project, we used ready-made IP components; the Altera web site provides documentation that aids in designing the software and hardware. We also developmed custom IP components according to the project requirements. These components are attached to the Avalon bus.

■ Software/hardware design—We used the hardware/software co-design for the project. Co-design is challenging because software development involves the planning and distribution of hardware resources and plays an important role in the performance of the entire system. SOPC Builder and the Nios II IDE provide an integrated hardware/software design system, which sped up the development process.

Design FeaturesOur design’s features are as follows:

■ Create dynamic tables to accelerate operation—AES encryption/decryption requires many finite field mathematical calculations. Replacing finite field mathematical calculations with LUTs can increase the design’s speed.

■ Connect ten slaves to the Avalon bus—Include data memory, instruction memory, CCD controller, SDRAM controller, UART, PIO, Ethernet, AES algorithm, and RS encoding operation components. Implement complicated IP integration and use the LEs in the Cyclone II device efficiently.

232

Page 242: Nios II Embedded Processor Design Contest - Outstanding - Altera

Network Data Security System Design with High Security Insurance

■ Accelerate instruction calculations—RSA encoding/decoding uses many multiplication and modulus calculations. Compiling this function in custom instructions and adding them to the Nios II ALU improves the effectiveness of complicated mathematical and logic calculations.

■ Connect custom IP outside of the Nios II core—The Nios II embedded processor can be modified easily, so users can design PIO pins for external communication according to their own needs. We also integrated the AES + RS hardware core to speed up encoding and decoding.

■ Provide high data security, data fault tolerance, and correction capability—Academic circles have recognized the safety of AES encryption/decryption technology. A user can only open the encrypted text if he/she possesses the golden key. The golden key is further protected by RSA encryption. Data fault tolerance is implemented using the RS coding technology, i.e., normal services can be provided when one server host is inactive. We performed sophisticated calculations on the FPGA and improved efficiency by using the Nios II CPU to control the operation components, implement custom instructions, and support peripheral resources.

■ Provide a user-friendly demo program—The user-friendly software GUI displays the flow of a whole set of AES + RS encoding/decoding, helping users quickly grasp the design concept.

■ Use storage space efficiently—Traditionally, encrypted data (such as network-attached storage, storage area networks, etc.) is stored by copying data to the file servers. Data on each file server belongs to the same stock, so the space occupation is N x S (where N is the number of servers and S refers to the occupied space). When reclaiming data, data is taken from a close server.

In our new network data security system, however, the storage encodes a stock of data with the RS code and disperses them on several file servers. The data is then drawn in an orderly fashion from the file servers. The occupied space is S because the same stock of data is dispersed to several file servers during storage. Additionally, the RS encoding technology enables normal service when one server host is inactive. In our experiments, tests can be facilitated using several I/O mechanisms, including 0/100 Ethernet, USB, and RS-232. We tested all of these connections using the DE2 development board.

ConclusionOur design team benefited greatly from the Altera Nios II Design Contest 2006. Our work included system integration, hardware development, and software design:

■ System integration—SOPC Builder and the Nios II IDE enable the flexible design and prototyping of soft CPUs. With this solution, the designer first develops network interfaces and then builds development and test platforms on PCs to accelerate the development process. This contest familiarized our design team with the importance of integration, providing faster development flow for consumer electronic products that have increasingly shorter life cycle, and reducing the costs of human resources and materials. We believe this flexible system design will become the mainstream in the near future. Our only problem was that the network performance of the DE2 development board was not good enough for our project and could not implement dynamic image processing.

■ Hardware development—Hardware/software co-design is a hot topic. The costs and timing of developing hardware are higher than that of software development; therefore, making full use of hardware is critical. In this contest, we learned that using hardware to perform repeated operations improves system efficiency. However, this effect must be achieved by using the Nios II processor because it contains highly integrated interfaces, such as the Avalon bus, PIO, and custom instructions, which integrate the hardware and software interfaces. With this combination, a designer can easily implement hardware/software co-design.

■ Software design—I feel honored to have participated in the Nios II design contest. I am responsible for the software interfaces, focusing on the communication and information exchange with the Nios II processor. We used Ethernet to do the work, and learned how to transmit

233

Page 243: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

information. We use the C++ Builder to finish the software design and use it as a verifying tool. The software verifies the design’s correctness and implementes the Nios II communication debugger to test each phase. During the design process, we learned Nios II internal communication and information exchange. Now that the contest is over, I feel that I know more about the Nios II processor hope to continue my learning process.

234

Page 244: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC Implementation of Software-Defined Radio

First Prize

SOPC Implementation of Software-Defined Radio

Institution: National Institute of Technology, Trichy

Participants: A. Geethanath, Govinda Rao Locharla, V.S.N.K. Chaitanya

Instructor: Dr. B. Venkataramani

Design IntroductionOur project implements a software-defined radio (SDR) in a Nios® II processor. A software-defined radio is a radio that provides software control of a variety of modulation and demodulation techniques, wide-band or narrow-band operation, communications security functions (such as hopping), and waveform requirements of current and evolving standards over a broad frequency range. In this project, we used the Nios II processor—which supports easy reconfiguration and low development costs—to build a software-defined radio that supports many real-time applications. Altera®-based system-on-a-programmable-chip (SOPC) designs let designers implement real-time critical functions in hardware. Custom instructions make it easy to implement the whole system on an SOPC platform, with better software partitioning and hardware implementation of the software-defined radio.

Design PurposeSDR technology facilitates the software development of the radio system functional modules, such as modulation/demodulation, signal generation, coding, and link-layer protocols. This implementation helps designers build reconfigurable software radio systems in which parameters are selected dynamically. A complete hardware-based radio system has limited utility, because the functional module parameters are fixed. A radio system built using SDR technology extends the use of the system for a wide range of applications.

Application ScopeSDR technology can be used to implement military, commercial, and civilian radio applications. Designers can implement a wide variety of radio applications using SDR technology, such as Bluetooth, WLAN, GPS, radar, wideband code division multiple access (W-CDMA), general packet radio services (GPRS), etc. SDR has generated tremendous interest in the wireless communications industry because of the wide-ranging economic and deployment benefits it offers.

235

Page 245: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

SOPC Builder RoleThe Nios II processor offers embedded design versatility and reconfigurability. It has built-in memory, peripherals, and interfaces, as well as a variety of intellectual property (IP) functions. Productivity tools—such as SOPC Builder, the Nios II Integrated Development Environment (IDE), and Nios II compiler—combined with an FPGA target device allowed us to reduce our turnaround time and costs. The Nios II processor is a perfect fit because we can choose the peripheral, performance, processor mix, and cost that best suits the our SOPC needs. The Nios II processor supports multi-processor systems, and using SOPC Builder, we can use multiple processor cores according to our design requirements.

The Nios II processor provides design flexibility because it is compatible with both software and hardware description languages. Efficient implementation of a complete SOPC is feasible when high-density FPGAs are integrated with high-capacity RAMs and the Nios II processor.

Suitability of FPGAsDesigners can use FPGAs efficiently for digital signal processing (DSP) and other computationally intensive tasks. High-bandwidth memories, embedded DSP blocks, phase-locked loops (PLLs), and high-speed interfaces can be programmed into the FPGA, which facilitates SDR implementation. Additionally, embedded processors with FPGA co-processors enable easy design reconfiguration.

Functional DescriptionOur SDR design receives and demodulates realtime AM signals. The design can accomodate future enhancements, e.g., accommodating a transmitter in addition to the existing receiver, implementing different transmission modes, enhancing reception, and supporting the real-time requirements of civilian and military applications.

The SDR implementation involves frequency translation, analog-to-digital (A/D) conversion, and message recovery. In the frequency translation phase, the received signal frequency is translated to the intermediate frequency (IF) of 455 KHz. This part consists of an antenna, a radio frequency (RF) amplifier, a mixer, a local oscillator, a band-pass filter, and an A/D converter. We used the the MATLAB and Simulink software to implement the blocks and obtain AM signal sample values. Sampled values of the modulated signal are stored in RAM and are fed to the demodulator circuit, which is programmed into the FPGA. The demodulator circuit is based on a special sampling theorem, which makes using a mixer unnecessary. This signal then produces a demodulated output using the coordinate rotation digital computer (CORDIC) algorithm.

CORDIC AlgorithmCORDIC algorithms use shifts and adds to compute a wide range of functions, including trigonometric, hyperbolic, linear, and logarithmic functions. The CORDIC algorithm is used in diverse applications such as mathematical co-processor units, calculators, waveform generators, and digital modems.

The CORDIC algorithm uses shifts and adds to perform vector rotations iteratively. In rotation mode, CORDIC converts one vector in rectangular form to another vector in rectangular form. In vector mode, it converts a vector in rectangular form to polar form.

CORDIC Rotation ModeThe CORDIC algorithm for this mode is derived from the general rotation transform:

xfin = xincosΘ - yinsinΘ (1)

yfin = yincosΘ = xinsinΘ (2)

236

Page 246: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC Implementation of Software-Defined Radio

The transform rotates a vector (xin, yin) in a Cartesian plane by an angle Θ to another vector with the coordinates (xfin, yfin). Rotation is achieved by performing a series of successively smaller elementary rotations Θ0, Θ1, Θ2 ... ΘN such that:

Figure 1 shows the case where rotation of a vector of magnitude 1 by an angle Θ is achieved using three elementary rotations Θ0, Θ1, and Θ2 .

Figure 1. Vector Rotation by an Angle Θi Using a Number of Steps

Rotation of the vector by an angle Θi can be rewritten as:

xi+1 = xicosΘi - yisinΘi (3)

yi+1 = yicosΘi - xisinΘi (4)

(5)

(6)

The computational complexity of equations 5 and 6 can be reduced by rewriting them as:

xi+1 = xi - yi tanΘi (7)

yi+1 = yi - xi tanΘi (8)

(9)

and performing the division by cosΘ together for all N iterations by dividing the value of (xN,yN) by:

Θ Θi

0

N

∑=

(cos(Θ), sin(Θ))Θ

Θ0

Θ1

Θ2

y

x

(1,0)

(x ,y )1 1

(x ,y )2 2

xi 1+

Θicos------------- xi yi Θitan–=

yi 1+

Θicos------------- yi xi Θitan–=

xfin yfin,( )xn

Θicos1

N

∏-------------------

yn

Θicos1

N

∏-------------------,=

237

Page 247: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Further, the value of Θi for i =1, 2, ... N is chosen such that tanΘi is 2-i. Table 1 shows the values of the angles for i = 0 to 9.

This process reduces the multiplication by the tanΘi to a simple shift operation. As the iteration increases, Θi becomes smaller and smaller. We terminate the iteration when the difference between Θi and the sum of Θi from 1 to N becomes very small for some value of N. The remaining angle by which the vector must be rotated after the completion of i iterations is indicated by the parameter zi+1 as defined by equation 10.

Zi+1 = Zi - Θi (10)

Z0 = Θ (11)

Θi is considered to be positive when the rotation required is counter-clockwise and negative otherwise.

To approximate an arbitrary angle using Θi of the form tan-1(2-i), Θi may be negative for some values of i. For example, to approximate 50, we choose Θi as 45, 26.5, -14 and, -7.1 in the first four iterations. (The actual sum of these angles is 50.4.) The sign (sgn) of zi indicates whether, in the next iteration, the

rotation should be counter-clockwise or clockwise. Because, tanΘi is +2-i when Θi is positive and -2-i otherwise, the iterative equations may be rewitten as:

δi = sgn (zi) (12)

xi+1 = xi - δi yi 2-i (13)

yi+1 = yi + δi xi 2-i (14)

zi+1 = z1 - δi tan-1 ( 2-i ) (15)

Table 1. Values of Θi = tan-1 (2-i)s

i Θi tanΘi

o 45 1

1 26.5 0.5

2 14 0.25

3 7.1 0.125

4 3.57 0.0625

5 1.78 0.03125

6 0.895 0.015625

7 0.4476 0.0078125

8 0.2238 0.00390625

9 0.1119 0.001953125

Θicos1

N

238

Page 248: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC Implementation of Software-Defined Radio

The computation of

can be simplified. Because cosΘi = 1 for very small values of Θi, the equation can be computed for N = 6 (therefore K = 0.6073), and can be used for any other value of N > 6.

CORDIC Vector ModeIn this mode, an initial vector with the x,y coordinates of (xin,yin) is rotated such that its y coordinate becomes zero. The procedure used for rotation may be adopted for vector mode with the following modifications: rotation is carried out clockwise (so that the y coordinate can be made 0), and the total angle by which the vector has been rotated from the initial position after i rotation is indicated by the parameter Zi+1 and Z0 is defined as 0. See equations 16 through 19. To obtain equation 16, when yi is positive, the rotation is clockwise in the next iteration. When it is negative, it is counter-clockwise.

δ i = -sgn (yi) (16)

xi+1= xi - δ i yi 2-i (17)

yi+1 = yi + δi xi 2-i (18)

zi+1 = zi - δi tan-1 (2-i) (19)

As i becomes large, yi goes to 0 and xfin, the magnitude of the vector after N iterations and Zfin, the angle of the vector are obtained as:

(20)

(21)

CORDIC as a Universal DemodulatorIf the carrier frequency, amplitude, and phase of the received signal are fi , 2b(t), and (t) respectively, then the received signal r(t) is given by:

r(t) = 2b(t)sin(i t + (t) ) (22)

One approach to demodulating the signal is generating the in-phase I(t) signal and quadrature signal Q(t) given by:

I(t) = b(t{sin[2(fi – fo)t + (t)] (23)

Q(t) = b(t){cos[2(fi – fo)t + (t)] (24)

Θicos1

N

xfinxn

θ icosN

1

∏-------------------=

zfiny0x0-----⎝ ⎠

⎛ ⎞1–

tan=

239

Page 249: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

using a local oscillator of frequency fo with a known initial phase.

The in-phase and quadrature signals can be fed to the two inputs (xin,yin) of the CORDIC operated in rotation mode. The magnitude of the vector xfin gives the demodulated signal corresponding to amplitude modulation as:

(25)

Similarly, Zfin, the angle of the vector, gives the phase shift introduced into the carrier and is given by:

(26)

Differentiating equation 26, the instantaneous frequency of the carrier can be found.

Generating In-Phase and Quadrature Signals Using Quadrature Mixers Using the quadrature mixers shown in Figure 2, I (t) and Q (t) can be generated.

Figure 2. Generating In-Phase and Quadrature Signals

In this scheme, the input signal is divided into two in-phase paths and the local oscillator signals applied to the two mixers are 90-degree out of phase. The outputs of the two mixers are:

Vif1 = 2b(t)sin(2fit + (t)) cos(2fot) (27)

= b(t){sin[2(fi – fo)t + (t)] + sin[2(fi +fo)t + (t)]) )

vif2 = 2b(t) sin (2fit + (t)) sin 2fot) (28)

= b (t){cos [2(fi – fo)t + (t)]-cos[2(fi + fo)t + (t)]} (29)

The desired I and Q components are obtained using low-pass filters, which filter out the high-frequency signals represented by the fi+f0 terms in equations 27 and 29.

A special sampling scheme generates the I and Q components without using quadrature mixers. In this case, if the local oscillator frequency is f0, r(t) is sampled at a rate of fs = 4f0. If Vif1, Vif2, and r (t) are sampled at a rate of fs = 4f0, then:

t = nTs = n /4f0 (30)

xfin I2 t( ) Q2 t( )+ b t( )= =

ZfinQ t( )I t( )-----------⎝ ⎠

⎛ ⎞ 1–tan

2Π fi f0–( )t Θ t( )+[ ]sin2Π fi f0–( )t Θ t( )+[ ]cos

-----------------------------------------------------------⎝ ⎠⎛ ⎞

1–tan fi f0–( )t Θ t( )+= = =

cosω t0

sinω t0

r(t)

I(t)

Q(t)

240

Page 250: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC Implementation of Software-Defined Radio

2tf0 = 2 (nTs) f0 = 2n /4 = n/2 (31)

If we substitute equation 31 in equations 1, 27, and 29, the output sequence at different sampling instants are as shown in Table 2. Table 2 shows the samples of the received signal, in-phase channel, and quadrature channel signals at different sampling instants for f0 fi.

From Table 1, we can verify that the in-phase and quadrature components can be generated by alternately passing r(t) to one of the two channels and inserting zeros alternately to each channel as shown in Figure 3. We obtain the desired I and Q components using low-pass filters that filter out the high-frequency signals represented by the fi + f0 terms in equation 29. Let us denote the value of r(t), I(t), Q(t) at t = nTs as r(n), I(n), and Q(n), respectively. Then, I(n) and Q(n) can be written as:

I(n) = r(0), 0, -r(2), 0, r(4), ...

Q(n) = 0, r(1), 0, -r(3), 0, r(5), ...

What happens if the local oscillator frequency f0 is the same as the input frequency fi? In this case, fs can be 4f0. Table 3 shows r(t), I(t), and Q(t) at various sampling instants for f0 = fi.

Therefore, we can generate the in-phase and quadrature components without using mixers, at the expense of a higher sampling rate.

Demodulator ImplementationTwo mutually complimentary clocks are fed to the selection lines of both multiplexers operated at four times the IF signal as defined by our special sampling theorem. I and Q are given as the x and y inputs to the CORDIC, which is operating in vector mode. The demodulated signal can be fed to the PC and displayed. See Figure 3.

Table 2. Signal Samples (Sample at t = tn = nTs)

For n = 0 1 2 3 4R(t) b(0)sin(Θ) b(t1 )sin(2Π fi t 1+ Θ) -b(t2 )sin(2Π fi t 2 + Θ) b(t3 )sin(Π2 fi t 3 + Θ) b(t4 )sin(2Π fi t 4 + Θ)

Vi f 1 b(0)sin(Θ) 0 -b(t2 )sin(2Π fi t 2 + Θ) 0 b(t4 )sin(2Π fi t 4 + Θ)

Vi f 2 0 b(t1 )sin(2Π fi t 1+ Θ) 0 -b(t3 )sin(2Π fi t 3+ Θ) 0

Table 3. r(t), I(t) and Q(t) ((Sample at t = tn = nTs))

For n = 0 1 2 3 4R(t) b(0)sin(Θ) b(t1 )cos(Θ) -b(t2 )sin(Θ) -b(t3 )cos(Θ) b(t4 )sin(Θ)

Vif1 b(0)sin(Θ) 0 - b(t2 )sin(Θ) 0 b(t4 )sin(Θ)

Vif2 0 b(t1 )cos(Θ) 0 -b(t3 )cos(Θ) 0

241

Page 251: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 3. Generating In-Phase and Quadrature Signals Using Special Sampling Scheme and CORDIC

Advantages of Using the Nios II ProcessorWe chose the Nios II processor for the following reasons:

■ Based on the Nios II processor, we can enhance system performance by adjusting the Avalon® switch fabric.

■ The customizable Altera Nios II processor has high performance and supports flexible product development at low cost.

■ The Nios II processor’s customizable instruction set can accommodate complicated arithmetic operations. Additionally, it can accelerate algorithm processing, which provides faster execution than implementing these operations in software.

■ A configurable design enhances system performance.

■ The μC/OS-II and RTOS with the Nios II IDE are very user friendly.

■ Implementing the processor, peripherals, memory, and I/O interface in a single FPGA reduces the total system cost.

■ We could implement the system rapidly.We could go from the original concept design to system implementation in a short time with the Nios II processor. Additionally, we could easily upgrade the hardware and software on site. This flexibility allows us to design products in-line with the latest specifications and equipped with new features.

■ The Nios II processor can be customized and reconfigured. For example, it supports three processor cores, peripherals, the Avalon switch fabric, custom instructions, and hardware acceleration. All of these functions can be implemented using commonly available Altera FPGAs.

■ Using IP optimized for the FPGA architecture, we can redesign standard functions easily, rapidly customize hardware peripherals, focus on design partitioning, and improve our design knowledge.

■ Integrated development kits, the Quartus® II software, SOPC Builder, ModelSim®-Altera software, and SignalTap® II embedded logic analyzer provide a complete set of test and debug tools for hardware design. With the Nios II IDE, it is possible to simplify software design and all software development tasks, such as program editing and debugging.

r(t)

I

Q

I1I0

I1I0

0

0

CORDIC B(t)

242

Page 252: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC Implementation of Software-Defined Radio

Design ArchitectureThis section describes our design architecture.

Hardware DesignA signal is received from an antenna and is fed to the RF amplifier. Its output is mixed with the local oscillator signal in the mixer. All frequency components, except IF (455 KHz), are filtered out by the band-pass filter (BPF). The IF signal is digitized by the analog-to-digital converter (ADC) and is fed to the demodulator that is implemented on the FPGA. The demodulator reconstructs the message signal and feeds it to the digital-to-analog converter (DAC), which drives a loudspeaker. Figure 4 shows the SDR block diagram.

Figure 4. SDR Block Diagram

Software Flow ChartThe demodulator output is called by a Nios II custom instruction. The demodulator output is available in the Nios II processor after it has been plotted in MATLAB. Figure 5 shows the software flow chart.

RF Amplifier Mixer Band-Pass Filter A/D Converter

Local Oscillator

Demodulator D/A Converter

243

Page 253: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 5. Software Flow Chart

Design DescriptionThe SDR implementation involves two design phases:

■ Frequency translation

■ Message recovery

We implemented the first phase using the MATLAB software and the second phase in an FPGA using Altera tools. The amplitude modulated signal is generated and digitized in the Simulink (MATLAB) software before being stored in RAM. These data values are fed to the demodulator, which is called by the Nios II processer using a custom instruction. The RAM address is incremented for every call.

The custom instruction consists of two essential elements (see Figure 6):

■ Custom logic block—Hardware that performs the user-defined operation. The Nios II processor can include up to five user-defined custom logic blocks. These blocks become part of the Nios II microprocessor’s arithmetic logic unit (ALU).

■ Software macro—Allows the system designer to access the custom logic from the software code.

Start

Stop

Initialize the Memory Address to 0

If Address = Max

Provide the Data Samples to CORDIC

Read the CORDIC Output & Displayon the Screen

Increment the Address by 1

Yes

No

244

Page 254: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC Implementation of Software-Defined Radio

Figure 6. Adding Custom Logic to the Nios II ALU

The demodulator is based on a special sampling theorem and the CORDIC algorithm. Modulated samples are split into in-phase and quadrature components with multipliers. These components are fed as x and y inputs of the rectangular-to-polar coordinate converter (cordic hardware). The rectangular-to-polar converter, RAM, and multiplexers are implemented using Verilog HDL. The Verilog HDL demodulator implementation block, along with M4K blocks that contain the sampled AM signal values, is downloaded into the FPGA.

In a software-defined radio, the amplitude modulation (AM), frequency modulation (FM), phase modulaton (PM), phase shift keying (PSK), and frequency shift keying (FSK) modulation types and the local oscillator frequency can be varied using software. The variable local oscillator frequency can be implemented using CORDIC and a phase accumulator; however, the CORDIC algorithm cannot be used to generate high-frequency signals. Instead, we use programmable logic (multiplexers) to generate the high-frequency signals. The stable signal frequencies, on the order of tens of MHz, can be generated efficiently using the Quartus II LogicLock™ feature and programmable logic.

Figure 7 shows the design flow.

Figure 7. Design Flow

Nios II Embedded Processor

+--

&

<<>>

Out

result

A

dataa

Nios IIALU

B

databCustom Logic

To FIFO, Memory, or Other Logic

Frequency Translation

Digitizing IF Signal

Demodulation on FPGA

Output Display

Receive the Signal

245

Page 255: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Experiments and ResultsTo test the design, we applied two amplitude modulated signals to the design (a carrier of 1 MHz is modulated with two message signals of 1 KHz and 2 KHz, respectively). Figures 8 and 9 show the two modulated signals.

Figure 8. Amplitude Modulated Waveforms (fc = 1 MHz , fm = 1 KHz)

246

Page 256: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC Implementation of Software-Defined Radio

Figure 9. Amplitude Modulated Waveforms (fc = 1 MHz , fm = 2 KHz)

The design generated perfectly reconstructed (i.e., demodulated) message signals from the two modulated signals, as shown in Figure 10.

247

Page 257: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 10. 1 KHz and 2 KHz Demodulated Waveforms

Performance ParametersTable 4 shows the design’s resource usage.

Table 4. Design Resource Usage

Parameter ValueDevice EP2C35F672C6 (Cyclone II)

Tool Quartus II verson 5.1

Total logic elements (LEs) 3,154 of an available 33,216 (9%)

Total memory bits 181,760 of an available 483,840 (38%)

Total number of phase-locked loops (PLLs) One quarter (25%)

Total registers 1,860

248

Page 258: Nios II Embedded Processor Design Contest - Outstanding - Altera

SOPC Implementation of Software-Defined Radio

Design FeaturesOur design has the following features:

■ Implementing a demodulator for real-time applications involves mathematical operations instructed by the CORDIC algorithm. We efficiently implemented our design using SOPC concepts.

■ Because the Nios II processor can be reconfigured, we can enhance the design in the future to meet the requirements of real-time applications with very low development costs.

■ The Nios II processor enables optimum hardware/software development by executing more computationally intensive tasks in hardware and the remaining tasks in software.

■ The Nios II embedded processor delivers good price/performance and lowers technical development hurdles. We use the Nios II processor to implement the processor, peripherals, memory, and I/O interface on a single FPGA, which helped to reduce the total system cost. Additionally, Altera’s comprehensive development tools and optimized IP functions reduced our software development cost, letting us focus more attention on the design details of the software-defined radio.

Future DevelopmentWe plan to develop our design in the following areas in the future:

■ Add various modulation techniques, such as FM, amplitude shift keying (ASK), PSK, and FSK.

■ Add support for variable frequency transmission, which is required for military applications.

■ Enhance the design such that multiple software modules implementing different standards on the same system co-exist. This support will allow dynamic system reconfiguration by simply selecting the appropriate software module.

ConclusionWe thank Altera for having the contest and acknowledge their support when we had design problems. During the design we learned that the Nios II processor complies with the emerging trend of industrial technology software such as hardware design. Using the Nios II processor, we reduced development costs, material costs, and turnaround times, improving our competitiveness. The Nios II processor and its development platform provide flexibility and an enhanced, robust system. Using custom instructions with the Nios II processor is an added advantage for design customization. On the whole, using SOPC concepts allowed us to create a more flexible, dynamically reconfigurable, and computationally intensive implementation.

249

Page 259: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

ReferencesG. Seetharaman, B.Venkataramani, V. Amudha, Anurag Saundattikar, “System on chip implementation of 2D DWT using lifting scheme,” Procedures of the International Asia and South Pacific Conference on Embedded SOCs (ASPICES 2005), (July 5 - 8, 2005), Bangalore.

Walter Tuttlebee, “Software Defined Radio,” John Wiley & Sons Ltd., (2004).

Jeffrey H. Reed, “Software Radio,” Pearson Education Pte Ltd., India, (2002).

James Tsui, “Digital Techniques for Wide Band Trasmission,” Artech House Publishers, (1995).

Ray Andraka, “A survey of cordic algorithms for FPGA based computers,” International Symposium on Field Programmable Gate Arrays 1998 Proceedings, (1998): 191 - 200.

250

Page 260: Nios II Embedded Processor Design Contest - Outstanding - Altera

Industrial Applications

Industrial Applications

This section includes projects that can be used as industrial applications, including:

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer..................................... 253

Black Box for Robot Manipulation................................................................................................... 285

Digital Oscillograph .......................................................................................................................... 295

Nios II-Based Air-Jet Loom Control System.................................................................................... 325

Nios II-Based Audio-Controlled Digital Oscillograph ..................................................................... 337

251

Page 261: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

252

Page 262: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

First Prize

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

Institution: Huazhong University of Science and Technology

Participants: Lian Zeng, Yong Li, and Hong-mei Zhu

Instructor: Jie Luo

Design IntroductionWith the rapid development of digital technology, people enjoy the benefits of updated digital products and systems. System design, debug, and test instruments have expanded from simulated digital oscillographs and large logic analyzers to portable multi-function instruments that enable network resource sharing.

Project BackgroundTwo types of logic analyzers are available in the market today. Agilent and Tektronix provide high-end desktop instruments that use Intel Pentium processors, run the Windows XP Professional operating system, and feature powerful functions. For example, the TLA7000 series supports 136 channels, and has a sampling frequency up to 8 GHz, clock speeds up to 800 MHz, and a memory depth of 16 Kbits. Agilent’s product series is similar. These desktop products are very expensive, for example, more than 100,000 Yuan. Most engineers will not choose such a high-performance, expensive logic analyzer.

The other available logic analyzer is a simple data collection card (using a USB or PCI interface) based on a PC. When the user runs data analysis software on the PC, the card analyzes logic. The Flyto Electronics Co., Ltd. Beijing L-800 product uses a USB interface, supports 24 channels to collect and input data, and is priced about 9,900 Yuan. Agilent’s latest 16900 series product is also based on a PC, runs on the Windows XP Professional multi-threaded, multi-processor architecture, and has 68 input channels. It sells at about 20,000 Yuan. These products satisfy most engineering requirements at a comparatively low price. However, these analyzers are not easy to carry and must be used with a high-performance PC.

Logic analyzers are widely used in industry, business, education, and even military fields. With so many applications, engineers require a portable logic analyzer that supports cooperation and sharing to meet their needs.

253

Page 263: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

The Nios® II-based multi-functional logic analyzer uses FPGA hardware to collect and store data. It uses a Nios II processor for interaction, control, and communication, and also supports a VGA monitor and remote network sharing. This analyzer meets most engineers’ requirements, costs less than 2,000 Yuan, is portable, and supports remote sharing.

Design Objectives This paper discusses our work creating the logic analyzer. With the Altera® Development and Education (DE2) board, Nios II soft-core processor, and system-on-a-programmable-chip (SOPC) design concepts, we used a single FPGA to implement the logic analyzer. With the Nios II processor, this analyzer can run a dual-CPU core, enabling an off-line VGA monitor and a PC as the network monitor. Additionally, the analyzer uses its powerful processing and storage function to analyze varied protocols. It uses a network interface to transmit data, and to implement remote assistant measurement and multiple user collaborative analysis.

The analyzer supports parallel collections, and provides 32-channel signal analysis and storage in the FPGA. The sampling velocity is 100 megasamples per second (MSPS), the storage depth is 1 Kbit, and the data bandwidth is 33 megasamples. The Nios II CPU delivers the processed data to the VGA interface for the LCD display and also delivers it to the PC via the Ethernet interface. The design can save the waveform as a document to ease analysis and data comparisons. The analyzer supports serial protocol analysis, including I2C, serial protocol 2 (SP2), serial peripheral interface (SPI), and UART.

Functional DescriptionThis section describes the project functionality.

■ Implementation of the signals’ parallel collection, real-time analysis, and synchronous storage with the FPGA—Most domestic logic analyzers use an embedded processor to collect and analyze data. Software serial collection is so slow that the data analysis severely lags behind the data storage. In contrast, the FPGA hardware processing signal enables the synchronization of data collection, analysis, and storage, achieving high-speed collection, real-time analysis, and synchronous storage.

This design uses an Altera Cyclone® II EP2C35F672C6 device, with the overall clock frequency at about 150 MHz. To ensure that the FPGA runs smoothly, we used a 100-MHz clock to collect the data, analyze the collected data’s real-time triggering condition, and save the data into a FIFO buffer with the specified storage depth. If conditions permit, the trigger core notifies the FIFO buffer to read data according to the trigger point. The FIFO buffer then locks the data for the Nios II processor to read. The FPGA’s processing mechanism is far superior to the embedded processor’s and has the same performance.

■ Blending measurements of varied levels—Logic analyzers measure logic levels. Varied logic levels account for the differences when defining CMOS high and low levels. For example, the TTL level is usually supplied by 5-V power, with a logic high of 3.5 to 5.0 V and a logic low of less than 0.4 V. The CMOS logic high is about 80% of VDD, logic low is about 20% of VDD, and the midpoint is at 50% of VDD. The CMOS power supply voltage can be less than 5 V, e.g., 3.3 or 1.8 V. We use a distributed amplifier (DA) and a comparator to convert the level, using the DA to alter the comparator’s voltage reference and translating various levels into standard TTL.

■ Friendly user input—Local users can use a touch screen to input triggering conditions instead of using a keyboard and mouse. Using the Nios II processor to control the touch screen ensures the screen’s real-time response.

254

Page 264: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

■ VGA color monitor support—The DE2 development board provides VGA demo code that supports one-bit color (i.e., bottom color and top color). The board uses on-chip RAM as the graphic memory, covering 640 x 480 = 307,200 RAM bits, which is about 65% of the available RAM resources. The channel storage capacity is not sufficient, and one-bit color is too dull to be suitable for viewing.

Instead, we use SRAM as the graphic memory, with 512 x 8 bits and 8-bit color. To display a frame requires reading SRAM data, the Avalon® bus streaming mode, and direct memory access (DMA) to update the VGA data and color image on the monitor in real time.

■ Ethernet support—The local color VGA terminal supports an Ethernet interface. A remote PC can act as a terminal via the Ethernet connection. Therefore, the measurements and monitoring are not limited by space, which is very convenient for cooperative work.

■ Synchronous operation and updating multiple terminals—When multiple terminals work simultaneously, if the logic analyzer operates on one terminal, the other terminals should be synchronously updated as well. If the logic analyzer stops, the user can view the operation settings.

■ Simple serial protocol analysis—When using the serial protocol, a clock matches data; therefore, the logic analyzer can only show a waveform. Detailed data cannot be read. Our design analyzes four serial protocols—PS2, UART, I2C, and SPI. Users can pick any of them as the test signal for which to select channels and simple settings for the corresponding signals. Aside from the tested signals, the logic analyzer also displays the data waveform in the protocol. This operation helps users analyze the serial protocol.

■ Real-time voice counting—When analyzing the protocol, the system uses the DE2 board’s audio interface to report the protocol data in real time with sound. Meanwhile the system activates the sound card on the remote terminal to report the protocol voice. The protocol voice counting helps users get the testing result, making the design more cooperative and more universal.

255

Page 265: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Technical SpecificationsOur design has the following technical specifications:

Design ArchitectureThis section describes the project architecture.

Implementation TheoryBecause of the system’s performance parameters and functional requirements, we decided to use the DE2 development board and relevant technology to develop the system. The Nios II processor helps solve complex problems such as limited hardware processor peripheral resources, complex interface configuration, hardware design, and software programs. Using dual CPUs effectively controls and reasonably allocates access to external devices, and helps us meet our functional requirements via the bus protocol communication between the CPUs. Therefore, the system makes full use of the chips, which greatly improves the operational efficiency. Because the analyzer frequently accesses the external storage, we define the external devices so that storage access and data transfer are simple. Using the FPGA’s simultaneous operation, the logic analyzer’s sampling speed is greatly improved. Because the VGA terminal requires real-time data, we use DMA data transfers. Altera’s SOPC solution provides the advantage of multiple processors, FPGA logic control, and data processing.

Design OutlineWe used the following general steps when creating our design.

1. Analyze system demand according to the analyzer’s technical parameters and functional index.

2. Refer to the demand analysis to define the system modules and overall design structure and clarify key technology.

3. Consult the DE2 development board examples and documents to learn how to connect with external devices.

Technical Specifications

Specification ValueSampling parameter The highest frequency is 100 MHz (limited by the component’s speed)

Threshold voltage 0 to 5 V

Input voltage 0 to 5 V

Triggering conditions The four triggering conditions include debugging up edge, debugging down edge, high level, and low level

Input channel 32 routes

Triggering position Front, middle, and backward

Triggering level 12

Storage depth 1 Kbyte at most (limited by the size of the storage)

Displaying mode VGA and PC

Clock mode Internal clock and external clock

External clock velocity 100 Hz to 100 MHz

Internal clock velocity 100 MHz

Operating environment Windows 2000 and Windows XP

Connection mode Internet

256

Page 266: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

4. Perform the system hardware design. Draw diagrams of the external expansion circuit, PCB and FPGA hardware logic, and Nios II dual-CPU design.

5. Implement the system software, including the Nios II software and PC interface.

6. Perform system debugging and testing.

Project PlanThe following table shows the project schedule.

Hardware DesignThis section describes the hardware design.

Hardware ArchitectureAs shown in Figure 1, the setting information is sent to the kernel CPU from the PC via the network, or is sent to the FPGA logic from the touch screen via a human-machine interaction CPU. The external signal is sent to the FPGA logic section through the comparer, whose threshold is set by the kernel CPU. Then, the FPGA logic sends the signal to the kernel CPU for processing. Next, the signal is sent to the PC via the network interface, while the data is written to the public memory (this action notifies the human-machine interaction CPU to read the data). The human-machine interaction CPU writes the data into SRAM, and the SRAM sends the data to the VGA terminal through a FIFO cache controlled via DMA. During protocol testing, the kernel CPU sends the processed signal and opens the FPGA-driven audio driver to report the analyzed data.

Project Schedule

Time Phase Specific Tasks2006-4-14 to 2006-5-09 Demand Analysis Clarify system performance parameters and functional

requirements. Master FPGA/Nios II technology.

2006-5-10 to 2006-6-01 Overall Design Create the system’s overall design and module division.

2006-6-02 to 2006-7-30 Software and Hardware Design

Design the external expansion circuit. Draw the PCB, design FPGA logic section, and Nios II CPU and interface.

2006-8-01 to 2006-8-20 System Testing Create the design testing scheme, set up a testing plan, and perform the test according to the function and performance index.

2006-8-01 to 2006-8-25 Write Design Report Write the design report with reference to the design documents and system implementation.

257

Page 267: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 1. Hardware Architecture

Module DescriptionThe hardware has two parts: the logic circuit (including data collection, trigger analysis, data storage, external clock frequency measurement, and voice processing) and the Nios II processor (including the kernel CPU and human-machine interaction CPU, which controls the external devices).

Logic SectionThe logic circuit is described in the following sections.

Data Collection ModuleThere are various types of decision voltages for high- or low-level external logic signals. To make the system adaptive to various signals, the tested signals go through level conversion, which converts various levels into the TTL level to input to the FPGA. We use a DA+ comparer to implement level conversion. We used the MAX516 with the 8-bit DA+ comparer to make the conversion. The user sets up the decision voltage. After converting the data according to the following formula, the Nios II processor delivers the converted data to the DA.

N = VDAC/VREF x 256 (1-1)

The voltage converted by the DA’s output goes to the comparer’s comparing side to become the comparer’s voltage reference. If the external signal voltage is higher than the voltage reference, it is considered a logic high, which is output at a high level. If the voltage is lower than the voltage reference, it is considered a logic low, which is output at a low level. According to the sampling clock, the FPGA simultaneously collects the 32 route signals output by the comparer.

FPGA

Master CPU

DMA

FPGALogic

Compare

Compare

Compare

CompareVGA

PC

LCD

Human-MachineInteraction CPU

UserLogic

UserLogic

UserLogic

UserLogic

UserLogic

UserLogic

UserLogic

UserLogic

UserLogic

UserLogic

Flash

Internet

Bus

Headset

D/A

MutexOn-ChipMemory

FIFO

Bus

SDRAM

SRAMUserLogic

258

Page 268: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

Trigger Analysis ModuleThe Nios II processor sends the triggering information to triggering core. Then the FPGA sends the collected signals into the triggering core to make a judgment and outputs a pulse to notify the storage device when it has satisfied all trigger conditions.

The triggering information includes a mask word and trigger conditions. The mask word is made up of 1s and 0s such that a triggering condition is 1 and the absence of a trigger condition is 0. The system exchanges the input signal and the mask word and compares the result. If they are the same, the trigger activates, otherwise the system waits.

There are four trigger conditions: rising edge triggering, falling edge triggering, high-level triggering, and low-level triggering. To determine which condition is occuring, the system needs at least two consecutive clock cycles of data. The trigger conditions on each level have two conditional distinguishing words, and the triggering core uses two comparisons to decide whether to trigger. The decision data of the rising edge triggering is 10, and that of the falling edge triggering is 01, the high level is 11, and low level is 00.

The triggering in this design is scaled into 12 levels. The system judges the trigger condition of the next level after the current trigger requirements are satisfied. For example, the triggering core sends out trigger signals when all conditions are satisfied. Due to uncertain intervals between the triggering on different levels, the trigger point for multi-level triggering is set backwards so that the system sees as many triggering processes as possible.

Data Storage ModuleCollected data is stored into the device’s FIFO buffer, which is a fixed size. If the amount of data in the FIFO buffer is more than the depth, the data entering the FIFO buffer first is read immediately so that the data stored in the FIFO buffer is the size of the storage depth. If the triggering core sends out a pulse satisfying the trigger condition, the FIFO buffer rewrites data as required by the position of the trigger point. Then, it stops storing data and notifies the kernel CPU that it is reading the data.

Measure External Clock FrequencyThis design uses equal precision frequency measurement, which enables a broad clock frequency scope from 100 MHz (high frequency) to 1 KHz (low frequency). If the time on the actual gate is integer multiples of the tested signal frequency, this method eliminates the ±1 word error produced when the tested signal is counted, and ensures accurate frequency. Figure 2 shows how this operation works. Figure 3 shows the timing simulation.

Figure 2. Equal Precision Frequency Measurement Operation

Preset Gating Signal (SWITCH)

Standard Frequency Signal (CLK)

Tested Signal (SIGIN)

Return-to-Zero Signal (CLEAR)

D Q

CLKEN

CLK

CLR

CLKEN

CLK

CLR

CNT1

CNT2

OUT1

OUT2

259

Page 269: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 3. Equal Precision Frequency Measurement Timing Simulation

After the gate starting signal is given (the rising edge of the SWITCH preset gating signal), the CNT1 and CNT2 counters begin after the rising edge of the tested signal SIGIN. When the preset gate shutting signal (falling edge of the SWITCH gating signal) is given, the counter stops counting after the rising edge of the SIGIN tested signal, completing the measurement. At this time, the CNT1 and CNT2 values (OUT1 and OUT2, respectively) can be read. We divide the SIGIN tested signal’s value, OUT2, by the standard CLK signal’s value, OUT1, and then multiply by the standard CLK signal’s frequency to obtain the SIGIN tested signal’s frequency.

Voice ModuleThe system delivers a pre-recorded voice file (.wav) to the audio digital-to-analog converter (DAC) for display, according to the protocol analysis data. The voice module implements the DAC configuration, audio data transfer, and the audio data cache. The audio DAC uses the WM8731 device, and is configured via the I2C bus. Due to fixed configuration data, the DAC configures the WM8731 device with a hardware logic simulation of the I2C timing.

The configured audio’s sampling rate is 8 K, 16 bits. The Nios II processor can transfer the audio .wav file faster than the sampling rate. Therefore, we added a data cache FIFO buffer to match the speed of the DAC and the processor.

CPUsOur system has a variety of complex processing. The VGA module displays a lot of data in real time, uses the Avalon bus’s streaming mode, and uses frequent DMA operations. Additionally, the collected data is complex and the network interface interruption operation adds to the system complexity. Therefore, we decided to implement the system with two CPUs: the kernel CPU and human-machine interaction CPU. The kernel CPU processes the collected data and network data, and the human-machine interaction CPU displays the VGA data and controls the touch screen.

Kernel CPUFigure 4 shows the kernel CPU SOPC Builder settings.

260

Page 270: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

Figure 4. Kernel CPU SOPC Builder Settings

The kernel CPU is composed of the bottom data-exchanging interface, the communication interface between the network interface and PC, and other parts, as described in the following sections.

The bottom data exchanging interface is divided into the following sections:

■ Nios II interface for sending setting information—This interface to the user logic defines user_ram, and delivers all setting information to the triggering core and FIFO buffer. This interface has 7 address signals, 32 data signals, and one write signal, which can deliver 64-bit data.

■ Read interface for storing FIFO data—After saving the required data, the FIFO buffer produces the interrupt signal rd_enable, which notifies the kernel CPU to read the data. When the system finishes reading all data in the FIFO buffer, the FIFO buffer issues the interrupt signal RD_EMPTY_ENABLE, which notifies the kernel CPU to stop reading. The data read interface user_fifo connects the Nios II processor and the logic circuit. It has 32 data signals and one read signal, and delivers the data from the FIFO buffer to the Nios II processor.

■ Output interface of the sampling circuit decision voltage—The sampling circuit decision voltage’s output user interface DAC is defined by the user logic interface. The external circuit is a bus interface. Figure 5 shows the design.

The five address signals are the chip select (CS) of the 32-route DAC, and the 8-bit DAC uses 8 data signals. The write signal is a valid low level. Figure 6 shows the MAX516 timing.

If CS is low, the system locks and saves the data on the data signals when the write signal is on the falling edge. The scope of tAS is 25 to 50 ns, the scope of tWR is 50 to 120 ns, and the scope of tDS is 20 to 50 ns. The time required to establish the signal is 30 ns, to wait is 80 ns, and to remain is 20 ns.

261

Page 271: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 5. Voltage Module Interface and Timing

Figure 6. MAX516 Timing

■ External frequency read interface—All external frequency read interfaces are composed of a programmable I/O (PIO), in which frestart is the equal precision frequency measurement module start signal that initiates the module. freou is frequency data, which is valid only when freoec is low. Therefore, it can be read only when freoec is low.

■ Audio data interface—The audio data interface is a group of bus interfaces that defines wr_audio and delivers the required audio data to the voice module for playing. This interface consists of 16 data signals and one write signal.

The DE2 development board integrates the DM9000A Ethernet interface control chips, which connect to the RJ-45 external port and link to the LAN via a 100B-TX network cable. Integrating 16 Kbytes of

AddressA0, A1

WR

CS

DATA IN

D7 - D0

Address Valid

Data Valid

t AS t WR t AH

t CS t CH

t CS t CH

262

Page 272: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

SRAM in the DM9000A chips ensures a data reception and delivery cache, and decreases the packet loss rate. Designers can choose from a wide range of chip performance parameters, for example, transmission velocity of 10 or 100 Mbytes, a power supply of 3.3 or 5 V, an interface of 10 M-BaseT in UDP3, 4, 5 or 100 M-BaseTX in UDP5. Considering the network data transmission velocity and size, and system compatibility, our design uses 100 megabits per second (Mbps) transmission velocity, 3.3 V, and 100 M-BaseTX in UDP5.

Figure 7 shows the read timing. The CS# and CMD signals are the chip select and control signals. IOW is a read signal, SD is a system data signal, and IO16 is a 16-bit transmit control signal. The read signal is low after the chip select is effective. The data signal is effective after the T2 time period. At T3, IO16 is a valid low level.

Figure 7. DM9000A Read Timing

Figure 8 shows the write timing. The timing signal names are the same as in Figure 7, and the write timing is almost the same as the read timing.

Figure 8. DM9000A Write Timing

We use the jtag_uart debug interface to debug the program. The following data and configuration is shared by the CPUs: SDRAM, flash, EPCS controller, volatile and non-volatile storage data, and programs.

Human-Machine Interaction CPUFigure 9 shows the human-machine interaction CPU.

T1 T5

T2T6

T3 T4

T7 T8

CS#, CMD

ICR#

SD

IO16

T1 T5

T2T6

T3

T4

T7T8

CS#, CMD

IOW#

SD

IO16

263

Page 273: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 9. Human-Machine Interaction CPU SOPC Builder Settings

The human-machine interaction CPU consists of the touch screen input interface, VGA monitor output interface, and other parts. The touch screen controller uses the TI ADS7846 device to implement the following microcontroller signals: IRQ, DOUT, DIN, DCLK, CS, and BUSY. In Figure 9, IRQ, DOUT, and TOUCHOUT (DINL, DCLK, and CS) use the PIO interface to simulate the ADS7846 signal. See Figure 10 for the simulation sequence.

Figure 10. Touch Screen Sequence

When the touch screen changes its shape (i.e., the user clicks the touch screen), the ADS7846 device generates an interrupt signal through IRQ. After receiving the interrupt signal, the Nios II processor generates CS, sends the control word to the ADS7846 device, waits for the busy signal, and receives the touch screen coordinate signal, which finishes one read sequence.

264

Page 274: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

In practice, two coordinate signals are read to prevent false triggering. When the difference between the two signals is less than 20, the system assumes the trigger was correct. Otherwise, it considers it to be a false trigger.

The VGA monitor has 640 x 480 resolution and scans 60 frames per second. The pixel clock is 15.175 MHz, so the scan period of each point is about 40 ns. If every point is generated by the Nios II software scan, the 40-ns interval means that a Nios II CPU operating at less than 50 MHz can only execute two instructions.

According to VGA sequential scanning principles, the next scan point is predictable. So the system puts the pixel information to be sent in a line and sends it in an orderly fashion. Using the Avalon peripheral streaming mode, we implement the VGA controller with one Avalon stream. We used the DMA controller to build a DMA channel between the VGA controller with the streaming mode and SRAM. We use the hardware to read the pixel information automatically. Therefore, the Nios II CPU updates VGA images by operating the corresponding blocks in SRAM.

As shown previously in Figure 10, the dma_0, tri_state_bridge_0, sram, and user_logic_vga_controller_stream_0 signals form the VGA monitor. The SRAM connects to the Avalon bus with custom logic. The DMA read master connects to the tri-state bus (read data in SRAM); the DMA write master connects to the VGA custom logic.

As shown in Figure 11, the VGA control core with the Avalon streaming mode consists of three parts: a VGA timing generator, the FIFO buffer, and the Avalon streaming port. The VGA sequence consists of clock, horizontal and vertical synchronization (hs, vs), and color (R, G, B) signals. It complies with the VGA standard, i.e., the 640 x 480 pixels at 60 Hz. Specifically:

■ Clock frequency—25.175 Hz (pixel output frequency)

■ Line frequency—31,469 Hz

■ Field frequency—59.94 Hz (image refresh frequency per second)

The clock frequency is obtained by the 50-MHz phase-locked loops (PLLs). The line and field frequencies are obtained by frequency division of the VGA standard.

Figure 11. Avalon Streaming Mode

The FIFO buffer is mainly used as a buffer; read and write clocks are asynchronous. The Avalon streaming port generates bus interfaces and communicates with the FIFO buffer.

The jtag_uart debug interface is used to debug the program. Data and configuration shared by the dual CPUs are: SDRAM, flash, and the EPC controller, shared dual CPU cores, volatile and non-volatile memories, storage data, and programs.

Communication Between Dual CPUs■ Communication data—Users use the touch screen to enter trigger levels, trigger conditions,

storage depth, names of channels, etc., that are sent to the kernel CPU. The kernel CPU judges

FIFO

Avalon Stream Mode VGA Controller

Vga_clkHsVsRGB

ClkReset

EndofpacketReadyfordata

WriteChipselectAddress

Writedata[7:0]

Avalon S

treaming P

ort

VGASequence

Start

265

Page 275: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

whether they meet trigger conditions and transmits the waveform data of the qualified channels to the human-machine interaction CPU. The conditions and names are also sent back to the human-machine interaction CPU (a remote PC and the local VGA display must be synchronized).

■ Communication interface—When the user enter conditions via the human-machine interaction interface, the kernel CPU is notified. After setting conditions, the human-machine interaction CPU notifies the kernel CPU via a PIO interrupt. Likewise, when trigger conditions are met, the man-machine CPU needs a real-time response. The system displays the qualified waveform on the VGA terminal. The kernel CPU notifies the human-machine interaction CPU via a PIO interrupt. As shown in Figure 12, the interrupts are SET_COMPLETE_IRQ and TRIG_COMPLETE_IRQ.

■ Communication modes—The trigger level, trigger conditions, storage depth, and channel names require two-way communication. The data volume is not very large, so the chip’s mutex_ram can be used as shared memory to implement two-way communication. Both CPUs should read and write to the RAM, but not synchronously.

The waveform, bus, and protocol data only need to be transmitted from the kernel CPU to the human-machine interaction CPU. The data volume is large, so the system uses SDRAM as a shared memory. The kernel CPU writes data, and the man-machine CPU reads data, asynchronously.

Figure 12 shows hows the two CPUs communicate. CPU is the kernel CPU, CPU_IO is the man-machine CPU, SET_COMPLETE_IRQ and TRIG_COMPLETE_IRQ are the interrupt PIO, and onchip_mutex and sdram_0 are shared communication memory.

Figure 12. SOPC Builder Setting for Dual-Core Communication

The human-machine interaction CPU is specially designed for the peripheral. The touch screen input implements I/O control, while the VGA monitor is implemented via the DMA streaming mode, which is independent of the Nios II I/O subsystem. We implement the monitor by sharing memories with a lot of data communication with the kernel CPU, according to Altera-recommended design principles.

Software DesignThe software design has three parts: kernel CPU data processing, human-machine interaction CPU interface design, and the PC interface design. Figure 13 shows the software flow. The white boxes represent external interfaces, the gray boxes are the human-machine interaction CPU modules, the black boxes are the kernel CPU modules, and the light gray boxes are the FPGA logic.

266

Page 276: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

Figure 13. System Software Flow

As shown in Figure 13, setting information is entered and stored in SRAM through the PC or touch screen. It enters the trigger core via the trigger core settings module. After entering the trigger core in real time, sampling data is sent to the FIFO buffer. If the sampling data meets the requirements, the data read module reads and stores the data in SDRAM. The setting information and sampling data in SDRAM sends the data to the PC via the network interface, after passing the data transformation module, and then sends it to the human-machine interaction CPU display module.

Kernel CPUThe kernel CPU software design consists of four modules: the data exchange module, protocol analyzing module, network communications module, display and synchronization refresh module, and inter-core communications module. See Figure 14.

Data Reading Module

Data Sampling Interface

Data Processing Module

Network Data Sending Module

Display ModuleDisplay

Network Data Accept Module

Triggering Core Module

Sampling Data

Sampling Data

Sampling Data

Sampling Data + Setting Information

Channel or Protocol Data + Setting Information

Channel or Protocol Data

+ Setting Information

SamplingData

Channel or Protocol Data

+ Setting Information

Channel or Protocol Data + Setting Information

Setting Information

Setting InformationSDRAM

On-ChipMemory FIFO

Setting Information

267

Page 277: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 14. Kernel CPU Modules

Data Exchange ModuleThe kernel CPU and data exchange modules are divided into a Nios II module that sends setting information, a FIFO data read module, an output module of the sampling circuit decision voltage, and an external frequency read module.

■ Nios II module that sends setting information—Two display terminals can set the logic analyzer. If the PC sends out the start signal first, it sends, via the network interface, the settings package to the kernel CPU, which stores all conditions in the shared memory. If the human-machine interaction CPU sends out the start signal, it stores setting conditions in the shared memory and sends an interrupt signal to the kernel CPU. The kernel CPU reads the setting conditions from the shared memory and writes setting conditions to the interface address for sending settings information and sends them to the trigger core and the FIFO buffer.

■ FIFO data read module—After receiving the data, the FIFO buffer tells the kernel CPU to read them using interrupts. By reading the FIFO read interface address, the kernel CPU receives the data in the 32-channel FIFO buffer and stores them. When all data is read, the FIFO buffer tells the kernel CPU to stop reading using inerrupts, and puts all data in channels.

■ Output module of the sampling circuit decision voltage—When the two display terminals set the logic analyzer, the kernel CPU receives 32 bits of data from channel 1 through channel 32. The kernel CPU first judges whether the datum is 0; if it is 0, the channel is not selected and requires no settings. The CPU changes all other values in the corresponding DAC input and writes to the MAX516 device by writing the output address of the sampling circuit decision voltage.

■ External frequency read module—The module is active only when the external clock is set. When data processing is finished, the system examines the PIO. The freoec signal is used to give the frequency measurement module enough time to finish the measurement. When freoec is low, read its value.

Protocol Analyzer ModuleDuring protocol analysis, the logic analyzer automatically sets conditions that suit each protocol analysis. If qualified signals appear, the kernel CPU reads the data to analyze the software after the trigger core is triggered.

I2C ProtocolFigure 15 shows the I2C protocol.

FIFO

DA + Comparator

PC

Protocol analyzing module

PS2 Protocol

Analyzing Module

I2C Protocol

Analyzing Module

UART Protocol

Analyzing Module

SPI Protocol

Analyzing Module

Kernel CPU

FPGALogic Setup

Module

FrequencyMeasurement

Module

Module forNios II Processor

to Send SetupInformation

FIFO DataRead Module

Sampling CircuitDecision VoltageOutput Module

ExternalFrequency

Read Module

SynchronousDisplayModule

NetworkCommunications

Module

Between-CoreCommunications

Module

Human-MachineInteraction

CPU

Data ExchangeModule

268

Page 278: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

Figure 15. I2C Protocol

When SCL is high, the falling edge appears on SDA and data transmission begins. The 8-bit data sent to SDA is unchanged when SCL is high but changes at the low level. One response bit is followed after one byte is transmitted. Bytes sent in each transmission are not limited; several start signals are also allowed. The SCL signal generated after each transmission is high. A rising edge of the stop signal appears on SDA.

In the I2C protocol, the system exchanges the data and channel mask words. It leaves data on the SCL and SDA channels and sets all other bits to zero. Then, it checks the masked data start bits. When a start bit appears, it records the position of the datum and begins data detection. After detection, it records the detected data and its position. When the data is finished it completes detection. Finally, the system, examines end bits successively. If no end bit appears, it examines start bits, detects data, and records data and its position. The system repeats the process until all data is examined.

PS2 ProtocolFigure 16 shows the PS2 protocol.

Figure 16. PS2 Protocol

The maximum clock frequency is 33 kHz. All data are arranged in bytes. Each byte is one frame that contains 11 to 12 bits. The first bit is the start bit, which is always low. The second to ninth bits are 8 data bits; low levels are first and high levels follow. The tenth bit is the parity bit, which is odd. The eleventh bit is the stop bit, which is always high. The twelfth bit is the acknowledge bit, which only appears in the communication from the host to equipment. The data is locked on the falling edge of the clock signal.

In the PS2 protocol, the system exchanges the data and channel mask words. Data is left on the clock and data channels and all other bits are set to zero. The masked data examines the low level of the data channel in turn. When the data channel appears low, the system records the position and begins data detection. It records the detected data and checks whether the parity bit is correct. If the parity bit is correct and the end signal is detected, it records the detected datum and its position, and finishes detection. Finally, it detects the start bit and data, record data, and their positions. The system repeats the process until all data is examined.

DA

CL

MSB AcknowledgementSignal from Slave

AcknowledgementSignal from Receiver

Byte Complete,Interrupt within Slave

Clock Line Held Low WhileInterrupts Are Serviced

ACK ACK

P

Sr

SrorP

Stop orRepeated Start

Condition

1 2 3 - 8 998721

Start orRepeated Start

Condition

CLOCK

DATA

STA

RT

DA

TA0

DA

TA1

DA

TA2

DA

TA3

DA

TA4

DA

TA5

DA

TA6

DA

TA7

PA

RIT

Y

STO

P

269

Page 279: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

UART ProtocolFigure 17 shows the UART protocol.

Figure 17. UART Protocol

The baud rate is changeable and the number of data bits is not set. The low bit is first. The user can select the parity method and number of stop bits.

In the UART protocol, the system exchanges the data and channel mask words. Data is left on the transmit (TXd) or receive (RXd) channels and all other bits are set to zero. When the serial input signal has a level ranging from low to high, transmission begins. The system records the position of the datum. The baud rate can be set, so the data can be detected according to the relationship between the sampling frequency and set frequency and digits of data. The system verifies according to the data verification method. It confirms the position when the data ends according to the number of stop bits. It then records the detected data and positions when they end. The system repeats the process until all data is examined.

SPI ProtocolFigure 18 shows the SPI protocol.

Figure 18. SPI Protocol

Transmission begins when CS falls. During transmission, SCK generates 8 clock cycles as synchronous clocks. For the data transmission of 8 clocks between SDI and SDO, the high bit comes first and the low bit follows. Due to the difference of peripheral chips, some data is valid on the rising edge of SCK; others are valid on the falling edge. The maximum clock frequency can reach as high as 1.05 Mbps.

In the SPI protocol, the system exchanges the data and channel mask words. Data is left on the SCL, SDI, SDO, and CS channels and all other bits are set to zero. The masked data examines the low level of the CS channel. When a low appears on the CS channel, the system records the position of the datum and begins data detection. After 8 data are examined, it records them. When high appears on the CS channel, one byte transmission ends and the system records the position. The system repeats the process until all data is examined.

Network Communication ModuleThis section describes the network communication module.

Start Bit Data Bit Address Bit

Stop Bit

LSB MSB

0 1 2 3 4 5 6 7 Add

270

Page 280: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

Communication Theory IntroductionIn the system, the interface transfers data between the DE2 development board and a PC. The DE2 development board provides an Ethernet RJ-45 interface, which connects the switch or hub within the LAN with twisted pair (TP) wire. In this case, all PCs (wired or wireless) connected on the LAN can achieve easy, fast, reliable communication with the DE2 development board.

Design Standard UsedTo be universal, our system uses the 10/100 Mbps Ethernet control transfer protocol. Fast Ethernet is the new standard set by the IEEE802.3 LAN committee. It is the extension of the 10 Mbps Ethernet standard, but it can transfer and receive data at 100 Mbps speed, with a transfer media of TP or fiber optic cable. Fast Ethernet is compatible with 10 Mbps Ethernet environments, and can be upgraded directly without wasting the existing hardware and software resources of a company,

Fast Ethernet supports the carrier sense multiple access/collision detection (CSMA/CD) Ethernet protocol. Its working principle is that it will detect whether the line is free when a site wants to transfer the data packet. If the line is free, it transfers immediately; if the line is busy, it detects whether the line is free again after a random time and sends the data packet until the line detected is free. If two or more sites transfer on the free fiber optic cable simultaneously, a conflict occurs. In this case, all conflicting sites stop transferring data and repeat the above procedure after a random time. This operation guarantees LAN communication reliability and stability.

Network DesignBased on the existing LAN in the office and lab, our system establishes a communication network with the DE2 development board as the central node and other display terminals as sub-nodes. All PC terminals connected on the LAN could implement the logic analyzing function after installing the logic analyzer PC interface software. Multiple labs or research teams within a LAN could share a logic analyzer, which saves research equipment costs and improves the logic analyzer efficiency. When a PC is analyzing logic, other terminals could also establish a communication connection with the DE2 board and observe the output result, i.e., the analyzed result can be displayed on different terminals. See Figure 19.

Figure 19. Network Structure

Network Structure

Legend

LegendSubtitle

Description

Switch

WirelessAccessPoint

Hub

Terminal

PDA

Laptop

LCD

New iMac

A/B Switch Box

Symbols Counting

Logic Analyzer

271

Page 281: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Protocol DesignThe protocol stack design has the following layers (see Figure 20):

■ Physical layer

■ Medium access control (MAC) layer. It transfers the MAC frame, which adopts the 802.3 Ethernet standard to implement reliable end-to-end communication.

■ Application layer. It encapsulates the application layer for data to be transferred and calls the service on the MAC layer to transfer the data.

Figure 20. Communication Protocol Stack

The 802.3 Ethernet frame structure begins with a 7-byte prefix code. The content of each byte is 10101010. The subsequent byte’s content is 10101011, indicating the start of the frame. The destination address and the source address follow. Although the standard permits 2- and 6-byte addresses, the 10 Mbps baseband network standard only allows 6-byte addresses. (If the top digit of the destination address is 0, it indicates a general address; a 1 indicates multicast address. If all digits of the destination address are 1, it indicates the broadcast address). Following is the 2-byte length field (whose value is 0 to 1,500) and the data section. If the frame’s data section is shorter than 46 bytes, a filling field is used to meet the shortest length required. The last field is the checksum, whose checking algorithm is a cyclic redundancy code (CRC). See Figure 21.

Figure 21. Frame Structure

The frame format design on the application layer is (see Figure 22):

■ Command field—This field is 1 byte long and counted from 0x00 to distinguish different data types.

■ Length field—The length field size is the length of the data section (in bytes), which makes it easy for the receiver to analyze the data section.

Application Layer

Protocol Stack

MAC Layer

Physical Layer

Prefix code SFD Source Length/Type Data Filling

46 – 1,500

FCS

Destination

Frame Check Sequence (FCS) Computing

272

Page 282: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

■ Data section—The maximum data section length is 1,496 bytes, according to the frame size of the Ethernet MAC frame.

■ Check field—The check algorithm uses the simple sum check method to guarantee the data transfer accuracy.

Figure 22. Application Layer Frame Format

Process DescriptionFigure 23 shows a single communication process.

Figure 23. Single Communication Process

Display and Synchronization Refresh ModuleThe system supports multiple display terminals, therefore, all display terminals should be synchronized when the logic analyzer operates. Users can run the system by themselves, so we added an arbitrator to the dominant CPU. When the PC section sends the setting information, the arbitrator closes the set completed interrupt in the human-machine interaction CPU, which ensures that all settings arrive at the FPGA logic correctly. When the human-machine interaction CPU issues the set completed interrupt signal, the dominant CPU reads the setting information in the public memory after closing the interface interrupt. When data processing finishes, the dominant CPU sends the setting information and data to PCs via the network. It then tells the human-machine interaction CPU to read the setting information and data with the process completed interrupt, which guarantees synchronization when the logic analyzer operates.

Inter-Kernel Communication ModuleIn the end, the dominant CPU returns the data to the two display terminals according to the origin of the setting packet. If the setting packet is sent by the PC, the dominant CPU will only send the data packet of the selected channel to the PC, and inform the human-machine interaction CPU to read back all setting information from the public memory and read data from SDRAM. If the setting information is sent by the human-machine interaction CPU, the dominant CPU sends all setting information to the PC as well as the selected data packet. Then, it informs the human-machine interaction CPU to read back all setting information from the public memory and read data from SDRAM.

Command Field Length

1 – 1,496

Checking Section

Data Section Checking Field

Network InterfaceInitialization

Get the Datato be Sent

Data Frame Format Encapsulationon Application Layer

Interface Receiving Interrupted,MAC Frame Decapsulation

Application Layer FrameFormat Decapsulation

Process theData Received

MAC Frame Encapsulation,Invoke Interface Drive to Send

273

Page 283: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Human-Machine Interaction CPUThe key function of the human-machine interaction CPU is real-time detection of the user touch screen input, accurate and in-time display on the monitor, and dominant CPU communication. Figure 24 shows the software design flow of the Nios II human-machine interaction CPU.

Figure 24. Software Flow of Human-Machine Interaction CPU

First, the CPU initializes the system and the interrupts, which includes touch screen, DMA, and trigger requirement interrupts, etc. Next, it waits for the user to input information using the touch screen. If there is touch screen input, the CPU transfers the corresponding display concurrent control word to the dominant CPU and waits for the trigger requirements to be met. When the trigger requirements are met, the trigger waveform is sent to the VGA monitor. Additional details are described below.

■ Real-time touch screen input—The touch screen input uses an interrupt trigger. The control word is sent to the touch screen controller after it receives the touch screen interrupt. Then, the user touch coordinate is read until the busy status ends. In this case, the CPU receives the user input in time to make a real-time response.

■ VGA display module—The low-level VGA driver is established with the Avalon bus streaming mode hardware. The frame refresh is implemented via a DMA interrupt, and the data refresh is implemented by writing to the corresponding area of the SRAM memory. The upper-level graphics display calls the graphic function directly to display all types of graphics and waveforms.

■ Multiple-core communication—After the user has set the trigger channel, trigger requirements, and depth via the touch screen, the human-machine interaction CPU tells the dominant CPU that the setting is complete. The information is transferred to the dominant CPU via the PIO trigger

SystemInitialization

InitializeVGA Monitor

Initialize theTouch Screen

Initialize theWaveform Display

Wait Interruption

TouchInterruption

WaveformInterruption

VGAInterruption

Refresh the DisplayAccording to the Location

of the Touch Point

Display the Waveformand Channel

Change DMARegister

274

Page 284: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

interrupt. The human-machine interaction CPU waits until the trigger requirements are met, at which point, it triggers a human-machine interaction CPU PIO interrupt. The human-machine interaction CPU reads the trigger data and sends the waveform to the VGA monitor.

PC InterfaceThis section describes the PC interface.

Platform Introduction and Design IdeaWe developed the PC interface with the Microsoft Visual C++ software version 6.0. This software is commonly used for developing desktop application software, and reduced our development time and improved efficiency. The Microsoft Foundation Class (MFC) framework supports fast GUI development. We could optimize the program interface and enhance the user experience with its strong operability and interaction performance.

The interface is analyzed, designed, and developed using object-oriented concepts. Based on the model control view (MCV) design mode, the whole system uses the Windows message mechanism, which separates data from the interface, ensuring real-time user operation and data reliability.

The PC interface communication drive uses the mature Winpcap software system to send and receive data. The free, public Winpcap software system is used for the direct network programming in Windows systems. It provides the following capabilities:

■ Captures the original data packet, whether the data packet is sent to the local PC or is a switching packet between other devices.

■ Filters the user-defined rules before the data packet is sent to the application.

■ Sends the original data packet to the network.

■ Generates network traffic statistics.

■ Combines with the design of the whole communication section to implement communication between the DE2 development board and PCs in the LAN.

Interface IntroductionFigure 25 shows the main PC interface.

275

Page 285: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 25. Main PC Interface

System Function TestThis section describes the functional tests we performed on our system.

Testing SchemeWe used a signal sent from a 51 single-chip microcomputer, Lingyang’s single-chip microcomputer, and a Philip’s ARM as the test object.

Logic Signal TestWe performed the following logic signal tests on our design.

■ Bus test—We used the 51 single-chip microcomputer’s address and data bus to measure address reads and writes. For the test, we drew one 51 testing board, enable 16 address signals, 8 data signals, 1 read signal, and 1 write signal, and connected some of them to the logic analyzer test pins. Then, we set the 16 address signals as 1 bus and the 8 data signals as 1 bus with separate read/write signals at the PC end. The storage depth is set to 512, and we use a single-level trigger, which requires a falling edge to write the signal. The trigger location is at the front. We chose a single trigger and, after reading the data, displayed the tested signal in bus mode on the PC.

■ Depth of location test—We used the same location as in the bus test and varied the storage depth (we used 32, 64, 128, 256, and 1,024). We repeated the test procedure and displayed the resulting waveform on the PC.

■ Location of trigger test—We set the storage depth to 512 and changed the trigger location from the middle to the back. We repeated the test procedure and displayed the resulting waveform on the PC.

■ Multi-level trigger test—We generated several phase shift signals using the 51 single-chip microcomputer’s I/O and P1 interfaces. We connected these signals to the logic analyzer test pin.

Menu barButtons

ChannelName

WaveformDisplay

WelcomeInterface

Skill Hints

Status Bar

ChannelWindow

System StatusOutput Window

WaveformDisplay Window

Level DisplayWindow

FrequencyDisplay

LevelDisplay

ClockDisplay

276

Page 286: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

Then, we set 3 trigger levels on the LCD using the touch screen. The corresponding trigger requirements were:

● First level—signal1 rising edge

● Second level—signal2 rising edge

● Third level—signal3 rising edge

We used a 1-Kbyte storage depth and the external clock as the trigger clock. We selected a single trigger, and displayed the phase shift signal that meets the requirements set on the LCD after data is read.

■ External clock test—In this test, the 51 single-chip microcomputer, which is connected to the external clock pin of the logic analyzer, generates a clock signal. We chose the external clock as the trigger clock and displayed the waveform on the LCD.

Protocol TestWe used the following protocol tests:

■ I2C protocol analysis test—We used Lingyang’s single-chip microcomputer to generate a set of I2C signals, which we connected to the logic analyzer test pins, with the keyboard of an I2C interface. Next, we set the channel number for the SCK and SDA signals of the I2C connector on the PC. After running the test, we displayed the waveform and analyzed the protocol on the PC. To determine whether the test result was correct, we verified the results with the I2C interface chip ZLG7290.

■ PS2 protocol analysis test—We used a PS2 keyboard as the test object. We fetched the PS2 interface clock and data signals, and connected them to the logic analyzer test pin. We set the channel number for the PS2 clock and data signals using the LCD touch screen. After running the test, we displayed the waveform and analyzed the protocol on the LCD. We determined whether the test result was correct by checking the PS2 key code table.

■ UART analysis test—We used the serial port on the PC as the test object and sent a signal with the serial port tuning assistant. We fetched the serial port RXd signal, and connected it to the logic analyzer test pin. Next, we set the channel number for the RXd signal and the serial port type using the LCD touch screen. After running the test, we displayed the waveform and analyzed the protocol on the LCD. We determined whether the test result was correct by comparing the value sent by the serial port tuning assistant and the value generated by the logic analyzer.

■ SPI analysis test—We used an SPI signal, generated by the Philip’s single-chip microcomputer SPI interface, to illuminate the digital LED. We fetched the SC, SDI, SDO, and SCK SPI interface signals and connected them to the logic analyzer test pin. Next, we set the channel number for the SC, SDI, SDO, and SCK signals, and set the clock rate and signal lock-and-save mode on the PC. After testing, we displayed the waveform and analyzed the protocol on the PC. We determined whether the test result was correct by checking the segment code table of the 7-segment digital LED.

Synchronization Display TestFor this test, we connected the LCD and PC to the logic analyzer and performed the protocol tests. Both terminals displayed the results and waveforms.

Idiographic TestThis section descibes the idiographic tests. We use an oscillograph, a multi-meter, concentrators, and a PC to perform the tests.

277

Page 287: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Logic Signal TestOur testing results showed that our design functioned correctly, for all storage depths, triggering point positions, and sampling clocks. Figures 26 and 27 are the bus signal testing waveforms, which tested the 51 single-chip microcomputer’s address and data bus.

Figure 26. Bus Test Signal (PC Display)

Figure 27. Bus Test Signal (VGA Color Monitor)

Figures 28 and 29 are the waveforms for multi-level triggering signal testing. We used four-level triggering, conditioned by 1-1-0-1 serial of the CH 24 signal. The external clock was the sampling clock.

278

Page 288: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

Figure 28. Shifted Test Signal (PC Display)

Figure 29. Shifted Test Signal (VGA Color Monitor)

Protocol TestThe test was successful. Figure 30 shows the I2C waveform, which was displayed on the PC screen. Figure 31 is the synchronously displayed waveform on the LCD. We pressed the 6 key on the I2C keyboard. The program lists the key codes according to I2C keyboard’s ZLG7290 chip information. The key code is 70h-1h-71h-33h-0h-ffh, which corresponds to 6.

Figure 30. I2C Protocol Test Signal (PC Display)

Figure 31. I2C Protocol Test Signal (VGA Color Monitor)

279

Page 289: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

The test was successful. Figure 32 is the PS2 waveform displayed on the PC screen and Figure 33 is the waveform synchronously displayed on the LCD. We pressed the Enter key on the PS2 keyboard. 5Ah is displayed, which corresponds to Enter.

Figure 32. PS2 Protocol Test Signal (PC Display)

Figure 33. PS2 Protocol Test Signal (VGA Color Monitor)

The test was successful. Figure 34 is the serial waveform displayed on the PC screen and Figure 35 is the waveform synchronously displayed on the LCD. The serial tuning assistant delivered the data 1-2-3-4-5-6-7-8-9-0. The serial port actually transfers the data’s ASCII code. 31h-32h-33h-34h-35h-36h-37h-38h-39h-30h was displayed, which corresponds to the data sent.

Figure 34. Serial Protocol Test Signal (PC Display)

Figure 35. Serial Protocol Test Signal (VGA Color Monitor)

This test was successful. Figure 36 is the SPI waveform displayed on the PC screen and Figure 37 is the waveform synchronously displayed on the LCD. The Philips ARM device sends a one-way SDO signal and lights a numeral tube via the SPI interface. The SDI signal obtained the segmented code 3, which corresponds to 4Fh as shown.

280

Page 290: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

Figure 36. SPI Protocol Test Signal (PC Display)

Figure 37. SPI Protocol Test Signal (VGA Color Monitor)

Testing Result and Functions AchievedWe drew the following conclusions from our testing:

■ This design implements the following logic analyzer functions:

● Variable storage depth

● Variable trigger point position

● Internal and external trial planting modes

● Multi-level triggering

● Bus measurement and display

■ The design has protocol analysis capability, and can analyze I2C, PS2, UART, and SPI signals. If the storage depth permits, the system can analyze all signals for these four protocols.

■ The design supports a remote display. The PC terminal correctly analyzes and displays data when connected to the logic analyzer and the concentrator.

■ Because of synchronous updates and an independent setup, the two display terminals do not interfere with each other.

281

Page 291: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Design FeaturesOur design has the following features:

■ Low cost and high performance—The single chip logic analyzer is small, easy-to-carry, and cost effective, which satisfies the needs of most engineering technicians. Compared to the expensive logic analyzers currently on the market, this system provides excellent performance at a lower cost.

■ Portability—The single-chip FPGA and Nios II processor can implement data collection, analysis, storage, control, and transfer, which allows the logic analyzer to migrate from large a desktop to small handsets. Additionally, it provides portable terminals suitable for working outdoors.

■ Remote display and multi-user collaboration—Ethernet technology gives the logic analyzer remote control and display capability. Data transfer rates in high-speed LANs improves the system’s real-time performance. Using a network, many users can share the same device for cooperative tuning and analysis.

■ User input supported via touch screen—The system uses a touch screen, instead of a mouse and keyboard, to receive the user’s input. The portable terminal greatly simplifies the logic analyzer’s operation.

■ Real-time voice accounting—The system broadcasts the protocol data while performing its analysis, which helps users cooperate when running the machine and sharing the resources.

■ Interface variety—The VGA and network interface can both control the logic analyzer. Therefore, the system provides portable and remote terminals that can work for various situations.

■ Comparison and analysis of the logic analyzer—Traditionally, the logic analyzer can only analyze timing and state. This system, using the processing capability of the Nios II processor and the FPGA hardware’s flexibility, can analyze timing and state as well as varied logic.

■ System upgrades—Combining the Nios II processor and FPGA allows this system to improve hardware functionality and implement software updates.

■ Embedded processor technology—The system uses the FPGA to collect and analyze data and process it in parallel. The Nios II processor reads and writes to the external peripherals, which speeds processing and ensures real-time operation. Custom external peripherals, connected with SOPC Builder, increase the design’s flexibility.

ConclusionBy participating in Altera’s SOPC embedded processor competition, we obtained basic knowledge of FPGAs, SOPC concepts, and the Nios II processor, which changed the way we thought about FPGAs. SOPC concepts improve upon the FPGA functions (e.g., parallel operation, high speed, logic, simple timing, and complex operations) by integrating an external control unit (such as a single-chip microcomputer). The designer can easily add or remove external peripherals and interfaces from the Nios II processor, making a seamless interface between processor and hardware logic. We can abandon the traditional design concept that design alteration causes hardware modification, streamlining the design process. The Nios II processor decreases problems with hardware threading and ensures system stability.

SOPC concepts and the Nios II processor have totally overturned the traditional embedded system. Instead of passively adapting to the hardware processor, we can customize both hardware and software. The Nios II processor improves upon the embedded system’s hardware circuit with its simplicity, efficiency, and understandability. Collaborative development of software and hardware enables the

282

Page 292: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

synchronization of FPGA logic development and Nios II program development in a single FPGA, which greatly increases design efficiency.

The logic resources and external peripheral resources, as well as intellectual property (IP) cores provided with Altera’s latest DE2 development board is very helpful for function expansion. Using the development board, FPGA, Nios II processor’s collaborative development, and the dual CPUs, our system has better performance.

We were able to implement dual-CPUs and remote network control for the competition, but it is a pity we did not transplant the embedded operating system into the Nios II processor as well. Performing this task would have been difficult, because network communication would be implemented with the MAC protocol. However, implementing the embedded operating system in the Nios II processor would facilitate networking, documentation, and GUI design. We plan to work on this aspect of the design in the future.

SOPC design concepts, the Nios II processor development method, Quartus II development tool, SOPC Builder, and Nios II Integrated Development Environment (IDE) provide us with the system design. Specifically, we can embed a real-time operating system in the Nios II processor to strengthen theCPU’s function. We think that SOPC, a programmable system-on-chip, will have a rosy future and be more widely used in people’s lives.

AcknowledgmentsWe would like to express our gratitude to the Altera 2006 Undergraduate Electronic Design Competition SOPC organization committee in the Hubei province, and Altera (China). Without your efforts, we would not have the opportunity to learn new technology, expand our horizons, and inspire our sense of innovation. We also wish to thank Huazhong University of Science and Technology and its Electric Engineering and Electronic Innovation Center for the support, environment, and instruction they provide.

ReferencesPeng Chenglian. Challenge SOC-Nios Based SOPC Design and Practice. Beijing, Tsinghua University Press (2004).

Zeng Fanta. Survey of EDA Engineering. Tsinghua University Press (2002).

Chu Zhenyong. FPGA Design and Application. XiDian University Press (2002).

Wang Jianxiao, Wei Jianguo. SOPC Preliminary Design and Practice. XiDian University Press (2006).

Pan Song, Huang Jiye, Zeng Yu. Practical Course of SOPC Technology. Tsinghua University Press (2005).

Ren Aifeng, Chu Xiuqin, Chang Cun, Sun Xiaozi. FPGA Based Embedded System Design. XiDian University Press (2004).

Kairus J., Forsten J., Tommiska M., and Skytta J. Bridging the gap between future software and hardware engineers: a case study using the Nios softcore processor Signal Process. Labratory, Helsinki University of Technology, Espoo, 33rd Annual Finland Frontiers in Education, (2003).

Du Huimin. Verilog Based FPGA Designing Basis. XiDian University Press (2006).

Luthra M. Sumit Gupta Nikil Dutt Rajesh Gupta Nicolau A. Interface synthesis using memory mapping for an FPGA platform. USA Computer Design (2003).

Wang Jianxiao. SOPC Preliminary Design and Practice. XiDian University Press (2006).

283

Page 293: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Member of China Computer Federation Microcomputer Society. The 5th National Embedded System Academic Exchange Symposium in 2004. China Computer Federation Microcomputer Society (2006).

Ben Atitallah, A. Kadionik, P. GHozzi, and F. Nouel P. Masmoudi N. Marchegay Ph. Hardware platform design for real-time video applications. Tunisia Microelectronics, ICM 2004 Proceedings (2004).

Altera Nios II and other product documentation.

284

Page 294: Nios II Embedded Processor Design Contest - Outstanding - Altera

Black Box for Robot Manipulation

Second Prize

Black Box for Robot Manipulation

Institution: Hanyang University, Seoul National University, Yonsei University

Participants: Kim Hyong Jun, Ahn Ho Seok, Baek Young Min, Sa In Kyu

Design Introduction Today’s robot manipulators need ever more precise control capability. There are two important prerequisites for precise control: the number of fingers in the manipulator and accessable hardware/ software control. In the first case, the number of system I/O pins is determined by the number of fingers that must be controlled. In the second case, the developer should be able to easily design the system, taking into account the system requirements.

Conventional systems use several CPUs because it is difficult to control all motors in the system using a single CPU. Each CPU has control over a servo motor, and the CPUs communicate with each other to simultaneously control the manipulator. This methodology can have problems with unsynchronized movements or time delays. In contrast, FPGAs have many I/O pins, and the developer can design the system to fit in one FPGA. This FPGA can then control all of the motors by distributing the servo-motor dedicated processors. We chose to use an Altera Cyclone® II FPGA, which gave us easy, natural control over the manipulator. Our design implemented the Nios® II processor with custom instructions, which helped achieve faster processing capability and precise system control, providing a robust FPGA solution. Our ultimate goal was to develop the control module using this hardware/software strategy.

The robot manipulator has two arms and uses the Development and Education (DE2) board, which contains the Altera Cyclone II FPGA. The application used the μC/OS-II RTOS and custom instructions. Communication to and from this control module is performed via an external Ethernet link.

Function DescriptionIn a hardware/software co-designed system, the interaction of the software and hardware is critical. The hardware logic is implemented in an FPGA and tends to be fast and robust. The software logic is comparatively slow due to its sequential flow, instructions, and operand fetch stages that involve read/write memory access. Software, however, is significantly easier to develop and modify because it resides in memory and not in the dedicated gates of the switch fabric.

285

Page 295: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

The black box for robot manipulation (BRM) operates via a standalone controller, which controls the DC and servo motors, and has communications ability via Ethernet and serial peripheral interface (SPI) connections. Our goal was to make a high-performance, usable system. The project involved two phases.

■ In phase one, we developed and operated the manipulator like a microcontroller unit (MCU) robot system. Hardware implemention included a DC motor controller and encoder with programmable I/O (PIO), a servo motor controller with a pulse width modulator (PWM), and a hand module controller using the SPI master peripheral, and an Ethernet peripheral with the dm9000 Ethernet device.

In the software phase, we used the μC/OS-II RTOS running on the Nios II processor. We created the design using SOPC Builder, and implemented a lightweight TCP/IP (LWIP) stack, and defined the Ethernet and SPI communication packet, and added controls for the DC motor, DC motor encoder, and servo motor at the RTOS task level.

■ In phase two, we planned to use PWM generator and pid calculation using the Nios II C-to-Hardware Acceleration Compiler (C2H Compiler) and custom instructions. An important issue is whether the project can be used for home and space automation.

Performance ParametersWe used the Cyclone II FPGA on the Altera DE2 board, and ran the μC/OS-II RTOS on the Nios II processor. With this setup, it took one second to send a command four times to all eight DC motors, and one servo motor. It took an additional one second for SPI communication with the hand module and external Ethernet device. If a command is sent more than four times, the BRM system issues a fault indication.

Design ArchitectureFigure 1 shows the BRM’s hardware block diagram.

286

Page 296: Nios II Embedded Processor Design Contest - Outstanding - Altera

Black Box for Robot Manipulation

Figure 1. Robot Manipulator Block Diagram

The external device, which is controlled by the BRM, has five components:

■ DC motor

■ DC motor encoder

■ Servo motor

■ Hand module with SPI communication

■ An Ethernet device (dm9000)

Three joints in the arm use a DC motor. One wrist joint uses a servo motor. The whole system has eight joints that use DC motors, and two that use servo motors. At the end of the arm is the hand module. The AVR MCU controls it with 4 DC motors and four servo motors via SPI slave communication. Figure 2 shows a computer-aided design (CAD) drawing of the robot arm.

Robot Arm

DC Motors: 4Servo Motor: 1

Total Actuators

DC Motors: 6Servo Motors: 22

Robot Arm Robot Arm

Robot Arm

DC Motors: 4Servo Motor: 1

RobotHand

RobotHand

Robot Hand

Ready to ReceiveSPI PacketUsing AVR

Robot Hand

Ready to ReceiveSPI PacketUsing AVR

Black Box for Robot Manipulation

PIO, PWM,Encoder, SPI

Signal

PIO, PWM,Encoder, SPI

Signal

Altera DE2 BoardUsing Cyclone II FPGA

& Circuit

Main Robot Control Scheduler(PC Application)

Ethernet

287

Page 297: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 2. Robot Arm CAD Drawing

Figure 3 shows the software system block diagram.

Figure 3. System Block Diagram

The software does not have a general device driver. To maintain consistency with the RTOS, we assigned the task to the I/O subsystem. Figure 4 shows the software flow chart.

DC MotorControl Task

PIO

DC MotorLeft: 4

Right: 4

Servo MotorControl Task

PWM Generator

Servo MotorLeft: 1

Right: 1

Other DeviceControl Task

SPI Master

Hand ModuleLeft: 1

Right: 1

LWIP for μC/OS-II

dm9000

External ControlUnit (Robot

Main Scheduler)

External PacketManager

μC/OS-II Application

μC/OS-II for Nios II Processor

Nios II Processor

288

Page 298: Nios II Embedded Processor Design Contest - Outstanding - Altera

Black Box for Robot Manipulation

Figure 4. Software Flow Chart

Design DescriptionWe used the Altera DE2 board, which has a Cyclone II device, to implement the design. The main clock frequency is 100 MHz, and we used a total of 39 external general-purpose I/O (GPIO) pins for the DC motor, DC motor encoder, servo motor, and the hand control of the SPI master. We modified the DE2_WEB source code, and did not change the name of the top-level design. The design has the following modules:

■ DC_MOTOR_CONTROL—We implemented this module using the PIO output-only peripheral. It has a 32-bit data register, which we needed because one DC motor requires two signals (positive and negative).

■ DC_MOTOR_ENCODER—We implemented this module using the PIO input-only peripheral. It catches the rising edge of the encoder pulse, and it has a 32-bit data register. The DC motor encoder requires two signals (for the forward and reverse motor states).

Initialize DC Motor(Using Limit Switch)

DC Motor Task

DC Motor Task Creation

Servo Motor Task

Servo Motor Task Creation

Hand Control Task

Hand Control Task Creation

Ethernet Packet Manager

Ethernet Task Creation

BRMManager Task

BRM ManagerTask Creation

out: Hand Command for SPI Packetin: Sensor Value for SPI Packet(Using SPI Master Peripheral)

out: Hand Sensor Value & BRM State (Ready, Processing, Stopped)in: BRM Command (#n Gesture)(Using DM9000 Peripheral)

out: 8 PWM Signalsin: Encoder Edge Counter(Using PIO Peripheral)

out: 2 PWM Signals(Using PWM Generator Peripheral)

Processing:1. After Initialization, Wait to Get EthernetCommand Packet from Main Control System.2. If Get Command Packet, Parse the Packet & Pick the Arm & Hand Command.3. Command DC & Servo & Make Arm Action.4. Command Hand & Make Hand Action.5. Maintain DC & Servo until Hand ModuleResponds (OK, Error).

289

Page 299: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

■ SERVO_MOTOR_CONTROL—This module uses the PIO output-only peripheral. It has a two-bit data register because one servo motor requires one PWM signal. The operating system (OS) task generates the PWM pulse.

■ HAND_CONTROL—The module uses the SPI master peripheral. The hand module acts in SPI slave mode. It has a clock of 11/128 MHz.

We built the Nios II processor using SOPC Builder, as shown in Figure 5.

Figure 5. Nios II implementation in SOPC Builder

Figure 6 shows the Quartus® II-generated project summary.

290

Page 300: Nios II Embedded Processor Design Contest - Outstanding - Altera

Black Box for Robot Manipulation

Figure 6. Nios II Implementation in the Quartus II Software

Figures 7 and 8 show the circuit and hardware that are required to connect the external devices, such as the DC motor, encoder, and SPI slave module.

Figure 7. Completed Circuit

291

Page 301: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 8. DC Motor Driver and Power Circuit

The DC motor driver circuit has four L298 driver devices, and the power circuit has uses the LM2576-3.3 device, which operates at 5 volts. Figure 9 shows the signal divider circuit and external DE2 header.

Figure 9. Signal Divider Circuit and DE2 External GPIO Header

Figure 10 shows our testbench system.

292

Page 302: Nios II Embedded Processor Design Contest - Outstanding - Altera

Black Box for Robot Manipulation

Figure 10. Combination with Our Testbench Robot System

Design FeaturesOur design has the following features:

■ Processor—Nios II processor

■ Operating System—μC/OS-II RTOS using LWIP

■ Interfaces—SPI and Ethernet communication

■ I/O Signal—Motor control and sensing (34 I/O pins)

We developed our robot manipulator for robust control. The software generates the signals required for elaborate control while the Altera Cyclone II FPGA solves time delay problems. With this combination, the system performs well, with all of the parts working together. Furthermore, this implementation solves the synchronization problems experienced by the conventional manipulation method, which uses many CPUs to control many motors. Our control module is easy to control externally, because we use I2C to control orders from external devices. Additionally, we implemented the robot’s operating system easily by using custom instructions with the Nios II processor.

ConclusionIt was very difficult to adapt the FPGA system to the robot embedded system. To design the robot system with an FPGA, we divided the project into two parts: developing the MCU-based model and developing the FPGA-based model. Changing from MCU-based development to FPGA-based development will have a huge impact on the industry design flow. Originally, we considered using the ARM microprocessor to control the robot. However, after using the FPGA-based system, we found that we prefered using the FPGA to using ARM or an MCU. So, the design worked better than we expected. The Nios II processor, Quartus II software, SOPC Builder, and Nios II Integrated Development Environment (IDE) made it easy for us to develop our design, showing the usefulness of these tools in the design process.

293

Page 303: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

294

Page 304: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

Third Prize

Digital Oscillograph

Institution: Department of Microwave Engineering Air Force Radar Academy

Participants: Hui Wu, Zhi-Xiong Deng, and Li-Hua Guo

Instructor: Yao-Jun Chen

Design IntroductionThe digital storage oscillograph uses a microprocessor for control and data proccesing. It performs a variety of functions that simulation oscillographs cannot, such as leading triggering, combined triggering, burr capture, wave processing, hard-copy export, soft-disk recording, and long-term waveform storage. Typically, the digital storage oscillograph’s bandwidth exceeds 1 GHz and the performance is higher than simulation oscillographs in many aspects. With high performance and a large application scope, developing a good digital storage oscillograph is important.

We implemented a digital oscillograph design using the Nios® II processor and an FPGA (a combination of software and hardware), operating on the Development and Education (DE2) board. The system has three parts:

■ External expanded circuit—Converts from simulation signals to digital signals.

■ On-chip programmable part—Controls the signal sampling storage, measures the frequency, and generates a self-checking signal.

■ Embedded processor—Provides system and interface control.

Figure 1 shows the overall system structure. The input signal voltage is adjusted based on the collection scope of the condition circuit sampling. An analog-to-digital (A/D) device converts the signal from analog to digital. The system, running on the FPGA, stores the signal into the board’s dual-port RAM. The Nios II processor reads data from the RAM and sends it to the LCD to display the resulting waveform.

295

Page 305: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 1. System Structure

Function DescriptionOur design has the following functionality:

■ Implements high-speed data sampling. The sampling signals can be obtained by dividing the internal system’s frequency. The frequency can be changed by adjusting the frequency division coefficient, which changes the display’s horizontal sensitivity.

■ Supports programmable attenuation and amplification of the input signal, allowing the system to change the display’s vertical sensitivity.

■ Saves waveform data in SDRAM or flash for the system to show at a later time.

■ Implements hardware-level equal precision frequency measurement.

■ Uses the fast Fourier transform (FFT) algorithm with a time domain extraction method base 2 to analyze the tested signals’ frequency spectrum.

■ Supports communication between the DE2 board’s serial communication interface and a computer so that the data can be sent to the computer and displayed after it is processed.

■ Writes data to a secure digital (SD) card using the SD card interface on the DE2 board, which protects the data and allows it to be displayed in other regions.

■ Generates unformed or edited waveform signals using the direct digital synthesis (DDS) frequency synthesizer method. These signals can replace those generated by the signal source and checks whether the system is functioning if there is no signal source.

Performance ParametersOur design has the following performance:

■ Display—Dual-channel waveform display.

■ Bandwidth—About 500 kHz.

■ Maximum real-time sampling frequency—16 MHz.

Dual-PortRAM

CH1

CH2

A/D

A/D

SDCard Flash

SerialPort

LCD

Nios IIProcessor

SamplingStorageControl

TriggeringSignal Generation Operation

Panel

FPGA

ChannelSignal

Conditioning

296

Page 306: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

■ Storage depth—512 points.

■ Testing voltage scope—0 to +15 V.

■ Testing frequency scope—7 Hz to 500 KHz.

■ Vertical sensitivity—The minimum vertical sensitivity is up to 10 mV/div and the vertical sensitivity scope is 10 mV/div to 2.5 V/div.

■ Horizontal sensitivity—The minimum horizontal sensitivity is up to 1 μs/div and the horizontal sensitivity scope is 1 μs/div to 500 ms/div.

■ Available trigger methods—Edge and level triggering.

Design ArchitectureThis section describes the hardware and software design architecture.

Hardware DesignThe following sections provide a detailed description of the hardware design.

Conditioning CircuitThe conditioning circuit modulates the input signal to the scope, making it suitable for A/D conversion. The A/D reference is 2 V, and 8 grid lines on the oscilloscope equate to 2 V, or 0.25 V/div. Table 1 shows the amplification factor of the vertical scanning sensitivity per gear. The amplification factor refers to the percentage of the conditioning circuit’s output voltage and input voltage.

As shown in Figure 2, the conditioning circuit is structured from attenuation to amplification.

Table 1. Relationship between Vertical Sensitivity and Amplification Factor

Sensitivity or Amplification RelationshipVertical sensitivity mV/div 10 20 30 40 50 60

Amplification factor 25 12.5 8.33 6.25 5 4.17

Vertical sensitivity mV/div 80 100 150 200 300 400

Amplification factor 3.125 2.5 1.67 1.25 0.83 0.625

Vertical sensitivity mV/div 500 600 800 1,250 2,500

Amplification factor 0.5 0.42 0.31 0.2 0.1

297

Page 307: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 2. Conditioning Circuit Diagram

The attenuation part consists of the 8-bit digital-to-analog (D/A) converter (DAC) DAC0832, and the D/A output voltage is:

V0= VREF x DIN / 256

where VREF is the input voltage and DIN is the input from the Nios II processor. As DIN varies, the system changes the attenuation factor, which affects the amplification factor of the conditioning circuit and the vertical sensitivity.

The amplification part consists of the meter amplifier, AD620. The amplification circuit’s gain is:

G = 1 + 49.4 kΩ/Rg = 25.7

where Rg = 2 kΩ is the fixed feedback resistor’s value. The AD620 device’s gain bandwidth product is 12 MHz and the gain bandwidth is 466.9 kHz; therefore, the system’s testing signal frequency cannot exceed 466.9 kHz. If it exceeds this threshold, the AD620 device can still display the waveform and frequency, but it cannot display the correct amplitude value due to the signal amplitude attenuation after passing through the AD620 device. To solve this problem, we could substitute another high-speed component for the AD620 device.

Sampling CircuitWe used a 240 x 128 LCD as the external display. After it is divided into a grid using 10 horizontal and 8 vertical lines, the LCD has 16 x 16 pixel display sections.

The analog signal sampling circuit has an analog to digital (A/D) converter (ADC), the TLC5510 device. The maximum sampling frequency of this device is 20 MHz and it has a maximum real-time sampling frequency of 16 MHz. The TLC5510 is an 8-bit A/D chip with a conversion scope of 0x00 to 0xFF that corresponds to 128 points. We deal with the data from the TLC5510 device using a divide by 2, i.e., we pick the higher 7 bits from the TLC5510 device. Because each bit has 16 points, we can use the following formula to obtain the sampling frequency:

fs = 16/TCELL

TCELL demonstrates the time on the LCD’s horizontal grid, as it relates to the time gap (horizontal sensitivity). The horizontal sensitivity is designed as 1 μs/div to about 500 ms/div, or 18 gears

DAC0832

Data Line

Data Line

Output

1

2

3

4 5

6

7

8

2K

AD620

+12 V

104

104

104

+12 V

-12 V

2

3

LF3567

6

4

104

-12 V

12 V

104

+5 V

1 2 3 4 5 6 7 8 9 10

20 19 18 17 16 15 14 13 12 11

100p

1K

-12 V104

42

3

7

+12 V

6

LF356

+

- +

-

104

298

Page 308: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

altogether. According to the above formula, we can determine the sampling frequency. Table 2 shows a portion of the sampling frequency.

A CPU provides the TLC5510 device’s sampling signal, and the sampling frequency is obtained by the programmable frequency divider that divides the system’s reference frequency. The divider has preset values that change the frequency division times, which converts the sampling signal to different frequencies. Various sampling signal frequencies lead to various times represented by each horizontal grid line, which in turn varies the time gap (horizontal sensitivity).

Waveform Shaping CircuitTo generate the edge trigger’s rising edge while controlling the sampling signal storage, the system must generate a stable rectangular wave. For this operation, the system uses a hysteresis loop comparer provided by the LM311 device. Figure 3 shows the waveform shaping circuit diagram.

Figure 3. Shaping Circuit Diagram

The inverting input terminal inputs the signal. When it is larger than zero, the output is at a low level; when the input passes through the zero point from high to low, the output immediately bounces to a high level. When the input passes through the zero point from low to high, the output remains at a high level because the LM311 device’s inverting input terminal voltage is lower than that of the non-inverting input terminal. The output is not shifted to a low level until the inverting input terminal voltage is higher than the non-inverting input terminal. Thus, the output signal’s rising edge remains on as soon as the input signal passes through the zero point from high to low. Figure 4 shows this process. The system uses this signal for storage control and edge triggering.

Figure 4. Hysteresis Loop Comparer

Table 2. Relationship between Horizontal Sensitivity and Sampling Frequency

Sampling MeasurementTCELL 1 μs 2 μs 5 μs 10 μs 20 μs 50 μs 100 μs

Sampling fs/kHz 16,000 8,000 3,200 1,600 800 320 160

104

+

-

1K

3K+12 V

5.1K

3K

3.3 V

7

14

2

3

104+12 V

LM311

5 6

299

Page 309: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Storage CircuitThe sampling data is saved into dual-port RAM on the DE2 board. The address counter is synchronized by the sampling clock when the data is saved. The counter provides a write address for the dual-port RAM. When the address changes, the data is saved to the dual-port RAM in order. After 512 bytes of data are saved, the address counter returns to zero and stops counting. When the next trigger signal comes, it begins to count from the beginning and the data is saved from the beginning. With dual-port RAM, the system can read and write simultaneously. The Nios II processor generates the read address and has no connection to the address counter used by the writing address, which separates the read and write addresses and avoids errors.

Equal Precision Frequency MeasurementTraditionally, there are two methods for measurement:

■ Frequency measurement—This method records the tested signal’s change cycles (or pulse numbers) NX within the specified gate time TW. The frequency of the tested signal is fX = NX/TW.

■ Cycle measurement—Assuming that the standard signal’s frequency is fs, this method records the standard frequency’s cycles NS in a cycle TX of the tested signal. The frequency of the tested signal is fX = fS/NS.

The results of the two methods are different by ±1 word, and the testing accuracy is related to the counter’s record NX (number of pulses) or NS (number of frequency cycles). To ensure testing accuracy, the low-frequency signal usually uses cycle measurement and the high-frequency signal uses frequency measurement. In contrast, our design uses the equal precision frequency measurement to facilitate the tests.

The gate time of the equal precision frequency measurement is not a fixed value. Instead, it is integer multiples of the tested signal cycles that are synchronous to the tested signal. In this way, the measurement eliminates the ±1 word error that results from counting the tested signal and achieves the equal precision measurement for the whole testing frequency band. See Figure 5.

Figure 5. Equal Precision Frequency Measurement

During measurement, two counters work simultaneously to count the standard signal and the tested signal. When the gate starting signal is given (i.e., the rising edge of the preset gate), the counter does not count until the rising edge of the tested signal. When the preset gate closing signal (falling edge) arrives, the counter stops counting at the rising edge of the tested signal to complete the measurement. The actual gate time, T, is not strictly equal to the preset gate time T1; the difference is no more than a cycle of the tested signal. Provided in actual gate time T, the counter’s record of the tested signal is NX, the record of the standard signal is NS, and the standard signal’s frequency is fs. Thus, the tested signal’s frequency, fs, is fS = NX x fS/NS.

Preset Gate

Actual Gate

Reference Signal

Tested Signal

T

T1

Ns

Nx

300

Page 310: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

This formula shows that the NX error is zero, because the gate time is integer multiples of the tested signal’s cycle. That is, the comparative error of the frequency measurement has no connection with the frequency of the tested signal. Rather, it is only relevant to the gate time and standard signal’s frequency. The longer the gate time, the higher the standard frequency and the smaller the comparative error of the frequency measurement. Figure 6 shows the block diagram of the equal precision frequency measurement.

Figure 6. Equal Precision Frequency Measurement Hardware Block Diagram

The block diagram has four modules:

■ fen_pin_u divides the frequency of the reference clock inclock, and gets the square wave signal outclock with a 50% duty cycle (in which the b input stands for enable, low order, may not receive).

■ jishuqi counts the outclock coming out of fen_pin_u. If the record is greater than or equal to the preset value M[31..0], then jishuqi sends a signal to chufa while the counter stops counting and waits for the clr and en signals to begin the next counting cycle.

■ The chufa module’s clr signal receives the qout output generated by jishuqi to determine whether the chufa block’s output q is high or low, generating the gate signal. The gate signal is used as the gate signal of clk_cnt.

■ The clk_cnt block tests the frequency. The system delivers the clr signal to let the two 32-bit counters perform zero clearing. When the gate is high, the BZ_ENA and DC_ENA enables are 1s simultaneously (BZ_ENA controls DC_ENA to make DC_ENA consistent with it). Next, the block starts BZ_Counter and DC_Counter to make the system enter the count permission period. At this moment, BZ_Counter and DC_Counter independently count the tested signal and the standard frequency signal (inclock signal). After TC seconds of the gate time width, the Nios II system sets the preset gate signal low. The two 32-bit counters are not shut off by the D trigger until the following tested signal’s rising edge. At this time, the error caused by changing the gate time width TC is, at most, a clock cycle of the reference clock inclock signal. Because inclock

FrequencyDivision Part

ReferenceFrequency

Counter

Gate Signal

FrequencyMeasurement

Part

TriggeringPart

TestedFrequency

301

Page 311: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

is sent by the highly stable 50-MHz crystal oscillator, the absolute measurement error at any moment is only 1/50000000 seconds, which is also the main error of the system.

Assuming that within a preset gate time TC, the count value of the tested signal is NX, the value of the standard frequency signal is NB, and the frequency of the tested signal is XCLK. According to the equal precision frequency measurement, the frequency of the tested signal is:

XCLK = NB x bclk/NX

Figure 7 shows the timing simulation.

Figure 7. Equal Precision Frequency Measurement Timing Simulation

Equal Time Effectiveness MeasurementWhen the frequency of the tested signal is less than 200 kHz, the TLC5510 ADC sampling frequency can be up to 3.2 MHz at most. At this frequency, it does not severely interfere with the other components. However, if the sampling frequency is too high (more than 3.2 MHz), it causes very severe interference with the components and affects the waveform data’s collection and display. Therefore, the system’s testing frequency cannot be more than 200 kHz. To increase the system’s tested frequency, we use the equal time effectiveness measurement.

The equal time effectiveness measurement measures once every N cycles, and the time delay is TX/M, where TX is the cycle of the tested signal and M is the points collected in a cycle. Thus, when the tested signal’s frequency is very high, the data can still be collected correctly. Figure 8 shows the schematic.

302

Page 312: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

Figure 8. Equal Time Effectiveness Measurement Schematic

As shown in Figure 8, the input signal is first converted into a square wave signal. Later, a pulse signal is generated every N cycles, and these signals move backward by a small variable, which generates a low-frequency signal that is sent to the TLC5510 device as the sampling signal. The low frequency of this signal avoids the interference caused by a high sampling frequency. Figure 9 shows the hardware implementation.

At theInterval of N

Cycles Input Signal

Set Interval Cycle

Remain at N*M HighLevel after Falling

Remain at N*M LowLevel after Rising

Move N*M

303

Page 313: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 9. Equal Time Effectiveness Measurement Block Diagram

The block diagram has five modules: di, cont, maic, maic_1, and mm. d is the tested signal, enable is the enabling signal, and n[3..0] is the cycle interval, which can be 4 or a number larger than 4 (which will be explained later in this paper). When enable is low (efficient), the counter inside the di module begins to count. When it counts to N, the output signal q outputs a pulse as wide as two cycles of the input signal d. The signal q is given to the cont module. In the cont module, set[7..0] are the points to collect, clk is the reference clock, and m[7..0] is the collection offset m. m[7..0] can be calculated as m[7..0] = TX x 50000000/set[7..0].

TX (the tested signal cycle) can be obtained as follows:

At the cont block’s rising edge of d, the system initiates the internal counters count and count1. The counters begin counting, with count performing an add 1 operation, and count1 adding m. That is, when a pulse arrives at d, count adds 1 and count1 adds m until count reaches the collected points set[7..0]. After counting, count and count1 are set to zero so they are ready for the next counting cycle. We make the count1 value the output and send q[7..0] to maic and maic_1. maic postpones the rising edge of the di output q. The cont output, q[7..0], references the pulse time with the falling edge remaining unchanged.

When a rising edge arrives at the maic input, ds, it initiates the internal counter and its output, q, is at low level. When the count reaches q[7..0], the output is high and ds is high in the next cycle. The next work cycle begins after the rising edge of the next ds. maic_1 operates like maic, except it postpones the falling edge of ds until the q[7..0] reference clock cycle. The rising edge remains unchanged and the operation is the same as maic, except the counter starts on the falling edge of ds. mm integrates the maic and maic_1 signals. The maic output becomes the clock signal for mm and the maic_1 output becomes the clear signal for mm. When clr is low, the output is zero; at the clk rising edge, if clr is high, the output is clr. According to the previously shown formula for m[7..0], the final pulses of the di_1 signal’s output q signal are very narrow. To be detected by the system and while ensuring the hat the output signal’s duty cycle is 50%, the q signal’s pulse is widened by occupying two cycles of the tested signal. Therefore, N is a number equal to or bigger than 4.

Figure 10 shows the simulation waveform. qz moves away from out bit by bit, and each move has the same size as the previous move.

304

Page 314: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

Figure 10. Equal Time Effectiveness Measurement Simulation Waveform

Edge and Level TriggeringIf there is no trigger signal, the TLC5510 device samples the input signal. Because it can only sample 512 points at a time, it samples the signal again after 512 points. The TLC5510 sampling frequency is not an integral multiple of the input signal, which changes the sampling start position and makes the waveform display unstable. To avoid this problem, we need to synchronize the process by generating a trigger signal. Before the input signal reaches a set value, the trigger circuit generates a trigger signal, which tells the TLC5510 device to begin sampling. After sampling 512 points, TLC5510 stops sampling and waits for the next trigger signal.

The design has two trigger modes:

■ Edge triggering—The system uses a rectangular wave from the shaping circuit as the trigger signal. The positive edge is effective and stable when exceeding the zero point position. Therefore, it makes an effective trigger signal but cannot be controlled flexibly.

■ Level triggering—With level triggering, the Nios II processor sets a level. If the input signal is higher than the set level, the system is triggered to store data. Because we can set the trigger level using stsrem, this process is more flexible than edge triggering.

Figure 11 shows the hardware diagram for the two triggering modes.

Figure 11. Level and Edge Triggering

305

Page 315: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

This module has the following input signals.

■ SEL—Selects the trigger mode.

■ start—Used for level triggering.

■ EDG_START—Used for edge triggering mode.

■ datain[7..0]—Provides the tested signal data obtained by sampling.

■ m[7..0]—Provides the self-defined triggering level.

■ clk—Clock.

When datain[7..0] is greater than or equal to m[7..0], the q output of the bijiaoqi block is high, otherwise it is low. Using a complementer, SEL is changed into the channel-1 signal, which is opposite the original signal. The module performs AND operations with the chufa q output signal above and below, and performs an OR on the result to perform selection.

The ad_zhuan counter stores the result. The Nios II system reads the result and displays it on the LCD according to the address. AND3 generates a clock that serves as the ad_zhuan module’s system clock. When the OR2 output from the ad_zhuan block’s b signal is low, m is low and the address[8..0] output is 0. When b is high and the addr counter reaches a set value, addr does not change and m is high. In the reverse process, m performs an AND operation with the clk signal and the OR2 output. The result is low, which stops ad_zhuan operation. If the counter does not reach the set value, m is 0 and the addr interval counter automatically adds a 1 when it receives a clock cycle.

Figure 12 shows the level and edge triggering emulation.

Figure 12. Level and Edge Triggering Emulation

The datain signal may exceed the defined 98 value because datain[1..0] in bijiaoqi is evaluated as 0 in two bits to avoid sampling data jittering and waveform reversion.

Control PanelThe system sets multiple keys. When the user presses a key, the Nios II processor performs the requested operation, such as changing the time gap (horizontal sensitivity), changing the voltage gap (vertical resolution), data storage, or playback. Because the keys on the DE2 board cannot meet our system’s control requirements, we designed our system to use an additional 15 keys. See Figure 13 and Table 3.

306

Page 316: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

Figure 13. Control Keys

Table 3. Digital Oscillograph Control Keys

Key DescriptionMeasure Displays the measurement menu.

Cursor Displays the cursor.

Trigger Displays the triggering mode menu.

Save/load Displays the save and load menu.

FFT Displays the frequency spectrum graphics menu.

Shift Allows the user to select multiple functions.

fs+/move-up Improves the sampling frequency or moves the displayed waveform up.

fs-/move-down Lowers the sampling frequency or moves the displayed waveform down.

V+/move-up Improves the vertical sensitivity or moves the displayed waveform left.

V-/move-down Lowers vertical sensitivity or moves the displayed waveform right.

Run/stop Run or stops, which determines whether to upgrade the control data.

Auto Runs the system automatically.

Self check Generates a self-check signal with DDS.

Equivalent time-effectiveness Improves the system’s frequency scope operation.

Measure Cursor Trigger Save/Load

FFT

Shift 1 2 3

Fs+/MoveUp

Fs-/MoveDown

V+/MoveUp

V-/MoveDown

EquivalentTime

Effectiveness

SelfCheck Auto Run/Stop

307

Page 317: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

The 1, 2, and 3 keys have a variety of functions, as shown in Table 4.

Software DesignThis section describes the software design.

SOPC Resource AllocationThe Nios II processor provides excellent performance and made it easy for us to meet our design goals. Figure 14 shows our project in SOPC Builder.

Table 4. 1, 2, and 3 Control Key Functions

Mode FunctionMeasure When Shift = 0, the 1, 2, and 3 keys provide channel selection (CH1 or CH2).

When Shift = 1, the 1, 2, and 3 keys select the parameter type (f, T, etc.).

Cursor 1: Selects voltage or time increments. 2: When Shift = 0, the cursor moves up 1, when shift = 1, the cursor moves up 2.3: When Shift = 0, the cursor moves down 1, when shift = 1, the cursor moves down 2.

Trigger 1: Signal source selection (CH1, CH2, or CH1 and CH2).2: Selects continous or single triggering.3: Selects edge or level triggering.

When the 3 key is set for level triggering and Shift = 0, the 1 and 2 keys are the plus-minus for the triggering level.

Save/recall 1: Memory selection (flash01 through flash10 or SD card01 through SD card10).2: Data storage.3: Data playback.

FFT Pressing the Shift key provides channel selection (CH1 or CH2).1, 2: Unused.3: Confirm or cancel.

Self-check Pressing the Shift key selects the preset frequency (10 Hz, 20 Hz, 50 Hz, etc.).1: Selects the preset phase (-180, -135, -90, etc.).2: Selects between sine wave, square wave, triangular wave, and waveform editing.3: Confirm or cancel.

308

Page 318: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

Figure 14. SOPC Builder

The CPU is a Nios II /f processor. The design has 4 Mbytes flash memory and 8 Mbytes SDRAM, which are connected to the Nios II processor via the Avalon bridge. We can run and debug the Nios II processor using a timer, JTAG_UART, and other modules. We added programmable I/O (PIO) peripherals (lcd_db, lcd_rd, lcd_wr, lcd_cs, and lcd_cd), which drive the external emulation LCD and control the display. A 160-MHz phase-locked loop (PLL) provides the equivalent time-effectiveness measurement’s norm frequency.

System Design FlowFigure 15 shows the system’s software flow.

309

Page 319: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 15. Software Flow

Interrupt SystemThe entire system uses interrupts. Figures 16 and 17 show the system flow for the serial port and key interrupts.

Start

System Initialization

Wait for Triggering

Signal Collection

Read Data

Output Waveform

Start FrequencyMeasurement

Refresh Display

End?

Y

N

310

Page 320: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

Figure 16. Serial Interrupt Flow

Figure 17. Key Interrupt Flow

Enter

Clear Interrupt Signal

Open Send Interrupt& Close Receive Interrupt

Send Data

Close Send Interrupt& Open Receive Interrupt

Has SendingFinished?

Y

N

Back

Enter

Key Value

Key Value Jumping

Auto Measurement

Play/Stop

Vertical Sensitivity Increasing

Vertical Sensitivity Decreasing

Scanning Speed Increasing

Scanning Speed Decreasing

1

2

3

Function Selection Key

Display Spectrum Analysis Menu

Display Storage Playback Selection Menu

Display Triggering Selection Menu

Display Cursor Menu

Display the Measurement Menu

Self Check

RST

Equivalent TIme Effectiveness Measurement

311

Page 321: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Spectrum AnalysisThe design’s spectrum analysis uses the common time domain extraction, base 2 FFT. If the length of X(n) is n, divide X(n) into X1 and X2 according to the parity of n as shown in the following equations:

X1(r) = X(2r) for r = 0, 1,…..N/2 - 1

X2(r) = X(2r+1) for r = 0, 1,…..N/2 - 1

The sequence has 2 parts:

■ Calculate the discrete Fourier transform (DFT) and add the result together.

■ Obtain the DFT of the original sequence.

With this principle, divide the sequence by parity until it is the N/2 2-point DFT operation. This operation reduces the computing quantity. Additionally, due to the twiddle factor symmetry (see the following equations) during FFT programming, we use a circulation method to simplify the design.

X(k) = X1(k) + WNK X2(k) for k = 0, 1,…,N/2

X(k + N/2) = X1(k) -- WNK X2(k) for k = 0, 1,…,N/2

Because the time domain extraction method requires input data in reverse order, we have to reverse the data order before the FFT operation. To reverse it, we process the input data using a binary system and reverse the numbers in binary and then convert them into decimal to get the required input order. Figure 18 shows the FFT algorithm.

Figure 18. FFT Algorithm

Start

Reverse Order

Send to x(n),M

L=1, M

J=0, B-1

B=2^(L-1)

P=2^(M -L )* J

k=J,N-1,2^L

Output

End

X(k) = X(k) + X(k + B) * (WN) ^ PX(k + B) = X(k) - X(k + B) * (WN) ^ P

312

Page 322: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

Additional FunctionsThis section describes some additional functions our project contains, including serial port communication, DDS self-detection signals, and data storage, playback, and display.

Serial Port CommunicationThe DE2 board has an integrated RS-232 asynchronous serial communications port, which the design uses to establish the communication protocol, display data on the computer monitor, and obtain data. Our design only uses one computer. Therefore, establishing the serial port communication is simple: we just need to define the data delivery format and order according to the protocol. Our design uses a communication bit rate of 115,200 bits/second, no parity, 8 data bits, and 1 stop bit.

The sender sends data when the receiver sends it a signal. The sender only sends the data once, first using channel 1 and then channel 2. The channel data is in the following order: waveform data, signal frequency of the corresponding channel, cycle, peak-to-peak value, virtual value, and mean value. After channel data is sent, the sender sends a 0 as the final character. The data of the two channels cannot exceed 1,024 bytes.

Figure 19 shows the data send order. There are 8 data types (frequency, peak-to-peak value, virtual value, and mean value), each of which is 32 bits and occupies 32 bytes. The 0s occupy 8 bytes, so all of the data equals 40 bytes.

Figure 19. Data Send Order

0

471 472Waveform data

Frequency

Cycle

Peak-to-peak value

Virtual value

Mean value

Maximum value

Minimum value

511 Sampling points

512 0

983 984Waveform data

Frequency

Cycle

Peak-to-peak value

Virtual value

Mean value

Maximum value

1023 Minimum value

Sampling points

0

313

Page 323: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

We wrote a Visual Basic (VB) program that displays the received data on the computer display. The program sends requests, receives communications data, and displays and adjusts the waveform.

DDS-Generated Self-Detection SignalTo generate the self-detection signal, the system uses a signal generator that generates a sine wave, square wave, or triangular wave, and edits the waveform. The waveform’s frequency, amplitude, and phase are adjustable. When building our design, we had two options for creating the self-detection signal:

■ Solution 1—Use two special DDS devices (AD9850) to implement the signal generator. The AD9850 device can generate sine signals and alter the frequency and phase as long as the device is connected to an elaborate outside clock source. The AD9850 interface is easy to control, enabling the input of control data such as the frequency and phase from 8-bit parallel or serial ports. The output frequency is up to 125 MHz, and the circuit is easy and stable. However, this method only has 5-bit digital phase modulation, which varies the output signal phase between 180°, 90°, 45°, 22.5°, and 11.25° (or by a random combination of these values), which cannot achieve the accuracy of 0.1° that our design requires.

■ Solution 2—Use an FPGA to implement the digital phase shift signal generator. The DDS adds the phase increment as required by the phase, takes the accumulation phase value as the address code to read the waveform data stored in the memory, and goes through D/A conversion and filtering to get the desired waveform. A cycle signal is divided into 512 shares and written to ROM as 512-point data. We can vary the phase and address increments using input to the Nios II processor to control the output frequency and phase. We can also generate other cycle signals using this method. Figure 20 shows a block diagram of this design.

Figure 20. Digital Phase Shift Signal Generation

Using an FPGA to implement the digital phase shift signal source has the following advantages:

■ The system can flexibly control the frequency and phase.

■ Phase difference generation is very accurate and controllable.

Because of these advantages, we chose solution 2. Figure 21 shows the block diagram of this implementation for the case in which only one waveform is generated.

PhaseAccumulator

FrequencyControl

Words M'

CLK

Add[8..0]

Phase Shift Value

SineTable

D/A

Add SineTable

Add[8..0]

Reference

D/APhase Shift Signal

314

Page 324: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

Figure 21. Signal Generator Hardware Block Diagram

As shown in Figure 21, yuan_1 has three input signals:

■ dds_f[31..0]—Presets the frequency, Din.

■ dds_p[31..0]—Presets the phase.

■ clk—Reference clock.

When the system starts, it first writes data into the two 512-bit dual-port RAM (RAM1 and RAM2), which saves storage space and allows the waveform to be modified at any time to obtain the desired waveform. Next, the system presets the frequency and phase. At every clock cycle, the preset number adds to itself. The results are placed in the internal counter count, and the higher 9 bits of count are the RAM1 read address. The system adds the phase preset number dds_p[31..0] to the higher 9 bits of count and uses the result as the RAM2 read address, which offsets the waveform phase.

count has 32 bits. It automatically overflows when it is full because the RAM’s data capacity is 512 bits. This design ensures that the data and address match. When count is full, it outputs a cycle of the required waveform signals. The signal is digital. The system can convert the signal from analog to digital using the ADI7123 device on the board and then output it through the VGA port.

The relationship between the output signal frequency and the preset frequency, Din, is:

fout = Din x fclk/(2N)

Changing the Din preset frequency changes the output signal frequency.

Figure 22 shows the hardware simulation that directly displays the waveform data.

Figure 22. Digital Frequency and Phase Modulation Function Simulation

315

Page 325: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Using the Nios II processor, the system generates varied waveform data and saves it to the dual-port RAM. The system can modify the generated waveform by setting keys that edit the DDS waveform parameter on the control panel. For example, by pressing different keys, the user can change the generated signal’s waveform, frequency, peak-to-peak value, phase, and the harmonic component in the frequency spectrum. This action allows the user to display the waveform, measure parameters, and analyze the frequency spectrum.

Data Storage and PlaybackThe waveform can be stored in flash or SDRAM while it is displayed. Then, the user can use the playback function to re-display the data on the LCD.

Data Display in Other RegionsThe portable SD card allows the data to be displayed in other regions. The user can save the data into the SD card using the control panel. When reading and writing to the SD card, the key problem is timing. The program must adhere to strict read/write timing to read and write data to/from the SD card. Figures 23 and 24 show the read/write timing schematics.

Figure 23. Read Timing

Figure 24. Write Timing

Table 5 describes the terms used in the figures.

Table 5. Timing Codes

Code DescriptionS Start bit (= 0)

T Transmitter bit (Host = 1; Card = 0)

P One-cycle pull-up (= 1)

E End bit (= 1)

Z High impedance state (-> = 1)

D Data bits

X Don’t care data bits (from SD card)

* Repetition

CRC Cyclic redundancy code bits (7 bits)

Card active

Host active

316

Page 326: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

System TestWe used a variety of tests to verify the system’s operation to determine whether:

■ The waveform is accurately displayed in LCD.

■ There are glitches.

■ There are errors in the frequency test and the degree of the error(s).

■ The communication between the system and computer is successful.

To achieve the testing objectives, we used the following devices:

■ TDS 210 mixed-signal oscilloscope

■ YB 1605 function signal generator

■ Computer

■ Standard serial line

■ SD card (1 Mbyte)

■ Positive and negative 12-V power source

Waveform TestFor this test, we connected the system, let the signal source emit a signal, input the signal to the system, connected the oscilloscope with the signal source, and looked to see if the waveform displayed on the oscilloscope was as the same as that displayed on the system LCD. We used sine wave, triangle wave, and square wave signals as test input.

Our test result showed that the waveforms on the oscilloscope and LCD are almost the same and the waveform display functions correctly.

Frequency TestFor this test, the input signal’s waveform and voltage remain unchanged while the frequency changes from 10 Hz to 500 kHz. We checked the frequency column on the LCD to see the result. We used two different signal routes to test the two channel frequencies. To avoid the impact of an unstable signal source on the signal, the design uses the TDS210 oscilloscope’s value as the standard.

Table 6 shows the channel 1 testing result.

Table 6. Channel 1 Frequency Test Result

Signal Frequency (Hz)

8 100 1.100 k 9.99 k 99.96 k 300.27 k 500.25 k

Testing value (Hz) 7 99 1.099 k 9.979 k 99.94 k 300.2 k 495.9 k

TDS210 measurement 7.072 99.11 1.100 k 9.980 k 100.0 k 300.8 k 501.3 k

Error 1.02% 0.11% 0.09% 0.01% 0.06% 0.2% 1.08%

317

Page 327: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Table 7 shows the channel 2 testing result.

Voltage TestIn this test, we left the input signal frequency unchanged while changing its peak-to-peak value. We recorded the test value obtained in the system and calculated the error. Like the frequency test, we use two signal routes to test the two channels while taking the TDS210’s value as the standard. The frequencies of the two input signal routes are 10 KHz.

Table 8 shows the channel 1 testing result.

Table 9 shows the channel 2 testing result.

Serial Communication TestIn this test, we connected the asynchronous serial communication interfaces of the system and computer with a serial cable. Then we started the program and watched the waveform and data displayed on the computer screen.

According to the test, the design correctly displays the waveform while changing the system’s input signal. If the receiver (computer) sends an inquiry, the waveform displayed on the computer screen is also changed. The display correctly displays data such as the testing signal’s frequency.

DDS Self-DetectionAccording to this test, the self-detection waveform is generated correctly.

Table 7. Channel 2 Frequency Test Result

Signal Frequency (Hz)

8 100 1.100 k 9.99 k 99.95 k 300.24 k 500.22 k

Testing value 7 99 1.098 k 9.975 k 99.96 300.2 k 500.1 k

TDS210 Measurement 7.062 99.21 1.099 k 9.960 k 100.5 k 299.9 k 499.2 k

Error 0.88% 0.21% 0.09% 0.15% 0.54% 0.1% 0.18%

Table 8. Channel 1 Voltage Test Result

Signal Peak-to-Peak Value

0.2 V 1.4 V 2.5 V 5.0 V 7.0 V 10.0 V 15.0 V

Test value 0.221 V 1.38 V 2.54 V 4.98 V 7.23 V 10.5 V 16.0 V

TDS210 measurement 0.215 V 1.33 V 2.48 V 4.96 V 6.84 V 10.1 V 15.8 V

Error 2.71% 3.76% 2.42% 0.40% 5.70% 3.96% 1.27%

Table 9. Channel 2 Voltage Test Result

Signal Peak-to-Peak Value

0.2 V 1.4 V 2.5 V 5.0 V 7.0 V 10.0 V 15.0 V

Testing value 0.211 V 1.34 V 2.46 V 4.87 V 7.0 V 10.2 V 15.8 V

TDS210 measurement 0.218 V 1.32 V 2.44 V 4.96 V 6.96 V 10.0 V 15.8 V

Error 3.21% 1.51% 0.82% 1.81% 0.57% 2% 0

318

Page 328: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

Test SummaryOur tests proved that the system can perform all functions specified by the system design. The equal precision frequency measurement and equal time-effectiveness sampling methods improve testing accuracy and efficiency. Additionally, the system has little delay and error, and can meet general demands.

ConclusionThe four-month project increased our understanding of FPGAs, from the theory to hands-on experience. During the project, we were impressed by the powerful Quartus® II software; specifically, the many commonly used intellectual property (IP) cores that are available in SOPC Builder. We could modify them to fit our design needs, which made the design process easy and convenient. Additionally, the system permitted us to add our own custom IP cores, allowing us to meet customer demands while making the design flexible. The Quartus II software provides all of the functions that are required for the design process from beginning to end. The graphics and user interface helped us use the software more easily. The Quartus II software contains the Nios II Integrated Development Environment (IDE), so we could use that graphic interface for software programming, compiling, and design modification. We could even use the interface for system downloading and operation.

By participating in this competition, we obtained new knowledge, applied our existing knowledge, and mastered the skills required to create systematic, orderly designs. The competition also helped us develop our overall viewpoint. For example, when discovering a design fault, we learned not to just modify the code, but to consider the overall situation.

We also learned the importance of teamwork. As the saying goes, “let the expert deal with his specialty.” Each member of the design team should work on the area that he/she is best at, ensuring that the design is completed in time and that problems are found and solved as soon as possible.

Finally, we want to thank our instructors and all the people who supported us during the design process.

ReferencesCollection of Work of the Fifth National Undergraduate Electronic Design Contest Winners (2001). XiDian University Press, 2006.

Hongwei, Li and Yuan Sihua. Quartus II-based FPGA/CPLD Design. Electronic Industry Press, 2006.

Yumei, Ding and Gao Xiquan. Digital Signal Processing (2nd edition). XiDian University Press, 2004.

Hubei Undergraduate Electronic Design Contest. 2006.

Wang Dong. Visual Basic Program Design (2nd edition). Tsinghua University Press, 2002.

Haoqiang, Tan, Zhang Jiwen, and Tang Yongyan. A discourse of C Programming. Higher Education Press, 2003.

Xie Jiakui. Electronic Circuit (Linearity). Higher Education Press, 2003.

Shujun, Guo, Wang Yuhua, and Ge Renqiu. Principle and Application of Embedded Processor-Nios II System Design and C Programming. Tsinghua University Press, 2004.

SD Group. SD Memory Card Specifications. 2000.

Cheng, Wang, Wu Jihua, and Fan Lizhen. Altera FPGA/CPLD Design. 2005.

319

Page 329: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Screen ShotsFigures 25 through 31 provide screen shots of our system.

Figure 25. Waveform Data Storage and Invoking Interface

Figure 26. Trigger Selection Interface

320

Page 330: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

Figure 27. Square Wave Spectrum Analysis

Figure 28. Waveform Editing

321

Page 331: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 29. Voltage Measurement Between Any Two Points

Figure 30. Self-Measured Signal Frequency Phase Setting

322

Page 332: Nios II Embedded Processor Design Contest - Outstanding - Altera

Digital Oscillograph

Figure 31. Computer Screen Display for Serial Port Communications

Appendix: CodeRefer to the PDF of this paper on the Altera web site at http://www.altera.com for code that was created for this project.

323

Page 333: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

324

Page 334: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Air-Jet Loom Control System

Third Prize

Nios II-Based Air-Jet Loom Control System

Institution: Donghua University

Participants: Yu-Bin Lue, Hong Chen, and Bin Zhou

Instructor: Ge-Jin Cui

Design IntroductionWidely used in the textile industry, the air-jet loom is one of the fastest, shuttleless looms today. The loom has significant control system requirements. With multiple tasks (for example, the ZA205 loom’s control board needs 61 I/O ports) and demanding response time requirements, the control system’s performance usually determines the air-jet loom’s capability. The loom usually needs to complete a series of movements in a few milliseconds, such as opening, weft insertion, beating up, warp let-off, fabric take-up, weft cutting, weft storage, etc.

The advanced air-jet looms overseas use 32-bit embedded systems, while China still uses 8-bit monolithic processors and digital integrated circuits. Because other control systems have a high price and maintenance costs, a reliable substitute with excellent price/performance is of significance to China in loom production and maintenance cost reduction compared to imported looms. The air-jet loom does not have a customized 32-bit microcontroller (MCU), so the MCU application inevitably wastes resources.

With the rapid development of electronic technology, system-on-a-programmable-chip (SOPC) performance has improved. As a leader in the industry, Altera has developed the Nios II processers. The latest Nios® II processor provides performance up to 200 DMIPS. The user can select the core type to balance performance and size. Additionally, designers can use multiple Nios II CPUs, custom instruction sets, and hardware acceleration, improving performance. The μC/OS-II multi-tasking, RTOS provides flexible, efficient applications.

In industrial sites, FPGAs integration capability and rich I/O resources improve the system’s anti-interference capability and satisfy I/O requirements. By studying the ZA205 air-jet loom’s control system, we were able to implement a Nios II-based control board.

325

Page 335: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Because of time and other conditions, the system is restricted to a simulated platform. We give priority to the warp tension control to implement constant tension control without overshoot, because this is a key factor affecting the cloth quality. In the system, the signals output by the photoelectric coder use the remaining FPGA resources to build a small Nios II system that generates Gray code, with the relay’s control signal indicated by a LED.

Function DescriptionThe air-jet loom computer control system is a typical direct digital control (DDC) system. It has four layers: collection/execution, I/O channel, main control loom, and peripheral devices. The control board belongs to the main control loom and precisely runs all organization and fittings. This section describes the ZA205 air-jet loom’s control system and the control board functions.

Warp Let-off ControlTo ensure continuity of the production process, the loom must be constantly supplied with tensioned warp yarns. The knitting must be drawn away from the opening while the system winds the knitting onto the roller. The system has the following controls:

■ Tension control—The servomotor controls the warp yarn tension and slack according to the signal collected by the tension sensor. This process ensures that the warp yarn’s tension is controlled properly during opening, warp loosening, and warp let-off.

■ Manual warp yarn loosening and tightening—When the machine is not operating, the user can manually tighten or loosen the warp by controlling the servomotor’s forward and reverse rotator.

■ Fabric take-up control—This function pulls the finished cloth away from the opening in time as required by the weft density so that the position of the opening does not vary with the new weft-yarn, ensuring successful production.

Crank Angle CollectionThe loom’s crank angle is an indicator of the loom’s operation state. It is collected by the 256-line absolute photoelectric coder in the signal format of binary Gray code. According to this signal, the control board indicates the loom’s operational state and send commands accordingly.

Picking Control According to the Crank AngleThe air-jet loom is a shuttleless loom that inserts the weft with the airflow’s frictional traction force on the weft yarn. Most air-jet looms today use main and auxiliary nozzles to relay air jetting and weft insertion. Therefore, weft insertion control mainly focuses on controlling the valve solenoid of the main and auxiliary nozzles.

■ Main nozzle control—This control draws the weft yarn from the weft feeder using a high-speed airflow and completes the first phase weft insertion.

■ Auxiliary nozzle control—The auxiliary nozzles pull out the yarns introduced by the main nozzle, relaying weft insertion.

■ Weft stop pin control—After weft insertion, this process sends the weft stop pin signal to the FDP system (weft feeder) to make it ready for the next round of picking.

■ Nozzles cutting control—Uses the electronic knives to cut the weft yarns.

Main Motor Loop ControlThe main motor drives the loom by a pulley and uses a gear to drive the other mechanisms such as the cam harness mechanism, crank beating mechanism, and electric take-up mechanism. The air-jet loom’s

326

Page 336: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Air-Jet Loom Control System

operational states are divided into quick action and slow action; the slow action can be further divided into forward and backward. This section describes control circuit functions of the main motor.

■ Star-delta start control—Starting the main motor too strongly jeopardizes the electric network and motor, so the system adopts a star-delta press reduction start mode. The system connects in a delta when starting the motor and then switches the relay when it receives the control signal from the I/O port of the control system. As the system comes up to speed, the motor connects in a star.

■ Slow forward and reverse control—When the loom needs to run slowly, the AC switches from 50 Hz to 5 Hz, so that the loom can accurately stay at a certain angle when faults such as broken warp and weft occurs.

■ Braking control—This function uses different braking methods when it receives different braking signals, such as:

● Emergency braking—Stops the loom using the main motor while shutting off all power.

● Safe standstill or scram—Loosens the clutch and then brakes suddenly to slow down the motor. After a while, it brakes slowly to stop the motor. Finally, it stops the motor at a certain angle by slow action.

Network Remote Monitoring FunctionTo facilitate monitoring, the loom’s control system transfers its operation state and parameters into the server through the network. The following table shows the control board’s I/O.

Performance ParametersThe control parameters are described below.

■ Tension control density—< 5%

■ Touch screen operation response time—0.1 second

■ Speed of the simulated main motor—3,000 RPM

Control Board I/O

Control Board I/O Number of Routes

Direction

Crank angle 8 Input

State collection 11 Input

Button signal 10 Input

Main nozzles control 2 Output

Auxiliary nozzles control 12 Output

Weft stop pin signal 2 Output

Yarn shield pin signal 2 Output

Cutting nozzles control 2 Output

Main motor control 8 Output

Alarm light signal 4 Output

327

Page 337: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

■ System power supply—±5 V, 10 V

■ IPV4 data frame send frequency—10 Hz

Design ArchitectureFigure 1 shows the control structure block diagram.

Figure 1. Overall Block Diagram

Figure 2 shows the warp let-off motor control block diagram.

Touch ScreenConsole

Main Motor Control

Server

Automatic Stop

Open and shut o ffall the air valverelay on time

Main Motor Photoelectric Coder

Warp Yarn Detector

Warp Stop Detector

Tension Sensor

Warp StopControl

TensionControl

Air Valve Relay

Weft StopControl

SynchronousControl

Button

Alarmlight Weft Stop

Control

Warp StopControl

SynchronousControl

Tension Control

Button

M

M

Emergency Braking Braking

Starting

Slow Forward Slow Reverse Pick Finding

ParameterSettings State

Display

Start/Stop

Fast Forward, SlowForward, Reverse

Weft Insertion Control

Warp Yarn Tension Control

Tighten the Warp Yarn

Loosen the Warp Yarn

Tighten theWarp Yarn Loosen the Warp Yarn

Warp Let-Off Motor

328

Page 338: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Air-Jet Loom Control System

Figure 2. Warp Let-off Motor Control Block Diagram

Figure 3 shows the hardware block diagram.

Figure 3. Hardware Block Diagram

Figure 4 shows the main software flow.

ManualInstruction

PresetTension Value

ControllerServomotor

DrivingModule

Warp Let-OffMotor Warp Yarn

TensionSensor

GP 320 x 240Touch Screen Server

8-MbyteSDRAM

4-MbyteFlash Memory EPCS16

UART DM9000ASDRAM

ControllerCommon Flash

InterfaceEPCS

Controller

CPU0Timer

CPU2Timer

SDRAMController PLL

Avalon Tri-StateBridge

Relay PIO

CPU1Timer

On-ChipRAM

State PIO Button PIO Angle PIO

Clutch PIO Step MotorPIO

TensilitySensor PIO LED PIO Motor PIO

Weft PickingControl Relay

PhotoelectricCoder

Main MotorDrive Module

ElectronicClutch

ServomotorDriver

Warp YarnTension Sensor

ModuleLEDs Touch Button State Sensor

AvalonSwitchFabric

Nios IIProcessor0

Main

Nios IIProcessor1

I/O

Nios IIProcessor1

Gray

329

Page 339: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 4. Main Software Flow

Design MethodologyThe design provides multi-tasking with multiple Nios II processors and shares data between the processors. It adds a variety of peripherals to the bus. See Figure 5.

Figure 5. SOPC Builder

Initialization

Inquiry TouchScreen Register

Is SomethingEntered?

N

Y

Encode the Instructions& Send Them to the

Corresponding Tasks

Touch Screen Operation

Initialization Initialization

Read the Data fromthe Tension Sensor

Within thePreset Scope?

Calculate the TensionTuning with the Digital

Control AlgorithmKnitting Winding

with Specified Speed

Deliver Warp Let OffMotor Control Instruction

Read the Picking Signal

Valid?

Weft PickingOperation

Reach Weft?

Normal

Abnormal

Modify Weft PickingParameter, Fault

Settlement

Warp Let Off Control Weft Picking Control

N

Y

N

Y

330

Page 340: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Air-Jet Loom Control System

The system’s functions are warp yarn tension control, touch screen communication, network monitoring, main motor and nozzle control, etc. It uses three CPUs, which are described as follows:

■ CPU0—Warp yarn tension reading, servo motor control, and nozzle control.

■ CPU1—Touch screen and network communication.

■ CPU2—Main motor movement simulation and Gray code generation.

The following sections describe these functions.

Warp Yarn Tension ControlThe following sections describe the warp yarn tension control function.

Data CollectionThe warp yarn tension is indicated by the sensor of its pulse signal duty cycle. After the signal is reshaped by a Schmitt trigger, the system filters it to obtain the tiny DC level proportional to the tension. Then it is enlarged by a multi-stage amplifier to obtain the 0 to 3 V valid signal (after adding the DC offset). The system performs analog-to-digital (A/D) conversion of the valid signal. Because of limited I/O resources, the A/D converter selects the serial output TLC0832 device. Because the A/D converter can only work in slave mode, the system uses Verilog HDL to write the driving interface module. After A/D conversion, the sampled digital signals are sent into the shared memory for the CPUs to read.

Servomotor ControlThe system compares the tension sensor reading with a preset value and then adjusts the warp let-off motor by the digital process ID (PID). The warp let-off motor is REDCOM’s 57ZW-10W brushless DC servomotor that has three Hall output routes. Figure 6 shows the schematic diagram.

Figure 6. Warp Let-off Motor Schematic

The schematic uses Sanken’s SLA6023 chip, which is used specifically for brushless DC motors. The chip is composed of three Darlington doubled pipes, as shown in Figure 7.

331

Page 341: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 7. Sanken SLA6023 Chip

Figure 8 shows the motor-driven connection lines.

Figure 8. Motor-Driven Connection Lines

To drive the motor, the system determines the wheel rotor’s position by reading the three-route Hall sensor. It controls the current direction of the A, B, and C stator winding by turning on/off the Darlington pipes according to the drive timing.

The timer interruption mode uses pulse width modulation (PWM) and initializes according to the timer core’s register map. The following code describes the initialization process:

//Clear TOIOWR(TIMER_1_BASE,0,0x0);//Set START,CONT,ITO?start the timer and make cycle count, and //generate system interruption on the crossover pointIOWR(TIMER_1_BASE,1,0x07); //Registration interruption responsive funtion alt_irq_register(TIMER_1_IRQ,edge_capture_ptr,handle_timer1_interrupts);

The following code sets the count cycle in the handle_timer1_interrupts interrupt response function:

IOWR(TIMER_1_BASE,2,periodl);IOWR(TIMER_1_BASE,3,periodh);

Where periodl is the starting value’s low count address and periodh is the high address.

InverterPermanent - MagnetSynchronous Motor

ia

ib

A

CB

ic

DC

+

-

E

332

Page 342: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Air-Jet Loom Control System

During program operation, the system dynamically alters the values in the two registers according to the PID-calculated output, implementing the corresponding PWM duty cycle control.

Touch Screen CommunicationsThe touch screen is Digital’s 37W2 6-inch blue LCD for industrial graphics, featuring 320 x 240 resolution and 16 x 12 touch keys on each screen. It also has a graphic interface development environment, which is very convenient for the user. The screen has an RS-232 communications interface, which operates slowly. So that it does not affect the overall speed, the system usually sets independent threading for the serial port reads/writes in the PC’s WIN32 program.

In this design, a designated CPU communicates with the screen. It polls the touch screen’s registers, reads the corresponding data when there are operations, and sends operation instructions (such as starting and stopping) into the data sharing area, so that the other CPUs can perform as instructed by the instructions they receive. The user can also use the screen to change the parameters and password. The system writes data to the flash chip to initiate the touch screen when the power is on, which saves the parameters in the event of a power failure. Additionally, if the system becomes abnormal, the screen sends an instruction to control the alarm interface.

Figure 9 shows the touch screen interface.

Figure 9. Touch Screen Interface

Network MonitoringThe current loom is an independent entity, requiring the workers to walk back and forth to watch the state of each loom. We tried a few methods to solve this problem, and decided to add a control system with a network monitoring function. Even without an operating system, the system can generate a network data frame, packaging the data in IPV4 frame format. Then, it delivers the data to a server via the network interface.

We used Visual C++ 6.0 to develop the server side monitoring software, which displays the warp yarn tension’s preset value and actual value in dynamic curves, and monitors the cooresponding loom according to its number. Figure 10 shows a section of the software, in which the green curve is the tension’s preset value and the red curve is the actual value (the curve data sampling frequency is 10 Hz).

Start Start

Normal InchTurning

ManualTightening

ManualLoosening

Reverse InchTurning

Rotating Speed Display Alarm Light

W eft Yarn Control

AlarmInformation

ParameterSettings

NozzlesTrial Jetting

Rotating Speed: 1,000 rpm

Warp Let-Off Control

Warp Yarn Tension:Preset Value: 100Actual Value: 100

Primary Nozzle Angle: 30°Reaching Weft Angle: 60°

Main Motor Control

333

Page 343: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 10. Server Monitoring Software

Main Motor and Picking Nozzle ControlThe loom is driven by the main motor. Its working state is shown by the photoelectric coder on the crankshaft that is connected to the motor. To reduce the peak current pulse caused by the coding, the output codes adopt the most reliable coding method, which is binary Gray code.

Our system is a simulated platform without actual photoelectric coding components. Therefore, it uses an independent CPU to simulate the main motor movements, generating 8-bit Gray codes to simulate the photoelectric coder output. Its primary task is to generate Gray codes and put them into the data sharing area according to the main motor’s operational state after obtaining the touch screen operation instruction. It has four states: starting, stopping, normal inch turning, and reverse inch turning. The main CPU turns on/off the nozzles according to the Gray code value and the loom’s state (simulated by the seven-segment LED).

In the main motor simulation of the above tasks, the CalcGray() function we used for Gray code calculation has many cycles, is independent, and is used frequently. Therefore, we used the Altera® C2H compiler and generated the SOPC Builder component accelerator_Gray_Generator_CalcGray. SOPC Builder automatically generates the function interface for the new component, which is very convenient for the user.

Design FeaturesOur design has the following features:

■ Achieves data sharing communication between the three Nios II processors and uses a large CPU instruction cache.

■ Generates and delivers the IPV4 frame format network information packet.

■ Uses the C2H compiler to perform hardware acceleration for frequently used functions.

334

Page 344: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Air-Jet Loom Control System

■ Uses many other peripherals, which are connected to the Avalon® bus.

■ The traditional multiple monolithic computers architecture are affected by electronic interference in industrial sites, and are unable to maintain a stable system. The Nios II processor integrates most of the system on a FPGA, increasing the system’s stability and enhancing its anti-interference capability.

■ This system involves the communication between modules. The configurability of the Nios II processor let us add a UART, SDRAM, the LCD interface, and a custom interface, avoiding the resources wasted by MCU applications, accelerating research and devlopment, and lowering costs.

ConclusionThe Altera Quartus® II software and Nios II Integrated Development Environment (IDE) are revolutionary in system development. They have played a leading role in software-like-hardware design and simplified the design process.

During the competition, we spent a lot of time figuring out how to initiate the multiple CPUs simultaneously. We experimented with many methods, including modifying SDRAM clock deviation, bringing a phase-locked loop (PLL) from the inside to outside of the project, modifying CPU types, and modifying the code storage position in flash. Despite the strong support we received from Altera via the MySupport site, some problems remained unresolved until the end of the competition. In the end, we modified our plan according to the advice we received, canceling one CPU that was to be used for servomotor control and only using three. Meanwhile, we integrated the tasks into the main CPU. Although this solution increases stability at system start-up, other problems occur, such as too many tasks in the main CPU and non-specific labor division. Because of these reasons, we were not able to spend enough time on software development and peripheral tuning to complete all of the functions we planned.

335

Page 345: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

336

Page 346: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Audio-Controlled Digital Oscillograph

Third Prize

Nios II-Based Audio-Controlled Digital Oscillograph

Institution: Xian Jiao Tong University

Participants: Wan Liang, Zhang Weile, and Wang Wei

Instructor: Penghui Zhang

Design IntroductionThe oscillograph is a common instrument that plays a key role in many experiments. Because of its design structure, the oscillograph cannot work in some environments. For example, it may not work when a waveform must go through projector for a demonstration or if the waveform needs to be adjusted while the user is performing other operations.

With these issues in mind, we designed an audio-controlled digital oscillograph that uses an audio recognition algorithm to control the oscillograph; specifically, it controls the waveform display size attenuation and amplification as well as the scanning speed. The oscillograph display uses a standard VGA interface that can display waveforms on a PC monitor, projector, or other standard output device. Additionally, the system leverages the advantage of FPGAs for digital signal processing (DSP) and hence, for computing the period, valid value, and average value for the input signal and fast Fourier transform (FFT) algorithm-based frequency spectrum analysis.

The whole system is designed with Nios® II-based system-on-a-programmable-chip (SOPC) technology. The design has a front-end acquisition circuit on the Altera Development and Education (DE2) development board. After the acquisition circuit collects analog signals, it outputs the signals to the FPGA for processing. The corresponding output is generated according to the control information.

We use SOPC technology for the design and we integrate the Nios II processor into the FPGA. Our solution implements a system-on-chip (SOC) solution, which reduces the system size and overall power consumption. Additionally, we can change the system model and easily make upgrades because the main structures of the system are all implemented within the FPGA. This implementation makes the design universal and prolongs the system life. The system requires high-speed data processing such as a fast FFT function and real-time hardware implementation of the audio recognition algorithm, which we could not solve using a traditional microcontroller (MCU). With an FPGA solution, we could customize Avalon® bus-based peripherals via custom instructions and implement them with a high-

337

Page 347: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

speed FPGA and corresponding development environment, significantly improving the system’s DSP capability.

Function DescriptionThe whole system is composed of the front-end acquisition circuit and DE2 development board. The front-end acquisition circuit collects analog signals, and then processes and displays them using the FPGA on the board. The system contains control modules for the Nios II processor, audio recognition, data computing, FFT, and display. Figure 1 shows the overall system structure.

Figure 1. Overall System Structure

Front-End Acquisition CircuitThe data acquisition circuit is composed of attenuation control and the amplified circuit with a fixed multiple. Figure 2 shows the circuit design. The first-level operational amplifier is the voltage follower that provides high-input impedance. The following output connects to a fixed-multiple amplifier and then connects with a resistor ladder that varies the attenuation signal. The next switch selects the signal with different attenuation. The selected signal is sent to the analog-to-digital converter (ADC) for quantitative output after amplification and attenuation by the same fixed multiple.

Control Audio Audio Codec

Data Acquisiton

Audio Recognition

FFT Module

Data Computing

VGA Control

LCD Display Output

VGA Display Output

LCD Control

Nios II Processor

338

Page 348: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Audio-Controlled Digital Oscillograph

Figure 2. Front-End Acquisition Circuit

Audio Control ModuleThe audio control function implements an audio recognition algorithm and collects external analog audio signals through a microphone. It transforms the audio signal using the WM8731 audio encoding/decoding device on the DE2 development board, and sends the collected digital signal to the audio recognition module for recognition. If it is recognized as a control command, the system automatically implements the corresponding operations; otherwise, the system does not respond.

FFT ModuleThe FFT module changes the input signal in real time, and can synchronously display the spectrum map on the output terminal. The general FFT algorithm has a butterfly operation structure; the input is ordered in code-reversed sequence and the output is ordered in natural sequence. The operation structure is clear and simple. However, in this structure, each butterfly output data is still saved in the data storage unit of the original input. The input data addressing in the same location on different levels is not fixed; therefore, it is hard to implement cycle control. Additionally, it is hard to operate the function in parallel when using an FPGA.

In this design, we use a fixed geometric structure FFT operation. The operational structure is quite similar to the general butterfly structure, but it has some differences in the data and operation sequence. After modification, there is no change in the sequence of input and output data in each level; in this way, the geometric structure in each level can be fixed. With this structure, the design can perform cycle addressing, facilitate programming with the FPGA, and implement an internal parallel FFT hardware structure that reduces the FFT operation speed dramatically. The “Design Architecture” section describes the fixed geometrical structure FFT in more detail.

Data Computing ModuleThe general digital oscillograph can compute the signal’s real-time period, average value, and peak-to-peak value. The FPGA provides speed advantages for DSP applications, making real-time computing of these parameters easy.

VGA Display ModuleCathode ray tube (CRT) monitors have three colors: RGB. The color is created by shooting the CRT-generated focusing electrode (Eb) on the screen. The scan begins from the top left of the screen to the

339

Page 349: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

right and from the top to the bottom. When a line finishes scanning, the Eb goes back to the starting point of the next line on the left of the screen. The CRT masks the Eb during this period and prepares for the next scan. Therefore, a general VGA monitor has 5 signals:

■ R, G, and B—The 3 basic color signals (10 digits).

■ HS—Horizontal line synchronized signal.

■ VS—Vertical field synchronized signal.

The monitor must strictly abide by the VGA industry standard, i.e., 640 x 480 x 60 Hz. The standard’s corresponding frequency is:

■ Clock frequency—25.175 MHz

■ Line frequency—31,469 Hz

■ Field frequency—59.94 Hz

The key to implement VGA display is to provide the clock signals, which drive the monitor’s line and field scanning.

The DE2 development board contains the ADV7123 control chip, which can be used for VGA display control. The chip implements the time sequence of all standard signals required for VGA display, masks the bottom-level structure, and enables the user to implement the VGA display easily. Refer to the “Design Architecture” section for more details.

LCD Display ModuleThe VGA display module displays waveforms and the graphical user interface (GUI). It also displays simple system information that is not convenient to display on the oscillograph, such as initialization and system status settings. We display this information on the DE2 board’s LCD monitor. The Nios II hardware abstraction layer (HAL) provides a convenient interface for using the LCD: the designer can use it by simply calling the interface function.

Performance ParametersFigure 3 shows the resource usage for the VGA oscillograph module.

Figure 3. VGA Oscillograph Module Resource Usage

340

Page 350: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Audio-Controlled Digital Oscillograph

Figure 4 shows the resource usage for the audio recognition module.

Figure 4. Audio Recognition Module Resource Usage

The design has the following system parameters:

■ CPU parameter—The system uses a 50-MHz Nios II clock frequency.

■ Sampling circuit—The sampling frequency of the front-end analog signal is 25 MHz. The bandwidth is designed as 1 MHz, and the gear is designed as x100, x50, x20, x10, x5, x2, x1, x1/2, x1/5, and x1/10, or 10 gears in all.

■ VGA display module—We use the VGA industry standard with a data refresh rate of 4 Hz.

■ FFT operation module—We use a fixed geometric structure algorithm and pipelined output. One butterfly operation is completed every 8 clock cycles, and the 512-point FFT requires about 370 μs.

■ Audio recognition module—We use the Durbin algorithm with a recognition rate of more than 90%.

■ Input signal parameters:

● Analog signal input extent—±5 V

● Bandwidth—300 KHz

● Analog signal sampling frequency—25 MHz

Design ArchitectureThis section describes the architecture of our design.

Sampling Circuit DesignTo process and display the analog signal, the system must first convert the signal from analog to digital using the front-end sampling circuit. The system design displays general waveforms with 1-MHz bandwidth and an amplitude of less than ±2 V, such as sine waves, square waves, and sawtooth waves. According to the Nyquist sampling theorem, the sampling frequency must be greater than 2 MHz to convert the 1-MHz signal without distortion. For non-sine waves, the sampling frequency is 30 MHz. In this case, a signal within the bandwidth could be created with harmonic waves more than 30 times, resulting in a better waveform.

341

Page 351: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figures 5 and 6 shows the sampling circuit schematic diagrams.

Figure 5. Sampling Circuit

Figure 6. Sampling Circuit Design Layout

Forestage Protection, Amplification, and Shift CircuitHigh-speed A/D conversion has strict requirements for the level and quantification ranges. Therefore, we need to do some processing on the signal before A/D conversion, including limited amplitude protection, variable attenuation control, amplification, and level conversion. Figure 7 shows the forestage processing circuit.

Figure 7. Sampling Forestage Amplification and Shift Circuit

The SMA connector inputs the analog signal and provides limited amplitude protection for positive and negative power supplies through a ground TVS pipe and two Schottky diodes. A 1-MΩ ground resistor provides high-input impedance. The operational amplifier is the OPA603 device. The device supports

342

Page 352: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Audio-Controlled Digital Oscillograph

a 100-MHz bandwidth gain to ten times the gain in the ±9 V input amplitude, which generates the oscillation’s large input amplitude. The OPA603 signal output is connected to the ground with the resistor ladder to obtain different attenuation signal outputs by shift tapping. The system uses three attenuations in this level: x1, x1/10, and x1/100.

The shift function is implemented by selecting different attenuation signals using the broadband simulation switch, DG641. The bandwidth of the DG641 device is 500 MHz and the 5-MHz crosstalk is at -85 dB, which can implement on-off control of the 4-channel PN power supply. The signals output by the resistor ladder access the three-input gateway. To select different signals, we connect the appropriate output terminals.

We connect the selected output signal to the level-1 amplifier for a fixed number of times (10) to perform broadband operational amplification using the OPA680 device. The gain bandwidth is 400 MHz and the power supply is ±5 V, which we can use to amplify the ±4 V signal.

Using the level-1 attenuation, we can obtain four other attenuation types: x1, x1/2, x1/5, and x1/10 combined with the previous x1, x1/10, and x1/100. We can choose from ten attenuations: x1, x1/2, x1/5, x1/10, x1/20, x1/50, x1/100, x1/200, x1/500, and x1/1,000.

The level-2 attenuation output is amplified 10 times. The amplification of this level implements the signal amplitude adjusting and direct current (DC) level transfer, which changes the signal into the level appropriate for A/D operation. With the above-mentioned attenuation, it can implement 10 shifts with the range of ±5 V, ±2 V, ±1 V, ±500 mV, ±200 mV, ±100 mV, ±50 mV, ±20 mV, ±10 mV, and ±5 mV.

AD Conversion CircuitThe ADC uses the BB ADS902 device with operating parameters of 10 bits, 30 MHz, and a quantization amplitude of 1 V, which requires an external reference voltage norm. Figure 8 shows the A/D conversion circuit.

Figure 8. ADS902 Conversion Circuit

The ADS902 device has several power supply and grounding pins. To lower the interference, it performs multi-point filtering for each power supply. At the same time, it filters the reference voltage norm and the common mode voltage output terminal. The ADC analog input terminal and the data output terminal match the resistance.

343

Page 353: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Voltage Normalizing CircuitWe implement the OPA680 level norm conversion and the AD operation reference evaluation norm with the REF1004-2.5 and OPA2237. The REF1004-2.5 device provides the 2.5-V voltage norm; the OPA2237 device provides a single power supply and dual-operational amplification. As shown in Figure 9, the 2.5-V voltage output by REF1004 is divided and receives the 2-V voltage norm, which gets two reference voltages, 2.25-V and 3.25-V, with the 1-V pressure difference for the ADS902 transfer. The 2.25-V voltage norm also provides OPA680 for level raising. Figure 9 shows the REF1004 and OPA2237 norm circuit.

Figure 9. Circuit Provided by Voltage Norm

During debugging, the input impedance of the broadband operational amplifier is low, so the two input ends’ current deviation in the shift operational amplifier is high. Therefore, the zero-deviation is correspondingly high. Additionally, because the signal-to-noise ratio (SNR) is less than 5 mV and is produced by a general signal source, setting 5 mV and 10 mV is meaningless. So, the level 2 attenuation is changed into level 1 and the shift selections are x1/10, x1 and x10, that is, to remove the level 1 attenuation and amplification. The changed circuit is shown in Figure 10.

344

Page 354: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Audio-Controlled Digital Oscillograph

Figure 10. Changed Circuit

Audio Control Design and ImplementationAudio recognition means that machines understand human speech, i.e., they recognize the content of speech accurately and implement the requests according to the information. Figure 11 depicts a typical isolated word recognition system.

Figure 11. Isolated Word Recognition System

An audio recognition system has two stages: training and recognition. Both stages require pre-processing of the original input speech and extraction of features.

■ The pre-processing module processes the original audio signal input, removes unimportant information and background noise, makes audio signal endpoint detection (i.e., determines the start and end positions within the valid range), and processes speech framing and pre-emphasis.

Preprocessor

Extractionof Instantaneous

ParametersReference

Mode

Training

SpeechInput

Supra-SegmentalFeatures

ModeRecognition

AlternativeWords

PostProcessor

WordRecognition

345

Page 355: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

■ The feature extraction module calculates the acoustic parameters of speech and the features to extract the key feature parameters, reducing dimensions, and facilitating post processing. The feature parameters frequently used in audio recognition systems include: amplitude, energy, zero cross, linear predictive coefficients (LPC), LPC coefficients (LPCC), line spectrum pair (LSP), short-time spectrum, mel-frequency cepstrum coefficients (MFCC) to reflect physiological characteristics of the human ear, etc. The selection and extraction of features are critical to system construction.

In the training stage, the user inputs training speech several times. The system obtains feature vector parameters (sequence) through the pre-processing and feature extraction, and builds a reference mode library (which can be reference template or model) for training speech through the feature modeling module.

In the recognition stage, a similarity measure comparison is made between the feature vectors of the input speech and the modes in the reference mode library. The mode category with the highest similarity is output as the intermediate alternative result.

The post-processing module processes the alternative recognition result and gets the final recognition result through more knowledge constraints.

Pre-ProcessingPre-processing mainly includes pre-emphasis, framing, and forming processes. We apply pre-emphasis to the original audio signals input to emphasize the high-frequency components of the speech and increase the high-frequency resolution of the speech. We usually use a filter with transfer function of H(z) = 1 - αZ-1 for filtering, where α is the pre-emphasis coefficient (0.9 < α < 1.0), which is usually 0.95, 0.97, or 0.98. Suppose the speech sampling value at moment n is x(n), the result of pre-emphasis is y(n) = x(n) - αx(n-1) (0.9 , α , 1.0).

The speech has short time and stability and the short time feature can be improved through a framing operation to facilitate modes. The frame length is usually 30 ms and the frame shift is 10 ms.

We multiply the frame signals with a Hamming window to reduce the signal discontinuity at the beginning and end of each frame. The function of Hamming window is:

w(n) = 0.54 - 0.46 cos(2πn/N-1) for (1 ≤ n ≤ N -1 )

where N is the sampling number of the current speech frame.

Expression and Extraction of FeaturesLinear forecasting and analysis technology is the most widely adopted technology for extracting feature parameters. Most successful systems use the LPCC extracted by linear forecasting technology as feature vectors of the system. In our system, we calculate the speech frame’s self-correlation value, obtain the linear forecasting coefficients of this speech frame using the Durbin algorithm, and obtain the LPCC as the feature parameter vectors.

Audio RecognitionDuring audio recognition, the duration for pronouncing a word or each part of a word changes when it is spoken, even though the user tries to keep them unchanged during training or recognition. Therefore, using a feature vector sequence does not provide the best similarity comparison result. The project provides time-alignment for the feature parameter sequence mode using dynamic time warping (DTW), which efficiently solves the problem.

The starting point of a word corresponds to that of the path. The distance between the starting point and ending point of the best path is the distance between the speech to be recognized and the template speech. The recognition result is the template word with a minimum distance from the speech to be recognized. The method has a high computational load, but it is technically easy with a high recognition

346

Page 356: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Audio-Controlled Digital Oscillograph

rate. In point matching, the distortion measure for short-time spectral or cepstrum parameter recognition systems can adopt a Euclidean distance while the distortion measure for a recognition system adopting the LPC parameter may use a log likelihood ratio distance. The nearest neighbor criterion is usually adopted for these decisions.

In the oscillograph’s audio control module, the speech signal is sampled at 8,000 Hz and 16 bits; the frame length is 20 ms, i.e., 160 sampling points, and the frame shift is 10 ms. The speech feature parameter uses a 12-order LPCC. Both the audio signal pre-processing and feature parameter extraction use FPGA hardware, thereby implementing parallel operation with the CPU and reducing the CPU load to a certain extent. The peripheral processing circuit gets the speech frame feature parameter vector once every 10 ms and interrupts the CPU. The CPU makes real-time endpoint detection and matches the template with the DTW algorithm once a complete speech feature sequence is obtained, thus obtaining the best recognition result.

Hardware Design and ImplementationThe audio recognition design’s front-end acquisition module uses the WM8731 audio encoding/decoding chip on DE2 development board. The chip is connected directly to the analog audio interface on the development board, and is controlled through I2C bus. The AD/DA converters sampling rate and depth can be controlled easily by controlling the reading/writing of the register. The speech feature parameter adopts LPCC, which can be obtained as shown in Figure 12.

Figure 12. Feature Extraction Flow Chart

We implemented the parameter extraction module in the FPGA using Verilog HDL. The module’s functionality includes sampling from the audio chip through the I2C bus, pre-emphasis of audio signals, framing, windowing, extraction of self-correlation functions of speech frame, transform of the self-correlation value into LPC using the Durbin algorithm, and conversion of LPC into LPCC, which is used frequently in audio recognition.

Sampling

Pre-Emphasis

DurbinAlgorithm

LPCC

Self-CorrelationModule

Framing, Windowing

347

Page 357: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 13 shows the parameter extraction module’s top layer, which uses the pipeline operation mode. The internal modules are initiated by a rising signal edge. After operation, the module provides a rising edge indicating the status change that drives the next-level pipeline. Each module has an internal cache to ensure continuous data transmission.

Figure 13. Speech Feature Parameter Extraction Hardware Structure

As the value varies greatly during the whole operation, fixed-point operation reduces the result’s precision. Therefore, our design uses floating-point operation. The addition and subtraction of each floating point number uses 18 clock cycles; multiplication requires 14 clock cycles. Division requires more time, 34 clock cycles.

LPC Analysis and Durbin Recursive Algorithm LPC analysis estimates the value to be input by linear combination of the past values of output signals in a certain time domain discrete linear system. That is, if the estimated value of the audio signals at time n is:

The prediction error is:

According to the minimum mean square error principle (LMS algorithm), the predictor’s optimal prediction coefficient ai can be calculated using the following equation:

where:

These equations are called LPC canonical equations. R(l) is the self-correlation function, which is the basis of LPC analysis.

s′ n( ) ais n i–( )

i 1=

P

∑–=

e n( ) s n( ) s′ n( )– ais n i–( )

i 0=

P

∑= =

aiR k i–( )

i 0=

P

∑ R k( )– k, 1 2 …P, ,= =

R l( ) s n 1+( )s n( ) l,

n 0=

N 1– l–

∑ 0 1 …P, ,= =

348

Page 358: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Audio-Controlled Digital Oscillograph

Durbin Recursive AlgorithmThe Durbin algorithm starts from zero-order prediction when p = 0, E0

n = Rn(0), a0 = 1. The complete recursive process is shown below:

En0 = Rn(0)

aii = ki

aij = aj

i-1 - kiai-ji-1, 1 ≤ j , i - 1

Eni = (1 - ki

2)Eni-1

if i < p, goto (1)

aj - ajp, 1 ≤ j ≤ p

Self-Correlation Calculation ModuleFigure 14 shows the self-correlation calculation module block diagram.

Figure 14. Self-Correlation Calculation Module

Self-correlation functions require many multiplier-accumulator (MAC) operations. Therefore, calculating these functions are a key factor that affects system speed. To improve system speed, we store the windowed audio signal in two RAMs and use a parallel load to obtain two sets of data for multiplication. The result is sent to the accumulator for accumulation. Then the multiplier calculates the next data set. The multiplier and accumulator perform parallel operations throughout the whole process, ensuring pipeline continuity and maximum system speed.

FFT Design ImplementationWe use the radix-2 decimation-in-time FFT algorithm in our design. The butterfly operation is the basic operational unit of the FFT algorithm. Figure 15 shows the butterfly operation symbols, in which Wn

K is the twiddle factor. We set the length of the sequence x(n) as N, and N = 2M (where M is an integer), so the FFT operation flow diagram of the sequence consists of M levels of butterfly operations.

ki Rn i( ) aji 1– Rn i 1–( )

j 1=

i 1–

∑– Eni 1–⁄=

Σ

..

RAM

RAM

R(I)

R(0)

r(I)

349

Page 359: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 15. Butterfly Operation Blocks

The traditional flow diagram features reversed input and natural output. The algorithm has several advantages, for example, for operations of the same level, the two input data of each butterfly are only useful to their own operation. Additionally, the I/O data nodes of each butterfly are on the same line; therefore, the output data after one butterfly operation can be immediately stored in the storage unit of the former input data, i.e., in-place operation. This function saves storage units in the project implementation. The disadvantage of the algorithm is that the changeable geometrical structure of each algorithm cannot be cyclically controlled in the program. Therefore, it is not easy to expand.

Another FFT algorithm flow diagram features natural input and reversed output. See Figure 16 (use N = 8 as an example). The advantages of this algorithm are that the fixed geometric structure of each operation facilitates the cyclical program control. We only need to change the twiddle factor for operations that involve different levels. This algorithm is also more scalable: for an FFT with different points, we only need to change the corresponding address generator. Therefore, this FFT structure is more suitable for processing in an FPGA.

The disadvantage of this method is that the butterfly I/O data nodes are not on the same line. Therefore, data obtained through each butterfly operation cannot be stored in the storage unit of the original data. To implement the same N-point complex FFT with 16-bit data width, we need a 4N x16 bit storage space, which is twice the size required by the traditional in-place operation.

We decided to use the second structure (i.e., natural input and reversed output) in our design. See Figure 16.

Figure 16. Natural Input, Reversed Output 8-Point FFT Structure

FFT Block ModulesThe RAM module consists of two dual-port RAM blocks, RAM1 and RAM2, which store the input operation result’s real and idle component or that of different levels. The input data is first stored in RAM1 and the result of the first-level operation is stored in RAM2. The RAM2 data is extracted and placed in RAM1 after the second-level operation. We repeat the process for M = log2N times (where N is the number of FFT points). When N is different, the butterfly operation levels may also be different. Therefore, the result of the last-level operation may be stored in either RAM block. The input data is stored in RAM1 first, so the operation result may be overlaid by the input data before it is output. For this

x (k)1

x (k)2

x (k) + w x (k)1kN 2

x (k) - w x (k)1k

N 2w k

N

RAM1 RAM2 RAM1 RAM2

X(0)

X(1)

X(2)

X(3)

X(4)

X(5)

X(6)

X(7)

Y(0)

Y(1)

Y(2)

Y(3)

Y(4)

Y(5)

Y(6)

Y(7)

350

Page 360: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Audio-Controlled Digital Oscillograph

reason, the result of the last-level operation is stored in RAM1 when M is an even number. Before writing new data, we store the result in RAM2 and the final result is output through RAM2. Because the design’s input data is regular (512 points), we read from RAM2 after receiving the FFT operation end signal.

In the twiddle factor module,we generated the twiddle factors as real and idle components before the MATLAB software. Then we converted them into 16-bit fixed points and stored the result in hexadecimal format. We used the Quartus® II MegaWizard® Plug-In Manager to generate two N/2 x16-bit ROMs, which are initialized by the corresponding Hexadecimal (Intel-format) Files of the twiddle factor real and idle components, respectively. In this way, the FPGA contains the twiddle factor values. For butterfly operations of different levels, the address generation unit generates corresponding addresses for ROMs to read the factor values. For the natural input and reversed output structure, the twiddle factor addressing is important. As can be seen from the FFT structure diagrams, we can infer the twiddle factor address rule at each level. For the N-point FFT (N = 2k, k is the level), there are:

WN0,WN

1,WN2, ...WN

N/2-1

or N/2 twiddle factors. Arranging them them as an array with N/2 elements:

w(N/2) = {WN0,WN

2,WN4, ...WN

7}(N = 16)

Thus, the order of the twiddle factor at level i is obtained by repeating w(0), w(1), w(2), ...w(2i) for 2k-i-1 times.

For the N-point complex FFT processing, the RAM and twiddle factor two modules require total memory resources of (2 x N + N/2) x 2 x 16 bits (the data are in plural form, so each point needs two storage units).

The butterfly operation unit consists of four multipliers, one adder, and one subtracter. The Cyclone® II FPGA contains rich multiplier resources, which can implement one butterfly operation in one clock cycle. This performance is better than a single-multiplier DSP in which each butterfly operation requires four clock cycles. Theoretical analysis concludes that to finish the FFT with the same number of points, the clock cycles required by the FPGA implementation are one-fourth of those required for the DSP implementation.

For the N-point FFT, the operation can be divided into N = log2N levels. Operation between two levels works serially. Operation at each level has N/2 butterfly operations, which work serially with each other. So M x N/2 butterfly operations should proceed successively, with each operation deploying the same butterfly operation unit. Because the operation is serial, only one butterfly operation unit block is needed, even for an FFT with more points. Therefore, we do not require additional DSP resources in the FPGA. The design uses the 50-MHz system clock as the operating clock. To obtain stable data, we added latches before and after operations at each level. One butterfly is finished in 8 clock cycles and an N-point FFT needs M x N/2 x 8 clock cycles. A 512-point FFT needs roughly 370 μs.

The address generation unit is mainly used for generating RAM data and storing read/write-enable signals and read/write addresses of the twiddle factors ROMs. For example, the final operation result is reversely stored in the RAM. To enable the natural output of the result, we use a few simple statements to realize the bit-reverse addresses in the address generation module. See the following code (take N = 32 as an example):

process (clock)begin

if (clock' event and clock = '1') thenif (done = '1') then

count < = count + 1;end if;address < = count (0) & count (1) & count (2) & count(3) &count (4)

end if;end process;

351

Page 361: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Because we used the natural input, reversed output structure for the RAM block that provides data to the butterfly operation blocks, the address generation logic only requires the natural generation addresses - n (0 - (N/2 - 1)) and corresponding n + N/2 to address the RAM and send it to the butterfly blocks in mid-operation. The address generation logic only needs the natural generation addresses - n (0 - (N - 1)) for the RAM block that receives the butterfly blocks output data. For the twiddle factor table addressing, the butterfly operation at each level reads one section naturally according to its needs because the twiddle factors are stored in reverse.

FPGA Hardware ImplementationFPGAs can be flexibly programmed using hardware description languages (HDLs). Because we can simulate the FPGA operation, the hardware design is as flexible and convenient as the software design, which shortened our research and development phase. We used the JTAG interface to configure the device in system, enhancing the system’s flexibility. We used the Altera Quartus II software to emulate the hardware, perform logic analysis, and compare the output result to the MATLAB emulation result. See Figure 17 for the system architecture.

Figure 17. FFT Hardware Implementation

We wrote the design using VHDL. The 512-point FFT requires a 9-level butterfly operation; each level includes 256 butterfly operation blocks. The operations input data bit width is 16 bits. Each butterfly operation block needs four multiplications and then four additions (2 each for the real and idle component). Because the twiddle factor is 16 bits, we select the higher 16 bits of the 32-bit data after multiplication and then perform addition. The real and idle components of each butterfly operation block need two additions, so data can overflow. However, we only need the relative size of the energy value of each signal spectrum harmonic, so we extend the 16-bit data into 18 bits and calculate. Additionally, we set two sign bits for each level. If data overflows in one or more butterfly operations (real or idle component) at the same level and if the overflow happens at the 16th bit, we set sign bit 1. If the overflow happens at the 16th bit and it leads to the overflow of the 17th bit, we set sign bit 2. We store the operation result in RAM as 18-bit data. Therefore, when we next read the RAM data for calculation or output, the selected data is the second higher 16 bits if the sign bit is 1. If sign bit 2 is set, the selected data is the higher 16 bits; otherwise, the data is less than 16 bits.

The direct digital synthesizer (DDS) acts as the front-end signal generation module. Figures 18 through 21 shows spectrograms sampled by the DDS-generated square wave signals (frequency is 48.828125 kHz) at different frequencies).

Input Data RAM

ROM

Output Data

Array ArrayArray

Array

352

Page 362: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Audio-Controlled Digital Oscillograph

Figure 18. 50-kHz Sampling Frequency

353

Page 363: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 19. 50-kHz Sampling Frequency Spectrogram

Figure 20. 100-kHz Sampling Frequency

354

Page 364: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Audio-Controlled Digital Oscillograph

Figure 21. 100-kHz Sampling Frequency Spectrogram

VGA Display ImplementationThe VGA display module is composed of two display control modules. One controls the time domain waveform that displays the two signal channels and the channel parameters, including signal sources, frequency, cycle, average value, peak-to-peak value, and root mean square (RMS). The other controls the frequency domain waveform that displays either of the two signal channels obtained by a 512-point FFT. We use a 4-Hz frequency to update the storage display RAM as it relates to the current time domain waveform or frequency domain spectrum. See Figure 22.

Figure 22. VGA Display Control Module

Time Domain Waveform Display Control ModuleThe time domain waveform display screen has the following five general display areas:

■ Top—The top area is the title panel of the digital oscillograph, which displays Audio Control Digital Oscillograph—Xi’an Jiaotong University in blue characters.

■ Left—The design uses the left area to display the 0-level symbol of the two channel signals. A blue arrow points at the 0 level of the current channel.

Signal Time DomainWaveform Display Logic

Signal Frequency DomainWaveform Display Logic

DataSelectionTerminal

GraphicMemory

RAM

VGA TimeSequence& Chroma

DataDeliveryModule

355

Page 365: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

■ Waveform (or middle)—The middle area is the signal waveform display area, measuring 500 pixels (horizontally) by 440 pixels (vertically). It can simultaneously display the time domain waveform signal of the two channels or that of either channel. We update the middle area’s memory storage unit using the display signal value that is converted from the original signal in the column. The signal waveform is displayed using vectors, i.e., the pixels that correspond to the current column signal’s amplitude value track the pixels that correspond to the next column signal’s amplitude value. The dynamic scope of the original signal data is 0 to 1,023, and the scope of the display signal data is 0 to 439. We use the fr control signal synchronous with the graphic RAM refresh frequency to update the RAM display signal. When fr rises, it continuously collects 500 data bits on the A/D port at the given frequency, fs, and adjusts the dynamic scope every time new data is collected. The module then saves the data into the display signal RAM.

■ Right—The right area displays the signal attributes and values, and has six big grids and one small grid. Each big grid is further divided into two small ones.

● First grid—The first small grid displays messages and the second small grid displays the signal channel of the current attribute values.

● Second grid—The first small grid displays the frequency and the second small grid displays the frequency value.

● Third grid—The first small grid displays the cycle and the second small grid displays the cycle value.

● Fourth grid—The first small grid displays the average value and the second small grid gives the current value.

● Fifth grid—The first small grid displays the peak-to-peak value and the second small grid gives the current value.

● Sixth grid—The first small grid displays the RMS and the second small grid gives the value.

● The last grid displays the audio control command’s recognition result. It displays OK if the current command is recognized; otherwise it displays AGAIN.

■ Bottom—We do not use the bottom area. The white point refreshes when update signals occur.

On the screen, the blue Chinese characters and code are placed against a white background. In the signal area, the red signal waveform of the two channels appears on the light or dark green squared grid. In the value display area on the right, dark green grids distinguish the values with different attributes. See Figure 23 for the control module functions of each area.

356

Page 366: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II-Based Audio-Controlled Digital Oscillograph

Figure 23. VGA Time Domain Waveform Display Control Module

The state machine of the time domain waveform screen controls the five parts of the time domain waveform display screen. Every time the fr signal rises, the state machine initiates each module to update the relative RAM; a complete memory refresh updates the screen.

Frequency Domain Waveform Display Control ModuleThe frequency domain waveform display screen is composed of two areas that correspond to the two control modules:

■ Spectrum—The spectrum area displays the channel signal frequency spectrum diagram that corresponds to the current message source. The rising edge of the fr control signal starts the FFT calculation module, which continuously collects 512 signal values on the A/D port at the given frequency fs. Then it performs the FFT calculation. The spectrum area control module obtains a signal when the calculation completes. The block updates current memory according to the 512-point frequency domain energy value obtained by the FFT.

■ Bottom—The bottom area displays the title panel of frequency domain waveform screen in blue characters, e.g., FFT(512).

The frequency domain waveform display also has a control state machine. At the frequency domain waveform screen state, the module updates the bottom area. Otherwise, it only updates the spectrum area. Figure 24 shows the control module.

Figure 24. VGA Frequency Domain Display Control Module

Route A Data

fS

fS

f r

f r

f r

CycleBufferingSampling

RAM

CycleBufferingSampling

RAM

Channel 1 0-Level Adjustment

Channel 2 0-Level Adjustment

RefreshLeft Screen

Area

CalculationModule

ConvertSignal

Data into Display

Data

ConvertSignal

Data into Display

Data

Route B Data

RefreshScreen

Signal Area

RefreshScreen

Calculation &Display Area

(Right)

RefreshScreen

Bottom Area

Time DomainWaveform

Display ControlState Machine

Refresh Screen TitleArea (Top)

Channel 1

Channel 2

Data Collection/FFT Calculation

Module

Refresh ScreenSignal Area

Refresh ScreenTitle Area

State Machineof Spectrum

Display Controlf2

357

Page 367: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Design MethodologyOur design methodology consisted of the following aspects:

■ SOPC design—The SOPC design is divided into hardware and software. The hardware design includes the basic Nios II SOPC environment and customized peripheral circuits using the Avalon bus. The design uses hardware to implement the main system calculation units, including audio recognition attribute extraction, the FFT calculation module, and the VGA display control module. Software design includes the system control and audio recognition algorithm.

■ Structure design—The system runs on the DE2 development board; the board resources implement the majority of the design. Additionally, the system has a front-end acquisition circuit board connected with other parts of the system via a development board expansion slot.

Design FeaturesThe audio control implementation and real-time FFT spectrum display are the two highlights of our system.We used the Nios II processor to design and develop the audio control digital oscillograph. Except for the front-end acquisition circuit and some control devices, the system design is implemented in a single FPGA. Compared with a hard-core design, the soft-core system is compact, uses fewer circuit connections between devices, lowers power consumption, and improves system stability.

Because the system has many real-time calculations, CPU-based software calculation cannot meet our speed requirements. Therefore, most of our data processing is implemented using hardware design. With the FPGA’s data processing capability, we achieved real-time processing of high-speed, complex data.

ConclusionAfter tuning, our system can display a waveform like a standard oscillograph for a given frequency scope. However, the design has some shortcomings. Due to improper design of the front-end acquisition circuit, the system’s signal input bandwidth is 300 KHz instead of 1 MHz.

As mentioned previously, the amplitude has 10 gears. However, the low-input impedance of the broadband operation during tuning causes a large deflection current on the two input ports when shifting,which creates a larger zero drift. Additionally, the signal-to-noise ratio (SNR) of lower than 5 mV arising from ordinary signal sources is so poor that it is not necessary to re-configure the full-range gears of 5 mV and 10 mV. Therefore, we changed the second-level attenuation to the first level when tuning, and choose x1/10, x1 and x10 in the shift selection, i.e., we removed one-level attenuation and one-level amplification. Due to the limited memory resources of the EP2C35F672C6 device, we could not add our audio recognition module into the system, which affected our design.

This competition gave us a deeper understanding of the Nios II processor, particularly new design concepts and development features. The SOPC design method enables us to customize peripherals flexibly. We could easily set up the interface between peripherals and the CPU without the development limitations of a monolithic processor.

Finally, and most importantly, we thank Altera Corporation for offering us this valuable learning and hands-on opportunity.

358

Page 368: Nios II Embedded Processor Design Contest - Outstanding - Altera

Appendix: Nios II Embedded Processor Family

Appendix: Nios II Embedded Processor Family

Today’s embedded design engineers face a tough challenge: finding a processor with the perfect mix of features, cost, performance, while managing processor availability and obsolescence issues. Altera’s Nios® II processors deliver the perfect fit every time with fully customizable features and performance, low product and implementation costs, ease of use and adaptability, and obsolescence-proof design.

The Nios II family of 32-bit RISC embedded processors delivers up to 300 MIPS of performance and can consume as little as US $0.23 of FPGA logic. Because the processors are soft-core and flexible, you can choose from an unlimited combination of system configurations to meet your performance, feature, and cost targets. Designing with Nios II processors helps you send products to market faster, extend your product's life cycle, and avoid processor obsolescence.

Increase Productivity With Powerful Development ToolsWith today’s short design cycles, anything that can increase productivity has the potential to pay big dividends, win market share, and establish a competitive advantage. Altera’s powerful development tools allow you to rapidly prototype, develop, and deploy embedded systems, reducing time-to-market while extending time-in-market.

Altera’s comprehensive hardware and software tools help you create powerful Nios II processor systems in minutes. Figure 1 shows the complete Nios II embedded processor design flow. From concept (at top), through hardware and software implementation, to debug, Altera offers all the tools you need to get your product to market fast.

359

Page 369: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 1. Nios II Embedded Processor Development Flow

Hardware Development ToolsAltera provides a complete set of tools for your hardware design, including the SOPC Builder system development tool, Quartus® II design software, ModelSim®-Altera software, and SignalTap® II embedded logic analyzer.

Hardware design for creating Nios II processor-based systems uses the SOPC Builder system development tool to specify, configure, and generate systems. Launching from within the Quartus II design software, SOPC Builder provides an intuitive wizard-driven graphical user interface (GUI) so you can create, configure, and generate system-on-a-programmable-chip (SOPC) designs. It minimizes the time spent integrating components into a coherent system. Figure 2 shows a view of the intuitive SOPC Builder GUI.

SOPC Builder

Targets

Quartus II Nios II IDE

Hardware Software

Automatic Software

Generation

System LibraryHeader FileApplicationTemplate

Define System

ProcessorsPeripheralsMemory

Interfaces

RTL Simulation

Instruction SetSimulator

Target Hardware

RTL SystemDescription

SystemTest Bench

JTAG Debugger

FPGA Configuration

SoftwareDevelopment

EditCompileDebug

Generate FPGAConfiguration

SynthesizePlace & Route

CompileDownload

360

Page 370: Nios II Embedded Processor Design Contest - Outstanding - Altera

Appendix: Nios II Embedded Processor Family

Figure 2. SOPC Builder GUI

Use the SOPC Builder system design tool to choose from a menu of peripherals, communication interfaces, memory, and I/O to suit your application needs. This tool is tightly integrated with the Nios II Embedded Design Suite and automatically generates a complete custom board support package, including all required software drivers.

Quartus II Design SoftwareAltera’s Quartus II design software technology leadership gives you unmatched levels of performance and ease-of-use. Using Quartus II software, you can easily design, optimize, and verify your Nios II designs in an Altera device.

When you are ready to simulate your design, SOPC Builder generates both VHDL and Verilog HDL simulation models. You can easily simulate Nios II processor-based systems using an automatically generated simulation environment created by SOPC Builder and the Nios IIIntegrated Development Environment (IDE). A full Quartus II software subscription includes ModelSim-Altera software that can also be used to simulate your Nios II designs.

SignalTap II Embedded Logic Analyzer The ultimate testbench for engineers who want to see the active processes within their design is running at speed under real-world system conditions. The challenge is in accessing nodes buried within the FPGA architecture. The SignalTap II embedded logic analyzer eliminates this challenge by providing access to nearly any node within your FPGA design through a standard Joint Test Action Group (JTAG) port to view design nodes in system and at system speeds.

Software Development ToolsThe Nios II Embedded Design Suite (EDS) is a collection of tools, utilities, libraries, and drivers used to develop embedded software for the Nios II processor. The Nios II EDS includes:

■ Integrated development environment

■ JTAG debugger

■ Instruction set simulator

■ Flash programmer

361

Page 371: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

■ Design examples (in the form of software templates)

■ Software libraries and embedded software components

● Hardware abstraction layer (HAL)

● μC/OS-II real-time operating system 1

● TCP/IP protocol stack: NicheStack TCP/IP Network Stack-Nios II Edition 1

● Newlib ANSI-C standard library

● Simple file system

● Other Altera command-line tools and utilities

■ Embedded software acceleration tool: Nios II C-to-Hardware Acceleration (C2H) Compiler 1

(1) Fully functional evaluation version is included with the Nios II EDS and can be licensed separately.

Nios II Integrated Development EnvironmentBased on the open, extensible Eclipse IDE project and the Eclipse C/C++ Development Tools project, the Nios II IDE is the primary software development tool for the Nios II family of embedded processors. You can complete all software development tasks within the Nios II IDE, including editing, compiling, downloading, debugging, and flash programming. The Nios II IDE, shown in Figure 3, provides a consistent development platform that works for all Nios II processor systems. With a PC, an Altera device, and a JTAG download cable, you have everything you need to develop and debug Nios II processor-based systems.

Figure 3. Nios II IDE

JTAG DebuggerThe Nios II architecture supports a JTAG debug module that provides on-chip emulation features to control the processor remotely from a host PC. The Nios II IDE communicates with the JTAG debug module on one or more Nios II processors so you can:

■ Download programs to memory

■ Start and stop program execution

362

Page 372: Nios II Embedded Processor Design Contest - Outstanding - Altera

Appendix: Nios II Embedded Processor Family

■ Set breakpoints and watchpoints

■ Analyze registers and memory

■ Collect real-time execution trace data

The debug module connects to the JTAG circuitry built into all Altera devices and connects to the host PC via a download cable (Figure 4). Additionally, debug support for the Nios II processor is available from several industry-standard providers.

Figure 4. Nios II JTAG Debug Module

Instruction Set SimulatorThe Nios II instruction set simulator (ISS) allows you to begin developing programs before the target hardware platform is ready. The IDE allows you to run programs on the ISS as easily as running them on a real hardware target.

Flash ProgrammerMany designs that use Nios II processors also incorporate flash memory on the board. Any common flash memory interface (CFI)-compliant flash device connected to the FPGA can be programmed using the Nios II IDE flash programmer. The Nios II IDE flash programmer can also program any Altera serial configuration device connected to the FPGA, as shown in Figure 5. The Nios II IDE flash programmer is pre-configured to work with all of the boards available with the Nios II development kits, and can be easily ported to any custom hardware.

Figure 5. Transmitting Flash Content to the Flash Device

Software TemplatesIn addition to a project set-up wizard, the Nios II IDE provides software code examples, in the form of project templates, to help you bring up working systems as quickly as possible.

System Interconnect Fabric

JTAG DebugModule

Nios II Processor

Altera FPGA

Avalon Port

JTAGUART

Avalon Port Avalon Port

On-ChipMemory

DebugData

CharacterStream

Debugger

JTAG TerminalJTAG

Hub

Host PC

AlteraDownload

Cable

Built-In

JTAG

Controller

Altera FPGA

Target Board

Altera Download Cable

Flash Programmer

Design

CFI Flash Device

Host PC

Flash Content

Flash Content

363

Page 373: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Each template is a collection of software files and project settings. You can add your own source code to the project by placing the code in the project directory or importing the files into the project.

Figure 6 shows some of the available software project templates.

Figure 6. Software Project Templates

System SoftwareThe Nios II IDE lets you customize systems quickly using system software. With system software (also referred to as “software components”), you have an easy way to painlessly configure your system for specific target hardware.

■ Hardware Abstraction Layer—The hardware abstraction layer (HAL) system library is a lightweight run-time environment that provides a simple device driver interface for programs to communicate with underlying hardware. As SOPC Builder and the Nios II IDE are tightly integrated, the HAL system library can be automatically generated to serve as a board support package for Nios II processor-based designs.

■ μC/OS-II—A complete, portable, ROM-able, preemptive real-time kernel, μC/OS-II from Micrium ships with all Nios II development kits and includes full source code, reference manual, and free developers’ license. When you are ready to migrate your design to your board, you can purchase a shippers' license. A shippers’ license entitles you to a license for three developers to create an unlimited number of designs for one year on μC/OS-II and a perpetual license to support designs created during the subscription period (to fix bugs and make minor modifications).

■ TCP/IP Stack—NicheStack TCP/IP Network Stack, Nios II Edition is a software suite of networking protocols designed from the ground up to provide an optimal solution for designing network-connected embedded devices with the Nios II processor (refer to Table x). The stack has a small footprint, is portable, and delivers high performance without compromising compliance to request for comment (RFC) standards. NicheStack supports a wide variety of physical interfaces, and can be configured as a standard client machine, an IP router, or a multi-homed server. The suite also contains a comprehensive device networking package, FTP, Telnet, IGMPv1, and DNS and DHCP client components. The NicheStack TCP/IP Network Stack, Nios II Edition is distributed by Altera as full ANSI C source code and includes an evaluation license. If you wish to design with this software suite, you must purchase a license from Altera. The Nios II Edition NicheStack TCP/IP Network Stack has the following highlights:

● Zero data copy for ultra fast performance

● Standard sockets interface

● Raw socket support

● Non-blocking versions of all functions

● Versatile MSS and window options

364

Page 374: Nios II Embedded Processor Design Contest - Outstanding - Altera

Appendix: Nios II Embedded Processor Family

● Connections limited only by memory availability

● Optional optimized assembly language checksum routines

● “Predictive” header processing for speed

● Nagle algorithm (slow start)

● VJ smoothed round trip timing

● Delayed ACKs

● BSD style "keepalive" option

● Complete debugging and optimization module

Nios II C-to-Hardware Acceleration (C2H) CompilerAltera also offers the Nios II C-to-Hardware Acceleration Compiler (C2H Compiler), a productivity tool that gives embedded developers push-button acceleration of performance-critical C-language software algorithms. These algorithms are automatically converted into hardware accelerators in the FPGA that act as coprocessors with a latency-aware, pipelined connection to the processor's memory map. With this tool, designers have an easy way to boost performance using a known programming language and familiar tools, improving productivity and speeding time-to-market.

The C2H Compiler is tightly integrated into the Nios II development environment (Figure 7), leveraging Altera's proven SOPC Builder tool and the system interconnect fabric to automate the conversion of ANSI C source code to hardware (register transfer language), integrate the resulting hardware accelerator into the system's memory map, and schedule memory transactions with latency-aware pipelining. It enables developers to quickly prototype functions in software running on the processor, then easily convert the software into a hardware-accelerated implementation.

Figure 7. C2H Compiler

Protect Your Software Investment from Processor ObsolescenceComponent obsolescence impacts virtually every industry, especially those with products that have long lifecycles such as automotive, industrial, military, aerospace, and medical. The biggest investment in

365

Page 375: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

any embedded system is the application software; changes to the system hardware can threaten years of investment in software development.

All silicon is eventually discontinued, and FPGAs are not immune to that. However, by designing with a soft processor your software investment is protected in many ways. Nios II processor designers have a perpetual license to create and deploy Nios II processor-based designs in Altera FPGAs, so even if the underlying FPGA device changes, the investment in application software is preserved.

■ No need to port application software—Because the processor is soft, it is implemented as a hardware project that can easily be migrated to another Altera FPGA device while maintaining the same application code.

■ No need to requalify a new processor—Because the processor hasn't changed, you do not need to spend the time and effort requalifying a new device.

■ Minimal lost opportunity cost—By keeping the same design tools, design flow, hardware design, and application software, your lost opportunity cost is minimized.

■ Board redesign—In the worst-case scenario where your current FPGA is not available, you may need to redesign your board. However, design risk is significantly reduced because the entire hardware design is intact and only the device and pin connections change.

■ Phasing in new product—Phasing in a new product is always a challenge. However, if you design your system with an Altera FPGA, upgrading your system simply involves a flash update that can be performed remotely, as opposed to a hardware replacement. This minimizes customer down time and field service costs.

Scale System PerformanceAltera’s Nios II processors let you take full advantage of the inherent parallelism of FPGAs to achieve high levels of system performance. Multiple processors can execute code simultaneously while hardware accelerators can offload compute-intensive algorithms at the same time. Upgrade your embedded system's performance at any stage of the product life cycle without the need to redesign the board or develop hand-optimized assembly code.

Three Processor CoresChoose from three code-compatible, 32-bit, soft-core processors: one optimized for maximum system performance, one optimized for minimum logic usage, and one that strikes a balance between the two. These cores can easily be configured with features such as multipliers, user-specified cache memories, custom instructions, hardware debug logic, and more to adapt to your specific performance needs. Table 1 shows the features of each Nios II processor family member; Tables 2 and 3 provide additional performance details.

Table 1. Nios II Processor Family Members (Part 1 of 2)

Feature Nios II /f (Fast) Nios II /s (Standard) Nios II /e (Economy)

Feature Nios II/f (Fast) Nios II/s (Standard) Nios II/e (Economy)

Description Optimized for maximum performance

Faster than the fastest and smaller than the smallest first-generation Nios CPU

Optimized for minimum logic usage

Pipeline 6 Stage 5 Stage 1 Stage

Multiplier 1 Cycle (1) 3 Cycle (1) Emulated in software

Branch Prediction Dynamic Static None

366

Page 376: Nios II Embedded Processor Design Contest - Outstanding - Altera

Appendix: Nios II Embedded Processor Family

Note to Table 1:(1) Using DSP Blocks in Stratix® or Stratix II FPGAs.

Note to Table 2:(1) These results were generated using seed swapping and synthesis/fitting settings in the Quartus II software.

Note to Table 3:(1) These results were generated using seed swapping and synthesis/fitting settings in the Quartus II software.

High-Bandwidth System InterconnectAltera’s SOPC Builder system design software lets you generate high-throughput systems that take advantage of the inherent parallelism of FPGAs. The system interconnect fabric is fully switched, in that dedicated connections between master and slave components allow multiple simultaneous transactions without the arbitration stalls found in traditional bus architectures.

Instruction Cache Configurable Configurable None

Data Cache Configurable None None

Custom Instructions Up to 256 Up to 256 Up to 256

Table 2. Maximum Clock Frequency (fMAX) for Nios II Processor System (Note 1)

Device Family Device Used Nios II /f Nios II /s Nios II /eStratix III EP3SL70F484C2 266 200 322

HardCopy® Stratix II HC230F1020C 202 202 321

Stratix II EP2S60F1020C3 222 171 285

HardCopy Stratix EP1S80F1020C5_HC 147 131 176

Stratix EP1S80F1020C5 148 128 172

Cyclone III EP3C40F324C6 163 136 190

Cyclone II EP2C20F484C6 142 111 193

Cyclone EP1C20F400C6 134 121 173

Table 3. DMIPS for Nios II Processor System (Dhrystone Benchmark v2.1) (Note 1)

Device Family Device Used Nios II /f Nios II /s Nios II /eStratix III EP3SL70F484C2 300 128 50

HardCopy® Stratix II HC230F1020C 228 129 49

Stratix II EP2S60F1020C3 251 110 44

HardCopy Stratix EP1S80F1020C5_HC 166 84 27

Stratix EP1S80F1020C5 168 82 27

Cyclone III EP3C40F324C6 165 68 17

Cyclone II EP2C20F484C6 144 55 18

Cyclone EP1C20F400C6 130 53 17

Table 1. Nios II Processor Family Members (Part 2 of 2)

Feature Nios II /f (Fast) Nios II /s (Standard) Nios II /e (Economy)

367

Page 377: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

In traditional bus architectures (Figure 8), a single arbiter controls communication between the bus masters and slaves. Each bus master requests control of the bus, and the arbiter then grants bus access to a single master. If multiple masters attempt to access the bus at once, the arbiter allocates bus resources to a master based on a fixed set of arbitration rules. This can lead to a bandwidth bottleneck as only one master can access the system bus and its resources at a time.

Figure 8. Traditional Bus Architecture

The system interconnect fabric's simultaneous multi-master architecture increases your system's bandwidth by eliminating this bottleneck (Figure 9). Using Altera’s system interconnect fabric, each bus master gets its own dedicated interconnect, meaning that bus masters only contend for shared slaves, not for the bus itself. Each time a component is added or the peripheral access priorities change, SOPC Builder generates a newly optimized system interconnect fabric with a minimum of FPGA resource use.

Figure 9. System Interconnect Fabric Architecture

The system interconnect fabric supports a wide range of system architectures, including single- and multiple-master systems, and allows seamless data transfers between peripherals with performance-optimized data paths. Your design's off-chip processors and peripherals are equally well supported by the system interconnect fabric.

Custom InstructionsCustom instructions allow developers using Nios II processors to increase system performance by extending the CPU instruction set to accelerate time-critical software. Using custom instructions, you can optimize system performance in a way not possible with traditional off-the-shelf processors.

MasterMasters

SlavesUART PIO Program

MemoryData

Memory

Master

Arbiter

System Bus

Bottleneck

Slaves

Masters

ProgramMemory

I/OCustom

Accelerator Peripheral

I/OProgramMemory

DataMemory

Master Master Master

DataMemory

System Interconnect Fabric

Arbiter Arbiter

368

Page 378: Nios II Embedded Processor Design Contest - Outstanding - Altera

Appendix: Nios II Embedded Processor Family

The Nios II family of processors supports up to 256 custom instructions to accelerate logic or mathematically complex algorithms normally handled in software. For example, a block of logic that performs a cyclic redundancy code calculation on a 64-Kbyte buffer operates 27 times faster as a custom instruction than when performed by software (Figure 10). Nios II processors support fixed and variable cycle operations, include a wizard for importing user logic as a custom instruction, and automatically create software macros for use in developers’ code.

Figure 10. Nios II Custom Instructions

Hardware AcceleratorsAutomatically accelerate your software by converting C language subroutines into hardware accelerators that boost performance without increasing clock frequency and power consumption. Simply “right-click to accelerate” performance-critical functions using the Nios II C-to-Hardware (C2H) Acceleration Compiler and eliminate the time and expense of manually generating Verilog HDL or VHDL accelerators. See Figure 11.

Figure 11. Nios II Hardware Accelerator

Multiprocessor SystemsUse multiple processors to scale your system’s performance or to partition software applications into smaller, simpler tasks that are easier to write, debug, and maintain. The Nios II Embedded Design Suite (EDS) and tools from industry-leading embedded software providers support developing and debugging multiprocessor applications. Nios II processors, combined with high-density devices such as Stratix FPGAs and HardCopy structured ASICs, are ideal platforms for creating high-performance multiprocessor systems.

Nios II Embedded Processor

+--

&

<<>>

Out

result

A

dataa

Nios IIALU

B

databCustom Logic

ProgramMemory

ProgramMemory

Processor Accelerator

DataMemory

System InterconnectFabric

Arbiter Arbiter

DM

A

DM

A

369

Page 379: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Configurable Caches and Tightly Coupled MemoryAdjust the size of the processor instruction or data cache to meet the performance needs of your application. For fast access to frequently used routines, add up to four tightly coupled memories that provide cache-like access without the penalty of cache misses.

Customize Your Processor Rather than being limited to a pre-fab processor, with Nios II processors, you choose the exact peripherals, memory, and interface features you need-customizing the processor to your specifications. In addition, you can easily integrate your own proprietary functions to give your design a unique competitive advantage.

Nios II development kits include a library of commonly used peripherals and interfaces. See Table 4. For a complete list of SOPC Builder-ready intellectual property (IP) and peripherals, visit www.altera.com/SOPCBuilderReady.

Using the interface-to-user-logic wizard in the SOPC Builder software, you can also create your own custom peripherals and integrate them into Nios II processor systems. With SOPC Builder and Altera FPGAs, you can assemble embedded processor configurations not available in off-the-shelf processors, letting you create exactly what you need, every time.

Adapt to Changing RequirementsAdd hardware features at any time in the product lifecycle without the need to redesign your board. You can boost performance and add features, even late in the development stage, to help you adapt to changing requirements and reduce development time, insulating your product from the competition. To

Table 4. Nios II Peripherals

Peripheral DescriptionJTAG UART Communicates serial character streams between a host PC and an SOPC Builder

system using the JTAG circuitry built into Altera FPGAs

CompactFlash Interface Provides mass storage support

Interface to User Logic Connects on-chip user logic or off-chip devices to an SOPC Builder-generated system

UART Provides common serial interface with variable baud rate, parity, stop and data bits, and optional flow control signals

Interval Timer Provides a 32-bit timer; can be also be used as periodic pulse generator or system watchdog timer

Parallel I/O (PIO) Provides 1- to 32-bit parallel I/O (input, output, and edge-capture) ports

Serial Peripheral Interface (SPI)

Implements an industry-standard serial peripheral interface with either master or slave protocol

DMA Controller Performs bulk data transfers by offloading memory tasks from the CPU

SDRAM Controller Provides a simple Avalon® interface to off-chip SDRAM and supports 8-, 16-, 32-, and 64-bit data

Memory Interfaces Includes:● On-chip ROM and RAM ● SDRAM, SSRAM, SRAM, and flash ● Altera serial configuration devices

Ethernet Port Includes:● 10/100 Mbps SMSC LAN91C111single-chip Ethernet controller interface ● Software support provided by the lightweight IP TCP/IP stack ● Included with the Nios II development kits

370

Page 380: Nios II Embedded Processor Design Contest - Outstanding - Altera

Appendix: Nios II Embedded Processor Family

deploy bug fixes, feature enhancements, and service upgrades for systems in the field, on site or remotely, simply download a new firmware image on the target system.

Reduce Your Total System CostYour total investment in an embedded application is more than the cost of the embedded processor. By implementing your embedded system in an FPGA, you can reduce system cost several ways:

■ System Integration Reduces BOM Cost—Reduce chip count by implementing functions currently handled by discrete components. Integrating functions into a single FPGA reduces board size, cost, density, and power consumption.

■ Royalty-Free Perpetual Use License—Purchase once, use perpetually, and pay no royalties for Nios II processors implemented in Altera FPGAs and HardCopy structured ASICs.

■ Low Cost Embedded FPGA—Combine the Nios II /e core with a low-cost Cyclone® series FPGA and consume as little as US $.23 of logic, which leaves plenty of room to add your own custom design or integrate functions performed by external devices. By using a soft-core processor, you can always target the latest, lowest cost device with minimal design risk.

■ High-Volume Migration Path to ASIC—Once your FPGA design is finalized and moves into high-volume production, you can quickly migrate it to an Altera HardCopy II structured ASIC solution, dramatically reducing device cost. Altera also offers an ASIC license for the Nios II processor, peripherals, and interconnect logic for designs that need to migrate to cell-based ASIC designs.

■ Exact-Fit Peripheral Feature Set—Avoid paying for unnecessary peripherals, features, or power consumption. Create a custom-tailored system that combines one or more CPUs with the exact set of peripherals, memory, and I/O interfaces you need, and a power consumption you can live with.

■ Custom Coprocessor—Avoid migrating to an expensive discrete processor for incremental performance enhancements. Instead, offload your computing to a low-cost coprocessor inside your FPGA. Using a combination of one or more Nios II processors, custom instructions, and hardware accelerators, your FPGA co-processor can provide significant performance boost to your existing processor.

■ Low-Cost Development Tools—You can download full featured evaluation versions of the Nios II processor, Nios II Embedded Design Suite, and SOPC Builder system design tool today for free enabling you to go from concept to complete system running in the lab in a matter of hours.

■ Driver Development Costs—Choose from a library of available drivers for your embedded needs and drastically reduce the need for custom driver development and accelerate time-to-market.

■ Reduce the Cost of Hardware Upgrades and Maintenance—The ability to update firmware over Ethernet is a common feature in today's embedded systems. In classic embedded systems (comprised of a discrete microprocessor and the devices it controlled), the word “firmware” applied to the update of the software image that the microprocessor was running. Add FPGAs to the embedded mix and the remote update possibilities increase. This is especially true of systems that contain a Nios II soft processor, because you can upgrade both the Nios II processor (as part of the FPGA image) and the software that it runs in one remote configuration session.

Speed Development with Nios II Development KitsAltera and its partners offer development kits that give you everything you need to start designing the perfect processor for your system today: from documentation to download cables, from boards to design software. One example kit is shown in Figure 12. To find out more, visit Altera's development kits web site at www.altera.com/devkits.

371

Page 381: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Figure 12. Nios II Development Kit, Cyclone Edition

Accelerate Your Learning CurveGet started immediately by downloading the complete set of Nios II design tools and intellectual property (IP) cores or accelerate your design efforts with one of our low-cost development kits. Reuse one of the many pre-built embedded reference designs as a template for your own application and quickly come up to speed on how to use Altera tools through several in-depth tutorials, online training modules, or instructor-led training classes.

There are several ways to learn more about Altera's embedded solutions, all of which begin by navigating to the Altera embedded web site (www.altera.com/embedded) where you can:

■ View online demonstrations

■ Read in-depth technical documentation

■ Download an evaluation version of the Nios II processors and Nios II IDE

■ Check out the latest Nios II development kits

■ Register to attend on-line or instructor-led training

When you are ready for the next step, simply order a development kit or contact a sales office. Visit www.altera.com today for details.

You can also visit the Nios design forum site (www.niosforum.org), where Nios and Nios II users around the world share ideas, design examples, and other information.

372

Page 382: Nios II Embedded Processor Design Contest - Outstanding - Altera

Appendix: Nios II Embedded Processor Design Contest Winners

Appendix: Nios II Embedded Processor Design Contest Winners

The following table lists the 2006 contest winners.

Nios II Design Contest Winners

Design University/College Award LocationNios II Processor-Based Remote Portable Multi-Function Logic Analyzer

Huazhong University of Science and Technology

First Prize China

Set-Top Box Capable of Real-Time Video Processing

Xidian University Second Prize China

Digital Watermark-Based Trademark Checker

Information Science Institute of Beijing JiaoTong University

Second Prize China

Automatic Scoring System Huazhong University of Science and Technology

Third Prize China

Digital Oscillograph Air Force Radar College of PLA Third Prize China

Nios II-Based Air-Jet Loom Control System Information College of Donghua University

Third Prize China

Nios II-Based Audio-Controlled Digital Oscillograph

Xi'an JiaoTong Universiy Third Prize China

Nios II-Based Multiple-Core Intelligent Traffic On-Vehicle Terminal

Xi'an College of Posts and Telecommunications

Third Prize China

Network Data Security System Design with High Security Insurance

I-Shou University First Prize Taiwan

Nios II Embedded Electronic Photo Album St. John's University Second Prize Taiwan

Omnidirectional Mobile Home Care Robot National Chung-Hsing University Third Prize Taiwan

Unattended Wireless Search Robot Kwangwoon University First Prize Korea

Black Box for Robot Manipulation Hanyang Univ, Seoul National Univ, Yonsei University

Second Prize Korea

Driver Assistance Tool Hanyang University Third Prize Korea

SOPC Implementation of Software-Defined Radio

National Institute of Technology, Trichy First Prize India

SOPC-Based Speech-to-Text Conversion National Institute of Technology, Trichy Second Prize India

373

Page 383: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

The following pages show photos of some of the award presentations and design contest winners.

Multimedia Decoder Using the Nios II Processor

Indian Institute of Science Third Prize India

Artificially Intelligent Autonomous Aircraft Navigation System Using a Distance Transform on an FPGA

Monash University Star Australia Australia

Nios II Embedded Processor Design Contest Award Presentation by Bob Xu at SOPC World Chengdu

Prize Won: First Prize, ChinaUniversity: Huazhong University of Science and TechnologyProject: Nios II Processor-Based Remote Portable Multi-Function Logic Analyzer

Prize Won: Second Prize, ChinaUniversity: Information Science Institute of Beijing JiaoTong UniversityProject: Digital Watermark-Based Trademark Checker

Prize Won: Third Prize, ChinaUniversity: Huazhong University of Science and TechnologyProject: Automatic Scoring System

Nios II Design Contest Winners

Design University/College Award Location

374

Page 384: Nios II Embedded Processor Design Contest - Outstanding - Altera

Appendix: Nios II Embedded Processor Design Contest Winners

Prize Won: First Prize, KoreaUniversity: Kwangwoon University Project: Unattended Wireless Search Robot

Prize Won: Second Prize, KoreaUniversity: Hanyang University, Seoul National University, and Yonsei UniversityProject: Black Box for Robot Manipulation

Nios II Embedded Processor Design Contest Award Presentation by May Yang of Galaxy and Jennifer Lo of Altera at SOPC World Taipei

Prize Won: First Prize, TaiwanUniversity: I-Shou UniversityProject: Network Data Security System Design with High Security Insurance

375

Page 385: Nios II Embedded Processor Design Contest - Outstanding - Altera

Nios II Embedded Processor Design Contest—Outstanding Designs 2006

Prize Won: Second Prize, TaiwanUniversity: St. John's UniversityProject: Nios II Embedded Electronic Photo Album

Prize Won: Third Prize, TaiwanUniversity: National Chung-Hsing University Kwangwoon University Project: Omnidirectional Mobile Home Care Robot

Nios II Embedded Processor Design Contest Award Presentation by Razak Mohammedali at SOPC World India

Prize Won: First Prize, IndiaUniversity: National Institute of Technology, TrichyProject: SOPC Implementation of Software-Defined Radio

376

Page 386: Nios II Embedded Processor Design Contest - Outstanding - Altera