Hardware/Software Co-Verification with RTOS Application Code
By Michael Bradley, Mentor Graphics and Kainian Xie, Hyperchip Inc
Introduction
Software programmers have a few tools and methodologies to develop and debug embedded software. A standalone ISS can be used to run compiled code locally on a host workstation or PC. Device drivers and other routines that interact with the hardware must be stubbed out, or the hardware must be emulated within a debugger macro language. Two disadvantages of this approach are the limitations of the macro language, and the accuracy of the implementation of the macros. An evaluation board that contains the target CPU is often used, and has the advantage of real time performance. Its disadvantage is that its hardware resources are general purpose and bear no or little resemblance to the final product. An FPGA prototype can be created to mimic the hardware to be deployed, but this is a complex undertaking, especially for designs that consume multiple FPGA's.
One solution to accurate hardware/software verification is to use the ISS of the target CPU and "connect" it to the hardware simulator being used by the hardware design group. One obvious disadvantage of this is that the software execution is limited to the speed of the hardware simulator. The Seamless? Co-Verification package from Mentor Graphics increases the speed of the ISS-Hardware Simulator "connection" by allowing most of the ISS instruction cycles to run decoupled from the hardware simulator. This patented technology termed "optimizations" has been used to generate successful Silicon on Chip (SoC) tape-outs, as well as CPU based board designs.
Another tool available to the programmer is an RTOS simulator. An RTOS simulator does not emulate the instruction set of a CPU; instead it models the resources of the RTOS itself. This allows the programmer to develop and debug task level operations such as pending and posting to a mutex, rescheduling of tasks, mailbox operations, etc. The RTOS simulator is a higher level of abstraction than an ISS. It is CPU independent and does not require (or allow) assembly code.
It is possible to "connect" an RTOS simulator to the hardware simulator through Seamless. At this level of abstraction, it is possible to observe the threads of execution, and how they interact with the hardware. The effect is the appearance that thousands of software cycles have run in conjunction with the hardware in essentially zero time. In other words, the RTOS can be initialized, application tasks started, and the software ready to interact with the hardware before the hardware simulator has advanced. Once in this state, the hardware will be initialized by the RTOS application, and hardware interaction begins. The software can now perform system level transactions with the hardware. This test environment is not concerned with CPU instructions, it will be used to exercise high-level operations in hardware and software; its performance will be bounded by the amount of hardware simulator time needed to perform a given software or testbench request.
The Line Card Design
Hyperchip has developed optical communication line cards. The cards plug in to a Hyperchip proprietary switch fabric. The switch fabric is structured to be highly parallel, which eliminates serial bottlenecks. The entire system is targeted for the core routing of optical networks, at a total speed of 1 petabits per second (a petabit is a thousand terabits, or 10 to the 15 power).
The forwarding & traffic management engine and support functions are implemented in several FPGA's. A CPU is connected to the datapath hardware via ' PCI bus interface. The CPU runs the Real Time Operating System, (RTOS) VxWorks? from WindRiver. WindRiver also provides VxSim? as an RTOS simulator for VxWorks.
In the deployed line card, VxWorks runs on the core of the CPU. VxWorks' memory space is the SDRAM local to the CPU. The PCI block within the CPU acts as a bridge and allows the core to communicate with the datapath hardware. Data path hardware is able to communicate to the RTOS by depositing traffic information to SDRAM, and sending a PCI interrupt to the CPU.

Figure 1 shows the major blocks in the line card. For the hardware/software verification environment, the hardware and software processes must communicate through some interface logic in the hardware simulator. This hardware/software interface is typically the pins of a CPU core or chip. However, in the line card design, we are able to obtain a higher level of abstraction by interfacing at the PCI bus. To accomplish this, Seamless provides a PCI 2.1 compliant transactor model. The transactor model allows I/O reads and writes from VxSim to be performed in hardware. The PCI transactor also provides an interrupt facility from the hardware to VxSim.
In the simulation environment, the CPU and SDRAM are abstracted. VxSim will replace VxWorks running on the processor. VxSim is a simulated version of VxWorks, and runs on the workstation CPU. The workstation memory will replace the SDRAM. The Seamless PCI transactor model acts as the PCI bridge located in the CPU. Seamless implements the requested bus transactions from VxSim in the PCI transactor model. The PCI transactor model is instantiated in the VHDL design.
VxSim is integrated with Seamless via the HCE? mode. Host Code Execution (HCE) is a special mode of Seamless that is activated when an ISS is not present. HCE mode allows the user to execute C code that references an HCE library, and is compiled for the workstation. The HCE library interfaces to a Bus Interface Model (BIM) in the hardware simulator. In other words, the HCE library allows the users C-Code to interact with the hardware simulator. The HCE library has four major functions:
- Advance time in the hardware simulator
- Initiate PCI bus master transactions
- Creates a callback to accept and/or present data when the transactor is accessed as a target
- Creates a callback to process PCI interrupts

The line card software is designed so that the higher levels of software are independent of the underlying hardware platform or simulation environment. The line card code can be ported to different platforms by altering the hardware abstraction layer). Several abstraction layer versions were created in order to support different environments: CPU evaluation board, the Seamless/VxSim environment, and the final hardware.
The deployed hardware system will boot from FLASH, which will copy VxWorks to SDRAM, where VxWorks is initialized and started. In the Seamless/VxSim environment, the booting operation is not needed; execution begins in VxSim and the users startup routine. The startup routine calls hardware initialization routines, and starts the users tasks. These tasks run various tests on the line card. A typical startup sequence is:
- Initialize Seamless PCI transactor
- Search for PCI targets on the PCI bus. Configure targets as needed
- Register PCI targets as IO devices in the VxWorks IO sub-system
- Start tasks to run tests
Synchronization of VxSim and the Hardware Simulator
At first glance, one may assume that synchronization of the VxSim process running on the host with the hardware simulator is going to be a complex issue. In reality, this synchronization is not much different from typical synchronization issues between hardware and software. Hardware and software typically run asynchronous to each other. Methods such as polling and interrupts allow the hardware and software state machines to "sync up" and exchange information. Similarly, when users application code implements polling or interrupt methods, this will synchronize the hardware simulator and VxSim via the Seamless kernel.
Just in case the user does not have polling or interrupt driven software, or if additional control over synchronization is needed, the HCE library provides additional facility to control synchronization. The HCE function, hce_AdvanceHardware(), is used to tell the hardware simulator to advance simulation time. Since the line card software is interrupt driven, it was not necessary to precisely control the hardware advance time. It is more convenient to let the hardware advance function run periodically as a VxSim background task. Accordingly, the hce_AdvanceHardware() function is put in its own task, and run at VxSim's highest task priority. This task also suspends itself, in order that the other tasks may run:

The effect is that the hardware will advance by 100 PCI clocks, and then VxSim will run for 100 ticks. The global variable hw_ready_to_go is used to delay the start of the hardware simulator until VxSim has completed its initialization (in some of the tests, a VxSim task is run to accept user input. The user is allowed to input the test(s) he wishes to run). The hw_ready_to_go variable is set at the end of the software and hardware initialization tasks, and when a test is ready to run. The hce_AdvanceHardware() function does not need to be called during hardware initialization because Seamless will automatically advance hardware simulation time for PCI bus transactions initiated by the RTOS.
Since the RTOS is running on the host, we cannot install interrupt service routines as we normally would in the deployed system. Instead, Seamless provides an HCE callback routine that is called whenever a PCI interrupt occurs. An argument is passed to the callback that indicates the cause of the interrupt. The Interrupt type include all possible PCI interrupt types as well as an additional type which indicates that the PCI transactor model has been accessed as a target (slave). Some skeleton code for the interrupt callback is shown in figure 4.

In this example, there are software tasks polling global variables such as isrFlagA and isrFlagB. The tasks periodically sample these global variables and execute appropriate interrupt code if they are set. Alternatively, a mutex could be used to activate the interrupt handler, or code placed here directly, as in the case of INTERRUPT_TARGET.
If the interrupt is of type INTERRUPT_TARGET, this indicates that the PCI transactor is being accessed as a target. In the case of the line card, the datapath hardware would only access the PCI bus to transfer packets of data into or out of memory. First an HCE function is called to determine the type of transaction (read or write), then an additional HCE function is used to receive or send data for the appropriate address.
Software / Hardware Testing
The loop back test is an exciting software application to run with hardware/software Co-Verification. The software application code generates traffic such as IPV4, IPV6, MPLS, etc. and injects these packets into the inbound network processor via the PCI bus and the support FPGA. These packets go through the datapath hardware to a physical loop back in the switch fabric testbench. Then the packets return through the datapath hardware and are deposited to a buffer in the RTOS. When the buffer is full, an interrupt is generated by the support FPGA. The interrupt is handled, and the packets are delivered by driver to the RTOS task.

In this environment at Hyperchip, the Software Developer initiates the testing, by creating software applications within VxSim. Tasks are written to configure hardware registers, and then read the status registers to ensure the hardware is in the proper state. Next, the tasks inject data packets into the driver. The Software Developer can then analyze the Modelsim? waveforms to find out if the packet is really injected into the hardware simulation environment. After this, the Hardware engineer can trace the packets through the datapath hardware. If everything goes OK, and the packet did loop back and did go into software via PCI, the software Developer will make sure the return packet data is correct.
During the loop back test, many bugs were caught easily because of the visualization of the software algorithm and the hardware implementation. Breakpoints where used on both sides to stop the simulation so that, the state of the system could be analyzed.
Conclusion
The simulation environment of Seamless with VxSim, Modelsim, and the PCI transactor model proved to be a very effective environment in performing system level simulations. It was possible to exercise the hardware in the same manner, as it will be in a deployed system. This would not be possible with typical testbench tools. Application level software was debugged against a virtual prototype of the hardware. This saved valuable time, as a physical prototype was not required. The software was developed ahead of schedule, before physical samples where available.
Several hardware design issues where revealed during Co-Verification. Since the hardware was described at the RTL level, changes where easy to make.
It would not have been possible to execute such a large amount of software in a typical hardware simulation environment. Seamless allowed the software to run on the host workstation, and to periodically synchronize with the hardware simulator. For the loop back test, the execution time of the Seamless simulation was essentially the same as the time needed for the packets to traverse through the datapath in the hardware simulator. Throughout the entire process, it was possible to control and observe results in both the hardware and software environments. It provided an insight to the line card operation, not possible with other tools.
VxWorks? and VxSim? are registered trademarks of Windriver Corporation. Modelsim? and Seamless? are registered trademarks of Mentor Graphics Corporation. HCE™ is a trademark of Mentor Graphics Corporation.
