Using HW-SW Co-Verification to Enhance ASIC Verification
by Anders Ulander, Ericsson Utvecklings AB and Staffan Berg, Mentor Graphics (Scandinavia) AB
Abstract
This article describes the introduction of HW/SW co-verification in an ASIC design flow. The experiences accounted for in the article are based on the use of Seamless CVE from Mentor Graphics in a large telecom ASIC design project that was undertaken at Ericsson UAB in 1998.
The article explains why HW/SW co-design was chosen to enhance the verification flow, the technology that was used, how the methodology was introduced in the project, and summarizes some of the experiences from the project.
Introduction: The Project
During the last year Ericsson UAB has been working on a design based on a PowerPC 603e microprocessor. The design also included an interface ASIC containing general interfaces such as HDLC, UART and GPIO. The interface circuit also contained a complex memory controller. The size of the ASIC was roughly 400K Gates.
It was clear at an early stage that the verification of the ASIC would require extensive simulations involving the processor. Since the project also had a very aggressive time schedule to meet, it was decided early on that HW-SW co-verification should be explored as one way to enhance and speedup the verification process.
At an early stage co-operation with the software unit in Ericsson UAB was established. At this point, primarily the firmware writers were involved. The need for a common environment to ease communication between HW and SW designers became clear, and we saw the possibility to use Seamless CVE from Mentor Graphics as the foundation for this environment. After a training course at Mentor Graphics, where both HW and SW designers were trained on the tool and methodology, the co-operation could start.
HW-SW co-simulation with Seamless CVE
Seamless CVE is a co-simulation backplane that connects traditional simulators such as a VHDL-or Verilog simulator for the hardware, with an instruction set simulator (ISM) for the SW portion of a design. In the case of Ericsson UAB, we used both ModelSim from Mentor Graphics and Cadence Verilog-XL as HW simulators, and XRAY from Microtec Research was used as ISM.
The basic idea is simple: All hardware is modeled as usual in the HDL simulator, using standard VHDL or Verilog. The processor that is instantiated in the HW design is just a black box, a so-called Bus Interface Model. The actual behaviour of the processor is modeled in the Instruction Set Simulator. Whenever the processor needs to execute an instruction or access memory, the instruction set simulator sends a request through the backplane to the HDL model. The Bus Interface Model responds by performing the correct bus cycles, and the memory access is performed through the hardware and the correct data or instruction is returned to the ISM.
There are several advantages with this architecture:
- Since the processor is modeled at the Instruction level, rather than Gate or RTL level, the simulation performance of the processor model is several orders of magnitudes faster than with traditional HDL-based models.
- The Instruction Set Simulator includes a High-level SW debugger, that allows the designer to debug his SW as C or Assembly source code, rather than just viewing events in a waveform display. Breakpoints can be set in the source code, variables and pointers can be examined, just as in any modern debug environment for SW design.
- The HDL Simulator gives full access to all signals in the hardware. Signals can be monitored and changed, breakpoints can be triggered on events, and all events are simulated with full timing accuracy
This type of co-simulation is not entirely new, and similar approaches have existed before Seamless CVE was introduced in 1997. There are, however, certain drawbacks with this type of architecture, that becomes evident as soon as you try to use them on real designs. The main problem has to do with simulator performance and memory accesses. Although the ISM has the potential of executing software at a speed of several hundreds of thousands instructions per second, every instruction has to be fetched from the hardware. This means that for each instruction or memory access, the ISM has to stop and wait for the HW simulator to execute the proper sequence of events on the buses. Since the hardware simulator works at a much lower level of abstraction, it has to deal with possibly thousands of events per instruction, which results in an overall performance of less than 10 instructions per second. At this rate, debugging anything more than a few lines of code becomes virtually impossible. Running real applications or Real-time operating systems would require days or weeks of simulation time.
The way this is handled in Seamless CVE is through a patented optimization technology. Instead of modeling every memory access in the HW simulator, Seamless CVE stores the memory contents centrally in the so-called Memory Server. The designer then has the choice of which particular memory access he wants to simulate in the HW simulator. Normally you start verifying your design without any optimizations. This means that simulation will run very slowly, but every instruction fetch and memory access can be monitored in the HW simulator. Once the instruction fetches are verified, the user has the ability to enable Instruction Fetch optimizations, which allows the ISM to fetch instructions directly from the memory server. Next step is to selectively switch on Memory Access optimizations for those portions of the memory map that have already been verified, which will further improve the simulation throughput. The third and final optimization that is available in Seamless CVE is called Warp or Time optimization. In this mode the ISM is allowed to run at full speed for shorter or longer periods of time, re-synchronizing with the HW only when an access is made to an unoptimized region in memory, or an interrupt occurs.
By using these three optimizations in an intelligent way, it is possible to improve simulation speed several orders of magnitudes in some cases. Furthermore, the optimizations can be switched on or off dynamically during the simulation, so it is possible to alter between full speed optimised simulation and slower non-optimised mode, with full access to the HW during the same simulation run. These optimizations are really the enabling factor that allows the designer to run realistic amounts of SW on the HW model.
Co-Simulation at Work - Some Experiences
Verification strategyThere are several strategies to choose from when verifying a large, HDL-based ASIC design. The most common approach today is to create a testbench in the same language as the model of the ASIC, i.e. VHDL or Verilog. These testbenches can become very complex, and it is not uncommon for the testbench to consist of larger amounts of HDL code than the design itself.
The other approach, that we took for this project, is to create SW-based test cases. In this type of environment, less effort is spent on the HDL testbench. Instead, some kind of full-functional or behavioural model of the processor is used to execute real SW and drive the simulation.
Until now, it has only been possible to run small pieces of code with this approach, but the optimizations in Seamless CVE enabled us to run significantly larger amounts of SW. This made it possible to create testcases more or less entirely in SW, and the HDL testbench was reduced to providing stimulus from the surrounding environment. The resulting HDL testbench became less complex and more generic, and the verification was controlled from the SW. This also meant that at the same time as the HW was verified, the interface between HW and SW was also tested. During the verification work, one SW designer was working full-time with the HW design team creating testcases based on real applications.
Optimizations: The Enabling Factor for HW-SW Co-Verification
Since our ASIC contains several communication modules (HDLC, UART, GPIO, Device Bus, etc.) and, hence, quite a lot of DMA functionality, the memory will be accessed both from SW and directly from HW during execution. This makes the optimizations even more interesting, since the SW can gain access to memory directly and in that manner improve simulation performance significantly.
During most of the simulations the Instruction Fetch optimizations were used to improve simulation performance. Some testcases involved creation of large datastructures in RAM, which were then accessed, modified and written back by the HW. Since the datastructures were created by SW, the memory access optimizations could be used extensively here to avoid excessive simulation runs just to setup the design. This resulted in significant speedups of the simulation.
The third type of optimization, called Warp or Time, was also used a lot to improve speed further. The caveat with Warp optimization is to watch out for timing loops in the SW. Since the SW simulation is running at full speed during Warp mode, any dependencies on HW timing in the SW needs special attention. During the initialization of our ASIC there was one such case, where the memory controller had to wait a specific number of clock cycles before accessing the DRAM. The technique we used in Seamless CVE to manage this was to use breakpoints to temporarily disable Warp mode during the timing loops, to ensure that the HW and SW stayed synchronized.
Managing Complex Memories
One specific problem we ran into had to do with the modeling of complex SDRAMs. The memory controller in our ASIC allowed us to reprogram it to setup different mappings of the RAS and CAS signals for SDRAMs. This meant that the actual bit ordering on the address bus could change between the processor and memories. Although the current version of Seamless (2.2) allowed us to change the bit order, we soon found that this alone was not sufficient to handle our particular addressing scheme. At first this seemed to be a real showstopper, since the RAS and CAS mapping was rather fundamental to the memory controller. After several contacts with the Seamless engineering team in the U.S, however, we got an update to the tool after a relatively short time. This update allowed us to specify more elaborate memory configurations using a special configuration file.
Design Problems
One of the more serious problems we found manifested itself as incorrect memory behaviour. To model the SDRAMs we had used special Seamless memory models generated by the Denali Memmaker tool supplied with Seamless CVE. By using debug capabilities in the memory models it was soon found that the memory controller actually stored incorrect data into the memories. After some investigation the reason was found to be where nobody had expected -- in our version management system. It turned out that although this problem had already been fixed, the version management system was configured to use an older, incorrect version of one block in the design.
Without using HW-SW Co-verification, this type of error would have been extremely hard to find and debug.
Conclusions and Future Directions
As this project is now completed, we have gained some experiences that we will continue to build upon in future projects. Some of the advantages we found using this methodology are:
- Improved communication between HW and SW design teams early in the project. This has probably been the greatest benefit. Previously, only the System Designers had insight into both the HW and SW side of things. Early in the projects the HW and SW teams were often "out-of-sync", thus making it difficult to communicate between teams. Seamless has provided us with a natural platform for the communication, and at the same time it gives the HW designers some insight into how the SW is interacting with the HW, as well as giving the SW designers a possibility to understand how the real HW will behave.
- The optimizations that makes it possible to run large amounts of SW to make realistic testcases.
- Fast and efficient support from the tool vendor when we had problems.
We also found some areas where we think there is room for future improvements:
- Seamless CVE is a rather complex environment with two different simulators and many different windows. To invoke and setup a simulation involved quite a bit of "button-pushing" each time.
- There is currently no checkpoint/restart capability in the tool. Each time you make a change you need to go back to time 0 and reload the design. This is not as bad as it may seem at first, due to the optimization capabilities.
- Seamless requires the use of specific models for the processor and memories. These models are supplied with the tool, together with a memory model generator.
Future projects
On the basis of our experiences we have now started planning for our next project, where we will also use HW/SW co-verification. The next project presents us with some new challenges:
- Rather than using a standard microprocessor part, the next design will be based on an embedded core on a chip. This of course makes HW/SW co-verification even more appealing.
- In the next project we will also try to run a Real-time operating system on our simulated design. This will enable us to do even more verification early on, but it will no doubt also introduce new challenges that have to be solved.
- One possibility that is currently under consideration is to use a combination of ASIC emulation and HW/SW co-verification. Seamless already supports this, but the methodology is yet unproven inside Ericsson
