Sign In
Forgot Password?
Sign In | | Create Account

Dot Hill Systems Corp.

Here’s the thing about RAID storage: anyone can do it. Which is to say, it’s easy enough to take an updated Windows PC or server and a handful of spare hard drives, and then set up a software RAID that delivers on the basic goals of reliability, performance and capacity. At least more or less. DIY RAID can quickly get ragged when the unusual starts to happen. Think power failures, severed PCI Express (PCIe) links or even stray alpha particles flipping bits in an SRAM. These outlying events, which actually aren’t that uncommon as big data continues to propel system complexity and demand for cloud storage explodes, are the main reasons storage vendor Dot Hill exists. The company’s promise is that customers won’t lose data even when bizarre exception cases happen. What follows is the story of how the company kept that promise when designing and verifying a new RAID accelerator, a project that yielded several new accomplishments, including first-pass success on a first-ever effort to use ASIC architecture after years of FPGA projects.

Download PDF (986kb)

Codelink, Questa® Advanced Simulator

We’d try to get the FPGAs into the lab as quick as we could and just find the problems there. With an ASIC, the goal is to find problems in simulation before you build the chip. This required a lot more thorough verification than the company has ever done before.”

Mike Peters, Dot Hill design engineer

The Problem: Starting with an FPGA prototype and a long history with FPGAs, Dot Hill sought to design and verify a new 30-million gate ASIC-based RAID accelerator using advanced OVM-based verification.

The Solution: First, Dot Hill engineers tested individual ASIC modules and then moved to the full chip testbench. Next, the team used OVM to abstract verification tasks so the same tests could be run on the prototype FPGA and final ASIC designs. Finally, they used Questa Codelink to debug and test software that would eventually run on the two ARM processors and to perform software-driven verification of certain hardware components, including an internal ROM that contained boot code.

The Results: Details of the various tools aside, ultimately the key question is whether the new approach worked and how things were different compared to earlier FPGA-based accelerators built and verified by the team. Consider that within two hours from the time the Dot Hill engineers got the chip on the board and back in their lab, they had it up and running. Indeed, once a couple of minor board problems were fixed, the team was able to access its processors. A short time afterwards, they had functional DRAM, and probably within a week they were running RAID cycles. Not bad on a chip this large.

Trading lab debug for simulation, and lots of it: Consider the issues tossed at the Questa Advanced Simulator. For starters, there’s the sheer volume of traffic pulsing through the device and all the associated concurrency those bits bring. The four PCIe ports can route through any of the other PCIe ports or to DDR. Writing to two or three registers in a RAID engine can launch literally thousands of XOR operations requiring complex calculations. And all the while the two ARM processors are running and executing code. Even something as seemingly simple as routing data across the chip is fraught. All the ASIC components are connected via a high-speed, point-to-point switch fabric, which itself supports multitasking. That is, a single write can be written to multiple destinations, indeterminacy that makes for devilishly difficult simulations. Heavy internal and external traffic traveling at high speeds is directly tied to the second big challenge — how to test for all the myriad options, configurations and use cases. The Dot Hill team concluded that trying to handle all this with directed tests that covered all permutations and configurations just wasn’t possible. So they tried to run lots and lots of random tests over a long period of time on as many servers as they could to get as much coverage as possible.

AVM, OVM, UVM — each is better than what we had before. Each one is making us more productive.”

Ty Sell, Dot Hill Verification Engineer

 
Online Chat