I spent most of this week working with a customer debugging a bus matrix. Yeah, I’m a manager, but debugging HDL code is a whole lot more fun than finishing up those performance reviews I need to get to. Anyway, the design is a 3 processor design – the first processor comes out of reset and runs for several thousand instructions. Then we see a problem. For no apparent reason the processor branches back to the reset vector. It does this over and over again. The customer and I were trying figure out what was causing this.
The software running on the processor was fully validated with the old bus interconnect – so we doubted it was a software bug. But the new bus matrix was fully validated as a stand alone block of HDL. I know the guys who did the verification, and they are very thorough. They did run every possible bus cycle from every master port, and exercised every slave port. There are 6 master ports and 12 slave ports on the matrix. It has 2 active channels, with full connectivity – that is any of the 6 masters can reach any of the 12 slaves across either of the two channels. Both channels can transfer data concurrently.
After a bunch of debugging we found out that an opcode that was being fetched was getting lost between the boot ROM and the processor. A burst transaction on the instruction port of the processor was getting a good opcode in the first word, but the second opcode was bad. The 3rd and 4th words were OK. The second word in the burst transaction was getting the data value from the second channel, in this case a 0 – which on this processor results in a no-op. This no-op was causing the problem we were seeing.
Some work with the author of the bus matrix ultimately exposed the RTL coding problem, and we got things up and running. It turned out that to expose the problem, 4 of the 6 masters on the matrix needed to be idle. One of the remaining 2 masters needed to drive a burst read and the other needed to drive a non-sequential word read cycle. The second master needed to start the transaction at the same time as the address phase of the second beat of the burst started. Also, the master driving the single cycle needed to have a higher priority than the master driving the burst cycle. The priorities of the masters are changed dynamically as the bus matrix processes transactions. One final requirement for the bug to manifest itself is that the two masters must be accessing different slaves and the slaves must have the same response time, i.e. both need to return after the same number of wait states. If the data phases of the bus cycles were not aligned, then the problem did not occur.
This seemed to me a pretty unlikely combination of events. So I did a little math to figure out if we had driven an even distribution of random bus cycles against the 6 master ports on the bus matrix, what would be the likelihood of getting this exact combination that exposed this bug.
I came out with a probability of 1 in 10 to 24th power. Those are about the same odds as winning the lottery – 3 times in a row. But the strange thing is that during one of our debugging session we forced the good opcode into the processor’s instruction bus – to get past this point and see what else would happen with the software. We hit the same problem after another 100 or so instructions – at a different place in the program. Patching that problem, we saw it again after another 200 instructions. How could such an unlikely combination of events happen so frequently in such a short time?
Recall that we need 4 of the 6 masters to be idle for the problem to be exposed. The design is based on Harvard architecture CPUs, so each processor brings an instruction port and a data port to the party. In this design one processor boots up, and then brings up the others later. So there is a long period of time where two master ports are active, but the other 4 are idle. But exposing the bug also required a particular alignment of bus cycles on the 2 active buses. While we may think of the activity on the 2 active bus ports to be random, they are controlled by the same IP block (the processor) and are therefore coordinated.
It turned out that the instruction pattern laid down by the compiler (with no optimizations enabled) for a “for” loop elicited the exact conditions that exposed the problem. But only if the for loop initial instruction was on a burst boundary (at an address divisible by 16). A quick scan through the executable image for the design uncovered literally hundreds of the problematic instruction sequences. Despite appearing impossibly unlikely, this problem we uncovered would be very common when executing code that was compiled using our compiler. Interestingly (or perhaps most scary) is that this combination of instructions was never seen in the hand written assembly code used in most of the tests.
Now, as the developer of the bus matrix, how could you ever be expected to know what combination of bus cycles would be driven into your design? I’m not sure. It does help to have the larger view of what’s going on in the design. In this example, if the bus matrix verification team knew that only two buses would be active during part of system execution, it might be practical to exhaustively test the activity of only 2 active masters. But the coordination of the bus cycles from the processor across the instruction and data buses is only known to the folks who designed the processor – and they’re not telling. So either learn to be psychic, or drive an actual processor as part of your verification.
I have an on going debate with a colleague about verification of an HDL module. If the HDL module will interface (either directly or indirectly) to a processor, I argue that you should run some code on the processor that accesses that module. This will expose combinations of stimulus that you won’t find using any other method. My colleague argues that it’s too much trouble and won’t expose any problems that couldn’t be found using traditional verification methods. But if the state space you’re searching is large, and processor activity may not fill that state space. Processors can take a very narrow and well worn path. On the other hand they may wander a path broadly though that space. Shouldn’t you test that path – at least once – before committing your HDL?
I’m not advocating throwing away all your verification tools and only driving activity from a processor. But I am saying you might want to include a processor running a program as part of your verification strategy for those parts of the design that interact with the processor.
One final thought, every embedded system project I have ever been involved with has had hardware problems exposed when software is first run. Perhaps your experience is different, but I think it is pretty rare for a system to come together and not have software expose some kind of problem in the hardware.
There are a bunch of products and languages on the market to help hardware designers find problems in the HDL that they are writing. And some of these languages and products have been very successful. It seems odd to me that with a source of design stimulus that will almost always expose real bugs (not unusual corner cases) that only a small percentage of designers take advantage of software as a part of their verification.