Sign In
Forgot Password?
Sign In | | Create Account

A Roll of the Dice

Russ Klein

Russ Klein

Posted Oct 19, 2009
0 Comments

Constrained random seems all the rage recently.    Last week I was visiting one of our field offices.  I was discussing with some of the applications engineers about how to use a processor to drive stimulus into a design.  One chimed in and said that this was all very interesting, but it could not be used to drive constrained random stimulus.

<Cue sound of a record scratching>

What??

“You can’t have a processor generate random traffic like you can with SystemVerilog,” he asserted (no pun intended).

The thing that got me is that this about the 20th person who works for Mentor to tell me this.  So, this week’s Mythbusting blog entry is devoted to generating random stimulus from a processor.

Now let me start by saying that it is my belief that when hunting bugs you are far better off with a rifle than a shotgun.  Testing (or verification) of computer languages is a pretty well defined science – these things should not be left to chance.  The folks who work for me know that they will be chastised severely if they go about testing with a “spray and pray” approach.  So, I never thought much of the constrained random approach to HDL verification.

However, after reading a paper by Richard McGee, Paul Furlong, and Fabian Delguste called “VHDL to SystemVerilog: Constrained Random Verification of a USB 2.0 Host Controller Sub-System.” I started to understand the approach better and will grudgingly admit that there may be some merit to this approach.  And, I can see the appeal from a developer’s point of view. The paper was presented at the Synopsys User’s Group meeting in Europe in 2006 and is available here.  While I admit there is some merit here, I do think that systematic approaches to the problem are superior and worth a look.  Information about Mentor’s offering in this area can be found here.

Now, on to the programming side of things. Contrary to widely held belief, the standard C run time library includes the function rand().  So, any C program that you want to run on a processor in simulation should have access to the rand() function – an adequate random number generator.  I say “should” because embedded compiler vendors do remove functions from the standard C library when they retarget the compiler for embedded purposes.  Oftentimes functions like printf() and fopen() are dropped to conserve space – since they are not likely to be used.  I know that rand() is present in the C runtime library for ARM’s RealView development suite.

If your C runtime library does not contain rand(), here is a simple one you can use:

int rand(void)
{
static long a = 0x1234;  // seed value
a = (a * 32719 + 3) %32749;
return a;
}

This is an implementation of “Gerhard’s generator” and I stole it from this website. It seems to do a good job of efficiently generating the lower 16 bits of pseudo-random numbers.

The C run time library function rand() returns 32 random bits.  To get a random number between 0 and N, you need to a bit of math.  The simplest way I know of to do this is to use the modulus operator or p = rand()%N – this sets p to a random number between 0 and N-1.  So if you want number from 1 to 10, you could use the expression 1+rand()%10.  You can use the following macro to make the code look a bit cleaner

#define RAND(x) (rand()%(x))

Now “RAND(10)” placed anywhere in your code will get you a random number between 0 and 9.

Of course you will need your simulations to be repeatable – this can be done with a random number seed.  C provides a function srand(), which takes a 32 bit value as an argument.  By calling this at the start of a run of a program, you can get exactly the same sequence of random numbers every time you run the program.

Now let’s get a little fancier.  Say you want a particular distribution of numbers.  For example you want 1/3rd probability for a value of 46, a 25% probability of 128, and for the remaining numbers, normal distribution from 50 to 100, with a standard deviation of 25.  Here is a function that would deliver this:

func()
{
if (0==RAND(3)) return 46;
if (0==RAND(4)) return 128;
i = 0;
n = rand() & 0xFFFF;
while (n<lookup_table[i++]);
return i+46;
}

where lookup_table is initialized with data for an appropriate cumulative distribution, which can be generated pretty easily in Excel.  If you run this for a while, it produces this distribution:

Distribution of N

You can even generate Poisson, binomial, uniform, or geometric distributions.  With a little programming you can even create something as obscure as Fischer’s non-central hypergeometric distribution, (I knew that advanced probability and statistics class would pay off in the real world.)  Wikipedia has a good discussion of probability distributions and their applications.  If you have forgotten everything from MATH-251 in college, this website has a good refresher tutorial…Worth brushing up on if you’re going to be doing a lot of constrained random programming.

OK, so now let’s apply this to the classic constrained random example.  We’ll make a routine to create some random Ethernet packets.  Here’s the SystemVerilog example (taken from Wikipedia here)

class eth_frame;
    rand bit [47:0] dest;
    rand bit [47:0] src;
    rand bit [15:0] type;
    rand byte       payload[];
    bit [31:0]      fcs;
    rand bit [31:0] fcs_corrupt;
    constraint basic {
        payload.size inside {[46:1500]};
    }
    constraint good_fr {
        fcs_corrupt == 0;
    }
endclass

So in C we would define the structure as follows:

struct {
    unsigned char  dest[6];
    unsigned char  src[6];
    unsigned short type;
    unsigned char  payload[1500];
    unsigned long  fcs;
} eth_frame;

Then we can define a random function to populate the structure:

random_eth_packet(struct eth_frame *ep)
{
    for(i=0; i<6; i++) {
        ep.dest[i] = RAND(256);
        ep.src[i] = RAND(256);
    }
    ep.type = eth_size();
    for (i=0; i<ep.type; i++) {
        ep.payload[i] = RAND(256);
    }
    ep.fcs = cyc_check(ep);
}
eth_size()
{
    if (0==RAND(3)) return 46;
    if (0==RAND(4)) return 1500;
    i = 0;
    n = rand() &
    while (n<lookup_table[i++]);
    return i++
}

This is equivalent to the random method on the SystemVerilog class.

Now I will admit that the application of constraints is far more elegant in SystemVerilog than it can be in C.  Using C++ one can probably construct something which approaches what SystemVerilog has implemented.  But C++ can be a high overhead language; its use on a simulated processor should be limited to Jedi-level programmers who really understand the performance implications of any particular construct…or those who have some type of simulated processor acceleration. (There’s an app for that, which I have, but that’s another post.)

For C one would need to use either #defines to apply constraints, or plumb in the logic to a particular random function.  As an example, if we wanted to have the random_eth_packet generate a bad checksum for 1% of the packets, we could add a parameter “bad_packet” to the function signature, and then change the assignment to ep.fcs as:

if (bad_packet) {
    ep.fcs = RAND(0x10000);
} else {
    ep.fcs = cyc_check(ep);
}

We would then call the function as:

random_eth_packet(ep, 0==RAND(100));

To change the percentage of bad packets from the compilation command line, we can substitute the “0==” with a macro definition, say “PCT_BAD_CRC” being less than RAND(100). You can set this at compilation time with a -dPCT_BAD_CRC=20. With PCT_BAD_CRC set to 0, we will get no bad crcs; with 100, we will get all bad CRCs (except in the very rare occurrence where the rand(0×1000) actually equals the CRC).

Now as the constraints become more sophisticated, C runs out of gas when compared to SystemVerilog.  You can apply a Boolean expression to set of random values (as below)  - but there is nothing like the real constraint solver that you get with SystemVerilog, and not enough CPU power to implement one.

do {
    a = RAND(32);
    b = RAND(32);
    c = RAND(32);
} while ((a<b) && (b<c));

A simple do-while loop works, but you should make sure the probability of success is at least 1 in 4 or you could burn a lot of simulation time spinning here.

Of course, random stimulus requires that we set up cover groups and bins to appropriately record the coverage.  With Questa or ModelSim we have a way to export software variables in the code on the processor for inclusion in the UCDB (Unified Coverage DataBase).  Then they can be included in coverage analysis.  I’m not sure how you’d do this with VCS or NCSim – if anyone has some ideas please post a comment.

So now for the sixty-four kilo-buck question, if we can create random Ethernet packets from either a processor or from a SystemVerilog task and an AMBA transactor, why would we pick one over the other?  The overriding criteria on this decision should be where you will get the highest return on your coding efforts.  Let me give you a couple of reasons why you might consider writing this type of stimulus in C running on a processor, and a couple reasons why you might not.

The first has to do with verification IP reuse.  While your job may not be affected by it, verification of a design extends well beyond the RTL coding phase.  Yes, the RTL coding phase is a significant and expensive step, but there may be FPGA prototypes or emulators in your design’s future.  You’ll need to drive stimulus (and check responses) in those environments, too.  Undoubtedly, there is a physical prototype, and at some point someone is going to need to think about verification of manufactured systems (though this often comes under the heading of “test”).  Also, there are diagnostics which will need to run on released systems.  Verification IP written in SystemVerilog doesn’t fit, or doesn’t fit well, in these environments.  Verification IP written to run on the processor can be leveraged across all of these environments.

The second reason you might want to run stimulus from the processor is that it will provide a better grade of, well – for lack of a better term, randomness.  A SystemVerilog task randomly driving stimulus on an AXI transactor has a large state space to run in.  There are a lot of different transactions which can be driven, and if we consider the possible permutations of sequences of 3 or 4 transactions, we are way beyond what can be run in simulation—which means that you’re only going to hit a fraction of the state space.  The actual processor (and corresponding software) cannot and will not drive anywhere near the universe of possible permutations.  In fact, it will tend to repeat the same patterns over and over again – the same patterns that will occur in the real system.

So, do you want your random walk through the stimulus to spend most of its simulation cycles exercising conditions that will exactly match the activity of a specific processor in the final system?  Or, do you want to spend your time simulating the broad spectrum of possible activity that might possibly occur?  Your actual answer will depend on your verification goals. If you are trying to validate that a particular block of IP will work with a specific processor, you should drive the stimulus from that processor.  Conversely, if you are trying to validate that a block of IP will work with any valid bus master, then you will not want to use the processor.

Another reason you might not want to put your stimulus generation on the processor is if you have no debug environment for code which is running on the processor in simulation.  Your productivity of writing and debugging complex code without access to a modern graphical debugger is horrendous.  Would you write SystemVerilog if you had no debug environment for it?  Probably not.  In this case you are better off writing the HDL verification in a language you can be productive in and let someone else in your organization write and debug it again for the downstream environments.

So, you can generate constrained random stimulus while running code on a processor.  In some cases this can be a better way to create stimulus, as it can be reused in downstream applications and provides a more realistic set of stimulus than you’ll get otherwise.  This myth is totally busted; and we didn’t even have to use a crash test dummy.

More Blog Posts

About Russ Klein

Russ KleinI’m an engineering manager here at Mentor Graphics in charge of the Seamless and Codelink products. These products bring together the worlds of hardware and software. So I need to think like a hardware developer and a software developer – at the same time. And sometimes that makes my head hurt. These are my observations, rants, and openly biased commentary on processors, embedded systems development, and the EDA industry. Visit Russ Klein's Blog

More Posts by Russ Klein

Comments

No one has commented yet on this post. Be the first to comment below.

Add Your Comment

Please complete the following information to comment or sign in.

(Your email will not be published)

Archives

 
Online Chat