C Synthesis is actually quite easy, comparatively

For those of you ASIC and FPGA hardware designers who have been dismissing C++ as a hardware description language (HDL), this blog is for you.  I’ve been thinking back to some of my first hardware design projects to prove to myself that it is really much easier to learn C++ for hardware design than it was to figure out VHDL or Verilog for the first time.

Maybe, it is all the gray hair showing my decade and a half of experience in hardware design, but I don’t think so.  Coming out of college in the early 90’s, things were in flux in hardware design.  Many designers were still using schematic entry while some were using special languages like ABEL and Palasm for CPLDs and early FPGAs.  Cutting-edge ASIC designers moved to VHDL or Verilog depending on their geographic location or end application requirements.  VHDL gave users more control of the hardware description by being a stricter language while Verilog allowed more flexibility and higher level of abstraction, often being compared to C.  I didn’t pick sides in this language battle and just learned both.

Doing stuff with schematic entry was really painful for someone like me that has trouble drawing stick figures, much less hooking up complex circuits from low level primitives.  I tended to think of things top down in terms of functionality.  My interests growing up may have influenced this approach.  As a kid, I didn’t play around with circuit boards, except for learning how to solder one summer in my dad’s physics lab.  I grew up wanting to write games for my TRS-80 Color computer, so I learned BASIC.  Later, I learned Turbo PASCAL while getting my BSEE at UCSD, plus a little Modula from the Math department.  Somewhere down the line, I had to learn TCL and Perl for scripting. C/C++ followed naturally from all these other things.  I didn’t use much C/C++ in college, but I did later when doing embedded stuff at Lattice, as well as some console applications.

Preparing Recommendations

When I started my first projects for PALs, GALs and CPLDs, I struggled mightily with the tight constraints of thinking in low level primitives and library macros.  As I mentioned, I didn’t growing up thinking in pictures, so I quickly moved to ABEL.  It was like programming except this darn tendency of hardware to run in parallel.  By the mid-90s, VHDL synthesis had broken in CPLDs and FPGAs.  It allowed a much higher level of abstraction, but it was really tough to figure what you were going to get from synthesis tools.  A ton of stuff in the language wasn’t supported for synthesis, or had different effects in different tools.  You really had to experiment, follow examples or comb through the RTL schematic viewers to figure out what the hardware was going to do.  Simulation tools were a rarity for most of us, plus there was all that extra work of writing a test bench and getting another entire tool flow to work.

The growth of ASIC designs has been slowed by the problems with verification.  Mistakes can kill start ups or careers.  The growth of FPGAs has been limited by having to learn VHDL or Verilog, where disciplines like physicists or algorithm researchers find the programming too difficult to endure the pain to get access to a fast, programmable hardware platform and just live with slow, general purpose processors running software.  On the other hand, it is really easy to hack stuff together in C code to make things functional at the software level with MSVC++ or GNU gcc.  The early knocks on high-level synthesis (HLS) tools were that they couldn’t match hand coded RTL results or couldn’t synthesize the entire chip.  John Cooley has recently found many engineers are interested in other engineers using HLS in real designs.  Recent announcements show how HLS tools are maturing in a move towards full chip synthesis while maintaining the flexibility of coding in C. 

While learning a new language has its challenges, C is not a new language for most of us.  Applying it to hardware requires learning some new tricks.  The restrictions for synthesis are much less than between the full VHDL or Verilog languages and the register transfer level (RTL) specifications supported by synthesis.  Basically, memory requirements need to be statically determinable (no malloc or new statements allowed) and modeling concurrency in a sequential language requires some extra care.  Still, if you can get the basic functionality working in C++, then HLS is a step of refining the C code to build more efficient hardware.  In my experience, debugging the functionality in a C debugger is so much easier than in VHDL or Verilog simulation, although the HDL simulators are really useful to check that the hardware is working the way you expect.

To continue on the topic from my previous blog posts; I wanted to spend some more time on my experiment with serial communication.  Getting the transmitter to send a static character string had proven to be pretty easy last time, but implementing the receive and transmit functions together had to tougher.  Didn’t it?  It was a bit, but not nearly as bad as I had imagined it would be.  I still don’t have a full functioning UART, but it does work as a terminal now.  My goal was to echo back characters typed in HyperTerminal from my Altera NIOS II development system used in my previous blog examples (pictured below) at 115,000 baud.

Altera NIOS II board

I kept things pretty simple.  For the receive side, I waited for the start bit (receiving a 0 on RXD).  Then, I’d delay for half a baud rate to get to middle of transmission cycle.  Next, I’d read each of the eight data bits and pack them into a byte.  Once the stop bit was read, I’d transmit the character.  I left off any error checking on RXD as an exercise for anyone with more time on their hands than me.

My top-level design is just:

#pragma hls_design top
void uart(bool *txd, bool *rxd)
{
 static unsigned char rcv_byte; //8-bit storage of incoming character, with a copy sent to transmit once complete.
 static bool byte_rcvd=false; //shared variable between transmit and receive

 get_byte(rxd, &rcv_byte, &byte_rcvd);
 send_byte(txd, rcv_byte, &byte_rcvd);

 return;
}

The static declaration designates that the values will be stored across calls to the function, resulting in registers.  Initially, I had a conditional call to send_byte() at the top level based on byte_rcvd, but I found pushing the conditional execution inside the send_byte() function call gave me smaller area and more flexibility to build hardware where the receive and transmit functions can run sequentially or in parallel, depending on constraints in the synthesis tool.

Notice that the byte_rcvd flag is passed by reference to both, so only the top-level shared variable is created since it is only written by get_byte() and read by send_byte().  On the other hand, rcv_byte is passed by reference to get_byte() while it is passed by value to send_byte(), so send_byte() creates a local copy that it shifts out a bit at a time.

I reused my constant transmit function from my previous design as my test bench to send down “Hello World” to the top-level function I planned to synthesize.  I verified the functionality that the test bench was sending down the bits of each character, then getting them back with some well placed printf statements and running things in the MSVC++ debugger (picture below)

bits_in_debugger

By setting the top-level pipeline to II=1 in Catapult’s constraint editor, I evaluate things every clock cycle, which gives me the proper baud delay count I calculated for 115,200 baud from a 50MHz clock (434). After generating the RTL, I use Catapult’s SCVerify flow to check that the timing looks right.  I launch Precision RTL synthesis in batch from Catapult, and then run Altera’s Quartus II to generate the FPGA programming file.

After programming the FPGA, I launch HyperTerminal and start typing away.  Imagine that, the characters echo back to the screen as fast as I can type (picture below).

easy

For more stuff on Catapult C Synthesis, go to our product page to grab the datasheet  or view more videos.

If you made it this far, take a few seconds to drop me a comment.

Thanks,
Dan

About Dan Gardner

imageDan Gardner is a technical marketing engineer at Mentor Graphics on the high-level synthesis team. He is an experienced RTL hardware design engineer with both ASIC and FPGA experience. Visit Dan Gardner's Blog

More Posts by Dan Gardner

More Blog Posts

Preparing Recommendations

Comments (↓ Add Your Own)

3 Comments on this Post

Commented on 4:19 AM, Nov 8, 2009
By SHARAD SINHA

Hello Dan, This is a good post.However, it would have been better if you had talked a little bit about "silicon utilization" in FPGA when synthesis using C is done. I am currently pursuing PhD ( at NTU, Singapore)in C to hardware translation but more specifcally in the fields of rapid area-time estimation and efficient mapping of algorithms to FPGA. We all know that effective silicon utilization is lowest in FPGA compare to ASICs and Structured ASICs.

Commented on 8:47 PM, Nov 9, 2009
By Dan Gardner

Hi Sharad, Thanks for taking time to comment. I didn't include the area utilized because I didn't have a full functioning UART, so it didn't really cross my mind. For this simple test, I didn't use much of the FPGA resources but here's the summary from Quartus: +-----------------------------+ Quartus II Version: 8.0 Build 231 07/10/2008 SP 1 SJ Full Version Family: Stratix II Device: EP2S60F672C3ES Logic utilization: < 1 % Combinational ALUTs: 152 / 48,352 ( < 1 % ) Dedicated logic registers: 75 / 48,352 ( < 1 % ) Total registers: 75 Total pins: 6 / 493 ( 1 % ); Total virtual pins: 0 Total block memory bits: 0 / 2,544,192 ( 0 % ) DSP block 9-bit elements: 0 / 288 ( 0 % ) Total PLLs: 0 / 6 ( 0 % ) Total DLLs: 0 / 2 ( 0 % ) +-----------------------------+ I'm not sure I agree with or fully understand your comments about FPGA vs. ASIC. With Catapult, we can support either FPGA or ASIC and meet area and performance to match hand-coded RTL in substantially less time. You might find Catapult useful for your research. Check out our higher education program, http://www.mentor.com/company/higher_ed/.

Commented on 5:20 PM, Nov 27, 2009
By SHARAD SINHA

Hi Dan, Thanks for the information on Mentor's Higher Education Program. What I meant by silicon utilization being lowest actually referred to the fact that a given design can be implemented in an ASIC for less silicon area compared to the silicon area consumed in an FPGA.This comparison is on the assumption that the given design can be accommodated in any of the existing FPGAs. I agree that Catapult can support both FPGA and ASIC.

Add Your Comment

Please complete the following information to comment or sign in.

(Your email will not be published)