Do you really need that CPU in your microcontroller? Here’s a way to free up your CPU using a combination of programmable logic devices and datapaths. Mark Ainsworth of Cypress Semiconductor explains how.In most microcontroller architectures, a "smart" CPU is surrounded by a set of relatively "dumb" peripherals. The peripherals have limited functions; usually they just convert data from one form to another. For example, an I2C peripheral basically converts data between serial and parallel formats, while an ADC converts signals between analog and digital. The CPU has to perform all of the work to process the data and actually do something useful with it. This, plus close management of the peripherals, can result in a lot of complexity in the
CPU's firmware and may require a fast and powerful CPU to execute that firmware within real-time timing constraints. This in turn can lead to more obscure bugs and thus to more complex and expensive debugging
equipment, and so on.But what if the peripherals were complex enough, flexible enough, and ultimately "smart" enough to effectively relieve the CPU of many of its tasks? A complex design could then be restructured as a group of simple
designs distributed among the CPU and the peripherals. The CPU would ultimately have fewer tasks and perhaps fewer interrupts to handle, in turn making bugs easier to find and fix. The overall design would become
more robust, and portions of the design more easily reused.
Finally, a CPU with less to do may be run at a slower speed to save power, or that available bandwidth could be used for the additional tasks that the marketing department dreams up for the next-generation product. However, the peripherals would still need to be designed in a cost-effective manner or the overall microcontroller might become too expensive.
This article shows how a set of smart, flexible, low-cost, custom digital peripherals can be designed into a microcontroller and configured to help implement a robust distributed system design.
Smart logic options—PLD or datapath?
There are two general ways to construct a smart configurable peripheral. The first is to use a programmable logic device (PLD). As shown in Figure 1, a PLD has a sum-of-products logic gate array driving a number of
macrocells. The "T" and "C" notations indicate that each product term can generate either a true or complement (inverted) output, so that both positive and negative logic can be supported.
Figure 1 shows a simple example of a PLD. PLDs can have hundreds of macrocells with up to 16 product terms driving each macrocell. The AND and OR gates within the product terms can be interconnected to form highly-flexible custom logic functions. The macrocells are typically clocked, and their outputs can be fed back into the product term array. This allows state machines to be created.
Large-scale PLDs can be used to form complex logic functions, even full CPUs, and thus can certainly be used to make smart digital peripherals. However, a lot of gates may be needed to implement even simple logic
functions like counters or adders, and it can become expensive to scale up a PLD-based solution for more complex functions. At some point, it makes more sense to just use an actual CPU.
A very simple form of such a CPU is a datapath based on an arithmetic logic unit (ALU), also known as a nano-processor. A datapath implements just a few common functions but does so more efficiently than an
implementation using PLDs. Figure 2 shows a simple datapath with an ALU. A typical ALU can do a variety of operations, usually on 8-bit operands: count up (increment), count down (decrement), add, subtract, logical AND, logical OR, logical XOR, shift left, and shift right. There are two 8-bit accumulators that can act as either
input data registers or storage for ALU output. A single operation takes place on the edge of an input clock signal. A function select register is used to control:
What operation takes place.
The source register(s) for that operation.
The destination register for the output.
Depending on the specific design of the datapath, it is possible to do a series of complex operations, as shown in Table 1.
The function select block can actually be a small SRAM, preloaded with the desired function select bits, and the SRAM's address lines can be used to select which operation is to be done. Finally, multiple datapaths can be chained together with carry and shift signals so that operations can be done on multibyte operands.
Since a datapath does only a few specific functions, it's possible to optimize its design so that it is inexpensive to build. However, a datapath is not nearly as flexible as a PLD for implementing complex logic. So which method is better for creating smart, flexible, low-cost digital peripherals—PLDs or datapaths? The answer is that separately neither one works well but together they can work very well. Let's take a look at a practical example of how this is done.
Universal digital blocks
An example of a system utilizing both PLD and datapath components is Cypress Semiconductor's PSoC 3 and PSoC 5 ICs. Each system contains up to 24 general-purpose digital logic subsystems called Universal Digital
Blocks (UDBs) constructed as shown in Figure 3. A UDB contains two PLDs of the type shown in Figure 1.
It also contains a datapath as well as status and control registers.
There are two chaining paths, one for the PLDs and one for the datapath. Finally, a routing channel exists to connect signals between each of the UDB's sub-blocks as well as between UDBs. Configuration of the PLD,
datapath, and routing is done by writing to UDB configuration registers (not shown).
The UDB's PLD design was described in Figure 1. As shown in Figure 4, the UDB datapath is similar to the basic datapath concept shown in Figure 2 but is more sophisticated—it has more registers and more features:
* The 8-bit ALU can do all seven basic functions—increment, decrement, add, subtract, AND, OR, XOR—and has separate shift and bit-mask blocks for post-processing the ALU result. (An eighth ALU function, pass, just passes a value through the ALU to the shifter and bit-mask blocks.) The shift block can do shift left, shift right, nibble-swap, and pass. The mask block does a bitwise AND with the contents of a separate mask register (not shown).
* Operations can be done using two accumulators (A0, A1) and two data registers (D0, D1). Two FIFO registers (F0, F1) are available for passing data between the datapath and the CPU. The FIFOs are up to
4-bytes deep. This structure enables simple multitasking; at different times separate operations can be done on subsets of the registers. So for example A0, D0, and F0 could be used for one task and A1, D1, and F1 could be used for a different task.
* A broad set of status conditions—compare, zero detect, all ones detect, and overflow detect—can be applied to the accumulators and data registers and routed elsewhere in the device.
Although the UDBs have a lot of features in both the PLD and datapath subsystems, what makes them especially useful is the extensive digital routing that is also offered. Signals can be routed among the PLDs and
datapaths throughout the entire set of UDBs, and elsewhere in the device, to form a complex fabric called the Digital System Interconnect (DSI).
In a basic example, we can use one UDB datapath to create an 8-bit counter with reload capability. To do this we connect one status condition back to a control store SRAM address line, as shown in Figure 5.
In this design, A0 is the counter register and D0 is the reload register. We need two functions, one to decrement the counter and one to reload the counter from the period register; these functions are preloaded in the Control Store RAM.
The logic is as follows. When A0 is not zero, the condition output will be low and the decrement operation at address 0 will be executed. When A0 is zero, the condition output will be high and the reload operation at address 1 will be executed.
All operations take place on the rising edge of the clock input, allowing the number of clock edges to be counted. The clock input can be routed from a variety of sources. The condition output can be routed
throughout the DSI, including to DMA and interrupt request inputs. Using datapath chaining and the mask block, the size of this counter can be any number of bits, and is not limited to a multiple of eight bits.
The counter shown in Figure 5 is a down counter. It can easily be converted to an up counter by using a different condition output (A0 == D0) and different functions in the control store SRAM: A0 = A0 + 1, and A0 = A0 XOR A0. Exclusive-or'ing any value with itself yields a zero result.
This simple design can be expanded, with the use of PLDs, to create a more complex application. To illustrate this, consider a traffic-light controller. A traffic-light controller cycles through three states—green, yellow, and red—so a state machine is required. Each state lasts for a certain amount of time before changing to the next state, so a counter is also required. For simplicity, assume that the "green" time is the same as the "red" time but that the "yellow" time is different.
Only one datapath is needed (assuming an 8-bit count value) to implement this timing structure, and three of the datapath registers are used. A0 is the count register, D0 contains the counter reload value for the "green" and "red" states, and D1 contains the counter reload value for the "yellow" state. The block diagram is shown in Figure 6.
The operations to be saved in the Control Store RAM are:
A0 = A0 - 1 // count
A0 = D0 // reload "green" or "red"
// count value
A0 = D1 // reload "yellow" value
The state machine is implemented in the PLD. The datapath condition output is fed back to the PLD to indicate that it's time to change state. The PLD also has logic that, based on the current state and the signal fed back from the datapath, controls which datapath operation to perform and which traffic light to activate.
Beyond the basics
A traffic-light controller is a simple application of a type that is commonly programmed using a CPU. However, we have seen that, except for initialization code, this function can be completely dissociated from the CPU and in fact has been encapsulated as a smart configurable peripheral. The functionality can be easily expanded to support additional requirements such as turn signals, pedestrian WALK signals, vehicle detect sensors, and transit/emergency transponders.
What's a CPU to do?
By using an efficient combination of PLDs and datapaths, you can create smart, flexible, low-cost peripherals that take the load off the CPU. However, if so much functionality can be offloaded to peripherals, what's left for the CPU to do? In many cases, not much—in some cases after system initialization, the CPU can be turned off! However, a more realistic solution is to use the CPU to do what CPUs do best, such as:
* Complex calculations.
* String and text processing.
* Database management.
* Communications management.
* System management.
For example, in our traffic-light application, the CPU could be used to:
* Detect when a vehicle goes through a red light,
* Use the camera to photograph the license plate,
* Extract the license plate's text from the photo, Look up the owner in the state database, and
* Send a ticket to the owner.
By offloading tasks to smart support peripherals, the CPU is freed up to do other, maybe more lucrative tasks.
About the author:
Mark Ainsworth is a applications engineer principal at Cypress Semiconductor. He has a BS in computer engineering from Syracuse University and an MSEE from the University of Washington.