Product How-To: Exploring Multicore Power Management with Modeling and Simulation

by Darryl Koivisto , TechOnline India - March 30, 2011

How to use Mirabilis Design's VisualSim to model and provide accurate results and the task/power efficiency metrics for multicore designs.

Power management has often used the speed/power ratio metric (Mhz per Watt) as a measure of performance/power efficiency. If one processor had a higher Mhz per Watt rating over another, then presumably it would be more power miserly. Within a processor family this may be true, the ratings will reflect the underlying process technology and chip architecture.

However, newer microprocessors can vary in the number of cores, the number of instructions per clock cycle, and numerous hardware accelerators that can vary the performance for a given clock rate.

This means the speed/power ratio only has significance if the microprocessors have similar architectures in terms of number of cores, instructions per cycle, and degree of hardware accelerators.

If one wants to compare a single core processor from one vendor with a dual core processor from another vendor, then Mhz per Watt may be a misleading comparison of performance/power efficiency.

Mirabilis Design’s VisualSim software simulates SoC, boards, software, processors, and networked distributed system using models that are developed quickly using pre-built, parameterized modeling libraries in a graphical
environment. These models contain power information embedded in them and can be used to estimate different power metrics.

Why Modeling and Simulation

Modeling and Simulation using VisualSim allows one to model both power and delays for executing software tasks to improve overall accuracy, as compared to a simple ratio metric.

Modeling and simulation can improve on the simple speed/power ratio comparison by taking into account valuable hardware information for a variety of configurations:

* Number and speed of processor cores
* Number and speed of busses
* Speed of caches
* Instructions per cycle
* Hardware accelerators

VisualSim provides extensive libraries for modeling hardware and software systems. These modeling components have power states and analysis built into the components. Combined with the application templates, power modeling is quick and provides a wide degree of coverage. The libraries are fully support TLM

abstraction levels.

In addition, power management modeling can alter the power states, power levels as a simulation executes. More importantly, users can modify the central power algorithm to optimize power use further.

On the software side, modeling and simulation can characterize software that has not been fully developed early in the design cycle by estimating the number of cycles for key tasks. If the software exists, then a profile of executing tasks becomes the input.

Modeling with VisualSim provides extensive flexibility that is not otherwise available in more accurate modeling environments or using cycle-accurate models. This is because the models are built out of basic blocks and the user can modify the internal details of the component. Also, components are graphically built and can be easily handed off to others.

The components are connected to describe a proposed system and simulated for different operating conditions such as traffic, user activity, and operating environment. A variety of power metrics are generated from the model automatically including instant, average and peak power; and battery discharge.

In addition, the modeler can control the power levels, power states and battery charge through the use of RegEx functions available in the system.

Task/Power Efficiency Metric

Modeling and simulation allows one to compare any power management scheme to a baseline system that has

all devices in the D0 (full-on) state and uPs is in the C0 (active) state.

In other words, the baseline system has little or no power management features. If one calculates a power efficiency metric it can simply be the ratio of power consumed by a baseline power system to a power optimized system:

Power Efficiency = Power_Baseline_System /

Power_Optimized_System

The higher the ratio, the better the power management of the system. The issue with this simple metric is that it does not directly take into account how long the power optimized system may take to complete a set of software tasks. If one also calculates a task efficiency metric based on the time to complete a set of tasks:

Task Efficiency = Task_Time_Baseline /
Task_Time_System


The task efficiency will be reduced for multi-core designs due to the communication time between parallel execution cores, for example. Typically, the power efficiency will be greater than one, while the task efficiency will be less than one. The overall Task/Power Efficiency metric combines the two individual metrics:

Task/Power Efficiency = Task_Efficiency *
Power_Efficiency


The task/power efficiency metric now takes into account both power efficiency of the system and the time it takes to execute a set of application tasks.

Application Software

Application software differs greatly from the software used to calculate performance/power (Mhz/Watt) ratios. This means selecting a microprocessor based on a performance/power ratio may have a different performance/power ratio when your application is running.

In addition, if application software has not been optimized for power efficiency, then modeling and simulation provides another means to examine execution of key loops or methods to improve overall power efficiency.

Often 20% of the loops or methods determine the overall performance of an application. VisualSim models allows one to compare individual software tasks of an application to see which ones to focus more attention on, improving the task/efficiency metric.

If a model can list the top five tasks that use the most power, one can examine these tasks in more detail to see if power can be optimized when they are executing.

In addition, a power management model can provide end-to-end delays for each executing task. There is a high correlation between task execution time and power consumed for a given clock rate.

Power Management Model in VisualSim

A power management model in VisualSim (Figure 1, below) was constructed to compare an existing single

core design with a dual core executing a set of five tasks. The VisualSim Model was constructed in approximately 4 to 6 hours.

 

                             

                                                                 Figure 1. Power Manager Model

 To view expanded image, click <a href="https://i.cmpnet.com/embedded/2010/0610/PowerModelingFig1(Large).jpg"> here</a>

 

First, the user assembles the behavior of the tasks by selecting the number of task execution blocks for both the single core and dual core configurations. Next, one defines the top level parameters (4) and their default values.

The hardware resources of core, cache and bus are defined as schedulers. These can be later subsititued for detailed transaction implementations. The power states and power levels are defined inside the Power_Manager block. There is one Power_Manager for single core design and one for the dual core.

Next, two task traffic generators are drag-n-dropped into the model, setting the distribution time between the tasks. No programming is required within the VisualSim environment.

In addition, eight processing blocks to define the details of the tasks; seven task mapping blocks to map the software task to processor core, bus, cache task scheduler blocks and; three plotting and text display blocks are drag-n-dropped into the block diagram editor.

The data structures represent a transaction that resembles a small spreadsheet with a column of names, and column of values being passed from one block to the next.

Here are some of the Power Manager Model assumptions:

==> Model will consist of processor, bus, cache for single core, dual core model variants.

==> Tasks will be statistically described in terms of a empirircal distribution of cycles for processor, bus, cache. For example, a task might execute 800 cycles (25% of time), 1000 cycles (50% of time), and 1200 cycles
(25% of time).

==> Dual core model will take 8 cycles to distribute each task to a core, whereas the single core model will execute the tasks sequentially without any distribution cycles.

==> Processor, bus, and cache operating at the same clock rate, a reasonable assumption.

==> Single core (1 Ghz) is twice as fast as dual core (500Mhz)

Power Management Analysis

The Power Manager Model generates average power, battery power, task latency, and task/power metrics. The task times are also ordered.

                             

                                                              Figure 2. Average Power

 

 

Average Power. The average power (Figure 2 above) for the dual core is less than the single core, since they are running at one-half the single core clock rate.

The dual core average power is greater than one-half power level of the single core, since two cores are running. The initial values are plotting the instant values as averages. After 200 usec the average value settles close to the average power for the design.

 

                       

                                                                   Figure 3. Battery Power

 

Battery Power. The battery power consumption for the dual core (Figure 3 above) is better than the single core after 2.0 msec. The higher the plot, the less power consumed as this represents the battery
discharge.

The Y axis represents milliwatt-seconds, or stored energy. The battery plot reflects the average power plot in Figure 2, Average Power. Again, the dual core power is higher than one-half of the battery consumption of the single core processor running at twice the clock rate, since it has two cores running half of each task at one-half the single core clock rate. This assumes the task can be parallelized.

Task Latency
. Illustrated in Figure 4, below, the graph of the task latency shows that the single core (red) running at twice the speed of the dual core has lower latencies that the dual core (blue) processor for five different tasks.

                       

                                                                 Figure 4. Task Latency

 

The Task Latency Metric takes into account the five different tasks shown here and computes a Task Latency Metric of 0.915, meaning the ratio of all the single core tasks to the dual core tasks is less than one.

 

Task/Power Latency Efficiency Metrics

The 1.279 Power Efficiency Metric (Figure 5, below) means the single core uses 1.279X more power
than the dual core configurations. The 0.915 Task Efficiency Metric means the single core latency is 0.915 of the dual core configuration.  

 

                      

 

                                          Figure 5. Task/power latency efficiency effects


Typically, the Power Efficiency Metric is greater than 1.0 and the Task Efficiency Metric is less than 1.0. Task/Power Efficiency Metric combines the Power and Task Efficiency Metrics, where power and task time are weighted equally.

The resulting Task/Power Efficiency Metric combines the power and task efficiency metrics into 1.171, meaning the dual core is 1.171X better than the single core in terms of power and task delays.

 

                                

 
                                                       Figure 6. Task order ranking.

Figure 6 above compares the cumulative individual task execution times on the single core and dual core configurations. Most of the dual core tasks are slower than the single core tasks. The model result allows one to order the tasks executed (slowest to fastest):

* Task_4, Task_1, Task_2, Task_3, Task_5

 

Conclusions

The methodolgy presented goes beyond Mhz/Watt values and takes into account the actual power characteristics of individual subsystems in terms of power states, and also models the tasks executed on
processors/cores, busses, caches and SDRAMs. Results of the modeling effort have been compared with test results from actual hardware with very good accuracy.

VisualSim can provide accurate results and the task/power efficiency metrics can be applied to designs running at different clock speeds with completely different power algorithms. Finally, this task/power methodology can scale to N processor cores and much larger designs.

VisualSim libraries of standard hardware and software components, flow charts defining the behavior, traffic models, and pre-built analysis probes ensure that system design is no longer time consuming, difficult to perform, and providing questionable results. The reduction in system modeling time and availability of standard component models provides a single environment for designers to explore both hardware and software architectures.

 

To learn more about VisualSim, visit the Mirabilis demonstration page where there are models embedded in the HTML pages. You can modify parameters and execute from within your web browser without downloading custom software.

About the author:

(Darryl Koivisto is the CTO at Mirabilis Design and has over 25 years experience as an Architect, Program Manager and computer modeling expert. Koivisto has honed his experience in quality by learning to follow rigid practice right from his first day at work. He keeps a large brown book where he meticulously notes down every technique adopted through the years. Prior to Mirabilis Design, he worked at Cadence Design System, Ford Aerospace, Signetics, and Amdahl. Koivisto has a DBA from Golden Gate University, MS from Santa Clara University and BS from California Polytechnic University, San Luis Obispo. 

 

About Author

Comments

blog comments powered by Disqus