TechOnline India Header
Most Popular
Top 5 Courses
  • Fundamentals of Signal Integrity
  • Fundamentals of MOSFETs for Switching
  • Fundamentals of DSP
  • Fundamentals of Multicore Processing
  • Bridge, Switch, and Router Basics
    Most Popular
    Top 5 Technical Papers
  • Digital Signal Processing: A Practical Guide (Part 1)
  • Digital Signal Processing: A Practical Guide (Part 2)
  • Digital Signal Processing: A Practical Guide (Part 5)
  • FPGA Design Methods for Fast Turn Around
  • How to Inexpensively Design an ASIC in 5 Weeks
    Most Popular
    Top 5 Virtual Labs
  • MC9S12NE64
  • Texas Instruments eZ430-RF2500 Wireless Development Tool
  • MC9S08QG
    Most Popular
    Top 5 Webinars
  • Mutexes vs. Semaphores: How to Use Each Properly
  • An Overview of ADI's iSensor' Intelligent Motion-sensing Technology
  • Learn how to run the uC/OS-III real-time kernel on an ARM Cortex M3
  • The Big Design Squeeze: How to get faster design turns in FPGA-based designs
    All Articles Products Courses Papers VirtuaLabs Webinars
    Top Search Items
    zigbee
    microcontroller
    xilinx
    LTE
    audio


    Techpaper Spotlight

    Wind River
    Accelerating the Development of Embedded Linux Devices with JTAG On-Chip Debugging
        Login | Register | Welcome, Guest

    Topics
    POLL
    How much code have you produced in your career?
    A few KLOC
        37%
    100s of KLOC
        46%
    Millions of LOC
        11%
    A trillion
        6%
     



    Efficient interfacing with external memory in high-end video
    TechOnline India

    High-definition multi-media devices like DTVs, set-top boxes, video players and even mobile phones comprise one of the fastest growing segments in consumer electronics market. The main drivers behind this growth are consumer demands for high resolution digital video content, color depth, and higher refresh rates.

    Typically, such a high performance chip could contain blocks of high performance Video Decoder [supporting multiple video standards], picture processing engine, 3-D graphics engine and display controller. All these blocks are expected to handle/process huge amount of data and also store and retrieve them internally and/or externally based on the system requirements. Figure 1 shows one example of a system diagram with only the external memory connectivity.

    The combination of multiple high performance functionalities at the consumer price demands powerful system architectures with appropriate trade-offs. Even though advanced technology process nodes like 65ns or 45ns can reduce the silicon size for such an IC, the extremely high external memory bandwidth consumption will create the bottleneck in the whole system. The peak aggregate system bandwidths of these computing engines can approach 4-5GBps), and a high performance external memory system is necessary to sustain the high-definition workloads.

    The bandwidth required for different processing engines can vary dramatically depending on the image content and processing algorithms used. A careful analysis of all individual bandwidth requirements, their access pattern and latency requirement is very crucial in order to select the external memory and architect the DDR controller and decide the system arbitration mechanism.

    In this article, different aspects of DDR controller and DDR/DDR2 memory module and different operational trade-offs are analyzed.


    Click on image to enlarge.

    I. Peak bandwidth analysis

    The first and most important parameter for selecting an appropriate external memory controller and the memory bus architecture of a complex bandwidth hungry SoC is the peak bandwidth requirement of the whole system. The peak bandwidth requirement directly influences the system performance as well as the cost of the system.

    The peak bandwidth calculation for a video SoC shown in Figure 1 can be very complex due to the varied nature of the latency tolerance, buffering capability, access pattern, peak to average bandwidth ratio etc. of different functional modules. A window for the peak bandwidth calculation should be carefully selected over which all the modules will get their required peak bandwidth and the peak to average ratio of the system bandwidth will be moderate.

    In this section, the window selection is analyzed for a simplified SoC system comprising of H.264 decoder capable of decoding 1080p @30fps at 200MHz, a display controller requiring 4:2:0 input, the system DMA to store the encoded stream and a system CPU. The bandwidth characteristics of individual engines are different and complex in nature.

    For example, the decoding module for H.264 can have average BW of the order of 400-450 MBps and the peak bandwidth over 3-4 macroblocks processing time can be close to 700 " 800 MBps for 1080p @30fps decoding. Again, the occurrence of consecutive peaks is statistical and its latency tolerance depends on the characteristics of the stream, the display picture buffering, granularity of the external memory access switchover etc. Whereas the bandwidth requirement of the display controller is uniform and it can not tolerate any latency beyond its buffering capability. The input stream loader's access pattern completely depends on the buffering limit of the system.

    Considering that the display controller has a ping-pong line buffer; a 1080p @30fps display would expect 1920 bytes data in every 20.4us and as this requirement is mandatory; the 20.4us time slot itself can be considered as the peak bandwidth window. The combined peak of other clients over this 20.4us slot will decide the peak of the system. As the H.264 decoder would need 4us time for a single macroblock processing for 1080p @30fps, its peak requirement needs to be characterized over 20.4us or 5 macroblock time slot. A typical peak bandwidth over 5 macroblock may be 700MBps.

    Let's again consider that the decoder can average out the processing time over 5 macroblock even if the required data for 5 macroblock is served within maximum of 2us delay, i.e. decoder's latency tolerance is 2us. Then the effective peak bandwidth will reduce to 700*20.4/ (20.4+2) = ~637MBps.

    This window will again depend on the buffering at the different function levels. For example, if the display controller can buffer up 2 lines instead of a single line, the display controller would now expect 3840 bytes data in every 40.8us and hence the peak selection window can be increased to 40.8 or 10 macroblock decoding slot.

    When the window becomes wider, decoder's peak requirement reduces (because the decoder will average over more macroblock decoding) and at the same time the latency tolerance also increases. One typical example - the peak bandwidth requirement over 10 macroblock decoding time can be 600 MBps and latency tolerance can be 5us. So the peak bandwidth requirement of the decoder in the new window is 600*40.8/ (40.8+5) = ~535MBps. Again, if the decoder gets an extra display jitter buffer, the peak bandwidth will automatically reduce as the decoder will now get 66.6ms time to decode 2 frames.

    The buffering of the encoded input data will decide whether this BW requirement will influence the peak bandwidth requirement or not. If the buffering is enough, then this bandwidth should not be considered for the calculation as this can work in cycle stealing mode.

    Once the window is selected, the net data to be accessed is known and so the next parameter to be characterized is the efficiency of the memory controller for different clients. The data access pattern for display controller and the bit-stream loading is regular and hence the efficiency can easily be characterized in simulation or even theoretically. But the decoder's access involves frequent context switching, page-bank changeover due to 2-dimentional data access by motion compensation (MC) etc and so a careful analysis is required.

    The following sections describe a few important techniques with respect to video decoder to increase the efficiency of the memory controller and data bus.

    1 | 2 | 3 | 4 NEXT >
     
     
    Latest Webinars
    · Distributor Brand Preference Study
    · Editorial Webinar: Optimized Linux Development Tools for Multicore
    · High-Power Amplifier Characterization using a Nonlinear Vector Network Analyzer
    · Completing LTE eNB Closed-loop Conformance Tests
    · Build Smart Products: Maximize return on investment through cross-discipline trade studies
     
    Member Company Spotlight
    ARM
     

    In this on-demand webinar, you will learn about the ARM PrimeCell infrastructure and how the DesignWare Verification IP enables the development of a more thorough and reusable verification environment. View "Rapid Verification of ARM11 processor-based platforms" here.


    Member Companies

    Virtualab
    Freescale Semiconductor

    MCF5485EVB