Mark Ireton, Ph.D., and Michael Kardonik, Freescale Semiconductor, Inc.
Semiconductor companies continue to shrink the minimum feature size of their processors, pack an exponentially increasing number of transistors onto a single die, and increase clock speeds. As a result, the industry has reached a turning point where the power dissipation of the device has become a limiting factor for processor speed. In the race to continually improve performance, we are in the midst of an epochal transition: single-core processor architectures are no longer feasible for high-end solutions, and multicore solutions are becoming the norm. This is evidenced by the success of the duel-core x86 microprocessors from both AMD and Intel, as well as the introduction of multicore DSP solutions from Freescale, TI, and Analog Devices.
While providing a solution that enables continuing performance increases, the switch to multicore architectures creates new challenges for the application programmer, such as:
- How do you communicate between processes on different cores?
- How do you ensure that shared resources are initialized before use?
- How do you share a peripheral equally between all cores?
And most importantly:
- How do you achieve all of this without making the software development process significantly more complex than in a single-core environment?
This article describes the use of a multicore real-time operating system (RTOS) that enables programmers to develop most of their code as though they were targeting a single-core device. The concepts described are general, but the particular examples are based on the SmartDSP OS, a lightweight multicore RTOS optimized for use on Freescale DSPs based on StarCore technology.
Multicore RTOS
In selecting a multicore RTOS, the developer must decide early on between symmetric multi-processing (SMP) versus asymmetric multi-processing (AMP). In a symmetric multi-processing environment, a single instance of the RTOS operates over all of the cores, dispatching threads dynamically at run time to available cores and performing load-balancing. In SMP the programmer has no a priori knowledge about which processor a thread will execute on. The RTOS must ensure inter-thread communication regardless of whether interacting threads are allocated to the same or different cores. In AMP, on the other hand, each processor has its own independently executing instance of the RTOS. The programmer determines on which core or cores a given thread will execute and defines appropriate mechanisms for inter-thread communications. Threads are statically allocated to particular cores at design time.
In general AMP RTOSs are more lightweight than SMP RTOSs. Due to static allocation of threads at design time, AMP RTOSs are more deterministic in their operation. For the types of hard real-time applications performed on DSPs, an AMP RTOS generally will be the preferred choice. Thus, the AMP RTOS will be the subject of the remainder of this article. However, the most recent generations of DSPs include data caches and MMUs, which together make it practical to implement an SMP OS such as Linux.
Hardware mechanisms for inter-core communication
Cores interact through inter-processor communications (IPC), which is implemented using a combination of shared memory and interrupts. An AMP system requires that a thread executing on one core can communicate with a thread operating on another core. This is enabled by special hardware IPC support. To effectively use IPC, the engineer should be familiar with these mechanisms, but an effective multicore RTOS should incorporate these capabilities in such a way that their use for application development is very natural. The goal is to provide an IPC paradigm tailored to the multicore programming environment that is as simple and convenient to use as that of a single-core programming environment. Such a mechanism is described later in this article.
Shared memory is a region of memory that is accessible by all of the cores in the system. Shared memory provides a common reservoir for data that must be shared across cores, including the storage for RTOS constructs such as shared queues, messages, semaphores etc. An application developer should take care to only use shared memory when threads sharing data are on different cores. Because shared memory is usually multi-ported, it is typically limited in size. In addition, shared memory may introduce execution stalls when there is contention for access.
Interrupts provide an active mechanism to tell a core that it needs to do something. These interrupts may come from peripherals or may be generated internally by one of the other cores. We refer to internally generated interrupts as "virtual" interrupts. Preferably each interrupt will be forwarded to each core, and each core can respond appropriately if the interrupt is enabled for that core. When an interrupt is enabled on more than one core, the developer must take special care to implement an appropriate policy for clearing the interrupt. For instance, the interrupt could be cleared as soon as the first core responds. Alternatively, the interrupt could be cleared only after all responding cores have completed servicing the shared interrupt.