|
A real-time HPC approach for optimizing Intel multi-core architectures (Part 1 of 3)
|
|
By
Dr, Aljosa Vrancic and Jeff Meisel, National Instruments
|

Page 1 of 4

|
Industrial Control Designline
(06/22/2009 3:27 PM EDT)
|

|
Editor's note: In this three part series, Dr. Algosa Vrancic and Jeff Meisel
presents
findings that
demonstrate how a novel approach with Intel hardware and software
technology
is allowing for real-time high-performance computing (HPC) in order to
solve engineering
problems with multi-core processors that were not possible only five
years
ago.
- Part 1 is a review of real-time
concepts that are important for
understanding this domain of engineering problems, and a comparison of
traditional HPC with real-time HPC.
- Part
2 outlines software architecture approaches for utilizing
multi-core
processors, along
with cache optimizations.
- Part 3 will consider industry examples
that employ
this particular methodology.
Introduction to Real-Time
Concepts
Because tasks that require acceleration are so computationally
intensive, your typical
HPC problem could not traditionally be solved with a normal desktop
computer,
let alone an embedded system. However, disruptive technologies such as
multi-core
processors enable more and more HPC applications to now be solved with
off-theshelf
hardware.
Where the concept of real-time HPC comes into the picture is with
regard to
number crunching in a deterministic, low-latency environment. Many HPC
applications perform offline simulations thousands and thousands of
times and
then report the results. This is not a real-time operation because
there is no timing
constraint specifying how quickly the results must be returned. The
results just
need to be calculated as fast as possible.
Previously, these applications have been developed using a message
passing protocol
(such as MPI or MPICH) to divide tasks across the different nodes in
the system.
A typical distributed computer scenario looks like the one shown in
Figure 1, with
one head node that acts as a master and distributes processing to the
slave nodes in
the system.

Figure 1: Example configuration in a traditional
HPC system
By default, it is not real-time friendly because of latencies
associated with networking
technologies (like Ethernet). In addition, the synchronization implied
by the message
passing protocol is not necessarily predictable with granular timing in
the millisecond
ranges. Note that such a configuration could potentially be made
real-time by replacing the communication layer with a real-time
hardware and software layer
(such as reflective memory), and by adding manual synchronization to
prioritize and
ensure completion of tasks in a bounded timeframe. Generally speaking
though, the
standard HPC approach was not designed for real-time systems and
presents serious
challenges when real-time control is needed.
An Embedded, Real-Time
HPC Approach
with Multi-Core Processors
The approach outlined in this article is based on a real-time software
stack, as
described in Table 1, and off-the-shelf multi-core processors.

Figure 1: Example configuration in a traditional
HPC system
(Click on image to enlarge)
Real-time applications have algorithms that need to be accelerated but
often involve
the control of real-world physical systems—so the traditional HPC
approach is not
applicable. In a real-time scenario, the result of an operation must be
returned in a
predictable amount of time. The challenge is that until recently, it
has been very hard
to solve an HPC problem while at the same time closing a loop under 1
millisecond.
Furthermore, a more embedded approach may need to be implemented, where
physical size and power constraints place limitations on the design of
the system.
Now consider a multi-core architecture, where today you can find up to
16
processing cores.
From a latency perspective, instead of communicating over Ethernet,
with a
multi-core architecture that can be found in off-the-hardware there is
inter-core
communication that is determined by system bus speeds. So return-trip
times are
much more bounded. Consider a simplified diagram of a quad-core system
shown
in Figure 2.

Figure 2: Example configuration in a multicore
system. Source: Adapted from Tian and Shih,
"Software Techniques for Shared-Cache Multi-
Core Systems," Intel Software Network.
In addition, multi-core processors can utilize symmetric
multiprocessing (SMP)
operating systems—a technology found in general purpose operating
systems like
Microsoft* Windows,* Linux, and Apple Mac OS* for years to
automatically loadbalance
tasks across available CPU resources. Now real-time operating systems
are
offering SMP support. This means that a developer can specify timing
and prioritize
tasks that are applicable across many cores at one time, and the OS
handles the
thread interactions. This is a tremendous simplification compared with
messagepassing
and manual synchronization, and it can all be done in real-time.
|
|
|
|
CAREER CENTER
|
Ready to take that job and shove it?
|
|
SPONSOR
|
|
|
|
RECENT JOB POSTINGS
|
|
|
For more great jobs, career related news, features and services, please visit EETimes' Career Center.
|
|