Having attended many interesting presentations at the First Annual
Multicore Expo in Santa Clara, Ca., recently, it is apparent that the
trend toward multicore deployment in the embedded space is going
strong. The case can be made that multicore trends in the embedded
space will continue to grow.
The presentation topics ranged from software development for
multicore to introductions of new multicore processors to tools and
where the industry is heading. Heady stuff. However, one of the most
obvious things was the concern for the current state of tools for
multicore development.
It’s clear that hardware guys have it good relative to multicore
software designers, something that may not have been apparent 20 years
ago. With the advent of multicore, if hardware designers are tasked
with doubling the raw performance of a processor, there is a very real
chance that they will take existing IP, tweak it, replicate it and
provide some interconnect logic and call it a day. If they need to make
it four times as fast, they do the above three times.
Not meaning to trivialize hardware design, the point being made is
that one of the reasons for going multicore is the ability to drive
simpler, slower frequency, lower power cores. The increase in
processing power comes from the number of cores, not their frequency or
use of super scalar pipelines, etc. to make them faster. To take a nod
from Adam Smith, author of “Wealth of Nations,” division of labor and
specialization of labor is the key to multicore processing.
A recent study suggested that engineers are only capable of
acquiring two to three skills at a time. If that is true, then embedded
multicore designers need to make two of those skills the ability to
understand concurrency (potentially massive concurrency) and the
ability to debug their software in a multicore environment. Returning
to earlier comments, chip designers as well as software developers,
need better tools in order to design and debug real world, real-time,
embedded applications.
One manufacturer presenting at the expo has successfully deployed
their 300+ core processor to the embedded market. A software developer
that wants to use this core has the option of using the core
manufacturer’s compiler and debugger, or none at all if they want to
use that core.
Unfortunately, this seems to be the current state of the industry --
everyone has their own way of doing things. There is no standard
approach to capturing concurrency in a design. There is no standard for
debugging a multicore target, or even a standard for connecting to a
multicore target.
Mentor Graphics is working with hardware, software, and firmware
vendors, including participants of the Multicore Association, to
establish industry-wide standards to provide an easy way to mix and
match operating systems, have them share resources, and communicate
between each other. The company is working to establish debugging and
connection standards as well as inter-core communication mechanisms.
Creating multicore-aware debuggers
Debugging embedded targets has come a long way in the last 20 years. No
longer do embedded developers have to rely on “ printf debugging,”
which really doesn’t belong in the embedded world anyway (first, you
have to decide where a printf is going to be directed at, and it can
only work after the hardware and drivers have been debugged).
Today, developers enjoy robust debugging suites that use
hardware-assisted connections (i.e., JTAG) to download applications to
and control the target. Good commercial packages not only let the
developer start and stop the processor, but provide intuitive ways to
monitor registers, memory, and stacks. One of the hardest parts of
embedded development is to understand the behavior of the system, and
debuggers are the tools that allow the visibility into the inner
workings of an application.
However, debugging a multicore target throws a whole new wrench into
the works. How does one control each core with one debugger connecting
to all the cores (or with a separate debugger for each core?) How is
the data for multiple cores best organized to make sense to a developer
(and does that change between a two-core system and a 300-core system?)
How many cores can an engineer observe at one time and still understand
what’s going on in each?
These are the questions that the industry needs to answer before the
power of multicore designs can really start to meet its potential.
Is JTAG the answer to multicore
debug?
One of the main differences between desktop and embedded debugging is
that embedded targets are external to the desktop system and have to be
connected to the debug console or integrated development environment
(IDE) in some fashion. Strangely enough, this device is called a
“connection device” or “connection” for short.
Connections range anywhere from two-wire connections to complex and
definitely more expensive Joint Test Action Group (JTAG) devices, which
contain huge amounts of random access memory (RAM), or even hard drives
which they utilize to queue up data. The host uses the connection to
communicate with the target device. For simple two-wire connections,
the interaction between the host-based IDE and the target are limited.
JTAG-based connection devices allow “on-chip debugging.” They allow
the IDE to interact with the target and provide services such as
remotely start, stop or suspend program execution (set a breakpoint)
and allow one to view memory and register contents as well as IO and
peripheral devices. The IDE utilizes a sequence of these functions so
that one can establish breakpoints or step through code.
So what makes the JTAG so special? Back in the old days, printed
circuits boards were tested on what is called a “bed of nails.”
Basically, when the board was created, it also had test points (solder
pads) placed on strategic places on the bottom of the board. After a
board was populated with chips, one of the final manufacturing steps
was to put the board on the bed of nails to be tested. The bed of nails
has spikes that stick up to make contact with the test points.
However, as technology evolved, and more and more of the board
functionality was moved into microprocessors and ASICS, the
accessibility of test points became a problem. In the mid to late
1980s, several companies banded together to form the JTAG. The results
of the JTAG were accepted by the IEEE in 1990 and the IEEE 1149.1
standard known as the Standard Test Access Port and Boundary Scan
Architecture was born. The name JTAG (Pronounced “J” “Tag”) was kept
since it is easier to say than “STAPBSA.” The boundary scan method
enables in-circuit testing and eliminates the need for the bed of nails
testing.
Making the right JTAG connections
The use of the industry standard JTAG scanning interface, initially
developed for boundary scan testing of complex devices and boards over
a low pin-count interface, has also become a standard method for
accessing and debugging processor cores. This is because it requires a
small number of pins, and has already been widely adopted for its
original purpose.
Using JTAG for processor debugging required adding a debug service
unit, or “debug logic” into the CPU core design and adding an
additional JTAG scan path to access that logic. A brief overview of a
common JTAG TAP (Test Access Port) with its multiple JTAG scan paths is
shown in Figure 1 below.
Separate scan register paths are provided for boundary scan, reading
the device ID code, initiating built-in, self-test functions and
obtaining their results, and accessing the debug support unit. The TAP
Instruction Register (TAPIR) is used to select the desired path, or
during normal operation, the TAP is left in the Bypass state so the
other functions are disabled.
 |
| Figure
1. A single JTAG Test Access Port |
Multi-core (multi-TAP) Configuration
The most cost-effective configuration (lowest pin count) for multicore
devices is to string the JTAG TAPs within each core along a single
daisy chain as shown in Figure 2,
below.
In this way, the instruction registers for each TAP are concatenated
into one long instruction register. So a specific core at a known
position in the scan chain can be set to select the debug support unit
registers and all other cores can be set in bypass mode, thereby
allowing one core to be individually addressed by one debugger control
packet.
 |
| Figure
2. Multiple cores on a single JTAG scan chain |
Extending this concept, multiple debuggers can each be assigned to
individual cores and can send debug service control packets to their
assigned core without impacting (or creating awareness of) the other
cores (ignoring shared memory considerations for the moment), since
Ethernet debug service packets are queued and executed in the order of
arrival.
Synchronous Stopping and JTAG Skid
Individual commands issued to a CPU core over JTAG require hundreds of
JTAG operations. While these appear to execute very quickly (the JTAG
scan chain may typically be doing serial scans at 10 MHz to 40 MHZ), at
least to the human viewer, this is actually a very slow process in
comparison to a CPU core running at say, 400 MHz to 1.2 GHz.
Since JTAG debug operations and processors running at hundreds of
MHz are inherently asynchronous functions, without hardware support on
the chip, it is not possible to stop one processor at a breakpoint, and
have that event cause another core to stop precisely at that location
using only JTAG operations. The time lapse between issuing a JTAG
command and the processor responding thousands of CPU cycles later is
commonly known as “skid.”
What this looks like from a debug experience standpoint is that you
are debugging the cores completely independently; there is no real
interaction between them. So connecting to multiple cores
simultaneously really doesn't mean much, because even when you do that,
you still have the situation that you cannot do anything to both cores
at the same time. This is a limitation of JTAG, and also of the fact
that there is no formalized hardware interconnect standard for
multicore debugging.
To address this problem, built into the core of the Mentor Graphics
EDGE debugger is the ability to have "synchronization groups." That is,
designers can define a group of threads that are to be stopped when a
given thread hits a breakpoint. This is backed up by a capability that
the back end transport provides to the debug engine that says "I can
stop this set of cores synchronously."
If this capability is not there, then the debug engine does its best
to emulate the capability by turning around and stopping the other
cores when the one hits the breakpoint. Obviously, there will be
thousands of instructions of skid, but without hardware standards, this
is better than nothing.
Can Nexus extend to multicore
debug?
As mentioned earlier, JTAG is a communication mechanism used to control
an embedded processor. It does not directly have anything to do with
debugging. On the cores themselves there must be debug logic that
controls the core.
The “Nexus 5001 Forum” is an industry group that has advanced a new
IEEE standard (IEEE-ISTO 5001) that defines just such a debug logic
block to support embedded development. It does contain some compelling
features such as the ability to read/write memory on the core while the
core continues to run.
While this is cool, it doesn’t directly have anything to do with
multicore debugging, except for the fact that it does define a
high-speed auxiliary communications mechanism that can be shared by
multiple cores for transmission of real-time trace data, among other
things. Unfortunately, the adoption of Nexus has been very slow, and it
does not have nearly the installed base that other technologies have.
Also, it does not appear to have much traction outside of the
automotive industry. Perhaps it will gain momentum in the future with
the growth of multicore.
What It All Means
From the silicon vendor’s perspective, it is pretty clear what the
vendors would get out of having industry-wide standards for connecting
to and debugging an embedded target. Silicon vendors spend large
amounts of time and money trying to create an “ecosystem” that is
beneficial to their product.
As a result, they spend enormous amounts of time putting RTOS, tool
and connection support together so that developers can use their
product when it hits the street. They may have to pay tool vendors that
are reluctant to support their proprietary hardware non-recurring
engineering (NRE) to do the work to support them. That time and money
would be better utilized plowed back into either their shareholders’
wallets or into research and development.
The ones that appear less likely to benefit, aside from the
developer, is the tool and connection vendor. Why would they be likely
to benefit from having all of their competitors considered for every
target? One reason is that successful tool vendors have distinctive
competencies that their customers value.
Also, the tool vendor knows that their profits would increase if
they had a wider audience of targets that their tools could be used on.
Furthermore, they spend huge amounts of time and money “porting” their
product to different silicon platforms. That time and money would be
better utilized by focusing on the value that they can bring to the end
customer rather than chasing a moving target.
From the developer’s perspective, what does all this talk about
standards mean to them? To start, it means freedom of choice. It means
that they can choose from a plethora of different priced, different
featured tools and connections. It means that their tools can be used
across targets. It means that one connection device will connect to
ARM-based multicore products as well as to MIPS, MicroBlaze and
Intel-based multicore targets.
It means eliminating the requirement to purchase new tools because
the debugger being used does not work on the new target. It means that
the developer can spend time gaining field expertise rather than
learning how to use a new tool.
The role of Eclipse in multicore
debug
The current status of debuggers and connections for multicore
developers is respectively good and bad. The Eclipse Foundation and
various sub-projects are making headway into the embedded space.
Eclipse provides a “debug platform” which debugger vendors can
implement to debug any arbitrary system. The result is a common look
and feel regardless of whether a designer is debugging Java, a Perl
script, or an embedded C/C++ application.
From the ground up, Eclipse was designed to be able to debug
multiple applications simultaneously, and has a number of features that
help facilitate this. In Eclipse, all views in a frame typically
reflect the currently selected context.
So if a designer has a thread in application “Foo” selected as the
current context, the variables view, expressions view, and registers
views all update to reflect “Foo.” If the designer then selects a
thread in application “Bar,” these windows update to reflect “Bar.”
Combine this with the fact that the designer can open multiple frame
instances and have the beginning of a nice multicore development
environment (a good reason to request a nice dual monitor system).
Eclipse has other nice features for multi-context debugging as well.
“Working Sets” of breakpoints for example (e.g., set the breakpoint in
file theDriver.c in Foo, but not in Bar). The DSDP
Project (Device
Software Development Platform) for example is driving the creation
of a
flexible debug hierarchy, which will be a better fit for supporting
debugging in a typical embedded multicore scenario: connection device
-> core(s) -> process(es) -> thread(s) for example.
In addition, the DSDP project is creating a common infrastructure
for connecting to remote targets, and then using services on them
(e.g., debugging, profiling, exploring target file systems, opening a
shell).
More and more tool vendors are migrating their offerings to Eclipse,
creating a very interesting new ecosystem. The result for the tools
users is that it will be possible for them to increasingly focus on
building an efficient development process on top of the tools, instead
of spending so much time and energy on the tools themselves.
The status of connection and debug hardware on the board is not as
positive at the moment as is the state of debugger development. The
upside to all this is that the Multicore Association is working toward
addressing this exact deficiency in connection and standards for
hardware.
Conclusion
The Multicore Association
is in its infancy. It is recommend that all
interested parties including software vendors, hardware vendors and
developers, invest some time, energy and money in it as it is the
singular entity out there trying to bring together all the
players on the multicore scene.
Hopefully, the Multicore Association and its debug working groups
will get some traction and put some stuff out there quickly to help
gain a following not to be ignored.
To read Part 1 in this series, go to Adam
Smith's answer to multicore design.
To read Part 2 in this series, go to Dealing
with hardware and OS issues.
Todd Brian is product marketing
manager for Nucleus kernels products, Lyle Pittroff is product
marketing manager for EDGE Connections products, Aaron Spear is Debug
Tools architect, and Jeff Womble is product marketing manager for EDGE
Tools products at Mentor Graphics.
For more information about multicore and multiprocessor
architectures, tools and methodologies, go to More
About Multicore and
Multiprocessing.