The Technology Behind Crusoe Processors, Alexander Klaiber. Transmeta Technical Report, January 2000.
In January of 2000, Transmeta Corporation introduced the Crusoe
processors, an x86-compatible family of solutions that combines strong
performance with remarkably low power consumption. As might be expected, a
new technology for designing and implementing microprocessors underlies
the development of these products. As might not be expected, the new
technology is fundamentally software-based: the power savings come from
replacing large numbers of transistors with software.
The Crusoe processor solutions consist of a hardware engine logically
surrounded by a software layer. The engine is a very long instruction word
(VLIW) CPU capable of executing up to four operations in each clock cycle.
The VLIWs native instruction set bears no resemblance to the x86
instruction set; it has been designed purely for fast low-power
implementation using conventional CMOS fabrication. The surrounding
software layer gives x86 programs the impression that they are running on
x86 hardware. The software layer is called Code Morphing software because
it dynamically morphs x86 instructions into VLIW instructions. The Code
Morphing software includes a number of advanced features to achieve good
system-level performance. Code Morphing support facilities are also built
into the underlying CPUs. In other words, the Transmeta designers have
judiciously rendered some functions in hardware and some in software,
according to the product design goals and constraints. Different goals and
constraints in future products may result in different hardware-software
Transmeta's Code Morphing technology changes the entire approach to
designing microprocessors. By demonstrating that practical microprocessors
can be implemented as hardware-software hybrids, Transmeta has
dramatically expanded the design space that microprocessor designers can
explore for optimum solutions. Microprocessor development teams may now
enlist software experts and expertise, working largely in parallel with
hardware engineers to bring products to market faster. Upgrades to the
software portion of a microprocessor can be rolled out independently from
the chip. Finally, decoupling the hardware design from the system and
application software that use it frees hardware designers to evolve and
eventually replace their designs without perturbing legacy software.
The basic goal of Crusoe is to enable system designers to make various
tradeoffs between HW and SW in order to meet different design goals
for different markets (e.g. reduced power consumption v. performance),
while still being able to efficiently run all legacy x86 code without
modification or recompilation. By separating the HW arch. from the
ISA, the system designers are much freer to make changes in the HW,
making their job much easier. This division isn't new -- e.g. the
Pentium II also dynamically translates from the x86 ISA to internal
RISC-like micro-ops (although the motivation here is simply to improve
performance while still supporting legacy code). The contribution of
Crusoe lies in the techniques they use to allow designers to
simplify the HW while still maintaining acceptable performance,
and the code morpher that allows the translation apparatus to be
modified in SW.
The novel HW assistance is what enables efficient emulation of precise
exceptions and self-modifying code; it also enables more aggressive
instruction scheduling for better optimization. Three mechanisms
support these capabilities: a transaction-based execution of the
translated instructions using shadow registers and a memory store
buffer; alias HW that enables loads and stores to be reordered without
regard to dependencies; and the ability to write-protect translated
x86 code to detect dynamic code modification.
Another novel capability is the Longrun power management, which can
adjust the clock frequency and voltage on the fly.
This paper reads like PR, not a scientific report. This is only to be
expected since the paper is a company tech report not a conference or
journal paper -- but it's not as useful as it could be.
There is practically nothing reported about Crusoe's performance,
other than a rather defensive passage about its performance on some
benchmarks; this passage makes it clear that Crusoe's performance may
not be that great unless the translation cost is amortized over a long
period of time.
The paper makes the claim that different performance/power
consumption/complexity tradeoffs could be made by varying the
functionality implemented in HW vs. SW. It would be nice to have some
idea of what the relationship between the HW/SW line and
performance/power consumption/etc really is -- i.e. is the tradeoff
such that we can make fine-grained adjustments by moving a few things
from HW to SW or vice versa, or is there a "cliff" that we fall off of
at some point.
One of the themes of this class is to find the right level of
abstraction for the virtual interface. Crusoe puts it at the ISA
level, mainly for reasons of compatability with existing x86 code (a
smart decision from a business stand-point). There are two directions
we could take from here -- horizontally, it would be interesting to
see how easily the code morpher could be adapted to translate other
ISAs; what factors make an ISA easy/hard to deal with (e.g. RISC
v. CISC)? Vertically, we could take the interface level further down
towards the HW (imagine an XML like app that defined its own ISA based
on some primitives supplied by the interface) or further up towards
the SW (rather like the Java Virtual Machine). Of course, without a
better idea of the lessons learned from Crusoe (i.e. a more honest
evaluation than given in this paper, including problems), it's
difficult to extrapolate and learn from the work done here. The paper
may best be used as a good source of raw ideas that can be used as a
starting point to answer these questions.