Design
In superscalar designs, the number of execution units is invisible to the instruction set. Each instruction encodes only one operation. For most superscalar designs, the instruction width is 32 bits or fewer. VLIW is a type of MIMD.
In contrast, one VLIW instruction encodes multiple operations; specifically, one instruction encodes at least one operation for each execution unit of the device. For example, if a VLIW device has five execution units, then a VLIW instruction for that device would have five operation fields, each field specifying what operation should be done on that corresponding execution unit. To accommodate these operation fields, VLIW instructions are usually at least 64 bits wide, and on some architectures are much wider.
For example, the following is an instruction for the SHARC. In one cycle, it does a floating-point multiply, a floating-point add, and two autoincrement loads. All of this fits into a single 48-bit instruction.
-
- f12=f0*f4, f8=f8+f12, f0=dm(i0,m3), f4=pm(i8,m9);
Since the earliest days of computer architecture, some CPUs have added several additional arithmetic logic units (ALUs) to run in parallel. Superscalar CPUs use hardware to decide which operations can run in parallel. VLIW CPUs use software (the compiler) to decide which operations can run in parallel. Because the complexity of instruction scheduling is pushed off onto the compiler, the hardware's complexity can be substantially reduced.
A similar problem occurs when the result of a parallelisable instruction is used as input for a branch. Most modern CPUs "guess" which branch will be taken even before the calculation is complete, so that they can load up the instructions for the branch, or (in some architectures) even start to compute them speculatively. If the CPU guesses wrong, all of these instructions and their context need to be "flushed" and the correct ones loaded, which is time-consuming.
This has led to increasingly complex instruction-dispatch logic that attempts to guess correctly, and the simplicity of the original RISC designs has been eroded. VLIW lacks this logic, and therefore lacks its power consumption, possible design defects and other negative features.
In a VLIW, the compiler uses heuristics or profile information to guess the direction of a branch. This allows it to move and preschedule operations speculatively before the branch is taken, favoring the most likely path it expects through the branch. If the branch goes the unexpected way, the compiler has already generated compensatory code to discard speculative results to preserve program semantics.
The acronym VLIW may also refer to Very Long Instruction Word, a CPU instruction set designed to load (or copy) a literal value count of inline machine code to the on-chip RAM for higher speed CPU decoding.
Vector processor (SIMD) cores can be combined with VLIW architecture as the Fujitsu FR-V, further increasing throughput and speed.
Read more about this topic: Very Long Instruction Word
Famous quotes containing the word design:
“You can make as good a design out of an American turkey as a Japanese out of his native stork.”
—For the State of Illinois, U.S. public relief program (1935-1943)
“If I knew for a certainty that a man was coming to my house with the conscious design of doing me good, I should run for my life ... for fear that I should get some of his good done to me,some of its virus mingled with my blood.”
—Henry David Thoreau (18171862)
“I always consider the settlement of America with reverence and wonder, as the opening of a grand scene and design in providence, for the illumination of the ignorant and the emancipation of the slavish part of mankind all over the earth.”
—John Adams (17351826)