User:CTho/Porting across processes

From Wikipedia, the free encyclopedia

PLEASE ask if there's any terminology or background information I should elaborate on.

Every so often, the question comes up, "If $company1 is having trouble with their ##nm process, why can't they quickly switch to $company2's manufacturing process?" or, "Why does it take so long to move from ##nm to (##*0.7)nm?". Hopefully this writeup will provide some insight into why it's harder than it sounds. I'm going to make generic statements that apply to 90/65/45nm, but not all statements will be true for all 3 of the technology nodes.

Major differences[edit]

My writeup on bulk vs SOI covers (or will cover) issues related to designs intended for bulk procesess vs designs intended for SOI processes.

Differences in the actual transistors[edit]

Each manufacturing process' transistors are different. The design space is enormous - not only are there many variables that can be tuned, there are also many different steps that can be used or skipped. I'm only going to write about a few conceptually simple differences and not even try to cover all of the possibilities (as it happens, these simple differences do cover the effects of many more complex differences, so this should still be informative).

Leakage vs. Performance[edit]

One of the most straightforward differences is the "threshold voltage" ("Vt" from here on, ideally "Vt"). In theory, the Vt is the voltage you have to put on the gate of a transistor before it turns on (in practice, it's much harder to define, but we'll ignore that here). Transistors aren't really ideal on-off switches though, and if you increase the voltage past the Vt, you'll get more current (i.e. it can drive a bigger load, or switch the same load faster). It's nice to use transistors with low Vts, because they make circuits really fast. However, there's a catch: transistors with lower threshold voltages don't turn off as well (they leak more), and the difference is significant. A gate made from low-Vt ("LVT") transistors might be 30-50% faster than one made from high-Vt ("HVT") transistors, but leak 10 to 100 times as much. Since leakage power is a significant portion of total power nowadays, you can't just use LVT transistors everywhere.

Modern processes generally offer multiple Vts*, so circuits that take longer to evaluate can be built with the fast transistors, but the rest of the design can use slower transistors to save power. However, there isn't a fixed set of threshold voltages across the industry--each process might use a different Vt for the LVT transistors. When porting a design across manufacturing processes, you need to revisit the Vt used in each gate: when moving to a slower process, more gates may need to move to lower threshold voltages; when moving to a faster process, more gates should be switched to higher threshold voltages (to avoid wasting power).

*Or multiple gate lengths, which can be used in a similar way to trade off leakage and performance

Beta ratio[edit]

Another difference is in the relative strength of nmos and pmos transistors (nfets and pfets). The ratio of the strengths is referred to as the "beta ratio" (it's lazy terminology if you look at where β really shows up in the equations...). Silicon inherently lets electrons move more easily than "holes", which makes nfets roughly 2x faster than pfets. I'm not going to go into the physics, mostly because I don't have a good enough grasp to explain it simply :).

Different manufacturers have different tricks to speed up their transistors (stress/strain, crystal orientation, etc) and some of these tricks work better for one type of transistor than the other type. As a result, while one manufacturer may have nfets that are 2x as strong as the pfets, another may have a ratio of 1.5:1 or 2.5:1. This is important, since it affects the sizes of the transistors for optimal delays. I can't remember the math off the top of my head, but for a process with a 2:1 ratio, for optimal delay, pfets should be 1.4x the size of nfets*. When porting from a process with one ratio to a process with a different ratio, you really want to change all the transistor sizes to get maximum benefit.

*If you used pfets 2x the size of nfets, you would get equal rise and fall times. However, that isn't actually fastest option. It turns out that if you shrink the pfet a bit, even though the rising delay gets worse, each gate sees less load capacitance, so the net result is higher performance. I can elaborate further, but that'll require pictures. Let me know if you want the details.

Other differences[edit]

If the diffusion capacitance is higher or lower relative to the gate capacitances, different circuit structures may become optimal. For example, on SOI processes, which offer extremely low diffusion cap, pass-gate logic becomes much more appealing.

DFM[edit]

Different manufacturers have different "design rules". Design rules specify what shapes can / can't be reliably manufactured. Notice how the shapes in this photograph are blobs rather than nice rectangles. It's difficult to produce patterns accurately at small scales, so manufacturers have to put limits on things like how closely two shapes can be drawn. Each manufacturer will have different limits.

There are other issues: it's difficult to manufacture very tall and narrow structures - they're being etched as trenches from the top down, but the plasma ions don't go perfectly straight down and you tend to get trapezoidal shapes rather than rectangles (picture). If one manufacturer can construct taller/narrower shapes than another, that affects design decisions.

Real design rule manuals are hundreds of pages, and they're different for each process.

Layout Density[edit]

The net result of the design rule differences is that a layout that's legal on one process is likely to be illegal on another process. You can do layout very conservatively so that it works across multiple processes, but that results in bloated cells and wastes a lot of die area (money). It's better to go for denser layout to produce a smaller, cheaper (and possibly faster--due to shorter wire routes) design. Even when staying with the same manufacturer and going to a new process node (e.g. 90nm -> 65nm) the rules change, so different pieces shrink by different amounts, and you have to redo a lot of work to make the new chip closer to optimal.

Metal stack[edit]

The actual transistors aren't the only things that change across processes: the metal wires change too. Differences in the manufacture of the wires will result in different attributes--different minimum wire widths, different rules regarding current capacity to avoid electromigration, and different resistances and capacitances.

Minimum wire width[edit]

If one manufacturer allows narrower wires than another, this actually has a significant effect. While it used to be that transistors were big and most of the time the area of a block was determined by transistor area, nowadays wires can limit the area of a block (if you need to send 100 signals from A to B, you need room for the 100 wires). If you're switching to a process that requires wider wires (relative to the sizes of the transistors), you'll have areas that don't have enough wire tracks available to route all of their signals, and you'll have to bloat the area of that part of the design.

Current capacity rules[edit]

The current-carrying capacity of wires is significant because it affects the power supply grid used on the chip. Power grids have to supply some maximum amount of current to the transistors without suffering electromigration, which puts a lower limit on how narrow the wires can be / how many wires can be used (if you have to supply a given amount of current, you can do it with a few really wide wires or a lot of narrow wires; using many narrow wires does have some benefits). There are trade-offs involved, because the power grid steals metal tracks that could have been used to route signals. If you switch to a process that can't carry as much current with the same wire sizes, you'll have to increase the amount of metal dedicated to supplying power--and sacrifice signal routing resources (which might end up costing area).

Resistance and capacitance[edit]

Different materials have different resistances and capacitances. Low-k materials reduce capacitance, which lets wires carry signals faster. Each manufacturer is going to have different low-k options (even a single manufacturer may have different choices - for example, some options cost more than others but may offer better performance).

Narrow wires may be strongly affected by "edge effects" (that's what they're called, right?) that increase resistance; they depend on the materials at the interface between the aluminum/copper that makes up the wire and the material that surrounds it.

Differences in wire resistance/capacitance are important because they affect how bad long wires are. If long wires aren't too slow, signals that go long distances can make the trip with only a few "repeaters" along the way (repeaters just buffer a signal along). If wires are slower, signals need to be buffered more frequently, and can't go as far in a cycle. Things like this can actually affect the high-level architecture of a chip (e.g. number of pipeline stages, if your chip is like the Pentium 4 and have to dedicate whole pipeline stages to just routing signals around). Relative to gates, wires are generally slowing down (or not speeding up as much) with each new process generation, so even simple process shrinks require rework as a result.

Wire resistance also affects the power grid--as current flows from the pins through the power grid to the transistors, the resistance results in a voltage drop. The end result is that the transistors see a lower voltage than what's applied at the power pins, so it's important to make sure the power grid's resistance isn't too high. If the new process has higher-resistance wires, you'll need to use up more metal resources for the power grid (otherwise the chip will be slower than a more optimal design would be).

Clock distribution networks tend to be very sensitive to the properties of the metal stack, and to ensure both high-speed and low-power operation the design would have to be reworked when moving to a new process.

Dealing with the differences[edit]

Most transistors on a chip are parts of "standard cells" (think of them like rubber-stamp patterns that you can plop down wherever you want, and they can be easily swapped for each other...), so if you just fix the transistors in the standard cells (e.g. adjust the beta ratios), you'll take care of a lot of the transistors. However, you'll run into a problem: some of the new transistor sizes won't fit well into the space allocated for the cells on the old manufacturing process. You can bloat all of the standard cells so their relative sizes stay the same and you don't have to redo all of your gate placements, but that wastes a huge amount of space. In the real world, since cost is so important, and cost is strongly affected by die area, you'll want to recoup that wasted space, so you'll have to redo your placement (if some gates shrink or grow relative to other gates, your placement won't be optimal any more). When some signals have to travel longer or shorter distances than they used to, you really want to go back and make sure the gates that were used are still optimal.

*Actually, most are probably in the cache nowadays, but most of those transistors are still part of a repeated cell, the "bit cell", so they effectively need to be fixed once.

There are a lot of fancy circuits in parts of the CPU (caches, register files, extremely-performance-critical logic) that use techniques that are very dependent on specific ratios in the manufacturing process (e.g. transistor leakage vs. current when the transistors are on, and the beta ratio). These circuits have to be redone when moving to a new process (sometimes for functionality if the process is different enough; sometimes just for performance reasons).

Other fancy circuits (PLLs, I/O pads, etc) are also highly sensitive, but they're pretty much voodoo to me so I can't say much about them.

Other notes[edit]

  • It takes a lot longer to manufacture a design (to get first silicon after the design is complete) than many people realize. It takes something like a month.
  • It takes a lot longer to qualify a design than many people realize. It takes a significant chunk of a year. There's a lot of analysis that has to be done that takes time (for example, verifying reliability of the design--making sure the chips will still work in 5, 10, 20 years--can't be done over night), and a lot of tests have to be run to catch bugs (with a wide variety of hardware and software configurations). Even "small" changes can introduce subtle bugs, and hardware manufacturers are extremely conservative because you can't patch hardware as easily as software (even though there are often hooks that can be used to disable features if they end up being buggy, they may come with significant performance penalties).

Even if porting the design took 1 day, it would still take most of a year before the result was ready to sell; does it still sound like it's worth switching processes rather than fixing your own?

Why GPUs can do it[edit]

It all comes down to how aggressive the design is. CPUs eek out every last bit of performance, and are highly tuned for a specific manufacturing process. Nearly all of the design is done by hand, which takes a long time but produces slightly better results. The gates in GPUs are almost entirely synthesized automatically by software; this allows for much faster spins of a design, but results in a less-optimal design. They're less aggressive, but as a result they're easier to port. This is probably both a cause and result of shorter design cycles (adding new features quickly helps with marketing - games taking advantage of new DirectX / OpenGL features ship pretty quickly, whereas the x86 instruction set is relatively unchanging in comparison; the design methods used just happen to be conducive to shorter product cycles anyway).