Making Tokens, Pt. 3: Photolithography
Part three of the Making Tokens series. Part 1 refined sand into 9N polysilicon. Part 2 pulled that into a single-crystal wafer. Now we take that $75 bare wafer and walk it through the most complex manufacturing process on the planet.
What "fabrication" actually is
When people say "fab," they mean the multi-hundred-step process of taking a bare silicon wafer and building hundreds of billions of transistors and metal interconnects on its surface, layer by layer, by repeatedly:
- Depositing a thin film (silicon dioxide, silicon nitride, copper, cobalt, hafnium oxide, etc.)
- Coating the film with a photoresist
- Exposing the photoresist through a patterned mask
- Developing to remove the exposed (or unexposed) photoresist
- Etching the underlying film where the resist is gone
- Stripping the remaining photoresist
- Cleaning everything
- Repeat
Each cycle through that loop is a single "layer." A modern logic chip like an H100 requires somewhere between 70 and 90 such mask layers. Each layer typically involves five or ten sub-steps. So the total number of process steps for a single wafer through a leading-edge fab is somewhere on the order of 800-1500 individual steps.
The wafer takes roughly three to four months to traverse the whole flow.
An aside, because I have to
I spent a couple of years in undergrad at Northwestern running a chemical vapor deposition rig in a chemistry lab. The job was growing graphene (a single-atom-thick sheet of carbon) on various substrates: silicon wafers with a thin oxide, copper foil, nickel foil. The chemistry is, on paper, embarrassingly simple. You pipe argon into a quartz tube as the carrier gas, hydrogen to clean and reduce the substrate surface, and methane as the carbon feedstock. You ramp the substrate to around 1000 °C. The methane cracks on the hot surface, leaves carbon atoms behind, and (if you have done everything right) those atoms organize themselves into a continuous sheet of graphene one atom thick. You cool it, characterize it under Raman and SEM, and write down what happened.
It almost never worked the same way twice.
The same recipe on the same substrate at the same temperature with the same flow rates would give you beautiful continuous graphene one Tuesday, scattered patches of garbage on Wednesday, and bilayer junk on Thursday. We spent months chasing variables: how thick the native oxide on the silicon was, exactly how clean we got the copper, how fast we ramped temperatures, whether the leak rate on the quartz tube had drifted since the last service, whether someone had opened the door to the fume hood at the wrong moment and let in a little extra humidity. The yield was somewhere around one in five runs producing something we could write about.
Now imagine the fab version of this problem. A leading-edge fab runs hundreds of deposition steps on a single wafer (CVD, ALD, PVD, epitaxy), each one needing thickness uniformity better than an angstrom across 300 mm, particle counts below a handful per cm², doping concentrations within tens of parts per million of the recipe, with no drift across months of operation, twenty-four hours a day, on hundreds of wafers a day. They hit 60-80% yield on chips with 80 billion transistors on them.
The respect I have for what TSMC and ASML and Applied Materials actually accomplish has its roots in the fact that I once tried, badly, to do a single layer of it for a couple of years.
The lithography step is the bottleneck
Of all those steps, the one that gets the headlines is photolithography: the step where you actually print the pattern. The reason it matters is geometric. The smallest feature size your lithography can resolve sets the minimum gate length of the transistors, which sets the density and performance of the whole chip.
For a long time, the industry used deep ultraviolet (DUV) light at 193 nm, generated by an argon-fluoride excimer laser. With clever tricks (immersion lithography, multi-patterning, off-axis illumination), 193 nm lithography pushed all the way down to the 7 nm node. But by the time you're trying to print features smaller than the wavelength of the light by an order of magnitude, you're doing the optical equivalent of carving a watch movement with a chainsaw.
Below the 7 nm node, the industry moved to extreme ultraviolet (EUV) lithography at 13.5 nm. The light is generated by hitting droplets of molten tin with a high-power CO₂ laser at 50,000 times per second. Each strike vaporizes the tin into a plasma that emits a flash of 13.5 nm light, which is then collected, focused through a set of multi-layer molybdenum-silicon mirrors (you can't use lenses, EUV gets absorbed by everything including air), and projected through the reticle (the mask) onto the photoresist on the wafer.
A single EUV lithography machine is the most complex industrial product ever produced. ASML is the only company in the world that makes them. The current generation (the TWINSCAN NXE:3600D) costs around $200 million per machine. The next generation (high-NA EUV, the EXE:5000) is around $370 million. A leading-edge fab needs 20-30 of these machines to be productive.
Numbers that don't fit anywhere else
A few quantitative facts about a modern leading-edge fab that I find never fail to land in conversation:
Water. A single advanced fab consumes around 10 million gallons of ultra-pure water per day. TSMC's total water consumption across all its Taiwan fabs is around 150,000 tons per day (~40 million gallons). This is one reason the recurring Taiwan droughts cause global panic in the chip industry, and one reason TSMC's new fabs (Arizona, Japan, Germany) get sited near reliable water supplies. "Ultra-pure water" here means water purified to >18 megohm-cm resistivity, with total organic carbon below a few parts per billion.
Power. A leading-edge fab draws 100-300 MW of continuous power, depending on size and tooling. The power load is dominated by the air handlers (the cleanroom is overpressurized and filtered constantly), the cooling, the lithography tools, and the wet etch chemistry. A single EUV scanner alone draws around 1.5 MW during exposure.
Cleanliness. The lithography area inside a fab is a Class 1 cleanroom: fewer than 1 particle larger than 0.5 microns per cubic meter of air. For reference, a hospital operating theater is Class 10,000 or worse. The air in the cleanroom is changed completely several hundred times per hour, passing through HEPA filters in the ceiling and being exhausted through perforated floors.
Chemicals. A modern fab uses on the order of hundreds of distinct chemistries: photoresists, developers, primers, BARC, post-CMP cleaners, ammonium fluoride, sulfuric acid, hydrogen peroxide, isopropyl alcohol, tetramethylammonium hydroxide, and a long tail of ultra-high-purity gases (silane, ammonia, nitrogen trifluoride, hydrogen, argon). The supply chain to deliver these chemistries at semiconductor purity is itself a large industry.
Capital. A new leading-edge fab costs $20-30 billion to build. TSMC's Arizona Fab 21 was budgeted at around $40 billion including subsequent phases. The fab depreciation alone, spread over the wafers it produces, contributes meaningfully to the per-wafer cost.
Yield and per-die economics
The other dimension that matters is yield. Not every die on a wafer works. Defects creep in at every layer (random particles, lithography misalignment, doping variation, etch non-uniformity), and a chip the size of an H100 (~814 mm²) is large enough that the probability of at least one fatal defect per die is non-trivial.
For a 300 mm wafer with theoretical density of ~80 H100-class dies per wafer, the actual usable yield for a mature node is somewhere between 50 and 70 working dies per wafer, depending on the chip and the maturity of the process. Early in a node's life cycle, yields can be much worse; near end-of-life, they can be higher.
The cost economics work out roughly like this:
- Bare wafer: ~$100
- Fab processing (depreciation, labor, materials, EUV machine time): ~$15,000-25,000 per wafer
- Yielded dies per wafer: 50-70
- Per-die cost: ~$200-500 for the silicon
That's just the die. Add packaging (CoWoS for advanced HBM-equipped chips, itself a $10K+ step), HBM memory stacks at a few thousand dollars each, then NVIDIA's gross margin on top, and you arrive at the $25,000-40,000 sticker price for an H100.
For perspective: an H100 sells for roughly 2,500x the price of the bare wafer it came from, after three months of fab processing and the most complex industrial operation humans have ever organized.
The geography of all of this
The fabs that can actually run leading-edge processes at scale are concentrated. The honest list:
- TSMC (Taiwan, primarily Fab 18 in Tainan; Fab 21 in Arizona ramping; Japan and Germany builds in progress)
- Samsung Foundry (South Korea, primarily Pyeongtaek; Taylor, Texas fab in progress)
- Intel (US, Israel, Ireland; trying to compete on the foundry side with limited success so far)
The number of facilities in the world that can actually produce the chip in your AI accelerator at the leading edge is a single-digit number of buildings. The cumulative capital invested in those buildings is into the hundreds of billions of dollars. The geopolitical implications of all of this are well understood and not the topic of this piece, but they're a major reason every Western government is currently writing semiconductor industrial policy.
What's next
At the end of this stage, we have a finished AI accelerator die: an 800 mm² piece of silicon containing ~80 billion transistors, organized into thousands of streaming multiprocessors, a few terabytes per second of memory bandwidth, and the supporting NVLink and PCIe fabric. That die gets diced from the wafer, packaged with HBM memory stacks, soldered onto an OAM board, and shipped to a data center.
In Part 4 we put 100,000 of these chips into a single building, hook them up to a 150 MW power feed, and train a model.