Pentium II Microcode: collected tweets part 1
The tweets might take a few seconds to load images
Disclaimer
Since February 2021 I work for Intel Corp., This work was all completed and published before I joined them, and was based on own research and public information alone.
The reason for publishing these tweet collections is the decline in popularity of Twitter due to recent events. I have not added any new information here that was not in the original tweets.
Context/Background info
This is a collection of tweets spanning most of the initial part to my Pentium II microcode research. For more context on this, see the talk I gave at hardwear.io 2020:
Hardwear.io 2020 - Under the hood of a CPU video (Warning, bad audio quality)
Introduction
This is the setup, i'm using an old OLIMEX ARM-JTAG-USB probe and OpenOCD's jtag commands. pic.twitter.com/Pb1tZSdOS2
— Peter Bosch (@peterbjornx) March 8, 2020
Guess what I've working on during while having to stay at home due to a cold? pic.twitter.com/WQZCTX9tfw
— Peter Bosch (@peterbjornx) April 2, 2020
This quickly turned into a full fledged RE tool
PLA decoder works, now to make sense of this microcode :), There clearly is some structure to it but I'm hoping the input to the PLAs becomes more obvious when I have all of them decoded, by listing those inputs that seem to set the output valid bit. pic.twitter.com/inB7VMYfmi
— Peter Bosch (@peterbjornx) April 8, 2020
Finally "decoded" the full Pentium II full decoder PLAs (thats roughly 100Kbit of AND and OR planes). Writing the tools was by far the largest part of it: some 9 days to write the interactive image recognition tool and about 2 days to read the die shots using that.
— Peter Bosch (@peterbjornx) April 13, 2020
I'm still having some trouble figuring out the input format for these PLAs, they take a 24 bit word which should contain 2 opcode bytes, parts of the ModRM byte and mode info. I only have images of a die etched all the way down to substrate so I can't see what the logic is doing
— Peter Bosch (@peterbjornx) April 13, 2020
, only where transistors were (there is a depth profile left by the trench oxide). Luckily the mask programming was done on the trench oxide/implant layers :)
— Peter Bosch (@peterbjornx) April 13, 2020
The tool i've written for this is far from done and I'm not really proud of what the code looks like at the moment, but should anyone want to take a look at it, it's on github: https://t.co/Cmu3dshkqo
— Peter Bosch (@peterbjornx) April 13, 2020
Need more photos
I also had to build a modified microscope to allow me to gather the images I needed.
Trying to get a bit more of a clue of the Pentium II microcode ROM column layout using my DIY metallographic microscope setup. Deprocessing was done using manual lapping and aluminum etch, with sulph. acid used to strip polymer coat from die. pic.twitter.com/PCygx9MLDY
— Peter Bosch (@peterbjornx) June 14, 2020
Stitching die shots can be a bit of a chore, but I'm quite surprised at how well my deprocessing technique worked: al interconnect was etched using al. etch mix (phosp. acid+acetic acid+nitric acid), polymer coat: sulph. acid, and oxide: q-tip+metal polish paste pic.twitter.com/uHqdRGxUAc
— Peter Bosch (@peterbjornx) June 15, 2020
PLA breakthrough
I think I figured out the input to the D4 decoder PLAs I decoded from PII "dixon" die shots! Trick was to invert every other bit of the x86 opcode. Output is not entirely accurate as I'm still missing some info on the OR plane, but it should reflect what bits are set. 1/7 pic.twitter.com/R7GTvLL4RA
— Peter Bosch (@peterbjornx) June 16, 2020
The output is that of four PLAs translating x86 to P6 cuops, which I captured using my PLAdecode tool https://t.co/Cmu3dshkqo . The output here seems to indicate at least some structure on the micro-op fields. When an instr yields less than 4 uops some of the PLAs seem to (2/7)
— Peter Bosch (@peterbjornx) June 16, 2020
produce garbage output. This should still have a consistent "valid/invalid" bit, but as said before, I have not fully understood the OR plane outputs yet (Each bit-cell seems to output to two, not one wires, almost as if there are 144 bits in a uop, not 72) (3/7)
— Peter Bosch (@peterbjornx) June 16, 2020
These double output bits are combined back into the 72 bit vector in the row circuitry before leading back into the rest of the core. I am currently piecing together die shots from an earlier PII (klamath) to try to piece together how this 144->72 bit mapping works, and how (4/7)
— Peter Bosch (@peterbjornx) June 16, 2020
the bit order in these PLAs corresponds to that in the MSROM, which I have been able to decode fully from the dixon die shots taken for me by SIC66/Martijn Boer. The reason for my own work being on klamath, not dixon is klamath's larger feature size and wire-bond, not (5/7)
— Peter Bosch (@peterbjornx) June 16, 2020
flipchip package. My main issue at the moment is my inability to consistently delayer ICs, for my klamath die shots I was able to mostly get rid of the top metal layer while not damaging the lower layers too much, but more than this starts degrading the sample. (6/7)
— Peter Bosch (@peterbjornx) June 16, 2020
Fortunately, my observations and assumptions from the active-layer die shots used for ROM/PLA capture have largely seemed to match what I later observed in the metal die shots, so I'm quite confident about the AND plane and MSROM capture
— Peter Bosch (@peterbjornx) June 16, 2020
Deprocessing and imaging
Managed to do some more delayering using KOH and polishing. Trick was to use hot, saturated KOH solution and floor repair "wax" to mount the die and protect the exposed silicon. Result is not very even but managed to remove 3 metal layers and the dielectric separating them! pic.twitter.com/hG8C3sEuI3
— Peter Bosch (@peterbjornx) June 19, 2020
Etching the die: 3 runs of 10,20 and 20 minutes at 70, 85 and 85 degC. A drop large enough to cover the die was pipetted onto the die and rinsed off using demineralised water after the run pic.twitter.com/9rcWwfTyWP
— Peter Bosch (@peterbjornx) June 19, 2020
And then manually polishing it to improve surface quality and remove debris. pic.twitter.com/sJYDX71LS4
— Peter Bosch (@peterbjornx) June 19, 2020
Picture is taken using 5x digital zoom on the live viewfinder feature of my canon EOS550D mounted to the microscope, objective is an olympus 40 0.65 0.17 type 457498. According to a manufacturing analysis document on this chip (PII klamath) M1 pitch is 500nm
— Peter Bosch (@peterbjornx) June 19, 2020
Managed to polish down to the contact layer for the simple decoder OR plane, contacts are really clear in this view! pic.twitter.com/Amhr0NR0Hn
— Peter Bosch (@peterbjornx) June 19, 2020
Thin film interference is really helpful when working on a chip with features around the diffraction limit for an optical microscope: polishing to just the right thickness where one layer has a normal reflection, but layers below only reflect some wavelengths, adding contrast pic.twitter.com/eP9LS66dAL
— Peter Bosch (@peterbjornx) June 20, 2020
— Peter Bosch (@peterbjornx) June 20, 2020
Circuit level RE
Slowly working out the output circuitry on the PII decoder PLA or planes, the main question here is: Why does the or plane have twice as many readout lines as the expected number of outputs for the PLA? pic.twitter.com/WvSuRN1Hrp
— Peter Bosch (@peterbjornx) June 20, 2020
Mostly got the first stage of it worked out. Can't tell what transistors are P or N, which is a bit of a pain, but seems to be a pretty simple circuit so far. pic.twitter.com/rh0v22DPfz
— Peter Bosch (@peterbjornx) June 20, 2020
I don't have metal 3 and 4 captured yet, but because of the way the 4 inputs are tiled together (specifically Q1-3, Q6-8) I could deduce the connectivity between the power rail and the i0,i3 net vias on m3. I'll post my schematic guess next pic.twitter.com/Jk61gabCn9
— Peter Bosch (@peterbjornx) June 20, 2020
Quick verification of what I have worked out upto now. The next bit will hopefully be the part that combines the row signals. pic.twitter.com/n74Ekylwya
— Peter Bosch (@peterbjornx) June 20, 2020
Mapping out the gates is proving to be difficult using my microscope, which is no surprise considering their width is about 0.5λ to 1λ, for a sense of scale, the white dots seen here are the contacts, which are 500nm across. Source of dimensions: https://t.co/N2KBvfd2pa pic.twitter.com/HuGJ9EFw1B
— Peter Bosch (@peterbjornx) June 21, 2020
Hmm, seems like that decoder is pretty much just a NAND gate combining both row wires from both sides of the array. Was a lot more complicated to reverse than you might expect because the latch/precharge circuits are very tightly packed come in multiple layouts. pic.twitter.com/TqJzj1SMK5
— Peter Bosch (@peterbjornx) June 21, 2020
For comparison, a pair of 4 input NAND gates + its 8 precharge/latch circuits (which is about the repeating unit size in the layout) looks like this when all drawn up as one schematic: pic.twitter.com/SxCm0Gw357
— Peter Bosch (@peterbjornx) June 22, 2020
Microscope upgrades
Got a new olympus E series microscope, which allowed much beter shots:
Seems to be working pretty well :) pic.twitter.com/xvAmLeDjUx
— Peter Bosch (@peterbjornx) June 25, 2020
Obfuscation
Just tracing out some of the buses running from the MSROM to the decoder, why did they have to mess up the bit order on it this badly... (bus going of to the east is mostly uop0 . uop1 . uop2, bus going to the south is roughly the MSROM bit order) pic.twitter.com/yscuydtmZ8
— Peter Bosch (@peterbjornx) June 28, 2020
Microscope setup is becoming more and more streamlined, next step is adding a PSU module instead of the 4 benchtop PSUs. Got a bunch more pictures of the microcode ROM row and column circuitry and how it ties into the buses. pic.twitter.com/uXVijzy9lF
— Peter Bosch (@peterbjornx) July 2, 2020
Managed to take another metal layer off the pentium II die, by first polishing, then using aluminum etch to remove the exposed metal layer and after that another polish to smooth the underlying dielectric. Should polish further but want to capture further images before that. pic.twitter.com/s3nbmw8oOJ
— Peter Bosch (@peterbjornx) July 3, 2020
Tracing out the D4 PLA bits onto the microcode bus is proving to be a lot of work. It does, however, seem like the mapping from PLA output row to output logic block (and bus wire) is not shuffled. The reason for verifying it anyway is the illogical wiring of the ucode bus itself pic.twitter.com/aqGJ0SPLn4
— Peter Bosch (@peterbjornx) July 10, 2020
Still working on the Pentium II microcode ROM, now on the mask ROM readout circuitry, and they have definitely not made it easy to figure out where each column goes: not only do they shuffle the bitlines around majorly, they also have 3 different column circuits! (1/3) pic.twitter.com/GHbT9Cy2Vd
— Peter Bosch (@peterbjornx) July 27, 2020
In these images you see the circuit types A,B, and C (with yellow marking bitline inputs, green marking outputs), followed by an overview of how the bitlines coming in from the top wire into them and an overview of how even the outputs get scrambled again. This is not all...(2/3)
— Peter Bosch (@peterbjornx) July 27, 2020
The output bus then goes around a corner and gets scrambled once more... The bus leading off to the top right here is roughly cuop1,2,3 concatenated, and I'm trying to figure out how these map onto the rom columns. I wonder why it is routed this way... (3/3) pic.twitter.com/cjMKH32HUX
— Peter Bosch (@peterbjornx) July 27, 2020
Some random shots
Not the most useful photo, but here's some patch RAM (metal 1, active, metal 2, metal 3) on the pentium II (klamath) pic.twitter.com/3THAbTtzW1
— Peter Bosch (@peterbjornx) July 7, 2020
Bright spots on each image are contacts/vias to layer above. They become more clear with aperture stop nearly closed, but that makes planar detail harder to see (it highlights depth structure).
— Peter Bosch (@peterbjornx) July 7, 2020
Silicon can be beautiful too (Pentium II klamath simple decoder standard cell logic) pic.twitter.com/yzNedXoLWC
— Peter Bosch (@peterbjornx) July 11, 2020
More circuit RE
I took a break from tracing out the high level connectivity on the Pentium II microcode ROM to reverse engineer one of the column circuits. It turned out to be a simple transmission gate latch. Here's the schematics and some of the photos I used. pic.twitter.com/1VjIR4rz30
— Peter Bosch (@peterbjornx) July 31, 2020
Microscope Upgrades!
Bought a 100x immersion objective (with an old microscope, but for now fitted it onto my modified one) and the difference in resolving power is just insane! Roughly the same field in image 1 and 2, but img 2 is using 40x dry objective. The 100x allows me to resolve gates clearly! pic.twitter.com/uvPqwYT7YQ
— Peter Bosch (@peterbjornx) August 2, 2020
Some SRAM and ROM at varying levels of polishing (1,2 with gates+contacts and 3 only diff wells.) pic.twitter.com/XWJLpmC4ew
— Peter Bosch (@peterbjornx) August 2, 2020
Some logic, polished down to the dielectric above metal 1. Shifting the focus slightly really brings out the m1-m2 vias, and the contrast on the metal itself is now good enough to probably allow automated processing. pic.twitter.com/uN5p79DImw
— Peter Bosch (@peterbjornx) August 3, 2020
Obfuscation hell continues
If anyone was wondering why the Pentium II project is taking as long as it is, one of the main reasons is developing the tooling on the go. This meant not having access to proper scans when starting to trace connectivity, meaning I had to hand trace a lot of it in Inkscape. (1/3) pic.twitter.com/lYmrbmTkUj
— Peter Bosch (@peterbjornx) September 6, 2020
For example, this is about the same region of the chip seen through my microscope as it was on June 20th and August 3rd. This is looking at logic, but imagine a similar difference in clarity for interconnect between the start of the project in back in March.(2/3) pic.twitter.com/3E1WhuwVPB
— Peter Bosch (@peterbjornx) September 6, 2020
For the earlier part, where I used someone elses photos of the base silicon for the ROM/PLAs to dump the raw data, I had to write my own tooling due to their lacking resolution and the particular cell type: https://t.co/BOreDQGgrV. (3/3)
— Peter Bosch (@peterbjornx) September 6, 2020
Automating that tracing
Ended up writing a python script that parsed the Inkscape drawing I made of the Pentium II interconnect and automatically inferred vias and determined connectivity from geometry. Next up is adding logic cell support and netlist output. Currently just adds "net" attribs 2 the SVG. pic.twitter.com/0moVqBsXXB
— Peter Bosch (@peterbjornx) September 6, 2020
Most errors the tool made seem to be connected to inkscape sometimes deciding to "transform" a rectangle rather than actually editing it's x,y,width,height attributes. I need to write a preprocessor that normalizes the SVG
— Peter Bosch (@peterbjornx) September 6, 2020
Here's an example of the inferred vias: (orange circles on wires are vias, orange circles on boxes are ports of those cells) pic.twitter.com/Af2208U8sY
— Peter Bosch (@peterbjornx) September 7, 2020
Mostly figured out the obfuscation
All but 3 columns of the microcode ROM right half array mapped to uop bits (as ordered in the XLAT PLA, no clue if that's logical order yet). The left half should be easier as the column circuitry there is much less chaotic in layout. pic.twitter.com/icVdQqVR3Z
— Peter Bosch (@peterbjornx) September 8, 2020
The MSROM has four seperate arrays, as can be seen even in medium res die shots of any P6 CPU, the centre and right arrays are very tricky because their readout circuits are laid out quite randomly. The left array has each of its readouts directly below the associated column. pic.twitter.com/O0hgB4av1J
— Peter Bosch (@peterbjornx) September 8, 2020
slight mistake in this image, the left-mid ROM has only 6 columns, like the patch SRAM for that array
— Peter Bosch (@peterbjornx) September 8, 2020
Those with a keen eye might have noticed the arrays seem to have more columns than shown, this is because they physically have 4 rows folded into one by interleaving, this is resolved on a block level. On every block these 16 to 4 demuxes exist. pic.twitter.com/kK8mn7uBxB
— Peter Bosch (@peterbjornx) September 8, 2020
Useful find
Found an interesting thing on the web, Pentium Pro XLAT PLA listings for all non-complex ops. Apparently this was in the optimalization guide back in the day. Pentium II and newer only list number of uops per macrooop pic.twitter.com/a5dU6ShG1X
— Peter Bosch (@peterbjornx) September 8, 2020
damn. seems to be correct even for some Pentium II PLA output (hex strings are from my PLA sim based on die shot capture of AND/OR planes). Not sure about the register fields i've marked, but the IMM value is clearly seen in the third and second last hex digits of the PLA output. pic.twitter.com/X7iGR2jCim
— Peter Bosch (@peterbjornx) September 8, 2020
Having that "pseudo" uop listing from the prerelease Intel Optimization Manual made it a lot easier to figure out the uop encoding. Getting closer to a working Pentium Pro/II ucode disassembler! First image shows disasm output on the lines A,B,C,D next to documented pseudo ucode pic.twitter.com/colD0YgvEb
— Peter Bosch (@peterbjornx) September 9, 2020
Them including immediate operand values was especially useful, also note the correspondence between arithmetic uopcodes and x86 arithmetic opcodes (low nybble)
— Peter Bosch (@peterbjornx) September 9, 2020
I suspect f0 is the flowmarker, and f1 is the alias control word, but need to do more research on that
— Peter Bosch (@peterbjornx) September 9, 2020
Finally starting to grok the machine/micro code
Still a long way to go before this is understandable assembly, but this part of the MSROM definitely looks like its pushing something to the stack (STRA = store address, STRD = store data). This was disassembled from a partial mask ROM dump of the Pentium II microcode ROM pic.twitter.com/vS3xIchAJL
— Peter Bosch (@peterbjornx) September 12, 2020
I was wondering why my XLAT PLA simulator was not giving sensible results for multibyte encodings... seems like I was somewhat sleepy while writing this... pic.twitter.com/HxkjA7QbKt
— Peter Bosch (@peterbjornx) September 12, 2020
This image shows the right array column readout circuits as boxes, and wiring as lines, the wires leading off to the top are the actual mask ROM bitlines. The bus is on the lower half of the image. The left array does not have such a complicated layout. pic.twitter.com/PiWRofcHfW
— Peter Bosch (@peterbjornx) September 8, 2020
ROM capture
Now that i've figured out enough of the P6 ucode encoding to start disassembling it (don't have all ops,reg names yet, but most fields in the word are known) I must complete capture of the ROM, which is a tedious process as I want to manually confirm every bit. This is 1/12 of it pic.twitter.com/Bch3wcV9R3
— Peter Bosch (@peterbjornx) September 18, 2020
Each pair (left,right half of ROM) of these contains 64 visible rows, with 2 wordlines each so 128 row addrs. Each of these rows then has 4 interleaved logical rows in it, yielding 512 addrs per pair. Each of these addrs contains a triplet of uops,meaning every pair has 1536 uops
— Peter Bosch (@peterbjornx) September 18, 2020
At the moment I have 1 pair complete and most of a second pair, meaning I am about 1/3 of the way through. Each pair takes me about 6-8hrs of work... I dont see any more efficient way as I want to be absolutely sure I know which bits are broken and I do not trust image...
— Peter Bosch (@peterbjornx) September 18, 2020
... processing enough to check this. My algo seems to be pretty good with only about 10-20 bit errors for a block like this, but because I am using this to understand an undocumented ISA I want to know the quality of my input data.
— Peter Bosch (@peterbjornx) September 18, 2020
Code finds :)
Think I found the register init for the Pentium II. On reset all registers are cleared but EDX is set to CPUID Family Model Stepping. Although my logic RE work is on a Klamath, my ROM die shots are from a Dixon. The Dixon CPUID is Fam6_Model6, which seems to match what this does. pic.twitter.com/Lh7XPOvrQz
— Peter Bosch (@peterbjornx) September 23, 2020
Fixing bugs
Hmm. think I had the direction for the row addresses backwards (3 uops per logical row, 4 logical rows interleaved into one physical) (Notice the addresses). Also wondering if anyone can guess what this ucode assist does? pic.twitter.com/meQZkvPIQI
— Peter Bosch (@peterbjornx) September 27, 2020
This is right around the moment I gave the hardwear.io talk, which is at
Hardwear.io 2020 - Under the hood of a CPU video (Warning, bad audio quality)