Pentium II Microcode: collected tweets part 2
Disclaimer
Since February 2021 I work for Intel Corp., This work was all completed and published before I joined them, and was based on own research and public information alone.
The reason for publishing these tweet collections is the decline in popularity of Twitter due to recent events. I have not added any new information here that was not in the original tweets.
The road so far…
Some errata
Some things I forgot to mention in my talk: Theres about 4K * 3 micro-ops in the ROM, and microinstructions can take part of the opcode from the macroinstruction, e.g. ADD,SUB,etc... are all one PLA entry.
— Peter Bosch (@peterbjornx) October 2, 2020
SMM is always useful
Hmm. https://t.co/F9hKwq4T7L is quite helpful for mapping out the P6 CRBUS addresses. (Manually annotated microcode for readability), I suspect the OP_31X ops are conditional microjumps. pic.twitter.com/hyU8lC4CnF
— Peter Bosch (@peterbjornx) October 4, 2020
CPUID!
Think i got control flow figured out, heres CPUID from a late Pentium II (Dixon), which apparently already had the PSN feature. pic.twitter.com/8nZv1v3t76
— Peter Bosch (@peterbjornx) October 8, 2020
In response to a question:
might do that later on, this is still very much a work in progress. one thing that really surprised me was that the branches are almost all absolute direct, and to pack the UROM addr into it they are using the LSrc2 bits as the high order bits for branch target.
— Peter Bosch (@peterbjornx) October 9, 2020
another interesting feature is that the CRBUS addresses seem to match MSR numbers (there are far more CRs than MSRs though). I don't have the wr/rdmsr ucode yet but I would expect them to also implement some MSRs as ucode routines, though.
— Peter Bosch (@peterbjornx) October 9, 2020
(Limited) 64 bit uops! On a pentium 2 ?!?
Not entirely sure what exactly is going on here yet, but this makes a very compelling bit of evidence that the P6 supports 64 bit load/stores. (The TMP regs are shared float/int and are 86+ bits long) pic.twitter.com/dDIcguru2A
— Peter Bosch (@peterbjornx) October 9, 2020
Let the games begin
Found the (pentium II) microcode patch MSR code, including crypto function! Now to see if I can either deduce the key or find a weakness...
— Peter Bosch (@peterbjornx) October 9, 2020
Hmm., think I found myself some P II microcode patch keys. (Only for a single update, so far, need to crack a few more to get the key deriv constants). Took a bit of computation but was not too bad. (Not true bruteforce, found a way to reduce complexity from 64 bit to 36 bits) pic.twitter.com/LX6TAJqlVc
— Peter Bosch (@peterbjornx) October 11, 2020
I can already reliably get key and IV from the ciphertext, however, what I want to be able to do is derive these the proper way, and to do that I need to determine 17 32bit values. Every update gives me 2, but atleast one seems to be tied to stepping or even processor flags
— Peter Bosch (@peterbjornx) October 15, 2020
Even without actually knowing the plaintext, knowing the decrypted format allowed me to pin several locations to a very small range of plaintext values, and the algorithm is weak enough that knowing two words of plaintext at arbitrary positions breaks it.
— Peter Bosch (@peterbjornx) October 15, 2020
Next up is creating custom Pentium II patches to run arbitrary microcode. Just like @_markel___'s work showed for the Atom, the Pentium II exposes the MSROM store on the CRBUS so hacked microcode will allow dumping the whole ROM from software. https://t.co/4VridDVO5a
— Peter Bosch (@peterbjornx) October 14, 2020
There is a weird check on the CRBUS part, where it will redirect writes to a different CRBUS register( possibly the sink register 0x1FF) if FPROM[crypto state (dependent on data)] does not match a dword read from the file. However, it should be trivial to deduce the FPROM content
— Peter Bosch (@peterbjornx) October 14, 2020
Getting close
Step 1 towards making custom patches: I can now re-encrypt Pentium II ucode patches using almost whatever key seed I like. Here's a patch being re-encrypted using the key seed 0x42C0DE04. It increments the seed until it finds a value that does not hit gaps in my key deriv table. pic.twitter.com/nMBJt5cVUP
— Peter Bosch (@peterbjornx) October 16, 2020
Do note that I chose not to change the header, which is not even provided to the CPU when updating so it really doesn't make sense to update it
— Peter Bosch (@peterbjornx) October 16, 2020
Loading…
The Deschutes stepping 2 processor in this old pc (thanks @noopwafel) accepts my modified patchfiles: I didn't change the actual MSRAM data yet, but used the control register op block to change an MSR and set the patch revision to 0x42. This is a newly made patchfile! pic.twitter.com/mqFNPjGwSC
— Peter Bosch (@peterbjornx) October 20, 2020
Borrowed some ideas from "Chernobyl" virus to get into ring0 to do the actual rd/wrmsr in the Win98 environment. Surprising to have StackOverflow point me at a famous virus's source on github for "getting to ring0 on windows 98"...
— Peter Bosch (@peterbjornx) October 20, 2020
Here's the cursed win98 user mode microcode loader source: https://t.co/a6UMfa3hui
— Peter Bosch (@peterbjornx) October 20, 2020
Developed in MSVC 6 on the target :) Tried doing it under a DOS extender first but it would not accept any updates for some reason. pic.twitter.com/wasz8HMqJr
— Peter Bosch (@peterbjornx) October 20, 2020
Proof of concept!
Finally managed to create a patch that will hook CPUID and change the brandstring. Had to first create a patch that would dump the ROM because I did not have the CPU that I actually extracted the mask ROM for. pic.twitter.com/gfWeZZgW6R
— Peter Bosch (@peterbjornx) October 22, 2020
The patch mechanism works using a set of registers that allow trapping control flow to certain MS addresses and redirecting it to new (MSRAM) addresses. To use this, I need to know the address of the target code, this is why I needed a ROM dump.
— Peter Bosch (@peterbjornx) October 22, 2020
Bootstrapping this was somewhat easy because all Intel patches I have hook wrmsr. What I did is I used the SYSENTER MSRs to communicate with my ucode(Win98 does not use them) and wrote a routine that would read the MSROM and store it to main memory.
— Peter Bosch (@peterbjornx) October 22, 2020
This means I now have a perfect copy of this CPU's microcode ROM. I might create a better dumper and port it to different steppings/cores. To port it to Pentium3 or newer I first need a mask ROM to determine patch format/crypto. ucode for P III should be very similar to P II.
— Peter Bosch (@peterbjornx) October 22, 2020
And some real ownage
CPUID to get to ring0: Wrote the Pentium II microcode equivalent to RWEverything ( https://t.co/MQ5nil0wwp ). Pretty fun how easily you can create evil ring 3 (sub)instructions this way... pic.twitter.com/zjUM9vS6z8
— Peter Bosch (@peterbjornx) October 25, 2020
How it all works
Soo, the P6 patch crypto is pretty funky. I'm no cryptography expert, but this seems like an interesting cross between a CFB and CBC mode. The cipher function itself is based around a Galois LFSR which is used not as a PRNG but as a block cipher... (1/2) pic.twitter.com/kPC8XsiTuq
— Peter Bosch (@peterbjornx) October 29, 2020
The actual cipher seeds an LFSR with the plaintext and clocks it 37 times, and then takes the contents of the LFSR and XORs them with the plaintext. The key is used as the taps for the LFSR. If anyone recognizes this mode and/or cipher, I'd love to know what they're called (2/2) pic.twitter.com/Wj05sjotXV
— Peter Bosch (@peterbjornx) October 29, 2020
They also sample the feedback state and feed it through a LUT to generate a sort of rudimentary integrity check value, which was useful because the same LUT was a part of key derivation. Cracking a single key/IV combo allowed me to get the 17 LUT entries from the check words.
— Peter Bosch (@peterbjornx) October 29, 2020
Some Ghidra fun
Implementing P6 microcode in Ghidra has yielded a few issues as ghidra does not support word lengths > 8 bytes or tokens longer than 64bits. Working around this is annoying and means addresses are 10* what they should be but it does seem to be working so far. pic.twitter.com/dxdZwV1LE0
— Peter Bosch (@peterbjornx) October 29, 2020
Starting to get the decompiler working in ghidra, though I haven't yet found a way to mark register parameters as inout or global. (This is originally assembly code and some "static" register variables are shared between a few functions) pic.twitter.com/yGx4cmG9MY
— Peter Bosch (@peterbjornx) October 29, 2020
Sources?
Published the source code to my pentium II patch encryption/decryption tool, sans the 32 bit base keys (should be easy enough to find these, but did not feel comfortable adding them). https://t.co/rfwzCOAbWV
— Peter Bosch (@peterbjornx) November 3, 2020
To go from the .hex files this produces to something you could start playing around with you will also need to descramble them using https://t.co/XY7WUMmhv6
— Peter Bosch (@peterbjornx) November 3, 2020
Yep. If anyone wants to mess around with microcode, a lot of the tools are on github. Some assembly required but you should be able to get it to work https://t.co/rfwzCOAbWV https://t.co/XY7WUMmhv6 https://t.co/vxioJD2jHC
— Peter Bosch (@peterbjornx) February 12, 2021