Check out my first novel, midnight's simulacra!

Threadripper L3 CPUID Strangeness: Difference between revisions

From dankwiki
No edit summary
No edit summary
Line 1: Line 1:
'''[[Dankblog|dankblog!]] 2021-02-05, 0356 EDT, at the danktower'''
'''[[Dankblog|dankblog!]] 2021-02-05, 0356 EDT, at the danktower'''


I was updating copyrights upon my [[libtorque]], a project from 2010's [http://vuduc.org/cse6230/ CSE 6230] with Professor Richard Vuduc. libtorque was tremendous fun to work on, and resulted in me distilling [[Fast UNIX Servers|many thoughts]] that I'd been kicking around for a few years, but I consider it a research project and not an industrial-strength library. I don't touch it terribly often, though I do check for compiler warnings every so often.
I was updating copyrights upon my [[libtorque]], a project from 2010's [http://vuduc.org/cse6230/ CSE 6230] with Professor Richard Vuduc. libtorque was tremendous fun to work on, and resulted in me distilling [[Fast UNIX Servers|many thoughts]] that I'd been kicking around for a few years, but I consider it a research project and not an industrial-strength library. I don't touch it terribly often, though I do check for compiler warnings every few [[gcc]] releases.


On my [[TRX40|AMD 3970X Threadripper]], the <tt>archdetect</tt> program included with libtorque failed out. I traced this down to the function <tt>decode_amd_l23cache()</tt> in my x86 hardware discovery. [[X86|Intel and AMD]] caches were at one time defined by a disordered set of mappings from integers to complete cache descriptions, as in each integer meant a completely different set of cache parameters, leading to code like:
On my [[TRX40|AMD 3970X Threadripper]], the <tt>archdetect</tt> program included with libtorque failed out. I traced this down to the function <tt>decode_amd_l23cache()</tt> in my x86 hardware discovery. [[X86|Intel and AMD]] caches were at one time defined by a disordered set of mappings from integers to complete cache descriptions, as in each integer meant a completely different set of cache parameters, leading to code like:

Revision as of 09:38, 5 February 2021

dankblog! 2021-02-05, 0356 EDT, at the danktower

I was updating copyrights upon my libtorque, a project from 2010's CSE 6230 with Professor Richard Vuduc. libtorque was tremendous fun to work on, and resulted in me distilling many thoughts that I'd been kicking around for a few years, but I consider it a research project and not an industrial-strength library. I don't touch it terribly often, though I do check for compiler warnings every few gcc releases.

On my AMD 3970X Threadripper, the archdetect program included with libtorque failed out. I traced this down to the function decode_amd_l23cache() in my x86 hardware discovery. Intel and AMD caches were at one time defined by a disordered set of mappings from integers to complete cache descriptions, as in each integer meant a completely different set of cache parameters, leading to code like:

 { .descriptor = 0xe2,
   .linesize = 64,
   .totalsize = 2 * 1024 * 1024,
   .associativity = 16,
   .level = 3,
   .memtype = MEMTYPE_UNIFIED,
 },
 { .descriptor = 0xe3,
   .linesize = 64,
   .totalsize = 4 * 1024 * 1024,
   .associativity = 16,
   .level = 3,
   .memtype = MEMTYPE_UNIFIED,
 },

and to a great deal of misery and frustration, and dreams of becoming a stripper because programming sucks, and most importantly to failing every time a new microarchitecture employed a new cache size and thus got a new CPUID number.

Thankfully, both Intel and AMD unfucked themselves late in the aughts, and introduced more sensibly-structured CPUID results. AMD provides the leaf 0x80000006, "L2/L3 Cache and TLB Identification". The EDX register returns "L3 Cache Identifiers", structured thusly:

Bits Description
31:18 L3Size: L3 cache size. Specifies the L3 cache size is within the following range:
(L3Size[31:18] * 512KB) ≤ L3 cache size < ((L3Size[31:18]+1) * 512KB)
17:16 Reserved
15:12 L3Assoc: L3 cache associativity. L3 cache associativity:
0h L2/L3 cache or TLB is disabled.
1h Direct mapped. 2h 2-way associative.
4h 4-way associative. 6h 8-way associative.
8h 16-way associative. Ah 32-way associative.
Bh 48-way associative. Ch 64-way associative.
Dh 96-way associative. Eh 128-way associative.
Fh Fully associative.
All other encodings are reserved.
11:8 L3LinesPerTag: L3 cache lines per tag.
7:0 L3LineSize: L3 cache line size in bytes.

Sounds good, and I've used this for over a decade to size up my MLCs and LLCs, being a faithful acolyte of pebbling engineering.

as you can see, my 3970X ate shit and died with the last words mask: 9 size: 134217728 lsize: 64 assoc: 0 lines: 2097152, indicating a failure to discover associativity on 0x80000006:EDX. and indeed, what the fuck, 0x04009140 in edx. now, unless the New Math has changed things up, one extracts bits 15:12 by shifting 12 right and masking against 0xF. yielding....9. scrotumtightening shitfucker! 9! i assumed i'd fucked up the CPUID state machine somehow, and consulted cpuid -r:

0x80000006 0x00: eax=0x48006400 ebx=0x68006400 ecx=0x02006140 edx=0x04009140

Mother of God, i thought. I've forgotten how to mask bits, or perhaps how to count to 12, or maybe even how to spell edx. I went to check the decoded cpuid output...

   L3 cache information (0x80000006/edx):
      line size (bytes)     = 0x40 (64)
      lines per tag         = 0x1 (1)
      associativity         = 0x9 (9)
      size (in 512KB units) = 0x100 (256)

Dogs fucked the Pope; no fault of mine! Now hardware architects can do some very strange things, and indeed I saw an Intel i7 of the Broadwell era with TLB that was 6-way associative, and the 96KB L2 of the Alpha 21164 was famously ménage à trois-associative, as befitted the California culture of the mid-90s. Tupac was still alive (doing 187s), big Pete Wilson was governing (proposing 187s), and the homeless were conveniently hidden behind piles of AOL install media (each man, woman, and child on earth had approximately 187 AOL cds). I once mailed Yale Patt about the 3-way associativity, and he responded with twelve pages of baseball stats descending into a challenge of pistols at dawn. "ps I'll predict your branches you brain-dead ass-eyed Atlanta son of a bitch. pps Do you know where I can find any more female GRAs? Mine have all left." but i digress.

so...what's goin' on here? i've gotta get back to profitable work for the moment, but watch this space for the inevitable solution. hack on!