Check out my first novel, midnight's simulacra!

CPUID: Difference between revisions

From dankwiki
No edit summary
(link to EFLAGS)
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
Loading the cpuid module creates the nodes <tt>/dev/cpu/*/cpuid</tt>. Good commands for breaking down cpuid data include <tt>cpuid</tt> and Dave Jones's <tt>x86info</tt>. A "cpuid_level" field can be found in <tt>/proc/cpuinfo</tt> (checked as of 2.6.25-rc3) for each processor in the system. There's Intel and AMD documents about using cpuid effectively, although they're given short shrift at [http://www.rcollins.org/ddj/Sep96/Sep96.html this page].
Loading the cpuid module creates the nodes <tt>/dev/cpu/*/cpuid</tt>. Good commands for breaking down cpuid data include <tt>cpuid</tt> and Dave Jones's <tt>x86info</tt>. A "cpuid_level" field can be found in <tt>/proc/cpuinfo</tt> (checked as of 2.6.25-rc3) for each processor in the system. There's Intel and AMD documents about using cpuid effectively, although they're given short shrift at [http://www.rcollins.org/ddj/Sep96/Sep96.html this page].
==Using [[libtorque]] for system discovery==
A very complete, open source CPUID implementation is available in [[libtorque]] (see [http://github.com/dankamongmen/libtorque/blob/master/src/libtorque/hardware/x86cpuid.c src/libtorque/hardware/x86cpuid.c]). It's up-to-date as of 2009-11 for both Intel and AMD processors, doesn't require root privileges, supports all major affinity API's, handles SMT, multicore, and multiple physical packages, [[NUMA]] and discovers all caches and TLBs. Furthermore, it performs dynamic topology enumeration.
==CPUID and [[SMP on x86|multiple processors]]==
Some CPUID attributes require multiple CPUID calls to retrieve (often EAX is first set to 0 to determine the highest attribute request supported, then called again with specific requests). Behavior is undefined if CPUID information from one processor is used to form requests for another (especially in a heterogeneous system). Furthermore, CPUID must be performed on each processor used -- results for one processor have no general ramifications for others (again, especially in a heterogeneous system). Current AMD and Intel CPUID specifications explicitly cover only homogeneous systems. To perform system-wide detection, either
* the process must start in a [[cpuset]] containing all processors, and the process must bind itself to each CPU in turn
** see <tt>sched_setpolicy()</tt> on [[Linux APIs|Linux]] and <tt>cpuset_setpolicy</tt> on [[FreeBSD APIs|FreeBSD]]), or
* the operating system must provide a world-readable interface (ie, the process never calls CPUID directly)
On [[Linux APIs|Linux]], the <tt>/dev/cpuid(4)</tt> device provides an obsolete interface to CPUID. On modern systems, it is chmod'd 0440, and its use is discouraged. Instead, since [http://lkml.indiana.edu/hypermail/linux/kernel/0801.3/2419.html at least 2.6.24] a [[sysfs]] interface has been supported; see <tt>/sys/devices/system/cpu/*/</tt>:<pre>[dumbledore](0) $ for i in /sys/devices/system/cpu/cpu0/cache/*/* /sys/devices/system/cpu/cpu0/topology/* ; do echo -n "`basename $i`: " && cat $i ; done
coherency_line_size: 64
level: 1
number_of_sets: 64
physical_line_partition: 1
shared_cpu_map: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000101
size: 32K
type: Data
ways_of_associativity: 8
coherency_line_size: 64
level: 1
number_of_sets: 128
physical_line_partition: 1
shared_cpu_map: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000101
size: 32K
type: Instruction
ways_of_associativity: 4
coherency_line_size: 64
level: 2
number_of_sets: 512
physical_line_partition: 1
shared_cpu_map: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000101
size: 256K
type: Unified
ways_of_associativity: 8
coherency_line_size: 64
level: 3
number_of_sets: 8192
physical_line_partition: 1
shared_cpu_map: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00005555
size: 8192K
type: Unified
ways_of_associativity: 16
core_id: 0
core_siblings: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00005555
physical_package_id: 1
thread_siblings: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000101
[dumbledore](0) $ </pre>
==CPUID and [[SMP on x86|multiple threads]]==
* The package must support SMT. Check via bit 28 (0x1c00) of the EDX register after CPUID function 0x00000001.
* The processor must support SMT. Check via bits 23:16 of the EBX register after CPUID function 0x00000001. They must be > 1.
* Number of threads sharing a given cache can be determined via CPUID function 0x00000004, checking EAX bits 25:14 for each cache.
* Number of threads for a given level of topology can be determined via CPUID function 0x0000000B, checking ECX bits 15:8 for each level.


==Availability of cpuid==
==Availability of cpuid==
Line 8: Line 61:
The 8086/88 is distinguished from the 80286 by attempting to clear bits 12 - 15 of the FLAGs register, The 8086/88 will always set these bits, regardless of what values are popped into them (see Listing One). The 286 treats these bits differently. In real mode, these bits are always cleared by the 286; in protected mode, they are used for IOPL (I/O Privilege Level) and NT (Nested Task). To continue the detection code, you need to set bits 12 - 15 in the FLAGs register, and see if they are cleared by the processor. If they are, then a 286 has been detected (see Listing Two).
The 8086/88 is distinguished from the 80286 by attempting to clear bits 12 - 15 of the FLAGs register, The 8086/88 will always set these bits, regardless of what values are popped into them (see Listing One). The 286 treats these bits differently. In real mode, these bits are always cleared by the 286; in protected mode, they are used for IOPL (I/O Privilege Level) and NT (Nested Task). To continue the detection code, you need to set bits 12 - 15 in the FLAGs register, and see if they are cleared by the processor. If they are, then a 286 has been detected (see Listing Two).


If you gethis point in the algorithm, you know you have at least a 386. Therefore, it is safe to use 32-bit instructions, like PUSHFD. This will be necessary in detecting the difference between a 386 and 486. These processors are distinguished from each othmpting to set the AC flag in the EFLAGs register. This flag was introduced in the 486, The 386 never sets this bit, and always clhen it is set by POPFD. Therefore, to detect the difference between these processor generations, the algorithm attempts to set thiee if it is latched or cleared by the processor (see Listing Three).
If you get to this point in the algorithm, you know you have at least a 386. Therefore, it is safe to use 32-bit instructions, like PUSHFD. This will be necessary in detecting the difference between a 386 and 486. These processors are distinguished from each othmpting to set the AC flag in the [[EFLAGS]] register. This flag was introduced in the 486, The 386 never sets this bit, and always clhen it is set by POPFD. Therefore, to detect the difference between these processor generations, the algorithm attempts to set thiee if it is latched or cleared by the processor (see Listing Three).


At this point in the algorithm, you're almost home. To detect the difference between the 486 and the Pentium, you attempt to set another new EFLAG bit (bit-21) called the "ID flag." This flag has only one purpose - to indicate the presence of the CPUID instruction. This bit was first introduced on the Pentium, but later retrofitted into the 486. If the CPUID instruction exists on either processor, it may be executed to return the processor-identification information. 486s without the CPUID instruction will not be able to toggle this bit. Therefore, it is safe to execute a sequence of instructions on either processor that detects the processor's ability to toggle this bit (see Listing Four).
At this point in the algorithm, you're almost home. To detect the difference between the 486 and the Pentium, you attempt to set another new [[EFLAGS]] bit (bit-21) called the "ID flag." This flag has only one purpose - to indicate the presence of the CPUID instruction. This bit was first introduced on the Pentium, but later retrofitted into the 486. If the CPUID instruction exists on either processor, it may be executed to return the processor-identification information. 486s without the CPUID instruction will not be able to toggle this bit. Therefore, it is safe to execute a sequence of instructions on either processor that detects the processor's ability to toggle this bit.


Once the algorithm gets to this point, you can execute the CPUID instruction to obtain the processor identification. This instruction can be run in any processor mode, at any privilege level. On the Pentium and 486, the CPUID instruction has two levels:
Once the algorithm gets to this point, you can execute the CPUID instruction to obtain the processor identification. This instruction can be run in any processor mode, at any privilege level. On the Pentium and 486, the CPUID instruction has two levels:
Line 20: Line 73:


==See Also==
==See Also==
* [http://www.intel.com/Assets/PDF/appnote/241618.pdf Intel Application Note 485], "Intel Processor Identification and the CPUID Instruction" (2009-03)
* [http://www.intel.com/Assets/PDF/appnote/241618.pdf Intel Application Note 485], "Intel Processor Identification and the CPUID Instruction" (2009-08)
* [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25481.pdf AMD Publication 25481], "AMD CPUID Specification" (2008-04)
* [http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25481.pdf AMD Publication 25481], "AMD CPUID Specification" (2008-04)
[[Category: x86]]

Latest revision as of 03:49, 4 August 2011

Loading the cpuid module creates the nodes /dev/cpu/*/cpuid. Good commands for breaking down cpuid data include cpuid and Dave Jones's x86info. A "cpuid_level" field can be found in /proc/cpuinfo (checked as of 2.6.25-rc3) for each processor in the system. There's Intel and AMD documents about using cpuid effectively, although they're given short shrift at this page.

Using libtorque for system discovery

A very complete, open source CPUID implementation is available in libtorque (see src/libtorque/hardware/x86cpuid.c). It's up-to-date as of 2009-11 for both Intel and AMD processors, doesn't require root privileges, supports all major affinity API's, handles SMT, multicore, and multiple physical packages, NUMA and discovers all caches and TLBs. Furthermore, it performs dynamic topology enumeration.

CPUID and multiple processors

Some CPUID attributes require multiple CPUID calls to retrieve (often EAX is first set to 0 to determine the highest attribute request supported, then called again with specific requests). Behavior is undefined if CPUID information from one processor is used to form requests for another (especially in a heterogeneous system). Furthermore, CPUID must be performed on each processor used -- results for one processor have no general ramifications for others (again, especially in a heterogeneous system). Current AMD and Intel CPUID specifications explicitly cover only homogeneous systems. To perform system-wide detection, either

  • the process must start in a cpuset containing all processors, and the process must bind itself to each CPU in turn
    • see sched_setpolicy() on Linux and cpuset_setpolicy on FreeBSD), or
  • the operating system must provide a world-readable interface (ie, the process never calls CPUID directly)

On Linux, the /dev/cpuid(4) device provides an obsolete interface to CPUID. On modern systems, it is chmod'd 0440, and its use is discouraged. Instead, since at least 2.6.24 a sysfs interface has been supported; see /sys/devices/system/cpu/*/:

[dumbledore](0) $ for i in /sys/devices/system/cpu/cpu0/cache/*/* /sys/devices/system/cpu/cpu0/topology/* ; do echo -n "`basename $i`: " && cat $i ; done
coherency_line_size: 64
level: 1
number_of_sets: 64
physical_line_partition: 1
shared_cpu_map: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000101
size: 32K
type: Data
ways_of_associativity: 8
coherency_line_size: 64
level: 1
number_of_sets: 128
physical_line_partition: 1
shared_cpu_map: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000101
size: 32K
type: Instruction
ways_of_associativity: 4
coherency_line_size: 64
level: 2
number_of_sets: 512
physical_line_partition: 1
shared_cpu_map: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000101
size: 256K
type: Unified
ways_of_associativity: 8
coherency_line_size: 64
level: 3
number_of_sets: 8192
physical_line_partition: 1
shared_cpu_map: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00005555
size: 8192K
type: Unified
ways_of_associativity: 16
core_id: 0
core_siblings: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00005555
physical_package_id: 1
thread_siblings: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000101
[dumbledore](0) $ 

CPUID and multiple threads

  • The package must support SMT. Check via bit 28 (0x1c00) of the EDX register after CPUID function 0x00000001.
  • The processor must support SMT. Check via bits 23:16 of the EBX register after CPUID function 0x00000001. They must be > 1.
  • Number of threads sharing a given cache can be determined via CPUID function 0x00000004, checking EAX bits 25:14 for each cache.
  • Number of threads for a given level of topology can be determined via CPUID function 0x0000000B, checking ECX bits 15:8 for each level.


Availability of cpuid

It's ironic that Intel claims that "any other approach may produce unpredictable results," since its algorithm is prone to failures that yield unpredictable results (as I'll demonstrate in this article). For more information on CPUID, see the text box "Pentium Detection," by Robert Moote (which accompanied the article "Processor-Detection Schemes," by Richard C. Leinecker, DDJ, June 1993).

The Intel algorithm relies on a series of PUSHF/POPF instructions to set and clear various FLAGs bits. Each generation of processor has a slightly different behavior which may be detected by this approach. This algorithm makes no attempt to detect the 80186/88 series of processors. In this regard, the algorithm is incomplete.

The 8086/88 is distinguished from the 80286 by attempting to clear bits 12 - 15 of the FLAGs register, The 8086/88 will always set these bits, regardless of what values are popped into them (see Listing One). The 286 treats these bits differently. In real mode, these bits are always cleared by the 286; in protected mode, they are used for IOPL (I/O Privilege Level) and NT (Nested Task). To continue the detection code, you need to set bits 12 - 15 in the FLAGs register, and see if they are cleared by the processor. If they are, then a 286 has been detected (see Listing Two).

If you get to this point in the algorithm, you know you have at least a 386. Therefore, it is safe to use 32-bit instructions, like PUSHFD. This will be necessary in detecting the difference between a 386 and 486. These processors are distinguished from each othmpting to set the AC flag in the EFLAGS register. This flag was introduced in the 486, The 386 never sets this bit, and always clhen it is set by POPFD. Therefore, to detect the difference between these processor generations, the algorithm attempts to set thiee if it is latched or cleared by the processor (see Listing Three).

At this point in the algorithm, you're almost home. To detect the difference between the 486 and the Pentium, you attempt to set another new EFLAGS bit (bit-21) called the "ID flag." This flag has only one purpose - to indicate the presence of the CPUID instruction. This bit was first introduced on the Pentium, but later retrofitted into the 486. If the CPUID instruction exists on either processor, it may be executed to return the processor-identification information. 486s without the CPUID instruction will not be able to toggle this bit. Therefore, it is safe to execute a sequence of instructions on either processor that detects the processor's ability to toggle this bit.

Once the algorithm gets to this point, you can execute the CPUID instruction to obtain the processor identification. This instruction can be run in any processor mode, at any privilege level. On the Pentium and 486, the CPUID instruction has two levels:

  • Level 0 returns a vendor ID string in EBX:EDX:ECX, which says "GenuineIntel" when printed as ASCII text.
  • Level 1 returns the processor identification signature - the same signature that appears in the EDX register after a processor RESET (see Listing Five).

The complete Intel algorithm is available in AP-485, or via anonymous FTP at ftp://ftp.intel.com/pub/IAL/tools_utils_demos/cpuid3.zip.

See Also