banner



How Many Registers Do Modern Cpus Have

ENOSUCHBLOG

Programming, philosophy, pedaling.


How many registers does an x86-64 CPU have?

Nov 30, 2020 Tags: programming, x86

x86 is back in the general programmer discourse, in role thanks to Apple tree's M1 and Rosetta 2. Every bit such, I figured I'd exercise yet some other x86-64 post.

Merely similar the last i, I'm going to cover a facet of the x86-64 ISA that sets it apart as unusually complex among modern ISAs: the number and diverseness of registers available.

Like instruction counting, register counting on x86-64 is discipline to debates over methodology. In particular, for this web log post, I'm going to lay the post-obit ground rules:

  • I will count sub-registers (e.yard., EAX for RAX) every bit singled-out registers. My justification: they have different instruction encodings, and both Intel and AMD optimize/pessimize particular sub-register use patterns in their microcode.

  • I will count registers that are nowadays on x86-64 CPUs, merely that tin can't be used in long mode.

  • I won't count registers that are only present on older x86 CPUs, like the 80386 and 80486 test registers.

  • I won't count microarchitectural implementation details, like shadow registers.

  • I will count registers that aren't directly addressable, similar MSRs that can merely be accessed through RDMSR. Yet, I won't (or will try non to) double-count registers that have multiple access mechanisms (similar RDMSR and RDTSC).

  • I won't count model-specific registers that fall into these categories:

    • MSRs that are only present on niche x86 vendors (Cyrix, Via)
    • MSRs that aren't widely available on contempo-ish x86-64 CPUs
      • Errata: I accidentally included AVX-512 in some of the original counts below, not realizing that it hadn't been released on any AMD CPUs. The post has been updated.
    • MSRs that are completely undocumented (both officially and unofficially)

In addition to the rules above, I'm going to apply the following considerations and methodology for grouping registers together:

  • Many sources, both official and unofficial, use "model-specific register" as an umbrella term for any non-core or not-feature-set register supplied past an x86-64 CPU. Whenever possible, I'll try to avert this in favor of more than specific categories.

  • Both Intel and AMD provide synonyms for registers (e.g. CR8 equally the "task priority register," or TPR). Whenever possible, I'll endeavor to utilize the more generic/category conforming proper name (like CR8 in the instance above).

  • In general, the individual cores of a multicore processor have contained annals states. Whenever this isn't the instance, I'll make an effort to document information technology.


General-purpose registers

The general-purpose registers (or GPRs) are the master registers in the x86-64 register model. Every bit their name implies, they are the only registers that are general purpose: each has a set of conventional uses1, but programmers are more often than not gratis to ignore those conventions and use them as they please2.

Considering x86-64 evolved from a 32-bit ISA which in plow evolved from a xvi-bit ISA, each GPR has a fix of subregisters that hold the lower viii, 16 and 32 bits of the full 64-flake register.

Every bit a tabular array:

64-bit 32-bit 16-bit 8-bit (low)
RAX EAX AX AL
RBX EBX BX BL
RCX ECX CX CL
RDX EDX DX DL
RSI ESI SI SIL
RDI EDI DI DIL
RBP EBP BP BPL
RSP ESP SP SPL
R8 R8D R8W R8B
R9 R9D R9W R9B
R10 R10D R10W R10B
R11 R11D R11W R11B
R12 R12D R12W R12B
R13 R13D R13W R13B
R14 R14D R14W R14B
R15 R15D R15W R15B

Some of the 16-scrap subregisters are also special: the original 8086 immune the high byte of AX, BX, CX, and DX to exist accessed indepenently, and so x86-64 preserves this for some encodings:

xvi-scrap 8-bit (high)
AX AH
BX BH
CX CH
DX DH

And so that'south sixteen full-width GPRs, fanning out to another 52 subregisters.

Registers in this group: 68.

Running total: 68.

Special registers

This is sort of an bogus category: like every ISA, x86-64 has a few "special" registers that proceed things moving along. In particular:

  • The instruction pointer, or RIP.

    x86-64 has 32- and sixteen-bit variants of RIP (EIP and IP), but I'm not going to count them as separate registers: they have identical encodings and tin't be used in the same CPU modethree.

  • The status annals, or RFLAGS.

    Just like RIP, RFLAGS has 32- and 16-bit counterparts (EFLAGS and FLAGS). Unlike RIP, these counterparts tin can exist partially mixed: PUSHF and PUSHFQ are both valid in long mode, and LAHF/SAHF can operate on the $.25 of FLAGS on some x86-64 CPUs exterior of compatiblility mode4. And then I'yard going to go ahead and count them.

Registers in this group: 4.

Running total: 72.

Segment registers

x86-64 has a total of 6 segment registers: CS, SS, DS, ES, FS, and GS. The performance varies with the CPU'due south mode:

  • In all modes except for long manner, each segment register holds a selector, which indexes into either the GDT or LDT. That yields a segment descriptor which, among other things, supplies the base of operations address and extent of the segment.

  • In long mode all merely FS and GS are treated as having a base of operations address of null and a 64-bit extent, effectively producing a flat address space. FS and GS are retained as special cases, but no longer apply the segment descriptor tables: instead, they access base addresses that are stored in the FSBASE and GSBASE model-specific registersv. More on those later.

Registers in this group: 6.

Running total: 78.

SIMD and FP registers

The x86 family has gone through several generations of SIMD and floating-point didactics groups, each of which has introduced, extended, or re-contextualized various registers:

  • x87
  • MMX
  • SSE (SSE2, SSE3, SSE4, SSE4, …)
  • AVX (AVX2, AVX512)

Allow'due south practise them in rough order.

x87

Originally a discrete coprocessor with its own instruction set and register file, the x87 instructions have been regularly baked into x86 cores themselves since the 80486.

Considering of its coprocessor history, x87 defines both normal registers6 (akin to GPRs) and a variety of special registers needed to command the FPU state:

  • ST0 through ST7: 8 80-scrap floating-betoken registers
  • FPSW, FPCW, FPTW 7: Control, status, and tag-word registers
  • "Data operand arrow": I don't know what this one does, but the Intel SDM specifies iteight
  • Instruction pointer: the x87 state machine apparently holds its own re-create of the current x87 instruction
  • Last instruction opcode: this is plainly distinct from the x87 opcode, and has its own register

Registers in this group: fourteen.

Running total: 92.

MMX

MMX was Intel'due south first attempt at consumer SIMD in their x86 chips, released back in 1997.

For design reasons that are a complete mystery to me, the MMX registers are actually sub-registers of the x87 STn registers: each 64-bit MMn occupies the mantissa component of its corresponding STn. Consequently, x86 (and x86-64) CPUs cannot execute MMX and x87 instructions at the same fourth dimension.

Edit: This department incorrectly included MXCSR, which was actually introduced with SSE. Thanks to /u/Skorezore for pointing out the mistake.

Registers in this group: 8.

Running total: 100.

SSE and AVX

For simplicity's sake, I'm going to wrap SSE and AVX into a unmarried section: they use the same sub-annals pattern equally the GPRs and x87/MMX do, so they fit well into a single table:

AVX-512 (512-bit) AVX-2 (256-bit) SSE (128-scrap)
ZMM0 YMM0 XMM0
ZMM1 YMM1 XMM1
ZMM2 YMM2 XMM2
ZMM3 YMM3 XMM3
ZMM4 YMM4 XMM4
ZMM5 YMM5 XMM5
ZMM6 YMM6 XMM6
ZMM7 YMM7 XMM7
ZMM8 YMM8 XMM8
ZMM9 YMM9 XMM9
ZMM10 YMM10 XMM10
ZMM11 YMM11 XMM11
ZMM12 YMM12 XMM12
ZMM13 YMM13 XMM13
ZMM14 YMM14 XMM14
ZMM15 YMM15 XMM15
ZMM16 YMM16 XMM16
ZMM17 YMM17 XMM17
ZMM18 YMM18 XMM18
ZMM19 YMM19 XMM19
ZMM20 YMM20 XMM20
ZMM21 YMM21 XMM21
ZMM22 YMM22 XMM22
ZMM23 YMM23 XMM23
ZMM24 YMM24 XMM24
ZMM25 YMM25 XMM25
ZMM26 YMM26 XMM26
ZMM27 YMM27 XMM27
ZMM28 YMM28 XMM28
ZMM29 YMM29 XMM29
ZMM30 YMM30 XMM30
ZMM31 YMM31 XMM31

In other words: the lower half of each ZMMn is YMMn, and the lower half of each YMMn is XMMn. At that place's no direct mode annals access for but the upper half of YMMn, nor does ZMMn have directly 256- or 128-bit access for the thunks of its upper half.

SSE also defines a new condition register, MXCSR, that contains flags roughly parallel to the arithmetic flags in RFLAGS (along with floating-bespeak flags in the x87 status word). SSE also introduces a load/store didactics pair for manipulating it (LDMXCSR and STMXCSR).

AVX-512 also introduces eight opmask registers, k0 through k7. k0 is a special case that behaves much similar the "cipher" register on some RISC ISAs: it can't be stored to, and loads from it ever produce a bitmask of all ones.

Errata: The tabular array above includes AVX-512, which isn't bachelor on whatever AMD CPUs as of 2020. I've updated the counts below to simply include SSE and AVX2-introduced registers.

Registers in this group: 33.

Running full: 133.

Premises registers

Intel added these with MPX, which was intended to offer hardware-accelerated premises checking. Nobody uses information technology, since information technology doesn't work very well. But x86 is eternal and tedious to fix mistakes, so nosotros'll probably have these registers taking up space for at least a while longer:

  • BND0BND3: Individual 128-chip registers, each containing a pair of addresses for a jump.
  • BNDCFG: Bound configuration, kernel mode.
  • BNDCFU: Bound configuration, user mode.
  • BNDSTATUS: Bound condition, afterward a #BR is raised.

Registers in this grouping: 7.

Running total: 140.

Debug registers

These are what they audio similar: registers that aid and advance software debuggers, like GDB.

There are half dozen debug registers of two types:

  • DR0 through DR3 comprise linear addresses, each of which is associated with a breakpoint status.

  • DR6 and DR7 are the debug status and control registers. DR6's lower bits signal which debug atmospheric condition were encountered (upon inbound the debug exception handler), while DR7 controls which breakpoint addresses are enabled and their breakpoint conditions (e.thou., when a particular accost is written to).

What about DR4 and DR5? For reasons that are unclear to me, they don't (and take never) existedix. They practice accept encodings but are treated as DR6 and DR7, respective, or produce an #UD exception when CR4.DE[fleck 3] = 1.

Registers in this group: vi.

Running full: 146.

Control registers

x86-64 defines a gear up of control registers that can be used to manage and inspect the country of the CPU.

There are xvi "main" control registers, all of which tin can be accessed with a MOV variant:

Name Purpose
CR0 Bones CPU operation flags
CR1 Reserved
CR2 Page-fault linear accost
CR3 Virtual addressing land
CR4 Protected style operation flags
CR5 Reserved
CR6 Reserved
CR7 Reserved
CR8 Task priority register (TPR)
CR9 Reserved
CR10 Reserved
CR11 Reserved
CR12 Reserved
CR13 Reserved
CR14 Reserved
CR15 Reserved

All reserved control registers upshot in an #UD when accessed, which makes me inclined to not count them in this mail service.

In addition to the "main" CRn command registers there are also the "extended" control registers, introduced with the XSAVE feature set. Equally of writing, XCR0 is the just specified extended control register.

The extended command registers utilise XGETBV and XSETBV instead of a MOV variant.

Registers in this grouping: 6.

Running total: 152.

"System table pointer registers"

That's what the Intel SDM calls these8: these registers hold sizes and pointers to diverse protected mode tables.

Every bit best I can tell, there are iv of them:

  • GDTR: Holds the size and base address of the GDT
  • LDTR: Holds the size and base of operations address of the LDT
  • IDTR: Holds the size and base address of the IDT
  • TR: Holds the TSS selector and base of operations address for the TSS

The GDTR, LDTR, and IDTR each seem to be 80 $.25 in 64-scrap modes: 16 lower bits for the size of the register's table, and then the upper 64 bits for the table's starting address.

TR is as well 80 bits: 16 bits for the selector (which behaves identically to a segment selector), and and then another 64 for the base address of the TSS10.

Registers in this group: 4.

Running count: 156.

Retentiveness-type-ranger registers

These are an interesting example: unlike all of the other registers I've covered so far, these are non unique to a particular CPU in a multicore chip; instead, they're shared across all cores11.

The number of MTTRs seems to vary by CPU model, and have been largely superseded by entries in the page attribute table, which is programmed with an MSR12.

Registers in this group:

Running count: >156.

Model specific registers

Model-specific registers are where things go fun.

Similar extended control registers, they're accessed indirectly (by identifier) through a pair of instructions: RDMSR and WRMSR. MSRs themselves are 64-bits only originated during the 32-fleck era, and then RDMSR and WRMSR read from and write to ii 32-scrap registers: EDX and EAX.

By way of case: here's the setup and RDMSR invocation for accessing the IA32_MTRRCAP MSR, which includes (among other things) that actual number of MTRRs available on the arrangement:

                      
i 2 3                    
                      MOV                      ECX                      ,                      0xFE                      ; 0xFE = IA32_MTRRCAP                      RDMSR                      ; The bits of IA32_MTRRCAP are now in EDX:EAX                    

RDMSR and WRMSR are privileged instructions, then normal ring-3 code tin't admission MSRs directly13. The one (?) exception that I know of is the timestamp counter (TSC), which is stored in the IA32_TSC MSR but tin be read from non-privileged contexts with RDTSC and RDTSCP.

Two other interesting (but still privileged14) cases are FSBASE and GSBASE, which are stored every bit IA32_FS_BASE and IA32_GS_BASE, respectively. Every bit mentioned in the segment register section, these store the FS and GS segment bases on x86-64 CPUs. This makes them targets of relatively frequent use (by MSR standards), then they take their own dedicated R/W opcodes:

  • RDFSBASE and RDGSBASE for reading
  • WRFSBASE and WRGSBASE for writing

But back to the meat of things: how many MSRs are at that place?

Using the standards laid out at the beginning of this post, we're interested in counting what Intel calls "architectural" MSRs. From the SDMfifteen:

Many MSRs take carried over from 1 generation of IA-32 processors to the adjacent and to Intel 64 processors. A subset of MSRs and associated bit fields, which practice not change on time to come processor generations, are now considered architectural MSRs. For historical reasons (commencement with the Pentium 4 processor), these "architectural MSRs" were given the prefix "IA32_".

According to the subsequent table16, the highest architectural MSR is 6097/17D1H, or IA32_HW_FEEDBACK_CONFIG. And so, the naïve answer is over 6000.

Nevertheless, at that place are significant gaps in the documented MSR ranges: Intel'southward documentation jumps straight from 3506/DB2H (IA32_THREAD_STALL) to 6096/17D0H (IA32_HW_FEEDBACK_PTR). On top of the empty ranges, there are also ranges that are explicitly marked equally reserved, either generally or explicitly for later expansion of a item MSR family.

To count the actual number of MSRs, I did a flake of pipeline ugliness:

  • Extract just table ii-two from Volume 4 of the SDM (link):

                                  
    1                        
                              $                          pdfjam 335592-sdm-vol-4.pdf nineteen-67                          -o                          ii-2.pdf                        
  • Use pdftotext to catechumen it to plain text and manually trim the next tabular array from the concluding folio:

                                  
    1 2                        
                              $                          pdftotext 2-2.pdf table.txt                          # edit tabular array.txt by hand                        
  • Split the plain text table into a sequence of words, filter by IA32_, remove cruft, and practise a standard sort-unique-count:

                                  
    1 2 three 4 v vi                        
                              $                                                    tr                          -s                          '[:space:]'                          '\due north'                          < table.txt                          \                          |                          grep                          'IA32_'                          \                          |                          tr                          -d                          '.'                          \                          |                          sed                          's/\[.*$//'                          \                          |                          sort                          |                          uniq                          |                          wc                          -l                          404                        

    (Output preserved for posterity hither).

That pipeline left a bit of cruft towards the cease thanks to quoted variants, so I count the actual number at 400 architectural MSRs. That'southward a lot more reasonable than 6096!

Registers in this group: 400

Running count: >556.

Other bits and wrapup

The footnotes at the bottom of this post encompass virtually of my notes, only I also wanted to dump some other resources that I found useful while discovering registers:

  • sandpile.org has a nice visualization of many of the architectural MSRs, including field breakdowns.

  • Vol. 3A § 8.seven.1 ("State of the Logical Processors") of the Intel SDM has a useful list of near all of the registers that are either unique to or shared between x86-64 cores.

  • The OSDev Wiki has collection of helpful pages on various x86-64 registers, including a great page on the behavior of the segment base of operations MSRs.

All told, I think that in that location are roughly 557 registers on the average (relatively contempo) x86-64 CPU core. With that beingness said, I have some peripheral cases that I'm not certain nigh:

  • Modern Intel CPUs use integrated APICs as office of their SMT implementation. These APICs have their own register banks which tin can be memory-mapped for reading and potential modification by an x86 core. I didn't count them because (i) they're memory mapped, and thus bear more than like mapped registers from an arbitrary piece of hardware than CPU registers, and (2) I'thousand not sure whether AMD uses the same mechanism/implementation.

  • The Intel SDM implies that Concluding Branch Records are stored in detached, non-MSR registers. AMD'due south developer manual, on the other manus, specifies a range of MSRs. As such, I didn't effort to count these separately.

  • Both Intel and AMD have their own (and incompatible) virtualization extensions, likewise equally their own enclave/hardened execution extensions. My intuition is that each introduces some additional registers (or maybe just MSRs), only their vendor-specificity made me inclined to non look also deeply.

Information on these (and whatsoever other) registers would be deeply appreciated.



Discussions: Reddit

How Many Registers Do Modern Cpus Have,

Source: https://blog.yossarian.net/2020/11/30/How-many-registers-does-an-x86-64-cpu-have

Posted by: demelobunecand.blogspot.com

Related Posts

0 Response to "How Many Registers Do Modern Cpus Have"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel