Skip to content

Commit c74a85d

Browse files
author
Mykola Hohsadze
authored
Add documentation on current behavior for X86 (#212)
* Add documentation for X86 OS support * Update X86 documentation * Remove outdated cache info comment * Update x86 documentation according to comments * Update Internal structures documentation
1 parent 4590768 commit c74a85d

File tree

2 files changed

+70
-2
lines changed

2 files changed

+70
-2
lines changed

include/cpuinfo_x86.h

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,6 @@ X86Info GetX86Info(void);
124124

125125
// Returns cache hierarchy informations.
126126
// Can call cpuid multiple times.
127-
// Only works on Intel CPU at the moment.
128127
CacheInfo GetX86CacheInfo(void);
129128

130129
typedef enum {

src/impl_x86__base_implementation.inl

Lines changed: 70 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,76 @@
1313
// See the License for the specific language governing permissions and
1414
// limitations under the License.
1515

16+
// A note on x86 SIMD instructions availability
17+
// -----------------------------------------------------------------------------
18+
// A number of conditions need to be met for an application to use SIMD
19+
// instructions:
20+
// 1. The CPU itself must support the instruction.
21+
// - we use `CPUID` to check whether the feature is supported.
22+
// 2. The OS must save and restore the associated SIMD register across context
23+
// switches, we check that:
24+
// - the CPU reports supporting hardware context switching instructions via
25+
// CPUID.1:ECX.XSAVE[bit 26]
26+
// - the OS reports supporting hardware context switching instructions via
27+
// CPUID.1:ECX.OSXSAVE[bit 27]
28+
// - the CPU extended control register 0 (XCR0) is set to save and restore the
29+
// needed SIMD registers
30+
//
31+
// Note that if `XSAVE`/`OSXSAVE` are missing, we delegate the detection to the
32+
// OS via the `DetectFeaturesFromOs` function or via microarchitecture
33+
// heuristics.
34+
//
35+
// Encoding
36+
// -----------------------------------------------------------------------------
37+
// X86Info contains fields such as vendor and brand_string that are ASCII
38+
// encoded strings. `vendor` length of characters is 13 and `brand_string` is 49
39+
// (with null terminated string). We use CPUID.1:E[D,C,B]X to get `vendor` and
40+
// CPUID.8000_000[4:2]:E[D,C,B,A]X to get `brand_string`
41+
//
42+
// Microarchitecture
43+
// -----------------------------------------------------------------------------
44+
// `GetX86Microarchitecture` function consists of check on vendor via
45+
// `IsVendorByX86Info`. We use `CPUID(family, model)` to define the vendor's
46+
// microarchitecture. In cases where the `family` and `model` is the same for
47+
// several microarchitectures we do a stepping check or in the worst case we
48+
// rely on parsing brand_string (see HasSecondFMA for an example). Details of
49+
// identification by `brand_string` can be found by reference:
50+
// https://en.wikichip.org/wiki/intel/microarchitectures/cascade_lake
51+
// https://www.intel.com/content/www/us/en/processors/processor-numbers.html
52+
53+
// CacheInfo X86
54+
// -----------------------------------------------------------------------------
55+
// We use the CacheInfo struct to store information about cache levels. The
56+
// maximum number of levels is hardcoded but can be increased if needed. We have
57+
// full support of cache identification for the following processors:
58+
// • Intel:
59+
// ◦ modern processors:
60+
// we use `ParseCacheInfo` function with `leaf_id` 0x00000004.
61+
// ◦ old processors:
62+
// we parse descriptors via `GetCacheLevelInfo`, see Application Note
63+
// 485: Intel Processor Identification and CPUID Instruction.
64+
// • AMD:
65+
// ◦ modern processors:
66+
// we use `ParseCacheInfo` function with `leaf_id` 0x8000001D.
67+
// ◦ old processors:
68+
// we parse cache info using Fn8000_0005_E[A,B,C,D]X and
69+
// Fn8000_0006_E[A,B,C,D]X. See AMD CPUID Specification:
70+
// https://www.amd.com/system/files/TechDocs/25481.pdf.
71+
// • Hygon:
72+
// we reuse AMD cache detection implementation.
73+
// • Zhaoxin:
74+
// we reuse Intel cache detection implementation.
75+
//
76+
// Internal structures
77+
// -----------------------------------------------------------------------------
78+
// We use internal structures such as `Leaves` and `OsPreserves` to cache the
79+
// result of cpuid info and support of registers, since latency of CPUID
80+
// instruction is around ~100 cycles, see
81+
// https://www.agner.org/optimize/instruction_tables.pdf. Hence, we use
82+
// `ReadLeaves` function for `GetX86Info`, `GetCacheInfo` and
83+
// `FillX86BrandString` to read leaves and hold these values to avoid redundant
84+
// call on the same leaf.
85+
1686
#include <stdbool.h>
1787
#include <string.h>
1888

@@ -121,7 +191,6 @@ static Leaves ReadLeaves(void) {
121191

122192
////////////////////////////////////////////////////////////////////////////////
123193
// OS support
124-
// TODO: Add documentation
125194
////////////////////////////////////////////////////////////////////////////////
126195

127196
#define MASK_XMM 0x2

0 commit comments

Comments
 (0)