|
13 | 13 | // See the License for the specific language governing permissions and
|
14 | 14 | // limitations under the License.
|
15 | 15 |
|
| 16 | +// A note on x86 SIMD instructions availability |
| 17 | +// ----------------------------------------------------------------------------- |
| 18 | +// A number of conditions need to be met for an application to use SIMD |
| 19 | +// instructions: |
| 20 | +// 1. The CPU itself must support the instruction. |
| 21 | +// - we use `CPUID` to check whether the feature is supported. |
| 22 | +// 2. The OS must save and restore the associated SIMD register across context |
| 23 | +// switches, we check that: |
| 24 | +// - the CPU reports supporting hardware context switching instructions via |
| 25 | +// CPUID.1:ECX.XSAVE[bit 26] |
| 26 | +// - the OS reports supporting hardware context switching instructions via |
| 27 | +// CPUID.1:ECX.OSXSAVE[bit 27] |
| 28 | +// - the CPU extended control register 0 (XCR0) is set to save and restore the |
| 29 | +// needed SIMD registers |
| 30 | +// |
| 31 | +// Note that if `XSAVE`/`OSXSAVE` are missing, we delegate the detection to the |
| 32 | +// OS via the `DetectFeaturesFromOs` function or via microarchitecture |
| 33 | +// heuristics. |
| 34 | +// |
| 35 | +// Encoding |
| 36 | +// ----------------------------------------------------------------------------- |
| 37 | +// X86Info contains fields such as vendor and brand_string that are ASCII |
| 38 | +// encoded strings. `vendor` length of characters is 13 and `brand_string` is 49 |
| 39 | +// (with null terminated string). We use CPUID.1:E[D,C,B]X to get `vendor` and |
| 40 | +// CPUID.8000_000[4:2]:E[D,C,B,A]X to get `brand_string` |
| 41 | +// |
| 42 | +// Microarchitecture |
| 43 | +// ----------------------------------------------------------------------------- |
| 44 | +// `GetX86Microarchitecture` function consists of check on vendor via |
| 45 | +// `IsVendorByX86Info`. We use `CPUID(family, model)` to define the vendor's |
| 46 | +// microarchitecture. In cases where the `family` and `model` is the same for |
| 47 | +// several microarchitectures we do a stepping check or in the worst case we |
| 48 | +// rely on parsing brand_string (see HasSecondFMA for an example). Details of |
| 49 | +// identification by `brand_string` can be found by reference: |
| 50 | +// https://en.wikichip.org/wiki/intel/microarchitectures/cascade_lake |
| 51 | +// https://www.intel.com/content/www/us/en/processors/processor-numbers.html |
| 52 | + |
| 53 | +// CacheInfo X86 |
| 54 | +// ----------------------------------------------------------------------------- |
| 55 | +// We use the CacheInfo struct to store information about cache levels. The |
| 56 | +// maximum number of levels is hardcoded but can be increased if needed. We have |
| 57 | +// full support of cache identification for the following processors: |
| 58 | +// • Intel: |
| 59 | +// ◦ modern processors: |
| 60 | +// we use `ParseCacheInfo` function with `leaf_id` 0x00000004. |
| 61 | +// ◦ old processors: |
| 62 | +// we parse descriptors via `GetCacheLevelInfo`, see Application Note |
| 63 | +// 485: Intel Processor Identification and CPUID Instruction. |
| 64 | +// • AMD: |
| 65 | +// ◦ modern processors: |
| 66 | +// we use `ParseCacheInfo` function with `leaf_id` 0x8000001D. |
| 67 | +// ◦ old processors: |
| 68 | +// we parse cache info using Fn8000_0005_E[A,B,C,D]X and |
| 69 | +// Fn8000_0006_E[A,B,C,D]X. See AMD CPUID Specification: |
| 70 | +// https://www.amd.com/system/files/TechDocs/25481.pdf. |
| 71 | +// • Hygon: |
| 72 | +// we reuse AMD cache detection implementation. |
| 73 | +// • Zhaoxin: |
| 74 | +// we reuse Intel cache detection implementation. |
| 75 | +// |
| 76 | +// Internal structures |
| 77 | +// ----------------------------------------------------------------------------- |
| 78 | +// We use internal structures such as `Leaves` and `OsPreserves` to cache the |
| 79 | +// result of cpuid info and support of registers, since latency of CPUID |
| 80 | +// instruction is around ~100 cycles, see |
| 81 | +// https://www.agner.org/optimize/instruction_tables.pdf. Hence, we use |
| 82 | +// `ReadLeaves` function for `GetX86Info`, `GetCacheInfo` and |
| 83 | +// `FillX86BrandString` to read leaves and hold these values to avoid redundant |
| 84 | +// call on the same leaf. |
| 85 | + |
16 | 86 | #include <stdbool.h>
|
17 | 87 | #include <string.h>
|
18 | 88 |
|
@@ -121,7 +191,6 @@ static Leaves ReadLeaves(void) {
|
121 | 191 |
|
122 | 192 | ////////////////////////////////////////////////////////////////////////////////
|
123 | 193 | // OS support
|
124 |
| -// TODO: Add documentation |
125 | 194 | ////////////////////////////////////////////////////////////////////////////////
|
126 | 195 |
|
127 | 196 | #define MASK_XMM 0x2
|
|
0 commit comments