Skip to content

Support for 19h family #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tiburcillo opened this issue Nov 28, 2020 · 119 comments
Open

Support for 19h family #39

tiburcillo opened this issue Nov 28, 2020 · 119 comments

Comments

@tiburcillo
Copy link

Would it be a lot of work to add support for the Zen3 family?
I love zenpower on my 2700x, would be cool if my 5600x would also be supported.

Thanks,
t

@gardotd426
Copy link

I was also wondering about this.

Currently doesn't work with 5000 series, which isn't that surprising, but I was still hoping it would, since k10temp is just flatout useless with Ryzen 5000 as well.

It worked perfectly with the 3800X on this same motherboard (X570 Taichi) which has the Nuvaton SuperIO chip.

@JaffoS1
Copy link

JaffoS1 commented Dec 7, 2020

Would be absolutly great, if this works with the 5000 series!

@abucodonosor
Copy link

k10temp supports Zen3 from kernel >=5.10.

@gardotd426
Copy link

gardotd426 commented Dec 21, 2020

k10temp supports Zen3 from kernel >=5.10.

Yeah. It gives Tdie and Tctl temps. That's literally it.

zenpower gives detailed voltage and power draw readings. None of that is available in k10temp.

This is literally all you get for CPU in k10temp on 5.10 for Zen 3:

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +32.8°C
Tdie:         +32.8°C

Pretty lackluster. There's a reason we're asking for zenpower support. I made my above comment while running 5.10, so I was already well aware of how well it "works" with 5.10.

@abucodonosor
Copy link

Oh no, Vcore or Isoc etc for ZEN3 in 5.10?

Well, I can try to add that support but I'm not really familiar with that code. I can look at what 5.10 did and add the IDs, and then change the logic in zenpower_probe().. However, I cannot guarantee that is accurate or will work.

Give me some minutes to figure that :)

@abucodonosor
Copy link

@gardotd426

Are you willing to test this patch?

https://crazy.dev.frugalware.org/ZEN3-test.patch

@gardotd426
Copy link

Yep. Tested it, no dice.

From skimming zenpower.c, it seems there's a lot of other areas where support would need to be added, just adding those few lines wouldn't seem to be enough (granted, my knowledge of how zenpower works is limited so this might not be the case).

But yeah, I get the exact same output as before.

zenpower-pci-00c3
Adapter: PCI adapter
Tdie:         +73.5°C  (high = +95.0°C)
Tctl:         +73.5°C

And that much worked without the patch, too (meaning that replacing k10temp w/zenpower gave me the same info just named as zenpower instead of k10temp).

@abucodonosor
Copy link

No, there is not much else, it just means the PLANE address is wrong for ZEN3 or the model IDs or both, and that includes the kernel itself. Someone with ZEN3 HW should report to lkml I guess.

There is no support whatso ever for fam 19h in zenpower before the patch, what means it got defaults and it seems to get defaults even now with fam19h added.

Btw are you sure you rmmod zenpower before loading the patched one?

@abucodonosor
Copy link

@gardotd426

I think I missed something.. in my patch change data->zen3 = true; to data->zen2 = true, just to test something, the address and calculation look the same on both zen2 & zen3 so it should not really matter.

@gardotd426
Copy link

I'm sure I loaded the right zenpower because I didn't even have it installed before this patch, I'd uninstalled it because it was useless, and was using k10temp. I rmmod-ed k10temp and loaded zenpower after installing. I'll try editing the patch and running again.

@gardotd426
Copy link

Same result, unfortunately. If I knew exactly what was missing I'd bug the guys @ lkml

@abucodonosor
Copy link

@gardotd426

k10temp should have Vcore etc. I'll try to find out myself the right offsets for ZEN3, bc I think there is something missing even in mainline.

Unfortunately, I don't have a ZEN3 box yet, prices for a 5950x are way too insane right now :)

@gardotd426
Copy link

Hahah yeah trust me I get it, I was going for the 5900X but you can't buy one anywhere (and I refuse to encourage scalpers), and the only way I could even get the 5800X @ MSRP was through a Newegg combo deal (they aren't selling them individually hardly at all) w/ a 500GB Samsung 980 Pro even though all three of my NVME slots are already taken up with 1GB NVMEs, so I just sold the 980 Pro on ebay for like 10 bucks less than I paid for it.

I still might get a 5900X later for the cores, but a 5800X is perfectly fine and in gaming it's pretty much the same as the 5900X and it definitely doesn't bottleneck my RTX 3090.

If you need help or testing or anything like that I'm happy to do it

@abucodonosor
Copy link

@gardotd426

Out of curiosity, what does the kernel report on the CPU?

Something like this should tell:

dmesg | grep CPU0: | grep smpboot

@hattedsquirrel
Copy link

Output for 5900X:
[ 0.111779] smpboot: CPU0: AMD Ryzen 9 5900X 12-Core Processor (family: 0x19, model: 0x21, stepping: 0x0)

@gardotd426
Copy link

[ 0.109997] smpboot: CPU0: AMD Ryzen 7 5800X 8-Core Processor (family: 0x19, model: 0x21, stepping: 0x0)

@abucodonosor
Copy link

I think I see the bug :)

@gardotd426
Copy link

?

@abucodonosor
Copy link

@gardotd426

give me a moment to create some theoretical patch just to see if it starts working.

@gardotd426
Copy link

Alrighty

@abucodonosor
Copy link

?

Somone committed with the stepping ids :) But the data want the model

@aqxa1
Copy link

aqxa1 commented Dec 21, 2020

Yeah, just tried out your idea, and it's now working. Copy and paste is broken on Firefox Wayland for some reason right now, but there's a heap of data now.

EDIT:

SVI2_Core:     1.55 V
SVI2_SoC:      1.48 V
Tdie:         +44.6°C  (high = +95.0°C)
Tctl:         +44.6°C
Tccd1:        +39.8°C
Tccd2:        +38.0°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:   17.56 W
SVI2_C_Core:   0.00 A
SVI2_C_SoC:   15.87 A

@gardotd426
Copy link

Yeah, just tried out your idea, and it's now working. Copy and paste is broken on Firefox Wayland for some reason right now, but there's a heap of data now.

What did you do?

@abucodonosor
Copy link

@abucodonosor
Copy link

Yeah, just tried out your idea, and it's now working. Copy and paste is broken on Firefox Wayland for some reason right now, but there's a heap of data now.

EDIT:

SVI2_Core:     1.55 V
SVI2_SoC:      1.48 V
Tdie:         +44.6°C  (high = +95.0°C)
Tctl:         +44.6°C
Tccd1:        +39.8°C
Tccd2:        +38.0°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:   17.56 W
SVI2_C_Core:   0.00 A
SVI2_C_SoC:   15.87 A

Yes is broken in the kernel the same way.

I wondered why it pulls default code at all, that is bc the switch(...) data is wrong

@hattedsquirrel
Copy link

@abucodonosor
With your new patch, it now does something:

# sensors zenpower-*
zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:     1.55 V
SVI2_SoC:    925.00 mV
Tdie:         +30.4°C  (high = +95.0°C)
Tctl:         +30.4°C
Tccd1:        +27.5°C
Tccd2:        +29.0°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:  543.90 mW
SVI2_C_Core:   0.00 A
SVI2_C_SoC:  882.00 mA

@gardotd426
Copy link

gardotd426 commented Dec 21, 2020 via email

@abucodonosor
Copy link

@gardotd426

Yes, and the fix is simple for the kernel, this:


diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index a250481b5a97..0b4e61bf90f7 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -541,7 +541,7 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
                data->is_zen = true;
 
                switch (boot_cpu_data.x86_model) {
-               case 0x0 ... 0x1:       /* Zen3 */
+               case 0x21:      /* Zen3 */
                        data->show_current = true;
                        data->svi_addr[0] = F19H_M01_SVI_TEL_PLANE0;
                        data->svi_addr[1] = F19H_M01_SVI_TEL_PLANE1;

Someone may try and confirm k10temp working too :)

@abucodonosor
Copy link

So still some offset ( maybe ) wrong, it may be from ZEN2 code need to check but can't see what is it right now.

SVI2_P_Core: 0.00 W
SVI2_C_Core: 0.00 A

Does this do something under load?

@aqxa1
Copy link

aqxa1 commented Dec 21, 2020

k10temp working:

k10temp-pci-00c3
Adapter: PCI adapter
Vcore:         1.55 V
Vsoc:        975.00 mV
Tctl:         +53.2°C
Tdie:         +53.2°C
Tccd1:        +44.8°C
Tccd2:        +40.5°C
Icore:         0.00 A
Isoc:          4.96 A

Looks to be a bit less data than Zenpower, though.

@IanSteveC
Copy link

Thanks, I got it working. I had to recompile zenmonitor with a patched source also.

To what extent is the SOC telemetry not trust worthy? I see some discussion about this above. It seems my reading for vSOC is a bit off I think. I’ve got it set to 0.95 in the BIOS, but zenmonitor (and hence zenpower) reports nearly 1.2V. Is this normal? On a 5950x.

@hattedsquirrel
Copy link

If zenpower reads the right registers the voltage reading is reasonably accurate. It certainly isn't off by 20% as in your case. But there is guesswork involved with the register addresses, so there is still room for things to go wrong. Did you set your voltages in the BIOS to fixed values, via the offset mode, or one of the other automated modes? Also, XMP can overwrite the V_SoC and I've seen 1.2V in conjunction with XMP.

While the voltage readings worked well on my 5900X, the power is pretty far off and also very temperature dependent. If you want something to compare your values to, you could give ryzen_monitor a try. That thing is also based on guesswork but uses the SMU as data source. So don't take those values as guaranteed either. Also, it has no lmsensors integration. But at least for me the SMU values correleated very well with physical measurements. Maybe it can help you estimate how accurate the zenpower readings are on your system.

@IanSteveC
Copy link

Yes I have ram set to XMP, and yes I have the SOC set to a static value of 0.95. Not using an auto mode for that.

But XMP profiles don’t have SOC voltage in them. Just VDIMM (which it also sets to about 1.45v), so if XMP is somehow interfering with the SOC voltage, this sounds like a bug in the BIOS/AGESA.

@hattedsquirrel
Copy link

Can you double-check with a different software or a multimeter whether your V_SoC really is 0.95V? 1.20V seems to be the default voltage on many mainboards for memory speeds above 3200MHz. If somehow the 1.2V got set in hardware, it would explain the zenpower reading.

@IanSteveC
Copy link

i rebooted into windows to check with HWinfo, and it reports the same, so I guess it's reading the right value.

I also noticed that in the BIOS HW monitor section, it lists two SOC values, one around 1.2 that i see from software "CPU VDDCR_SOC", and another that matches my BIOS setting under the label "PREM_VDDCR_SOC"

@fr33-man
Copy link

fr33-man commented Mar 2, 2021

@IanSteveC As the memory controller is integrated into the CPU package, increase in memory frequency causes an increase in NB/SoC voltage. It's not a bug.

If zenpower reads the right registers the voltage reading is reasonably accurate. It certainly isn't off by 20% as in your case.

As you can see by this message @hattedsquirrel still didn't read Stilt's post on power deviation as he knows more then somebody who tested at least 20 different motherboards #fact. I guess with all the bias in his article, he doesn't mind having some in his motherboard 🤣 Here's one paragraph from Stilt's post:

In short: Some motherboard manufacturers intentionally declare an incorrect (too small) motherboard specific reference value in AGESA. Since AM4 Ryzen CPUs rely on telemetry sourced from the motherboard VRM to determine their power consumption, declaring an incorrect reference value will affect the power consumption seen by the CPU. For instance, if the motherboard manufacturer would declare 50% of the correct value, the CPU would think it consumes half the power than it actually does. In this case, the CPU would allow itself to consume twice the power of its set power limits, even when at stock. It allows the CPU to clock higher due to the effectively lifted power limits however, it also makes the CPU to run hotter and potentially negatively affects its life-span, same ways as overclocking does. The difference compared to overclocking or using AMD PBO, is that this is done completely clandestine and that in the past, there has been no way for most of the end-users to detect it, or react to it.

However your CPU shouldn't get damaged since Ryzen has some foolprofing built into it. On an 2700X and ASrock x570 with a deviation of 60% the frequency just drops as the temperature gets into the 80s

@IanSteveC
Copy link

just FYI, im not using PBO/PBO2, and am doing manual settings.

I have removed the XMP setting (and manually set clocks/voltages/timings to what XMP does), and it has not made a difference to measured SOC values, so i guess it has nothing to do with XMP specifically.

i feel like if the BIOS exposes an option for me to set a static value for SoC voltage, and it does not honor that setting but instead does whatever it wants, that qualifies as a bug. how else are you to tweak the OC stability/settings without the ability to accurately set SoC voltage?

@KeithMyers
Copy link

From my following of the posts on the OCN Ryzen and memory overclocking threads, you can't trust the AGESA to do the things it says it is doing.

And you can't trust that setting XMP values for the memory does what it says it is doing. The memory controller will autocorrect out of alignment memory timings and not display the changes in the BIOS for example.

@hattedsquirrel
Copy link

@fr33-man Where is this post from? Is that the one from the HWiNFO forum? AFAIK this deviation applies to the reported current and therefore reported power. The reported voltage should be ok. At least that was my understanding. Also we are talking about the SoC here, not the core voltage/current/power.

@IanSteveC My BIOS offers two places to set the SoC voltage. One right on the front page and another one deep down in the menu structure ater a disclaimer and next to tons of very specific settings. Maybe you have two places to adjust it, too.

@KeithMyers
Copy link

The duplication of parameter settings in the BIOS is common. It's because the 32MB BIOS' have two separate compartmented sections, one for Matisse and one for Vermeer. So you often have a setting visible on the main pages and also buried deep in the overclocking sections.

@fr33-man
Copy link

fr33-man commented Mar 2, 2021

@IanSteveC I don't know then. A quick search shows that CPU VDDCR_SOC is the voltage going to the pins. IIRC, with my offset, the sensor would show 1.4V while SVI2 would show 1.325V. Might be something similar going on with yours.
@hattedsquirrel Wouldn't the apparent lower current influence the voltage?

@KeithMyers
Copy link

From what I can suss from the posts at OCN for the two Ryzen memory OC threads I read every day, it seems that AMD has pushed up the SoC voltage to a nominal 1.2V to be able to support the Ryzen 5000 cpus with their 4000Mhz memory capabilities.

If you were to drop your 3900X back in, I'm positive the SoC voltage would drop back down to the standard 1.09V level without you making a single change in the BIOS.

@KeithMyers
Copy link

This post has a lot of great information regarding max voltages for Vermeer from the expert on the subject.
https://www.overclock.net/threads/official-amd-ryzen-ddr4-24-7-memory-stability-thread.1628751/post-28749850

@hattedsquirrel
Copy link

@fr33-man No. Under-reporting the current to the SoC does not cause over-reporting of the SoC voltage. Sure, a board-manufacturer could, in theory, implement both independently, but that would defeat the primaray reason for under-reporting the current.
Anyway, concerning @IanSteveC's original question I think we can say: Yes, a reading of 1.20V is not unusual. And since he saw the same number in Windows and the BIOS, we can conclude that the value is read corectly from the register.
If the question continues to whether that reported value matches the actual voltage, I see two ways to find out: 1) Do a physical measurement. Thats the only way to know for sure. 2) If the first option is not possible, one could still ask the SMU. It offers several measurements from within the CPU package, even covering some on-die post-regulators. One had to trust the SMU firmware in that case, but so far I haven't heared any rumours that board manufacturers tinkered with that.

@IanSteveC

I also noticed that in the BIOS HW monitor section, it lists two SOC values, one around 1.2 that i see from >software "CPU VDDCR_SOC", and another that matches my BIOS setting under the label >"PREM_VDDCR_SOC"

The SoC actually gets 9 voltages rails from the socket. The most important ones are VDDCR_SOC (supplied by the big SOC relgulator), VDDIO_MEM and VDDP. VDDP per default is around 0.9V. It is well possible, that PREM_VDDCR_SOC actually refers to this power rail, but I'm not sure about that nomenclature.

@KeithMyers Yes, that matches my experiments and with the overclocking reports I read. The Ryzen 5000 CPUs are rated for 3200MHz memory speed according to the official AMD page. In my experience the SoC voltage defaults to 1.0V for 3200MHz and below. Above that it defaults to 1.2V. Thats why I thought it might have to do with the memory speed.

@KeithMyers
Copy link

I have my memory pushed to 3600Mhz 1:1:1 and the SoC voltage as stayed the same as it is at 1.09V at stock 3200Mhz.

@Lissanro
Copy link

Lissanro commented Apr 4, 2021

I have applied https://crazy.dev.frugalware.org/ZEN3-test4.patch and modified zenmonitor as suggested here: ocerman/zenmonitor#36 - but I'm not getting voltage and temperature values, just Package Power, Core Effective Frequency, Core Power and Core Frequency. I have Ryzen 9 5950X, and I see somebody with 5900X here ocerman/zenmonitor#36 have Core Voltage and Temperature values. I tried rebooting. I'm running zenmonitor with root privileges. Any ides what could be wrong?

@hattedsquirrel
Copy link

The values you are missing are read from the zenpower module. What is the output of sensors zenpower-* ?

@Lissanro
Copy link

Lissanro commented Apr 4, 2021

sensors zenpower-*
Specified sensor(s) not found!

But zenpower module is loaded:

lsmod | grep zenpower
zenpower               16384  0

Is there an additional step to get sensors working besides compiling and loading the module?

@hattedsquirrel
Copy link

Is there an additional step to get sensors working besides compiling and loading the module?

No, usually not.
If you have ZEN3-test4.patch applied and insmod the freshly built module it should give you a line in dmesg like "using ZEN2 calculation formula". It should also create a structure in /sys/class/hwmon/hwmon[num]/. On my system it looks like this:

root@ryzen:~# ls /sys/class/hwmon/hwmon3/
curr1_input  device     name          power2_label  temp2_input  temp4_label
curr1_label  in1_input  power         subsystem     temp2_label  uevent
curr2_input  in1_label  power1_input  temp1_input   temp3_input
curr2_label  in2_input  power1_label  temp1_label   temp3_label
debug_data   in2_label  power2_input  temp1_max     temp4_input
root@ryzen:~# cat /sys/class/hwmon/hwmon3/name
zenpower
root@ryzen:~# cat /sys/class/hwmon/hwmon3/power2_label
SVI2_P_SoC

@Lissanro
Copy link

Lissanro commented Apr 4, 2021

> sudo rmmod zenpower
> lsmod | grep zenpower # Returns nothing which means zenpower module is removed
> make clean && make # Rebuild the module from scratch
> sudo insmod zenpower.ko # Nothing appears in dmesg after running this command but the module is inserted successfully
> lsmod | grep zenpower
zenpower               16384  0
> ls /sys/class/hwmon/ # No zenpower hwmon present; hwmon0 is from k10temp
hwmon0
> cat /sys/class/hwmon/hwmon0/name
k10temp

So in my case it does not create the structure in there, and there is nothing in dmesg when I insmod the module (or when I modprobe it after installing).

I'm using Ubuntu Studio 20.10 with its default kernel (5.8.0-48-lowlatency). I'm not sure how to debug this... The module is definitely loaded and it is of the right version:

> ls /sys/module/zenpower/ 
coresize  drivers  holders  initsize  initstate  notes  refcnt  sections  srcversion  taint  uevent  version
> cat /sys/module/zenpower/version
0.1.12-ZEN3-test4

Just to be sure I rebooted again, nothing changed. I do not have any experience with debugging Linux modules. I would be grateful if somebody could give a suggestion what to do next.

@berniyh
Copy link

berniyh commented Apr 4, 2021

Make sure that you don't have the k10temp module loaded. To me it sounds like that is your problem.
If unsure, just unload it with modprobe -r k10temp before loading zenpower.

And for the future (if you want to use zenpower), put it into the blacklist.

@Lissanro
Copy link

Lissanro commented Apr 4, 2021

Thank you, now everything working as expected. Not sure how I managed to miss this step.

@JacobBrownAustin
Copy link

Is patch still required 5600x? I'm running Linux 5.11.8, and was only seeing 2 temperatures, which was disappointing. After searching around, I've found that Zen 3 is supposed to work with zenpower, so I installed that, but now just see Tdie and Tctl, which both are the same value, and I'm not even sure if I trust them yet. Is this issue still not fixed in "master" branch? Is it fixed in another branch, or do I need to apply this ZEN3-test4.patch for it to work? What's holding up getting this fix merged into a branch? Does it need more testing or code review? How can I help? I'd like to have valid temperature and voltage readings. (And fan speed too, but I guess I need to solve that in another place...)

@Lissanro
Copy link

Lissanro commented Apr 25, 2021

To make this work properly, I have applied the following patch to zenpower (I do not know why it still is not committed yet to the master):

https://crazy.dev.frugalware.org/ZEN3-test4.patch

And also I had to apply the patch from ocerman/zenmonitor#36 (comment) to zenmonitor to add support for Zen 3 family. I also applied https://github.com/ocerman/zenmonitor/pull/32.patch (fix order of tDie and tCtl to reflect order of zenpower drive) but for me tDie and tCtl are always the same anyway, so it did not make noticeable difference in my case. I do not know if they are supposed to be different, as far as I can tell there is no alternatives to zenpower, so I do not have other ways to check these values (Linux is my only OS).

Unloading and blacklisting k10temp module is also important step, zenpower will not work properly while it is loaded.

Ideally, there should be precompiled debs with installation script, but no one have found a time to create them yet.

For now I can share only deb file for zenmonitor (patched for Zen 3 support and compiled in Ubuntu 20.10): http://dragon.studio/2021/04/zenmonitor_2021-04-04-1_all.deb
Creating deb for zenpower is more complicated.

@stanojr
Copy link

stanojr commented May 9, 2021

Using Lissanro latest ZEN3-test4.patch on 5950x, works like a charm on ubuntu 21.04, maybe ocerman would like to see pull request ?

@ghost
Copy link

ghost commented Jun 19, 2021

zenpower3 and zenmonitor3 are actively maintained forks of this project that have Zen 3 support.

@hartmark
Copy link

@Ta180m , thanks for keeping this nice tool alive!

SineMah added a commit to SineMah/zenpower that referenced this issue Oct 31, 2021
@KeithMyers
Copy link

Shame that zenpower3 fork is closed and read-only now.
No support for kernel 6.0+ which does not compile and run anymore.

@hartmark
Copy link

hartmark commented Oct 7, 2022

The aur-package has retargeted against this repo
https://git.exozy.me/a/zenpower3

There's not much more commits over there, but arch haven't yet switched to 6.0 so hopefully it will get fixed if there are any issues.

You could preemptive make a ticket if 6.0 is broken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests