Skip to content

Inconsistent handling of PTP rate adjustment values #92926

@robhancocksed

Description

@robhancocksed

Describe the bug

In the current Zephyr code base, there is currently inconsistent handling of the rate parameter passed into the PTP clock API in the ptp_clock_rate_adjust function. It appears this is how we ended up in the current state of affairs:

  • Originally, drivers all appear to have treated this rate as a relative adjustment (i.e. compared to the last rate set), not an absolute adjustment (i.e. relative to the hardware baseline increment rate). For example, if the rate starts at 1.0, setting a rate of 0.5 followed by a rate of 2.0 would result in the rate being set to 1.0 again. This was not clear from the how the API was documented however.
  • Existing PTP clock drivers in the Zephyr tree also treated this as a relative adjustment (usually requiring the driver to track the last absolute rate value and multiply it by the newly applied rate before changing the rate increment in hardware). Examples: eth_e1000, eth_stm32_hal, ptp_clock_nxp_enet.
  • The gPTP implementation was using the API accordingly, i.e. the neighbor rate ratio was passed directly into ptp_clock_rate_adjust, so it was assuming the rate adjustment applied would be relative.
  • In Support gPTP sync with PI control and fix i.MXRT1060 support #89234, @yangbolu1991 changed the gPTP implementation to add in a PI servo to adjust the clock rate based on the time offset value, which then resulted in this value now being expected to be an absolute adjustment. The ptp_clock_nxp_enet driver was updated accordingly, but not any of the other PTP clock drivers.
  • The PTP implementation was also updated to treat this as a relative adjustment in net: ptp: support frequency adjustment with PI control algorithm #89517

The result is that any PTP clock drivers that have not been updated to treat the rate adjustment as absolute rather than relative are now broken. We ran into this with an out-of-tree driver when updating the Zephyr code base, however it appears that a number of other in-tree drivers like eth_e1000, eth_stm32_hal etc. would also be impacted. The symptom we observed from this being incorrect in our driver was that the gPTP timing control became unstable, with the rate adjustment starting to oscillate increasingly rather than converging on a stable value.

The changes @yangbolu1991 was making seem to be worthwhile in terms of increasing timing control stability, so I would suggest the following to resolve this situation and avoid further issues:

  • Update all other in-tree drivers to treat PTP clock rate adjustment as relative to the hardware baseline rate, not the previously set rate
  • Update PTP clock API documentation to make this expectation clear

Regression

  • This is a regression.

Steps to reproduce

We reproduced this with an out-of-tree driver and I don't have access to the hardware for some of the other drivers that appear to be affected, this was determined via code inspection.

Relevant log output

Impact

Major – Severely degrades functionality; workaround is difficult or unavailable.

Environment

No response

Additional Context

No response

Metadata

Metadata

Assignees

Labels

RegressionSomething, which was working, does not anymorearea: PTPIEEE 1588 PTP ProtocolbugThe issue is a bug, or the PR is fixing a bugpriority: lowLow impact/importance bug

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions