Skip to content

🐛 Raises meaningful exception when IPv6 URL is malformed #1512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MaelPic
Copy link

@MaelPic MaelPic commented May 22, 2025

What do these changes do?

For IPv6 URL, if brackets are set in reversed order (closing bracket before open bracket), the raised exception is now ValueError("Invalid IPv6 URL"), which is more convenient that the reported one (IndexError: string index out of range).

Are there changes in behavior for the user?

If url contains brackets in opposite order:
Previous behavior: exception IndexError is raised
New behavior: exception ValueError is raised

Related issue number

Fixes #1485

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes

Copy link

codspeed-hq bot commented May 22, 2025

CodSpeed Performance Report

Merging #1512 will not alter performance

Comparing MaelPic:bugfix-ipv6-parsing (63ba8c5) with master (07d0ba3)

Summary

✅ 101 untouched benchmarks

Copy link

codecov bot commented May 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.37%. Comparing base (07d0ba3) to head (63ba8c5).

❌ Your project check has failed because the head coverage (98.06%) is below the target coverage (100.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1512   +/-   ##
=======================================
  Coverage   99.37%   99.37%           
=======================================
  Files          30       30           
  Lines        6068     6079   +11     
  Branches      265      265           
=======================================
+ Hits         6030     6041   +11     
  Misses         35       35           
  Partials        3        3           
Flag Coverage Δ
CI-GHA 99.37% <100.00%> (+<0.01%) ⬆️
MyPy 98.06% <100.00%> (+<0.01%) ⬆️
OS-Linux 99.44% <100.00%> (+<0.01%) ⬆️
OS-Windows 99.48% <100.00%> (+<0.01%) ⬆️
OS-macOS 99.33% <100.00%> (+<0.01%) ⬆️
Py-3.10.11 99.12% <100.00%> (+<0.01%) ⬆️
Py-3.10.17 99.35% <100.00%> (+<0.01%) ⬆️
Py-3.11.12 99.35% <100.00%> (+<0.01%) ⬆️
Py-3.11.9 99.12% <100.00%> (+<0.01%) ⬆️
Py-3.12.10 99.35% <100.00%> (+<0.01%) ⬆️
Py-3.13.3 99.12% <100.00%> (-0.23%) ⬇️
Py-3.13.3t 99.75% <100.00%> (+<0.01%) ⬆️
Py-3.9.13 99.08% <100.00%> (+<0.01%) ⬆️
Py-3.9.22 99.31% <100.00%> (+<0.01%) ⬆️
Py-pypy7.3.16 99.33% <100.00%> (+<0.01%) ⬆️
Py-pypy7.3.19 99.36% <100.00%> (+<0.01%) ⬆️
VM-macos-latest 99.33% <100.00%> (+<0.01%) ⬆️
VM-ubuntu-latest 99.44% <100.00%> (+<0.01%) ⬆️
VM-windows-latest 99.48% <100.00%> (+<0.01%) ⬆️
pytest 99.44% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@webknjaz webknjaz added the bug label May 22, 2025
@MaelPic MaelPic force-pushed the bugfix-ipv6-parsing branch from e3af387 to 5001d56 Compare May 22, 2025 21:49
@MaelPic MaelPic changed the title 🐛 more meaningful exception if reversed brackets in ipv6 URL 🐛 Raises meaningful exception when IPv6 URL is malformed May 22, 2025
with pytest.raises(ValueError, match="An IPv4 address cannot be in brackets"):
URL("http://[127.0.0.1]/")
@pytest.mark.parametrize(
("url"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
("url"),
"url",

@webknjaz
Copy link
Member

note: we agreed with Mael that he'll add a change note later on, while traveling

@MaelPic MaelPic force-pushed the bugfix-ipv6-parsing branch from 5001d56 to f89d97b Compare May 23, 2025 00:47
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label May 23, 2025
Copy link
Member

@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaelPic thanks for your contribution during the PyCon US Sprints last week!

Would you me able to make the following cosmetic changes? Or do you need us to pick up the PR and complete it instead?

It fixes the issue where exception `IndexError` was raised in the
following conditions: empty IPv6 URL, brackets in reverse order.

-- by :user:` MaelPic`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- by :user:` MaelPic`.
-- by :user:`MaelPic`

Comment on lines 4 to 5
It fixes the issue where exception `IndexError` was raised in the
following conditions: empty IPv6 URL, brackets in reverse order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It fixes the issue where exception `IndexError` was raised in the
following conditions: empty IPv6 URL, brackets in reverse order.
These fixes the issue where exception :exc:`IndexError` was
leaking from the internal code because of not being handled and
transformed into a user-facing error. The problem was happening
under the following conditions: empty IPv6 URL, brackets in
reverse order.

Comment on lines 1 to 2
Aligned the type of exception raised for improper IPv6 URL values,
which should be `ValueError`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Aligned the type of exception raised for improper IPv6 URL values,
which should be `ValueError`.
Started raising a :exc:`ValueError` exception raised for corrupted
IPv6 URL values.

"url",
[
"http://[]/", # Empty IPv6 URL
"http://[1]/", # No semicolon
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(you probably meant “no colon” in there)

Comment on lines 343 to 348
[
"http://[]/", # Empty IPv6 URL
"http://[1]/", # No semicolon
"http://[127.0.0.1]/", # IPv4 inside brackets
"http://]1dec:0:0:0::1[/", # Brackets in reversed order
],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having code comments, it's best to set explicit param ids. They will show up in the test report, and will make it easier to both reason about the testing semantics, plus can be used to select and run one of the test params with pytest:

Suggested change
[
"http://[]/", # Empty IPv6 URL
"http://[1]/", # No semicolon
"http://[127.0.0.1]/", # IPv4 inside brackets
"http://]1dec:0:0:0::1[/", # Brackets in reversed order
],
(
"http://[]/",
"http://[1]/",
"http://[127.0.0.1]/",
"http://]1dec:0:0:0::1[/",
),
ids=(
"empty-IPv6-like-URL",
"no-colons-in-IPv6",
"IPv4-inside-brackets",
"brackets-in-reversed-order",
),

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it is better

@webknjaz webknjaz requested review from bdraco and Dreamsorcerer May 28, 2025 16:49
yarl/_parse.py Outdated
@@ -69,11 +69,11 @@ def split_url(url: str) -> SplitURLType:
# Valid bracketed hosts are defined in
# https://www.rfc-editor.org/rfc/rfc3986#page-49
# https://url.spec.whatwg.org/
if bracketed_host[0] == "v":
if bracketed_host.startswith("v"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if bracketed_host.startswith("v"):
if bracketed_host and bracketed_host[0] == "v":

This is faster since there is no function call

…ersed

Problem: If brackets are reversed or content inside brackets is empty,
`bracketed_host` will be an empty string. In consequence, reaching the
first element of `bracketed_host` returns an `IndexError`.
Solution: Use `startswith` because it does not raise anything if the string is empty
and then it fallbacks into the following check verifying the presence of `:`.
Also, the raised message was made more generic but still keeps its meaning.
@MaelPic MaelPic force-pushed the bugfix-ipv6-parsing branch from f89d97b to 63ba8c5 Compare May 29, 2025 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:chronographer:provided There is a change note present in this PR bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unhandled exception (IndexError) in URL parsing
4 participants