Replies: 4 comments 6 replies
-
I think it depends on whose view you take into account, I am for adding as much information as possible in the stack trace, because you never know when more stack trace is useful. If a user reports a "half hearted" stack trace I am usually not able to help as a maintainer. Surely, If the information is duplicated - and you know how to de-duplicate it reliably - feel free to open PR for that. It does look like in this case it should be possible. But I am personally not loosing my sleep over it. I personally usually err on the side of having more information of errors than needed rather than risking I miss some information - especially in cases like that when you might expect "ANY" error and ANY environment the error might come from. Hiding anything in this case is IMHO far too easy to loose crucial information that might help maintainers to debug problems. Surely in this case proably most of the stack trace is repetitive most of the time. But also those errors are serving a different purpose - expect unexpected. And thanks to the stack traces you wil sometimes be able to diagnose much more than you think. For example, I cannot count how many times I discovered that someone's process actuallly use a different version of the software than they thought - simply because I noticed that it's impossible to get the stacktrace with this particular line number in platform's stack-trace. I'd rather personally focus (and I do for quite some time) on making the "known" error messages more actionable and explanatory to the users when we "know" what the issue is. But I think getting rid of the full stack trace from logs (other than de-duplication) will make it far harder to get help by our users, which for me trumps the conciseness of the logs. I am not sure if this is an "iissue"/feature. Maybe others woudl be interested in chiming in, so I am turning it into a discussion, but if you have any concrete PR proposals to improve the logging of particular cases (without loosing the "observability" of the issues when they appear - you are most welcome @hterik) . |
Beta Was this translation helpful? Give feedback.
-
The way is see it is you have 4 layers of users here:
A DAG can fail for reasons caused by any of these. I want to focus on mainly on (4) here - If a user provides faulty input, the dag might fail. How can a dag-author provide a friendly error message to the user? Imagine something like this, is this possible to do today without ever seeing @dag()
def my_dag():
@task
def my_task():
context = get_current_context()
if context["params"]["some_param"] < 0:
raise InvalidUserInput("some_param is not allowed to be negative")
do_stuff(context["params"]["some_param"]) Note though that here i don't want to remove the stack trace entirely. For the dag author (3), who will be first line of support for the user (4), the stack trace within the dag-file still makes perfect sense, especially given errors further down the stack in I think this is also what you mean with improving 'known' error messages. The question is, how can authors, in (2) and (3) above, categorize errors as known. Reducing the log-noise of such would itself be a great improvement. Having some fancy UI for it even better. |
Beta Was this translation helpful? Give feedback.
-
Very good assesment. And I tihnk all that is happening already and needs no "fixing". I think simply both (2) and (3) should "catch the known errrors" and print the friendly message. This is already happening in a number of places where (2) and (3) thought about it deliberately and produced those messages. Example from one of google cloud hooks:
This is an absolute standard way of handing a "known" exception in Python,. Everyone does it when they know the unfriendly exception should be turned into a friendly message.
|
Beta Was this translation helpful? Give feedback.
-
@hterik i feel your pain. what i did was set
|
Beta Was this translation helpful? Give feedback.
-
Description
Given a simple dag like this:
The error log from the failing bash command,
false
, becomes incredibly big with lots of irrelevant information for the writer of the DAG, if that's not enough, the error is printed twice:Often operators can be be more advanced, especially PythonOperator and taskflow which frequently creates equally deep stack traces themselves, which is usually what you are interested in, but together it accumulates into several pages of error.
I'm usually the guy who thinks a stack trace is the best error message and like the context it provides, but here i think it goes a bit too far.
Expected log output should only contain
Can
TaskInstance._execute_task
be made to catch the operators exception and rethrow as something else, that can be ignored at the end bystandard_task_runner
?Or is this something that simply should be solved by logging configuration?
Use case/motivation
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions