Open
Description
What happened?
Issue described:
Environment details:
- Python 3.11.9 (Canonical managed Kubeflow Jupyter environment)
- Kubeflow SDK versions
- kubeflow: 0.0.1rc0
- kubeflow-katib: 0.18.0
- kubeflow-training: 1.9.0
TLDR:
- I tried to include the param
"activation": katib.search.categorical(["relu","gelu", "elu"])
in a parameters argument for KatibClient.tune(). - this returned the error:
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"admission webhook \"validator.experiment.katib.kubeflow.org\" denied the request: feasibleSpace must be specified in spec.parameters[1]: {activation categorical { [] }}","code":400}
For reference, the relevant block of code:
def objective(parameters):
# This is rather long, hence omitted.
print f"val_binary_accuracy={result}"
# Import the SDK
import kubeflow.katib as katib
from kubeflow.katib import V1beta1FeasibleSpace
from kubeflow.katib import V1beta1ParameterSpec
# Workaround for failed parameter spec parsing ... katib.search.categorical(["relu","gelu", "elu"]),
foo = V1beta1ParameterSpec(
parameter_type="categorical",
feasible_space=V1beta1FeasibleSpace(distribution = ["relu","gelu", "elu"], list = ["relu","gelu", "elu"]),
)
print(type(foo))
print(foo)
parameters = {
"embedding_n": katib.search.int(min=10, max=18, step=1),
# Workaround that worked
"activation": foo,
# Failing code that should have worked
# "activation": katib.search.categorical(["relu","gelu", "elu"]),
"predecessor_level_connection_affinity_factor_first": katib.search.double(min=0.1, max=50, step=0.1),
"predecessor_level_connection_affinity_factor_main": katib.search.double(min=0.1, max=50, step=0.1),
"max_consecutive_lateral_connections": katib.search.int(min=1, max=50, step=1),
"p_lateral_connection": katib.search.double(min=0.1, max=50, step=0.1),
"num_lateral_connection_tries_per_unit": katib.search.int(min=1, max=50, step=1),
"learning_rate": katib.search.double(min=10 ** -5, max=0.3, step=10 ** -5),
"epochs": katib.search.int(min=1, max=25, step=1),
"batch_size": katib.search.int(min=1, max=35, step=1),
"dropout": katib.search.double(min=0.05, max=0.95, step=0.05),
"maximum_units_per_level": katib.search.int(min=5, max=10, step=1),
"maximum_neurons_per_unit": katib.search.int(min=1, max=9, step=1),
"temperature": katib.search.int(min=10**4, max=10**6, step=1000),
}
katib_client = katib.KatibClient(namespace=NAMESPACE)
algorithm_config = {
"algorithm_name": "multivariate-tpe",
"algorithm_settings": [
{"name": "n_startup_trials", "value": "7"},
{"name": "n_ei_candidates", "value": "24"},
{"name": "random_state", "value": "42"}
]
}
katib_client.tune(
name=JOB_NAME,
objective=objective,
parameters=parameters,
objective_metric_name="val_binary_accuracy",
algorithm_name = "multivariate-tpe",
algorithm_settings= algorithm_config,
max_trial_count=25,
parallel_trial_count=2,
max_failed_trial_count=15,
resources_per_trial={"cpu": "7", "memory": "21Gi"}
)
I did a little troubleshooting:
- I did some investigating by browsing the SDK code: https://github.com/kubeflow/katib/blob/master/sdk/python/v1beta1/kubeflow/katib/api/search.py
- I found that the function search.categorical() has a parameter named
list
. - I suspect that this could be the first potential problem, as this is a naming collision with the default Python keyword
list
in the global scope. - Digging deeper, however, I found that there is a problem in line 65 on https://github.com/kubeflow/katib/blob/master/sdk/python/v1beta1/kubeflow/katib/api/search.py. search.categorical() passes the argument for the parameter
list
to V1beta1FeasibleSpace without specifying which arg it pertains to. - When I manually reproduce what search.categorical() is doing internally as seen below and print this out, you see the issue becomes obvious ...
- The property
distribution
is successfully set to the list which search.categorical() was configured with, but the propertylist
is not being getting set, as you see belowV1beta1ParameterSpec.feasible_space.list
is set toNone
.
I do this:
foo = V1beta1ParameterSpec(
parameter_type="categorical",
feasible_space=V1beta1FeasibleSpace(["relu","gelu", "elu"]),
)
print(type(foo))
print(foo)
This prints out:
<class 'kubeflow.katib.models.v1beta1_parameter_spec.V1beta1ParameterSpec'>
{'feasible_space': {'distribution': ['relu', 'gelu', 'elu'],
'list': None, # <------------------------------------------------------------------------------------<<<< # I think this is the problem
'max': None,
'min': None,
'step': None},
'name': None,
'parameter_type': 'categorical'}
If I repeat this, but I explicitly reference the parameters*: distribution
and list
when instantiating the V1beta1FeasibleSpace which is used to instantiate V1beta1ParameterSpec, the properties ..distribution
and ..list
are both properly set:
my_list = ["relu","gelu", "elu"]
foo = V1beta1ParameterSpec(
parameter_type="categorical",
feasible_space=V1beta1FeasibleSpace(distribution = my_list,
list = my_list),
)
print(type(foo))
print(foo)
Now this looks correct:
<class 'kubeflow.katib.models.v1beta1_parameter_spec.V1beta1ParameterSpec'>
{'feasible_space': {'distribution': ['relu', 'gelu', 'elu'],
'list': ['relu', 'gelu', 'elu'],
'max': None,
'min': None,
'step': None},
'name': None,
'parameter_type': 'categorical'}
When I replaced the call search.categorical(['relu', 'gelu', 'elu']])
with the correctly, manually created V1beta1ParameterSpec ('activations': foo
), in the parameters and submit the job using CatibClient.tune(), the API accepted and created the job without the error.
Suggested revisions:
- In line 65 on https://github.com/kubeflow/katib/blob/master/sdk/python/v1beta1/kubeflow/katib/api/search.py: Reference the parameters
distribution
andlist
explicitly OR in V1beta1FeasibleSpace and V1beta1ParameterSpec, make sure that init is setting both self.distribution and self.list. - Rename the parameter 'list', as this is a naming collision with a Python keyword referencing the function list(), which may be causing an empty list to be instantiated and assigned when V1beta1FeasibleSpace.list() sets
self._list = list
, which the empty list may be getting cast to a null type in searialization, explaining theNone
value.
What did you expect to happen?
- I would have expected the parameter
"activation": katib.search.categorical(["relu","gelu", "elu"])
to have properly instantiated a V1beta1ParameterSpec object with bothlist
anddistribution
set to["relu","gelu", "elu"]
. - I would have expected the submission as such to have been accepted and Katib to have run the training task.
Environment
Kubernetes version:
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.14", GitCommit:"f678fbc2f5af87ec6552b7ac6ebbe26258b0d120", GitTreeState:"clean", BuildDate:"2024-05-14T10:53:32Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"linux/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.10", GitCommit:"f0c1ea863533246b6d3fda3e6addb7c13c8a6359", GitTreeState:"clean", BuildDate:"2024-10-23T15:29:12Z", GoVersion:"go1.22.8", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.27) and server (1.29) exceeds the supported minor version skew of +/-1
Katib controller version:
# Managed instance. I don't have access to this. I am on the Azure deployment of Charmed Kubeflow 1.9
Katib Python SDK version:
Version: 0.18.0
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍