Allow virtual lazy tensors as targets in classification and regression #386
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds an experimental feature that allows to convert a
torch::dataset
to anmlr3::Task
.Essentially, the
torch::dataset
is converted to adata.table
consisting only oflazy_tensor
columns (including the target column).In order to make this compatible with the
mlr3
API (measures etc.), it is necessary to provide a converter for the target column that converts from thetorch_tensor
to the associated R type:When accessing the data from the task, the
lazy_tensor
columns for those columns for which a converter exists arematerialize()
d and the converter is applied, making it seem like this is just a standardnumeric()
.However,
LearnerTorch
avoids the conversion and can directly load the target tensors (as defined by thetensor_dataset
above) during training.Because the individual batches can only be loaded as a whole, this means that some data-access is more expensive.
E.g.,
task$truth(1:10)
needs to load all10
batches even though we are only interested in the target.For this reason, some operations are disallowed, such as target transformations or adding new rows to the task:
Furthermore, converted columns are cached, which is demonstrated below.
On the second access to head, the counter of the
dataset
is not incremented and hence$.getbatch()
was not called, but instead loaded from the cache.Created on 2025-04-17 with reprex v2.1.1
Internally, this works via the
DataBackendLazyTensors
(TODO: describe this)