Optimization list for metrics #161

marqueewinq · 2024-02-05T12:00:10Z

In classes like KendallTauCorrelation, there's a conversion of series to integer if they are of datetime type. This conversion is expensive.
Conversion to categorical vars in KendallTauCorrelation is repeated for sr_a and sr_b, which could be abstracted away.

In Mean._compute_metric, subtracting and then adding an array of zeros is redundant. It could be simplified to np.nanmean(sr.values).

The StandardDeviation class performs a sort before trimming for outliers, which may not be necessary if a significant number of values are removed.

In EarthMoversDistance, we can perform the operations we need on pd series more efficiently.
Also, we use dict to count unique values, but pandas has functions for that -- we are possibly losing efficiency here as well.

We can use np.nan_to_num in EarthMoversDistanceBinned to handle nans more explicitly.

In several metrics, there's a check for empty series or specific conditions that return default value. We need to move them up to be more efficient (not critical)
The check_column_types method is quite similar across different classes -- we might want to abstract that (not critical)
Overall, it seems like the there could be less code duplication (not critical)

### Tasks
- [ ] Refactor datetime conversion in `KendallTauCorrelation`
- [ ] Implement a utility function for categorical conversion in `KendallTauCorrelation`
- [ ] Update `Mean._compute_metric to use `np.nanmean(sr.values)`
- [ ] Update `StandardDeviation` to decide if sorting is necessary
- [ ] Use pandas ops in `EarthMoversDistance`
- [ ] Use `np.nan_to_num` in `EarthMoversDistanceBinned` to handle nans
- [ ] Move up the checks for empty series / special conditions
- [ ] Abstract away the similar `check_column_types` methods across all classes

The text was updated successfully, but these errors were encountered:

simonhkswan added the 🧿 enhancement label Feb 6, 2024

marqueewinq self-assigned this Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimization list for metrics #161

Optimization list for metrics #161

marqueewinq commented Feb 5, 2024 •

edited

Loading

Optimization list for metrics #161

Optimization list for metrics #161

Comments

marqueewinq commented Feb 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

marqueewinq commented Feb 5, 2024 •

edited

Loading