You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
⚡️ Speed up method WithFixedSizeCache.add_model by 50% in PR #1373 (feat/pass-countinference-to-serverless-getweights)
Here's an optimized rewrite of your program, addressing profiling hot spots and general efficiency improvements.
**Optimization Summary:**
1. **Avoid Redundant Method Calls:**
- Minimize repeated lookups and calculations.
- Cache computations/results when possible within function scope.
2. **Lazy Imports:**
- Move GC and optional torch imports where needed (they are only used upon eviction).
3. **Deque Optimizations:**
- In `WithFixedSizeCache.add_model`, avoid repeated `self._key_queue.remove(queue_id)` by checking position or maintaining a set for fast checks (no need, since only called if known present, and block is rare). Still, code can be reduced for clarity.
4. **Reduce logging** in the hot add logic (unless DEBUG mode; logging is a major time sink during profiling).
5. **Batch Removals:**
- Accumulate models to remove and do a single `gc.collect()` call after, instead of per-iteration.
6. **Data structure** choices are left unchanged (deque is still best for explicit ordering here).
7. **General Logic**: Use local variables for lookups on attributes used multiple times (minor, but helps).
---
**Key Runtime Optimizations:**
- Only call `gc.collect()` after all removals in a batch, not after every single model eviction.
- Reduced logging in hot code paths (this was responsible for noticeable time in profiling).
- Use local variables when repeatedly accessing class attributes.
- Use direct inlining for `_resolve_queue_id` for this use case.
- Defensive handling if queue/model state falls out of sync—never throws unnecessarily.
**Performance Note:**
If you profile again after these changes, most of the time will now be in actual model loading and removal. That is, this code will not be a noticeable bottleneck anymore in the workflow. If LRU cache size is much larger, consider further data structure optimizations such as a dict for constant-time eviction and presence checking, but for N ~ 8 this is not needed.
0 commit comments