Skip to content

Background blur video processor #682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Background blur video processor #682

wants to merge 20 commits into from

Conversation

pblazej
Copy link
Contributor

@pblazej pblazej commented Apr 23, 2025

Overview

This is a revisited and slightly optimized version of Vision background blur.

While the basics are fairly easy to implement, the devil lies in the details:

  • segmenting each frame synchronously is not an option, even on macOS
    • being unable to cancel ongoing requests, it's a simple recipe for getting the whole pipeline stuck and causing more frame drops
    • the same issue is visible in Apple's own example
    • there are 2 (?) basic approaches here:
      • handling the mask asynchronously so it's "slightly outdated"
      • delaying the whole pipeline - I'm afraid in the worst case there will be no easy way to recover, so I prioritized the main "sync" stream
  • using CI Metal-backed stuff really helps here
  • trying to reuse as many building blocks as possible

Example PR to test: livekit-examples/swift-example#72

Rendering

CoreImage 🟢

  • requires yuv → rgba conversion which will increase general throughput (cloud as well)
  • down-scaling + limiting the blur size should get us to 4K (on most devices at least)
  • looks naive, but the key is Metal-backed CIContext + the choice of params

Metal (added for comparison) 🟠

  • can work with yuv (or anything) directly
  • but slow for large kernel sizes (anything above 8 will drop frames, especially on iOS)
  • "proper" solution would require:
    • down-scaling (essentially level 1 mipmap)
    • smarter Gaussian blur decoupled into H and V pass
    • thread group optimization
      • size
      • caching
    • half-precision maybe?
    • etc. and there's no guarantee we can get faster than CI/MPS optimized impl

For now, my estimate is that the proper shader will be a significant investment, without any guarantee it's really optimal.

Copy link

ilo-nanpa bot commented Apr 23, 2025

it seems like you haven't added any nanpa changeset files to this PR.

if this pull request includes changes to code, make sure to add a changeset, by writing a file to .nanpa/<unique-name>.kdl:

minor type="added" "Introduce frobnication algorithm"

refer to the manpage for more information.

@pblazej
Copy link
Contributor Author

pblazej commented Apr 23, 2025

Frame processing hasn't completed yet, skipping frame happens on iOS @ 4K, will focus on that now. One of the options is to downscale the mask, however the segmentation time does not scale linearly.

@pblazej
Copy link
Contributor Author

pblazej commented Apr 23, 2025

Looks like the CoreImage part is the bottleneck, will try to use MPS instead 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the basic gist, without mipmaps - tried them but looks like the generation itself is slower than simple CI downscale...

// MARK: Parameters

private let downscaleFactor: CGFloat = 2 // Downscale before blurring, upscale before blending
private let blurRadius: Float = 3 // Keep the kernel size small O(n^2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a counter-intuitive assumption to keep it constant (no need to expose it) and drive the blur amount by downscale itself (the bigger, the more downscaling) as in getDownscaleTransform.

Quadratic blur * quadratically fewer pixels = constant, controllable load

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, just want to point out that in my testing on JS overly downscaling the background led to noticeably worse blurring, it didn't have the same visual "quality" anymore. Maybe CI is smarter about it though and this is not an issue here?

@pblazej
Copy link
Contributor Author

pblazej commented Apr 25, 2025

To the testers: look for Frame processing hasn't completed yet, skipping frame...

@pblazej pblazej marked this pull request as ready for review April 25, 2025 10:57
}

public extension CIImage {
func croppedAndScaled(to rect: CGRect, highQuality: Bool = true) -> CIImage {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we need to recreate the crop information as CVPixelBuffer is still the source one...


private let segmentationRequest = {
let segmentationRequest = VNGeneratePersonSegmentationRequest()
segmentationRequest.qualityLevel = .balanced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we'd want to expose the segmentation quality level as a user setting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.accurate won't work in most scenarios (including macOS), but I'm happy to do a highQuality: Bool between .fast and .balanced

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me! yeah, accurate doesn't sound like it should be used for video in the first place

blurFilter.radius = blurRadius

guard let blurredImage = blurFilter.outputImage else { return frame }
let upscaledBlurredImage = blurredImage.transformed(by: downscaleTransform.inverted(), highQualityDownsample: false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we even need to manually upscale it? would have hoped the texture simply get's mapped onto the blendFilter in the correct dimension

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-04-25 at 1 28 29 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D thanks for that!

@pblazej
Copy link
Contributor Author

pblazej commented Apr 25, 2025

@lukasIO looks like it balances itself nicely (the cost of scaling vs sigma vs total):

This is iPhone 14, Release:
image

Now we're totally safe even >30 FPS.

// Skip segmentation every N frames for slower devices
private var frameCount = 0
#if os(macOS)
private let segmentationFrameInterval = 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be calculated using ProcessInfo.processInfo.isLowPowerModeEnabled

@pblazej
Copy link
Contributor Author

pblazej commented Apr 28, 2025

CI errors are because of old cocoapods thing, will resolve that after finalizing comments 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants