Skip to content

Add database backend for ActiveJob delayed job handling and use it #2193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
david-a-wheeler opened this issue Dec 10, 2024 · 10 comments · Fixed by #2257
Closed

Add database backend for ActiveJob delayed job handling and use it #2193

david-a-wheeler opened this issue Dec 10, 2024 · 10 comments · Fixed by #2257

Comments

@david-a-wheeler
Copy link
Collaborator

david-a-wheeler commented Dec 10, 2024

We need to add a database backend for delayed job handling and then use it. I believe this will address some caching problems as well as improve scaling. This description goes over the issues.

First: Every once in a great while a cached image is not updated in a timely way. E.g.: #2186 and #2072 . When we update a value, we do send out a cache invalidation for that badge image. I've been trying to track down the problem, and I don't think there's a race condition inside the application. So I believe there's a race outside. I think what's happening is that packets sent from our application to our CDN (Fastly) are being sent in parallel, and in some circumstances the first one sent is the second received (from the point of view of "will be acted on"). Basically, someone requests an image and we send a cache invalidation. The CDN receives the cache invalidation and then the old image, which is now considered the good image. There's also a possibility of races within the CDN, too. I don't control the entire Internet, and I certainly don't control packet ordering on it :-).

Our framework (Rails) has a mechanism for supporting delayed jobs called ActiveJob. It has many methods for enqueing. See ActiveJob basics. We use ActiveJob somewhat, but currently its configuration (its default) is RAM-based, which means that any jobs scheduled in it (as currently configured) get lost on a reboot. This has made me hesitant to use it more, e.g., as suggested in #1199 .
ActiveJob supports lots of backends, but if we want it to be backed by a database, we must pick a backend. It's possible to switch backends, but it's a pain (you need to "flip" it), so it's better to pick well the first time.

ActiveJob has a few built-in adapters for backends. However, that's not a complete list, in fact, they no longer accept new adapters because backends can provide their adapters. Another backend that looks promising is GoodJob.

Once we add a database-based backend for jobs, we can do more delayed email deliveries as suggested in #1199 .

I will follow this up with some initial analysis of the backend options I've identified.

@david-a-wheeler
Copy link
Collaborator Author

I'm currently leaning towards GoodJob. It has an MIT license (it's OSS), and it's clearly active (many commits in the last week).

The developer (Ben Sheldon) has a blog post arguing for GoodJob and the GoodJob README has a comparison table. They're both from GoodJob, so of course they are arguments for it, but its arguments are very compelling:

  • Solid Queue: Uses polling latency. Not the end of the world, but that's an unnecessary computational burden.
  • Que: While it uses Postgres it requires structure.sql. Normal Rails apps, including ours, use schema.rb, which is more portable. There are pros and cons to switching to structure.sql, but since we try to maximize portability, this would be a bigger change that would be mostly a con.
  • Delayed Job: This was my first thought, it's relatively simple & directly supported by ActiveJob. However, Delayed Job is single-threaded (ridiculous since our app is multi-threaded) and uses polling (like Solid Queue).
  • Sidekiq, Sidekiq Pro: These depend on Redis. We don't currently use Redis. Adding another database just for job handling seemed excessive. We already use Postgres, so it'd be simpler to use it for all our database tasks.

In contrast, GoodJob works directly with Postgres, doesn't poll, and supports multi-threading. The adapter for GoodJob isn't built into ActiveJob, but that doesn't really matter, all that matters is that we can configure things to ActiveJob calls invoke GoodJob (which it's designed to do).

I continue to look for other ActiveJob backends and comparisons between them It's possible to migrate ActiveJob backends, but let's try to make a reasonably good choice the first time :-). So this post isn't necessarily the "final answer", just a "summary of what I've learned so far".

@david-a-wheeler
Copy link
Collaborator Author

I've looked at BackBurner, one of the backends directly supported by ActiveJob. For persistence it depends on beanstalkd, which is available on Ubuntu. However, when I looked at the [beanstalkd FAQ](https://github.com/beanstalkd/beanstalkd/wiki/FAQ] I found that its persistence uses writing to log files. We can't really do that; the filesystem writing isn't persistent. We have to use a backend database. So that won't work for us.

@david-a-wheeler
Copy link
Collaborator Author

Sneakers has been essentially replaced with Kicks, but they depend on RabbitMQ. That can be really useful, but it's not a good match for what we're doing, we want to minimize components. So I'm sure it's great for some uses, but doesn't really match our use case.

@david-a-wheeler
Copy link
Collaborator Author

I looked around for testimonials.

One mentioned the use of GoodJob, so that's promising for GoodJob.

It appears that the new ActiveJob default backend will be Solid Queue. Their arguments make sense for their case: it ports between databases.

The top contenders seem to be Solid Queue and GoodJob. Solid Queue polls. GoodJob is Postgres-only. Evidence so far suggests both would work very well for our use case.

We do have a small amount of code tied to Postgres (we use case-insensitive text and Postgres' text search mechanisms). Still, we try to limit that.

@david-a-wheeler
Copy link
Collaborator Author

I looked more at Solid Queue. The fact that it will be the Rails 8 default and is officially part of Rails are strong arguments for it. We try to minimize oddities. It's quite active (most recent commit 13 hours ago!). MIT license.

There are some configuration variations. I'll have to drill in to those to make sure at least one of its variations will actually work for us (though I don't anticipate problems).

We have 2 strong candidates so far: Solid Queue and GoodJob. The bad news, we have to do analysis to make a selection. The good news, we have some awesome options.

@david-a-wheeler
Copy link
Collaborator Author

david-a-wheeler commented Dec 10, 2024

Here's a discussion about Sidekiq. Clearly, if your organization is all-in on Sidekiq, use Sidekiq. If you've committed to it, it might even make sense to call directly to it (instead of calling through a portability shim like ActiveJob).

However, that's not our circumstance. We aren't all-in on Sidekiq, and making it easier to switch (since we have not made such a commitment) makes sense for our circumstance.

ActiveJob can easily call on Sidekiq to perform jobs, it appears that Sidekiq does not provide some queue data to the ActiveJob stack. That's not a crisis, but clearly if you choose Sidekiq, there are incentives to calling it directly (eliminating the value of the ActiveJob shim).

@david-a-wheeler
Copy link
Collaborator Author

david-a-wheeler commented Dec 10, 2024

I've been searching for a list of ActiveJob backends and comparisons between them.

This ActiveJob intro mentions "Resque". The Ruby on Rails guide to ActiveJob basics mentions "Delayed Job and Resque". The discussion on Delayed Job is above; we now should look at Resque. That's the only backend I've identified so far that I haven't considered.

Resque is "Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later." Looks promising if we were already using Redis - but we aren't. I don't think we need to add another major component just to do a little delayed job handling. So I don't think this is a good choice for our circumstance.

Note: We aren't doing that much with jobs. I expect a small job to be created on each edit to a project, along with each email on new sign-ups or password resets. These are quite short tasks, and not an overwhelming number either. We just need to make sure they aren't dropped when the system is halted.

@david-a-wheeler
Copy link
Collaborator Author

Regarding race conditions: Cache invalidations are much smaller than images (since they have less data), so they take less time overall to send and less time to process. Thus, if there are 2 TCP/IP streams in parallel, if we send the image and then the cache invalidation, there's an increased chance the cache invalidation will be received & processed first, even if it was sent after an older image was sent "first". We definitely need to send a cache invalidation later, to counter the race condition. There's no perfect delay time, but there's simply no mechanism to ensure that one packet sequence arrives before another once it gets on the Internet.

@david-a-wheeler
Copy link
Collaborator Author

I've looked at every reasonable option I could find. Our top contenders for an ActiveJob database backend are:

Both are OSS (MIT license), actively maintained, and can use PostgreSQL as their store.

After reviewing both, I've decided to start integrating Solid Queue.

GoodJob doesn't use polling, which is a nice minor performance advantage for it. However, the jobs we're doing are trivial cache invalidation and email sending tasks, and not that many of them, so the performance advantage is expected to be generally invisible.

Solid Queue has its own advantages:

  • It supports multiple RDBMSs. We do depend on two PostgreSQL specific capabilities (ciitext for case-insensitive finding and text searches), but we do our best to limit non-portable constructs. The fewer we use, the easier it will be to migrate if we choose to do that.
  • This will be the Rails 8 default and is officially part of Rails. We try to minimize oddities, as that tends to reduce maintenance efforts and debugging issues.

That said, GoodJob is a very worthy alternative. If things go wrong with the Solid Queue integration, that's the backup plan.

If anyone knows of an issue, please reply.

@andrewfader
Copy link
Collaborator

Resque is old. That's we used before Sidekiq and most people moved to SIdekiq. You can rule out both as you articulated

@david-a-wheeler david-a-wheeler linked a pull request Dec 27, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants