The Restate model depends on a long-running external orchestrator to do the "pushing". However, that comes with downsides--you have to operate that orchestrator and its data store in production (and it's a single point of failure) and you have to rearchitect your application around it.
DBOS implements a simpler library-based architecture where each of your processes independently "pulls" from a database queue. To make this work in a serverless setting, we recommend using a cron to periodically launch serverless workers that run for as long as there are workflows in the queue. If a worker times out, the next will automatically resume its workflows from their last completed step. This Github discussion has more details: https://github.com/dbos-inc/dbos-transact-ts/issues/1115
1. Yes, currently versioning is either automatically derived from a bytecode hash or manually set. The intended upgrade mechanism is blue-green deployments where old workflows drain on old code versions while new workflows start on new code versions (you can also manually fork workflows to new code versions if they're compatible). Docs: https://docs.dbos.dev/java/tutorials/workflow-tutorial#workf...
That said, we're working on improvements here--we want it to be easier to upgrade your workflows without blue-green deployments to simplify operating really long-running (weeks-months) workflows.
2. Could I ask what the intended use case is? DBOS workflow creation is idempotent and workflows always recover from the last completed step, so a workflow that goes "database transaction" -> "enqueue child workflow" offers the same atomicity guarantees as a workflow that does "enqueue child workflow as part of database transaction", but the former is (as you say) much simpler to implement.
Here's an example of a common long-running workflow: SaaS trials. Upon trial start, create a workflow to send the customer onboarding messages, possibly inspecting the account state to influence what is sent, and finally close the account that hasn't converted. This will usually take 14-30 days, but could go for months if manually extended (as many organisations are very slow to move).
I think an "escape hatch" would be useful here. On version mismatch, invoke a special function that is given access to the workflow state and let it update the state to align it with the most recent code version.
> transactional enqueuing
A workflow that goes "database transaction" -> "enqueue child workflow" is not safe if the connection with DBOS is lost before the second step completes safely. This would make DBOS unreliable. Doing it the other way round can work provided each workflow checks that the "object" to which it's connected exists. That would deal with the situation where a workflow is created, but the transaction rolls back.
If both the app and DBOS work off the same database connection, you can offer exactly-once guarantees. Otherwise, the guarantees will depend on who commits first.
Personally, I would prefer all work to be done from the same connection so that I can have the benefit of transactions. To me, that's the main draw of DBOS :)
Yes, something like that is needed, we're working on building a good interface for it.
> transactional enqueueing
But it is safe as long as it's done inside a DBOS workflow. If the connection is lost (process crashes, etc.) after the transaction but before the child is enqueued, the workflow will recover from the last completed step (the transaction) and then proceed to enqueue the child. That's part of the power of workflows--they provide atomicity across transactions.
> > transactional enqueueing
> But it is safe as long as it's done inside a DBOS workflow.
Yes, but I was talking about the point at which a new workflow is created. If my transaction completes but DBOS disappears before the necessary workflows are created, I'll have a problem.
Taking my trial example, the onboarding workflow won't have been created and then perhaps the account will continue to run indefinitely, free of charge to the user.
Looking at the trial example, the way I would solve it is that when the user clicks "start trial", that starts a small synchronous workflow that first creates a database entry, then enqueues the onboarding workflow, then returns. DBOS workflows are fast enough to use interactively for critical tasks like this.
The main difference is that this is a library you can install and use in any application anywhere, while Durable Functions is (as I understand it) primarily for orchestrating serverless functions in Azure.
(disclaimer: I work at Microsoft, but am not directly involved with Durable Functions)
Being a library is a pretty interesting feature! Correct, Durable Functions allows you to write task-parallel orchestrations of task-parallel 'activities' (which are stateless functions), and these orchestrations are fully persistent and resilient, like DBOS executions. It also has the concept of 'Entities', which are named objects (of a type you define) that "live forever", and serialize all method invocations, which are the only way to change their private state. These are also persistent. The Netherite paper [1], section 2, describes this model well.
So, there seems to be a pretty close correspondence between DBOS steps and DF activities, and between workflows and orchestrations. I don't know what the correspondence is to DF entities is in the DBOS model.
Yes, agree the correspondence is close, the primary difference is in form factor and not in underlying guarantees (but form factor matters! Building this as a library was technically tricky, but unlocks a lot of use cases). Reading the Orleans and early durable functions papers in grad school (and many of your papers) was definitely helpful in our journey.
Haha yes, one thing you can use this for is "long waits" or "long sleeps" where a program waits hours or days or weeks for a notification (potentially through server restarts, etc) then wakes up as soon as the notification arrives or a timeout is reached. More info in the docs: https://docs.dbos.dev/java/tutorials/workflow-communication
Fully declarative approach is the way to go for both IMHO. I'm more familiar with Kotlin than Kotlin. But its coroutines framework and structured concurrency are well aligned with something like this. Essentially you are doing is kind of a stateful form of structured concurrency where state is preserved in a DB and resilient against sub task failures, nodes dying, etc.
This reminds me of a product called restate. I talked to some people in that company a while ago. Their solution is built in Rust I think but they have clients for all sorts of platforms. Including Kotlin. Cool company and distributed workflow engines and agentic / long running workflows feel like a good match.
There are lots of other solutions in this space. I believe an ex Red Hat person is working on rebooting a workflow engine called Kogito based on something that orginally lived under their umbrella.
There's a long history of very enterprisy business process management stuff here. Lots of potential for overengineered solutions.
I once got sucked into a Spring Batch centric project and it was hopelessly overengineered for the requirements. Gave me a proper headache. Nothing was simple. Everything was littered in magic annotations causing all sorts of weird side effects. That's why I prefer declarative approaches with simple functions. Which is what the Kotlin syntax enables relative to Java. You can do the same technically in java but it quickly becomes an unreadable mess of function chaining.
There's no technical reason why this couldn't be done with another database, and we may add support for more in the future (DBOS Python already supports SQLite), but we're not working on it right now.
Thanks for the great feedback! Yeah, for isolation we recommend each service have its own system database and communicate via clients (so service A starts a workflow in service B by creating a client and calling "client.enqueue").
How could we make this experience better while keeping DBOS a simple library? One improvement that comes to mind is to add an "application name" field to the workflows table so that multiple applications could share a system database. Then one application could directly enqueue a workflow to another application by specifying its name, and workflow observability tooling would work cross-application.
yeah i think that's a step in the right direction, but ultimately as long as the workflow executor needs to know 'who runs this step' there will always be some friction compared to other systems like temporal.
DBOS implements a simpler library-based architecture where each of your processes independently "pulls" from a database queue. To make this work in a serverless setting, we recommend using a cron to periodically launch serverless workers that run for as long as there are workflows in the queue. If a worker times out, the next will automatically resume its workflows from their last completed step. This Github discussion has more details: https://github.com/dbos-inc/dbos-transact-ts/issues/1115