Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Cadence looks like an OSS version of Amazon Simple Workflow (SWF) service. The author used to work on SWF at AWS afaik.

I'm a heavy SWF user at work for managing complex data pipelines. SWF requires an important conceptual and tooling effort in the beginning, but it gets reimbursed if you use it a lot.

As for other comments mentioning Airflow: the programming model is quite different , since Airflow as far as I understand forces to provide a DAG of tasks upfront. SWF (and Cadence?) doesn't, it coordinates the work of Deciders and Activity Workers and only acts as a source of truth for the state of the workflow (+ distribute task in a unique manner to many long-polling workers). As a result you don't declare anything upfront and can have deciders take dynamic decisions along the way, which is really nice when you want very dynamic logic for your workflows (e.g. dynamic partitioning of tasks, decisions depending on external factors, etc.).

I'd love to have Maxim insights about how Cadence compares to SWF, and what would be the reasons/challenges behind migrating from SWF to Cadence for SWF users (except that SWF is basically stale for 4+ years and rigged with arbitrary limits)



Cadence vs SWF

Cadence was conceived and is still led by the original tech leads of the SWF.

SWF had no new features added for the last 5 years. Cadence is open sourced and is under active development.

Cadence was initially based on SWF public API. It uses Thrift and TChannel for communication and SWF uses AWS version of REST. Currently the API is not compatible with SWF as Cadence added a large number of new features and deprecated a few problematic ones. We are planning migrating to gRPC later this year.

Cadence can potentially run on any database that supports single shard multi-row transactions as a backend. Currently it supports Cassandra and MySQL.

SWF has pretty tight throttling limits. Cadence scales very well with use cases in production that require 100s of millions of open workflows and thousands of events per second.

SWF has pretty tight limits on individual payloads and number of events. For example maximum activity input size is 32k. Cadence currently has 256k limit. SWF history size limit is 10k events while Cadence limit 200k. All other limits are also higher.

Cadence has no limit on the activity and workflow execution duration.

Cadence through archival supports unlimited retention after a workflow closure.

SWF has Java and Ruby client libraries. Cadence has Java and Go client libraries.

SWF Java library is fully asynchronous and relies on both code generation (through annotation processor) and AspectJ. It is hard to set up, doesn't play well with IDEs and has very steep learning curve. Cadence Java library (as well as Go one) allow writing workflows as synchronous programs which greatly simplifies the programming model. It also just a library without any need for code generation or AspectJ or similar intrusive technologies.

Cadence client side libraries have much better unit testing support. For example the Java library utilizes an in-memory implementation of the Cadence service.

Cadence features that SWF doesn't have:

Workflow stickiness. SWF replays the whole workflow history on every decision. Which means that a workflow resource usage is proportional to O(n*n) of number of events in the history. Cadence caches workflows on a worker and delivers only new events to them. The whole history is replayed only when a worker goes down or the workflow gets out of cache. So Cadence workflow resource usage is O(n) of number of events in the history. For large workflows it makes a huge difference. It also leads to higher per workflow scale. For example it is not recommended to have workflows that execute over a hundred activities in SWF. Cadence routinely executes workflows that have over thousand activities or child workflows.

Query workflow execution. It allows synchronously get any information out of a workflow. An example of a built-in query is a stack trace of a running workflow.

Cross region (in AWS terminology) replication. SWF in each region is fully independent and if the regional SWF is down all workflows in the region are stuck. Cadence supports asynchronous replication across regions. So even in the event of a complete loss of a region the workflows continue execution without interruption.

Server side retry is an ability to retry an activity or a workflow according to an exponential retry policy without growing the history size.

Reset is an ability to restart a workflow from any point of its execution by creating a new run and copying a part of the history. For example the reset is used to automatically roll back workflows to the point before a bad deployment that was rolled back.

Cron is an ability to schedule a periodic workflow execution by passing cron string to the start method.

Local activity is a short activity that is executed in the context of a decision. It uses 6x less DB operations that a normal activity execution.

Long poll on history allows to efficiently watch for new history events and is also used for efficiently waiting for a workflow completion.

Cadence uses the elastic search for visibility. Soon it is going to support complex searches across multiple customer defined columns which is far superior to the tag based search SWF supports.

If decider constantly fails during a decision SWF records a few events on every failure eventually growing the history beyond the limit and terminating a workflow. Cadence supports transient decision feature that doesn't grow history on such failures. It allows continuing workflows without a problem after the fix to the workflow code is deployed.

Cadence provides command line interface

Cadence Web is open sourced and is much nicer than the SWF console.

Cadence supports local development through unit testing as well as using local docker container that contains the full implementation of the Cadence service and the UI.

Cadence doesn’t yet have activity and workflow type registration. The advantage is that changes to activity or workflow scheduling options do not require version bumps that affect clients.


Wow thanks for the completeness of the answer.

I found myself nodding on all the conceptual limits and features you added, I'm sold, gonna try Cadence asap :)

The tooling around SWF (web console and ability to get insights about tasks, failures, etc.) is definitely a big one from an operational perspective. The SWF console is indeed absolutely terrible (with basic bugs not fixed for years, like broken pagination), so we ended up developing our own here at Botify, along with a python based client lib that mimics most of RubyFlow principles. I'm curious if all this can be integrated with Cadence, will have a look. I can keep you informed if you feel it's valuable for the Cadence project.


Besides the UI Cadence provides a CLI that supports most of the API features.

The core API is almost the same, so porting an existing python client should not be a very large task.


Is there design/architecture doc for Cadence. Wanted to learn the design goals, non-goals, alternatives considered, trade-offs made for building a system like Cadence.


We don't have a public design document. This presentation contains some details about the internal architecture: https://www.youtube.com/watch?v=5M5eiNBUf4Q


Thanks, that was really interesting! Why switch from Thrift / TChannel to gRPC?


TChannel has very limited language support and is essentially deprecated. gRPC is supported by majority of mainstream languages and is under active development.


Everything here looks awesome! Except gRPC... I thought gRPC was great until I had to use it in anger. JSON or CBOR for me!

https://reasonablypolymorphic.com/blog/protos-are-wrong/


gRPC is not exposed to the Cadence users directly. They program against the client side library that completely hides the communication mechanism. And you are free to choose any object serialization mechanism. Currently JSON is the default wire encoding.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: