Worked for a bigtech well known name, large and extremely important project, literally the core of a service serving an enormous number of users, and feature flags were mandatory, no exceptions.
I can't imagine working without feature flags. Being able to enable new features in particular deployment rings (canary, dogfood, various production rings or regions), or per users / user groups, enabling gradually (percentage) and so on, is invaluable. I really can't overstate this.
Heck, we went as far as using feature flags for risky bugfixes even.
We had also internal tools to easily work with and track feature flags. A downside is that although normally you'd want to remove old feature flags that become obsolete, this hasn't been done very often.
What I suggested and we started doing was to tag the feature flags with the name of the author and the date at which they were added, and the same for the config updates, and usually ticket number and title for both case. This did help with tracking obsolescence, but obviously there was still a need to plan and do the actual work. Automating this process further was out of the question, due to the high risks involved.
> A downside is that although normally you'd want to remove old feature flags that become obsolete, this hasn't been done very often.
I figure this should be somewhat automatable since the relevant bits of code have references to the exact flag's identifier. Was it not done b/c nobody was incentivized in any way to spend any kind of time on cleanup?
edit: OK that might be a bit naive on the organizational side. Force everyone to give every flag an expiry date for review or something maybe?
A typical pattern I saw was a team adds a new flag, then gradually rolls it out. Once done the removal is added to the backlog. “High priority” bug fixes and feature work gradually moves the ticket down in the system. Two months later a reorg happens, the team no longer exists, and the work is lost.
In the case of my team, although we recognize we should clean it up, we’re usually prioritizing pushing out more features. That tendency leads to a lot of old feature flags.
In a startup, being able to get features out is more important than cleaning up old flags. If we don’t achieve product fit before we run out of runway, then the whole thing can get shut down. If we don’t achieve customer validation, same thing.
Usually, at some point, someone will suggest a rewrite. That pretty much never goes well.
That is the one problem we’ve found with feature flags. It’s very easy to forget to gut the “old” parts when they are turned off. In many cases I’ve seen two or three year-old feature flags whose stale end still remains because the developer and / or team that did the branch never cleaned up. It isn’t usually malicious or lazy… it’s just how things would pan out.
Then we’d be nervous removing the old stuff cause who knows why it was left there and who knows if they’d want it back on again…
I have found it can be good to 'remove' the flag at the same time you create it, but just don't merge the removal until later. I wrote up this idea in a blog post a while ago, if anyone finds it interesting: https://launchdarkly.com/blog/how-to-use-feature-flags-witho...
I've seen this used, but as PRs get added, these 'cleanup' PRs move to the bottom and are usually ignored by other team members.
To me, it's about having enough time to do this in a sprint, and that means it really needs to be a post-launch Jira ticket. Which I've seen done maybe once.
yeah, this is definitely a risk. I agree that the PR needs to be tracked as a post-launch task.
The advantage to this approach is that you do the hard part of removing the flag (ie, thinking through all the parts of the code that need to be cleaned up) while everything is fresh in your mind. Otherwise, you end up spending more time regaining all of the context, and are more likely to leave some vestigial dead code because you aren't sure it isn't needed any more (this is probably less of a risk with languages that lend themselves well to static analysis that can identify dead code, but these tools are never perfect).
Starting new at a startup-with-traction several years ago, I had reason to check out the feature flag configs for my first feature. It was hilarious.
- There were hundreds of them, some of them going back to the garage days, multiple years old. Some would turn on/off major functionality core to the product. For example, this was an ecommerce product - one toggle was "show/hide the buy button". (I think that one is staying in.)
- This was all hand-rolled, pre-LaunchDarkly stuff, but at least they were all in one place. I diffed Production, Staging, and a one-off UAT environment - the toggles were MASSIVELY different across each.
That's not a problem with feature flags per se, it's a problem with lazy implementations of feature flags. Flags should be associated with an expiry date, and company comms tooling should be consistently yelling in some public channel when expired flags still exist in the codebase.
> That's not a problem with feature flags per se, it's a problem with lazy implementations of feature flags.
Oh absolutely. Feature flags are great, but you definitely need discipline to make sure you clean things up. The longer the unused code rots, the harder it is to remove it.
Same. Feature flags need a yearly audit, but who has bandwidth for that? I'd file it under tech debt / tech health, or a great project for entry level or new hire.
> A downside is that although normally you'd want to remove old feature flags that become obsolete, this hasn't been done very often.
We use feature flags a ton and this is something that burns us a little bit too.
The upsides are worth it though. Having that surgical precision in production is essential when you have a complex product that touches many lines of business.
Put simply it disconnects merge and deploy from launch. This is very useful when your changes rely on other teams or third parties having gone live. Having a feature flag (we call them toggles) lets you get it into production turned off, without having to coordinate deploys with other systems.
The downsides are it is extra development overhead and adds tech debt (you should go back in and remove the toggles after a successful launch). Generally we try to devise a solution that can be deployed safely without a feature flag but they are often required.
Not anymore. We do still have the ability. Some teams have found that it introduced an overhead (moving part, point of failure) and long lived staleness into certain parts of the codebase, which is actually reflective of our teams' priorities. Feature flags bring an overhead that teams should be aware of an acknowledge the management required, before introducing it into a tech stack.
A different question - how do teams that don't use feature flags accomplish the things feature flags enable? Namely:
1) Validating you can handle production-scale
2) Ensuring integrations/environment-related issues don't happen when you deploy
3) Alpha/Beta groups of users
4) Quick reversions when something does not work as expected
Similar to other commenters I can't imagine not using feature flags. Some of these might have work-arounds like an artificial load tester, but nothing beats true production traffic & patterns.
I like the idea of feature flags and implement that pattern into my code from time to time, however I've never worked on a team that uses them. This is how they do it instead:
> 1) Validating you can handle production-scale
I've never seen anyone actually do this. Lots of people expect AWS to handle the heavy lifting here.
> 2) Ensuring integrations/environment-related issues don't happen when you deploy
Deployment to the test or staging env.
> 3) Alpha/Beta groups of users
Most often some sort of user permission system. Sometimes they run two nodes and have both run different versions of the software.
> 4) Quick reversions when something does not work as expected
A deployment. How long that takes depends on how the software is architected, but 5 - 10 minutes isn't uncommon.
This is mad. So much overhead for little to nothing.
Even if you don't use feature flags, you should always be merging complete code. Master should be your release, and you should be deploying often. The small your release, the quicker you catch issues, and the easier it is to understand where those issues lie.
Extensively. Nearly every single change to the codebase is flagged. Even switching copy is behind a flag sometimes if the copy is across multiple places. I find it excessive and irritating; it feels like cargo cult programming to me.
It's risk mitigation. Feature flagging everything is a pointless waste of time until suddenly you have to roll something back in production and it's not feature flagged. The value of feature flags increases with the greater the cost of a mistake making it to production.
For iOS builds particularly, that might have a 1-2 day review delay before you can get a new build approved, they're invaluable.
Depends on how long the time is between commit and live-in-production, and on how severe the impact is. And also on the number of developers committing changes to the same codebase/app.
If it takes you >20 minutes from when you commit to when it's live in production, and an incident arises due to the change where data loss/corruption increases in "blast radius" more by having the breakage in production longer, then a feature flag might be a great way to give you an immediate "kill it" switch.
One possible alternative that could give an immediate "kill it" switch is to deploy the last-known-good build, but that only works IF other non-revertable changes haven't shipped since your change (like non-reversible schema migrations), and also IF your time-to-deploy-existing-build is sufficiently fast (it's pretty rare that you actually have instant deploys in live production apps).
If you're in the situation where you have multiple developers committing unrelated changes to the same codebase, and non-reversible changes are a possibility, and your CI/deployment time is sufficiently-long (>5minutes, maybe?), then yeah, feature flags are probably a better fit than "just revert the commit".
For one-man projects with a slow rate of change and a fast CI/CD pipeline, sure, feature flags are overkill.
In a CD setup without feature flash, commit equals deploy equals launch.
With feature flags, commit equals deploy. Launch is controlled by feature flags.
For a more mundane analogy, would you rather have a stove that allows cold or burning only, or one with every possible nuance of warm in between?
Sure, though if your back-end has multiple services this becomes harder. If your back-end uses more than one server, then you can run into issues.
Feature flags also let you do partial rollouts, where you release to 0.1% of users and see if stuff breaks. Or A/B test to see if your feature improves whatever its meant to improve. It also lets you roll out the feature selectively to certain users if it's a breaking changes and each user needs time to migrate.
There are other uses if your flagging system is not "all or nothing". They can be very useful if they support directing a subset of users, customers, or traffic to new code/infra. The bigger a team gets, the more value you get from this.
In fact, FF like this can be used for A/B testing too.
Yes. I've worked at a few places that have used them similarly. Some do call them "feature flags", others call them "permissions". As others are noting in the comments here, they are great to perform partial rollouts of new features, or to selectively grant users (both for a client or just for our own in house purposes) the ability to see how a feature will run in a real world scenario.
One company created a category of feature flags that they give customers the ability to opt into themselves from their settings page if they want. This lets users selectively try out new features if they want. It's helpful to gauge interest and to get feedback before rolling out to everyone too. Though, that's a relatively small portion of the flags that are generally created
One problem that is typically encountered is setting up and executing a plan to remove the feature flags when they're no longer needed. Once you roll a new feature out to everyone (assuming there isn't someone who doesn't want the change), you should remove the flag and any code that's checking for it to keep things clean.
This exactly. Feature flags are necessary for rollouts, but they also quickly become technical debt. Multiple outages happened at my last company when the feature flag service went down and old flags defaulted to “off”.
It got to the point that the supporting team restricted flag-based rollouts to 95%. If you wanted it on for everyone, you had to remove the flag. Not sure if I fully agree with that design, but as an organization we were consistently failing to clean up flags.
Sounds like you have too many services if you have one just for feature flags! At my company we bundle feature flagging into the main auth service, if you can login then you can access your feature flags.
The feature flag service was doing double duty as both the “marketing experimentation A/B testing” platform and the “feature rollout” platform. Not defending the service boundaries, but it had enough meat by itself that there was a full time team working on it.
I also think the intent of the service was to be used for temporary experiments and rollouts, with the assumption that everything in the system could safely turn off. If these assumptions held you might not want to allow a bug in the flag service to take down auth. Indeed, the outages we did have from feature flag service going down would have been worse outages if it was crashing all of auth and unaffected clients couldn’t log in.
In practice long running tech debt resulted in unexercised code paths that might be years out of date.
> The feature flag service was doing double duty as both the “marketing experimentation A/B testing” platform and the “feature rollout” platform. Not defending the service boundaries, but it had enough meat by itself that there was a full time team working on it.
Ah, we're small enough that we don't have a marketing team and we don't have capacity to do A/B testing either!
Config level defaults were set in the flag service. But when that goes down, the client has to decide what to do.
I think there was an optional arg for specifying default behavior, but even when used there’s the problem that the default you pick for a rollout will be “off”. If you don’t go back and change it later, you’re back in the same boat.
For example, consider a platform where users can sign up to post or listen to music. There might be three types of user: "listener", "artist", and "label". There may be permissions and configuration settings that would cater to each one of these types within the same interface, and maybe in some exceptional cases, they could be intermixed (eg: while an artist and label can post music, maybe an artist can also track stats about what they like to listen to, something that only a listener account can do normally).
The platform might want to configure each of these to streamline usage and to remove features that aren't relevant to that account type, but some accounts may be specialized for whatever reason.
Not just feature flags, but "knobs" - we use a percentage (of requests or customers) instead of a boolean to enable features so we can dial them up/down.
This let us slowly roll out new features to a subset of users in case there are any issues.
Using feature flags also requires testing the default (not enabled state), ensuring you have a robust realtime configuration manager to control the knobs, and metrics for everything - not just how many requests/customers are opted-in, but also the progress & state of the configuration change.
It does no good to first enable a feature at 1% if only 1% of your servers have received the updated configuration - that's only .01% impact. It also tells you when your rollback is complete - you want to be sure when you disable something that there's not some stuck server with the feature still enabled...
Yes, the main problem I have with actual "feature" flags is they increase complexity because you have to test multiple variants of features together with integration and unit tests. I like the idea of wrapping a whole release in a release flag instead because you have a set of released features that should all work together.
How do folks implement feature flags? Do most people use feature flags as "remove/add code from/to my application at compile time"? Or do people also use some kind of a runtime check system that enables them to toggle functionality while the application is running with some partial reloads or fast restarts?
if FEATURE_FLAG_ENABLED():
do_new_version()
else:
do_old_version()
all throughout code. "FEATURE_FLAG_ENABLED()" could be checking a database or an external service. I've also seen it done through environment variables, but I don't like that pattern as much.
The most elaborate I've seen IRL is when feature flag management was handled by a service, so they were handled at runtime. The service authenticated the user and then returned a bunch of flags for them. The service also had a web interface for admins to modify the flags; they could be specified on different levels, like system, user, group. The flag service was always called by the backend, so if the frontend needed the flags, the backend forwarded it to them. For a select few flags, the users themselves could toggle them, on a simple interface.
The simplest I've done on a small project was just if blocks that checked the username. We had just a few, long term users, and this was used to handle their unique requirements.
The app fetches its base configuration first from a known URL, before even loading anything except for the most essential parts.
This is where we can disable the most basic level of operations or do an emergency shutdown of the client (backend broken or front end exploit found for example)
Second level is a version-specific configuration, which can be used to disable all major features of the application.
Then there is a system that allows for A-B -testing of features on a client, which can also be used to completely turn off features. It's mostly used to slowly roll out features on the client that might stress the backend in unexpected ways if enabled in one go.
The A-B tests are either moved to the second level config feature flags, or permanently turned on or off in a subsequent release.
Hi, I'm at LaunchDarkly, a commercial feature flag service. I'll explain how our flags work, which is different from more traditional implementations.
But first, as mentioned elsewhere, using a feature flag in code _usually_ looks something like...
flagValue = getFlag(FLAG_NAME, user_context)
if (flagValue == true)
... do the thing
else
... do the other thing
So, a couple of those traditional implementations:
1. Environment variables: Good for fast checking; no I/O needed. But if you need to flip a flag, your code won't pick up that change without restarting, maybe even redeploying. Many uses for flags, such as user enablements and kill switches, rely on flag changes being picked up immediately. Plus, this has no targeting capabilities: if a flag is on or off, it's on or off for _everyone_.
2. Request on demand: You store the flags in a database or other external system, then fetch the value when the code evaluates it. This means that flag changes get picked up without restarts, at the cost of blocking I/O. So, the more flags you have, the worse your performance gets. Not great. But if you send the user_context with the flag request, your flag service can do targeting (different flag values for different users)
3. Background polling: Similar to request on demand, except that you keep an in-memory cache so that flag evaluation happens instantly, and that cache is kept up to date with a background thread that periodically checks the flag service. Better performance than request on demand, but updates have more latency. Increasing the poll rate reduces latency at the cost of load on the flag service. Also, the cache will presumably only cache the flag value for the current user, unless you want to build a rule engine which runs locally.
Here's how LaunchDarkly does it:
Our SDK connects at app startup and downloads the flag data. If it's a server-side SDK, that data will include all the targeting rules. (Yes, we built a rule engine that runs locally. It's pretty powerful.[1]) The SDK caches that data so flags can be evaluated for all user contexts without making a request. However, the SDK then keeps a persistent connection open to our server (or a local proxy). When a flag changes, our server pushes the update through that connection, which updates the SDK's cache instantly. At present, flag updates take about 200ms to propagate to SDKs, usually less.
Every developer makes a rollback-worthy mistake every x changes. This means your probability of having a clean release is (1-1/x)^NumDevs EDIT - Assuming everyone submits 1 change/release.
Let's give it 1/100 bad code change, and 100 devs. The probability of a clean push is:
(1-1/100)^100 = 0.36. That's worse than a coin flip.
for 10 devs = 0.90. Still not great, one push in 10 will be bad.
When you're trying to push daily (or even more frequently), this will kill the team's velocity. It's even worse when there are interleaving changes making rollback and re-release impossible before the next release. Rollback and freeze is a problem when you're trying to meet deadlines (either marketing or regulatory).
Feature flags allow the code to go in and each change to be rolled back independently. This lets the rest of the team to make progress while the bug is debugged.
Short release pipelines could be a solution, but they're aren't necessarily sufficient.
Time to recovery when there isn't a feature flag is also problem.
For example, some bugs take days/weeks to reveal themselves (think month end), so a "30 min build and full push" is not enough of a guarantee. Time to recovery needs to consider the time to find the change and roll it back, including dealing with any interleaving changes that happened in the meantime.
No, we do monthly releases instead and make sure they're stable or fix them with a follow up bug fix if they're not. If a feature isn't ready yet, it doesn't get into the release (or develop branch).
I personally think feature flags are useful if you're deploying very frequently, but they just add confusion to software that's meant to be released/stable, especially for those developing it (what's with all the half done code and TODOs everywhere?)
Feature flags aren't a gateway to half-done code and TODOs unless they're misused: all code should be production ready, whether it's behind a feature flag or not. A feature flag severs the relationship between the availability of a feature and its appearance in code, which is very useful for lots of reasons -- pushing half-baked code isn't one!
Feature flags can be used as a gate for in progress code. This is common in companies that enforce a linear commit history and encourage many frequent, small commits. It helps avoid merge conflicts that could happen with long living feature branches.
We simply have a column on a few our of models called "features". Just an array with plaintext values. It's just branching logic on the FE/BE.
The biggest thing with feature flags is using them at an appropriate level of granularity for the stage of company you're at. Branching logic adds complexity.
Consider starting with an entire feature/route/page set to keep the branching logic consolidated and simple. Only move to more granularity as you need it.
Not all projects are of the same duration or complexity. Most of the features get developed in a 2 week sprint, but from time to time, there are projects that take a month or so. When there are multiple teams with different working schedules, feature flags add a lot of value during deployments and reduce the dependencies on other teams. The business/product benefits that arise from feature flags are equally valuable.
Also biased! [0], we use them a lot, mainly on the front end. Most of the new features we are working on are gate-able via front end UI elements, and so we push out pretty much all our new features using our own platform.
We are also doing interesting stuff like controlling what features are in our Open Source Docker container via flags in the platform that are baked into our Docker images when they are built.
Once you start using and relying on flags, it's hard to go back, and helps with a bunch of 'good' engineering processes and patterns.
Another interesting aspect is that once you have gated a feature based on a flag, you can then AB test that feature, almost for free, as you have a way of bucketing users and showing or hiding the feature using most feature flag platforms.
We rarely use them where I work although we've recently had some devs who are trying to get us to adopt it.
Personally I think they should be used sparingly for things where you have to be able to configure the same software on different environments differently, for purposes of A/B testing or other such things.
Using them a lot increases your code complexity and is basically just tech debt imo. The gain is that you don't have to run a deployment to turn on a feature? If you invest the time to make your build/deploy process not suck, this isn't a very big win.
I'm honestly surprised to see so much love for them here. I'm going to take some time to read this thread, see if there are compelling reasons to take on the extra complexity that I haven't considered.
For me the biggest take away from this thread is the distinction between release, deploy, and launch. So no matter how good or bad your CI/CD process is, you can still control what gets launched, when, and to whom.
I did not know about the term as such, but after reading about it I realised that it's something that our very small team (< 5) has been doing as a way to make sure that new stuff doesn't break production (eg switching major dependencies).
I can see how being more formal about it can help small teams, especially when they want to have high velocity but do not the capacity to test the hell out of everything, or simply have to support alternate features at the same time.
I think if you don't want certain features on production it's better to just not have them in the production branch. Putting them behind a feature flag always allows for a misconfiguration where suddenly a half built feature is visible and broken in production. It might be a small risk, but why take it?
> think if you don't want certain features on production it's better to just not have them in the production branch
Totally agree. In our case however, we used a feature flag when launching a new feature for the first time (a new way to render pdfs) that depended heavily on Per-customer configuration (eg their letterhead/logo) and on the data being rendered, that also had very strict requirements (number of pages, etc). The only to "test" this was in production and it's something that must be reversible in an instant by us or the particular client (ie no redeployment or anything like that).
I guess it depends on the context as well, since we do b2b and on-prem, so there's no such thing as "move fast and break things". Our clients actually require slow moving, incremental, updates that never break things, and it's a bit of a pain for us to do deployments / updates (it takes time, even if kind of automated/scripted).
See, and that's an entirely reasonable scenario to require feature flags for. In my mind it's always been a thing to add only when you actually need to toggle a feature without a deploy. Certainly not something to sprinkle in your code liberally.
A lot of people in this thread seem to disagree and I find it fairly baffling.
We do. But, depending on your project, you may need to do extra work to corral the complexity that can grow up around them. We wrote a library for using them in our web apps that allows us to use them, and remove them, easily.
Some possible failures that can happen with feature flags:
1) "Accidentally perpetual" - Since feature flags are a part of the code, it's easy to create multiple dependencies on the flag value, which makes it difficult to remove from the code without mysterious null exceptions happening where you didn't expect them.
2) Cross-scope - Using multiple feature flags carelessly can result in situations where one flag value change doesn't do what's expected unless another flag's value is present or set to a certain value. Flags should always be independent from one another, even if they're controlling the same code. Instead of two flags whose values affect each other, you would instead create more flags (4, in the case here if using Boolean flags) to reflect each combined state.
3) Fallback - What happens if the systems or SaaS that supplies your feature flags becomes unavailable? Always consider this.
Feature flags are a great tool, and enabling your team to be able to "test in production" with them can be amazing. However, do watch out for the footguns.
Yep. Some features are complex and cannot be completed in one sprint. On the other hand, we have constantly changing front-end code and leaving code in a branch too long might make it drift and might get broken in a lower-layer change because constantly syncing gets annoying. The other case is when the front-end is ready but the back-end is not or there is a public announcement that is scheduled for a later date.
Drifting branches seems to be the main argument pro feature flags. I wonder how teams avoid breaking parts sleeping behind feature flags, by the ongoing development of its surroundings.
>I wonder how teams avoid breaking parts sleeping behind feature flags, by the ongoing development of its surroundings.
Just duplicate the testing jobs with the feature flag flipped. Configure new job to turn expected-failure test cases into expected-pass (or whatever analogous way you keep track of it).
If the flag is obtained from a configuration file, you have a testing configuration where the flag is toggled and you run the tests in your CI/CD pipeline. E.g. we have a development plan where everything is enabled and tested.
Feature flags became essential when we began to open our offering as various configurations with varying features that align with business packaging. We use them as an easy way to toggle features per environment and end-user-configuration. Certain features are turned on/off for our Cloud version vs. OSS vs. Self-host. It's conceptually easy to understand across the team and easy to control.
HN seems like 90% web developers, and I don't really see the point outside of the web. I mean, we have other ways of gradually rolling out and configuring new software builds.
I've used them going back decades, long before they were a buzzword, only in back-end. The most common use case is where you have a bug fix or new feature that is highly data-specific. Often you'll have a user or set of users with the problematic data pattern. You code up a fix and redeploy, but with the fix disabled. Then enable the fixed code path for the user that reported the problem and ask them to re-test. From their feedback and/or log data you can confirm the fix and then enable for all users. This is helpful for example when there would be privacy issues debugging/testing a fix on real user data.
I've now worked at 3 different companies that built feature flags both internally and as a core part of their external product offering. I'm currently at Flagsmith (open source too).
Here are some of the more popular front-end feature flag use cases:
1. Gradual Roll Out: Build a feature and release it to 5% of your users, then increase as you see that it isn't "breaking anything". You might even do this AFTER a successful A/B Test concludes.
2. Test in Production: Build a feature and release it to only your internal team (or QA Team) to see how it works in a real production setting.
3. Feature Gating: Managing access to specific features based on a targeting condition. I've seen people do this for BETA features with key customers pretty often.
Most common reason people don't use them:
1. They are concerned about feature flag creep. Managing them if they aren't deprecated can be a problem worth thinking through ahead of time.
2. They worry about giving access to important parts of their product in production. Thinking about your environment set-up and access control is smart.
Yes. The ability to turn features on and off, especially on demand is an important part of the applications and services my team is responsible for. We have a few different levels where features can be enabled or disabled ranging from configuration injected into applications, configuration stores to lookup settings giving us the ability to expose/hide features.
To support per user or group settings we have a `canary` role that can be set to allow access to new features that have been integrated but not available to the general users. The nice thing about having something tied to roles is that the changes can take effect immediately without the need for redeploying, or reinitializing applications in our footprint. Also, the role based model can be made as fine or coarse grained suited to the app's and user's being served.
We tend to avoid encoding feature flags into URLs because users can bookmark them, revisit via history or navigate from old emails, messages, etc. and we'd rather not expose these flags or have them memorialized anywhere.
- enable percentage rollout only after validating for specific test accounts/ids
- when using percentage rollout, also have a 'killswitch' flag that can negate belonging to the enabled group
- if you don't need to test specific accounts/ids first, you can use only a 'killswitch' rollout flag starting at 100% and decreasing. enabling still possible to remove particular id's from feature enabled group
- best experience was making a helper that everything goes through rather than query feature flag name directly. This lets you test things like what happens in CI/CD if I hard-code that return value to fully enabled. This gives you test coverage using the flag for all the tests that don't mention it at all. The helper can also do any combination of required flags/killswitches for something to be meaningfully active.
- and for expired flags, have a recurring nag mechanism, e.g. Slack post by team/channel owner
We use them not as an A/B testing tool as I saw it suggested in a comment but more like a way to have controlled rollouts of features. As our team uses trunk based development we don't have a "development" branch so everything goes to master (we have a staging env and manual judgement prior to production deployment tho)
Perspective from a small startup: we used feature flags mainly for two types of features: 1. complex and 2. affects money.
Complex: when you roll out a complex feature, it's best to not make it available to all. Instead, focus on a small trusted subset of savvy users who will be easier to train. At the same time, you can use their experience with it to simplify it and make it easier for the rest of the userbase.
Money: we ran a marketplace so dealt with clients' money. We quickly realized that changes to the way payments are processed needed extensive feedback. Even if we assumed something was alright, chances are there would be objections. Rolling out changes in stages would allow our team to handle complaints and feedback without being overwhelmed.
Midsize SaaS here -- we switched from feature branches to feature flags a couple of years ago; major improvement in our process. Once or twice we've been tempted into foolishly "just this once" working on something out of a feature branch instead, and it always always always leads to messy merge conflicts.
Our React app just uses a simple home-rolled set of keys in localStorage; we have a "secret" route that lets users turn their own flags on or off, and we encourage developers to duplicate big chunks of code while working behind a flag instead of mixing old and new, to make cleanup easier.
(And yeah, someone does have to stomp around every few months and remind everyone to clean up their dead code again. Still worth it.)
Yes. Feature flags changed the mindset of my whole team. I absolutely love the fact that it pushes the whole team towards release-small-release-often mentality. It's a technical solution with many benefits but I love the cultural impact the most.
We're currently in the transition, any chance you have some insights to help ease the mentality shift as the team transitions to this vs bi-weekly/monthly deployments.
- automated tests are the pre-req of feature flags. kindly make sure that you have good test coverage. otherwise the team loses confidence in code & feature flags which defeats the whole purpose.
- as part of code review guidelines we added one major feedback item: can this PR be merged right now? why not? what can be done to merge it right away? as part of a cultural shift the code reviewers played a critical role. they kept engineers on the their toes at all times.
- feature flags should be part of user stories. in many cases you can't expect engineers to add a feature flag at the end of implementation. the product team should know this at the time of writing user stories.
- as mentioned in many other comments try to clean up code after feature flags become stale. otherwise the code (and respective automated test cases) becomes a huge mess.
- in our case we decided the granularity of feature flags (per-user, per-customer, per-region, per-server etc). we started with per-customer and went from there. worked out fine for us.
- feature flags have major two benefits: "release often, release small" & "decouple releases with big launches". please make sure to instill this in your team every day. mindset changes takes repetition and emphasis. if at anytime you feel that you are not achieving any of these please take a step back and figure out why.
Not frontend but I use them extensively now in backend + REST endpoints. I still think they are necessary to prevent having to hotfix builds when something breaks with a new feature, however I don't think they are purely good. Too many feature flags add a lot of complexity to the testing surface of a project and if the test suite isn't well adapted to test all the configurations then people will end up unit testing their feature flag (if they do at all) and not doing any kind of integration testing of the specific combination of flags that production may find itself in. You need to have a well thought out test suite.
Biased [0], but we use feature flags a ton. It means that as an engineer we can throw new ideas at the wall, turn them on for ourselves and then a small group of trusted customers and super quickly iterate until we have a feature that works well and that we can safely release to the rest of the world.
We mostly use frontend feature flags for this, so we'd only show the link in the menu or the specific component if the feature flag is turned on.
Recently worked at a mid-sized learning technology company. Feature flags were transformative for our product delivery to fully decouple deploys from releases, reduce risk, and offer tons of control over feature rollouts. Really can't imagine releasing without them.
In my new role, I'm curious what teams do with feature flags post-release. Do you have a good process for cleaning them up? Do they have long term usefulness as a failsafe or for customer/user configuration? Is it really an issue if they just stay in the code forever? Does this cause issues for you?
Our team uses them extensively. It's also tied to our A/B and QA testing infrastructure as it performs a similar function of "turn this/that on/off for these particular users". This enables us to do continuous deployment (dead/unfinished code goes to live all the time) and running QA for features on live infrastructure that has actual production loads.
It's also a life saver when issues arise, though the correct term for this is "operational toggles". Flip a switch when functionality is causing issues and it's gone.
I know you said frontend, but we use them for both FE and BE. They work fine, but our problem comes with DB migrations. We just haven't found a good way to deal with DB changes and flags.
This isn't a perfect solution but we "solved" this by allowing services to know if migrations have been run. We store the migration data in a shared database and every service that depends on a specific migration, directly or indirectly, is configured to verify that migration has been run before health checks will pass.
The system is designed to support automatic migrations and deployments, but I don't trust it enough yet. (It's easy to write a migration that works on a local database but consumes too many resources in production, so it's just a matter of having a proper preprod, that I haven't created yet.)
Build a service/loop/CI/cron job/whatever to pull feature flag state from prod and check into your relevant branch(es). Make the dev deployment apply this state by default.
You’ll get a lot of extra value out of this if you have an automated test pass that exercises the bulk of the product. You get bonus points if you can integrate individual feature flags back to your branch(es) independently, because it will let you easily identify which flag has broken automation.
My team has basically implemented this twice, for two different products.
One (product A) does individual integrations from the production feature flag system to the repo when the flag is at 100% in production. It’s quite successful.
The other (product B) does bulk integration every few hours from production to the repo, bringing all flags at once. It enables flags that have been “approved” (as opposed to fully enabled in production). It’s less successful because of the bulk behavior, which makes it more difficult to isolate breaks.
Both of these integrations go through the same validation gates as pull requests (they are actually implemented as pull requests). This ensures that flag changes through this system cannot break pull requests.
The ideal system for protecting pull requests (or whatever other validation) is to require validation to pass before the flag can be enabled in production. I have not implemented this as a hard gate, but I do have an “fyi” validation run as post of the flag enabling for product A so engineers see if they’ve broken something before they start turning on the flag.
Getting philosophical, I strongly believe that feature flags should not be enabled by default in the validation environment until they are fully enabled (or close to fully enabled) in production. This ensures that the base state (flag off) is validated. You always want it to be safe to turn off the flag, which means you want validation running with the flag off. Product B does not have this behavior for legacy reasons but I think it’s a poor technical decision.
We don't really use feature flags, this migration code is just our way to get closer. But it would be possible to write a migration that handles the feature flag swaps, assuming it's a value in a database or something that works somewhat like a database (redis, storage bucket, k8s configmap, etc.) That's how I would do it anyway.
The migration content is all committed to a central repository (we're using a monorepo) and they can basically run "migrate all" to get them. There's also a script that creates a basic environment from scratch, but it comes with only minimal test data, and not nearly enough for load testing.
The dev would need to watch for updates to the main branch to know if something there requires their attention, which may not be totally scalable. We're miniscule and pre-launch so it's not a big deal for us yet.
What kind of problems do you run into? And how is your DB deployed/updated/migrated/whatever?
I’ve found it necessary to either always update the DB before the code/binaries or to expose DB version to the code so that it can automatically turn off features if the DB schema isn’t updated yet. Which of these options makes sense depends on how you deploy. If you can rigidly enforce that the DB always updates before binaries, that’s a simpler model (but your DB queries need to be backwards/forwards compatible, depending on whether they are in code or in stored procedures).
The other thing that’s been hugely successful is to have a comprehensive test suite that can be exercised in intermediate states. e.g. New DB with old binaries. Absent a strong test suite, it’s always human judgement when it’s safe to roll out.
I’ve also found success with taking certain actions out of the deployment entirely. e.g. Adding indexes to large tables gets done independent of deployments, because if things go south we don’t want to be trying to resolve the indexing issue in the middle of a deployment. It’s easier to do this in isolation from other changes.
At my previous job we used them. Sometimes it was to enable/disable features for customers that were part of a beta program. These could eventually evolve to be controlled by ad admin as we rolled out the feature to all customers.
The product was a desktop file synchronization client. We had an API that we would get administer-controlled settings. Feature flags were usually part of that API.
We don't use them at my current job, but it's a much smaller company with many less customers.
Feature flags are core to CI/CD
Darklaunches of features, slow rollouts, all require feature flags, or at least canary releases, slow rollouts using k8s
Yes, we do, not in the way a lot of people are describing here (push without disruption, a/b testing). We deploy on prem applications that can be managed from the cloud, so when we add a new feature that involves said applications we have to observe which features the deployed application can use (til upgraded) and only display the appropriate ones from the cloud UI.
We do, not sure if we do it "right", but we do have flags (both in the backend and frontend) to enable features only on "dev" while they are being worked-in-progress, this allows us to not delay pushing to prod. We also have feature flags in our infrastructure-as-code actually (we use CDK, a typescript based infra-as-code framework)
Yes. It’s the only way we’ve found to be able to deliver large features in a timely manner without a giant push at the end before launch because all functionality is tested internally and delivered by deployed in a dark launch. Separating deployment from launch was one of the biggest improvements to our ability to deliver value to the business.
Yes, it mostly decouples "launch" of a feature from accidental side effects of it.
I find that reduces the peak/emergency workload; important for small numbers of developers.
Also aids in developing the whole codebase cohesively -- instead of "don't touch the core to add the new feature". Or trying to wrangle separate branches that have a dependency.
Old company used to because deployment after merge took a day with hundreds of developers deploying a monolith. 1/3 of the incidents were related to misuse of feature flags. It definitely has its places, but letting turn into a hammer to apply it everywhere is wasteful and risk. Totally depends on the application and customer base though.
Yeah, my team doesn't do anything terribly complex with them, but when I'm developing services I always throw in a command-line flag to toggle debug mode - ups the log level and serves pprof data using Go's net/http/pprof module. So it's not a user-facing feature, but we haven't run into that need yet.
Yes. So damn many of them (on backend, don’t deal with frontend).
Everything new is throttled and feature flagged. A new feature rollout can have 10+ throttles to slowly ramp up. And yes, they’re difficult to keep track at times.
What we don’t have (but should) is better management of who experiences which combination of flags, since they’re done randomly by default.
I use them in my desktop software. They are invaluable in helping transition to new features without sitting on the code and waiting weeks to merge, or forking the code which comes with its own set of headaches and problems.
They allow us to work on new code in the debug version, while insuring stability and continuity in the production code.
This reminds me of this 3-year-old post about Oracle Database[1], where too many flags were used. Or perhaps the developers were just bad? Or maybe it was the mixing in of macros?
I think they are a necessary evil. I haven't seen a better mechanism to allow dynamic changes to production, but they come with a big downside for operations.
The problem with feature flags is that (assuming a flag can only be "on" or "off") once you introduce the flags you have 2^n different possible states the system can be in. When you have a bug or a crash, you have to reason about all of those states. If you have even 10 flags, that's over 1,000 combinations!
Does anyone have a different way of enabling "experiments" or quickly rolling back bad changes?
Yes, we wanna test something that is really in production. We implement it that it's behind very hidden url and regular path is blocked with feature flag then at launch date you just mark true to false and the working thing just also becomes available at main url
I worked at the biggest retailer in the world and we used them a ton. They got even more useful once we built the concept of a versioned flag that would allow us to be true or false, depending on the app version we were on. That kind of flexibility is a must.
No, never used them in the past, but I'd like to give them a try for my next project.
Anyone got any good guides for building a feature flag system from scratch? ( I'm not interested in just importing a dependency, I want to properly understand how this works)
We do. We have one React app running on multiple instances that service different customers and not all customers need the same set of features. Some of features are only supposed to work on our staging instance until they're ready for production.
Quite literally, we had a CMS that could have been named "FEATURE FLAG" - every single element of it was tucked away in it's own feature flag. Had upsides and downsides but we made it work.
We have them, but they're not that elegant and often add more complexity than it seems like they should. But they can be very useful/valuable. Just wish they created less complexity.
Seems like a lot of folks here and at spotify do, as do we at my gig. Where do you keep your feature flags? Do you have to perform a reset to update them or do you cache them for X minutes?
I used them in multiple projects and i'm constantly discussing the benefits with other teams.
Implemented correctly (don't forget to remove them!), they are awesome.
Yes, use them on frontend, backend, and mobile. They feed into an AB experimentation system which is the gate for shipping features and catching regressions.
We use them a lot, but in my case mostly on the back end. We have a 24/7 service that runs and our customers depend on our service being up to make money. So if a new feature or change goes out that has any risk, we feature flag it so that if there are any problems we can flip the new feature off immediately rather than have to wait for a fix and deploy, or wait for a rollback.
Our feature flags are nice to work with in that you can just add them in code. If the flag doesn't exist in the DB, it is created with a default value. This makes them pretty painless to work with for us.
You can activate feature flags one server at a time as well to roll things out gradually if you want.
We have a simple web ui in our admin site where you can see them, what they are set to, when last updated etc. A good idea which we haven't done yet, is to log who changed it each time, and why as well.
Being able to find flags that haven't changed in a long time is useful to identify ones you can clean up.
The team uses them. Outside of looking for bugs I still have not been convinced that it is useful to check if blue versus green button has useful conversion rates.
I guess you could do A/B testing with feature flags but honestly, it won't be any good. If you want a good answer if to go with A or B, you want to have many data points, not just from one specific customer.
Gradual rollout and specific customer opt-in to certain features can certainly be done with feature flags, I wouldn't call that A/B testing unless you're running experiments and reaching a conclusion that A is better than B (or vice-versa).
You can totally implement A/B tests on top of a feature-flag framework. Your “pass” function should be hashed on the User ID/Cookie/whatever and then you can distribute users into pass/fail. You should do this anyway for reproducibility. If you log whether they passed/failed and then have metrics, you compute experimental results.
I can't imagine working without feature flags. Being able to enable new features in particular deployment rings (canary, dogfood, various production rings or regions), or per users / user groups, enabling gradually (percentage) and so on, is invaluable. I really can't overstate this.
Heck, we went as far as using feature flags for risky bugfixes even.
We had also internal tools to easily work with and track feature flags. A downside is that although normally you'd want to remove old feature flags that become obsolete, this hasn't been done very often.
What I suggested and we started doing was to tag the feature flags with the name of the author and the date at which they were added, and the same for the config updates, and usually ticket number and title for both case. This did help with tracking obsolescence, but obviously there was still a need to plan and do the actual work. Automating this process further was out of the question, due to the high risks involved.
Edit: added the last paragraph.