In startups, there’s a tension between founders and developers.
Founders needs to move fast and get the product to market as quickly as possible, while developers feel compelled to build things properly, which tends to slow down the product release.
In many contexts, tension is good. It’s actually instrumental. Think of bridges: it’s what allows them to stand. Or violins and guitars: it’s what makes them create music.
However, in order to have a positive tension, you need to balance it.
If you don’t, tension will eventually wreck the whole infrastructure. Bridge suspensions snap, violins cords break, and startups fail: Slow product releases lose the market, while hasty development leads to debt and entanglement.
It is a difficult trade-off.
I like to think of this paradox as a variation of the classic exploration-exploitation problem.
The dilemma is simple: is it better to continue investing in something that worked well so far (exploit), or to try out something new and see if it works better (explore)?
You can see this particular conundrum at play in all searching problems, and startups, in a way, are a form of Searching Problem too: founders are looking for the fastest way to reach product-market fit.
With startups, in this context, I’m specifically referring to those high-risk companies that are following a path of extreme growth trying something new altogether. They usually have a new, different take on the market they’re targeting, and most of the time need to iterate a lot before finding the right execution that unlocks profitability.
In such a specific context, we can decline the exploration-exploitation problem as follows:
Is it better to continue investing in a product iteration that worked well so far (exploit), and to spend as much time as needed to make it perfect (slow)?
Or is it more preferable to try out as many novel features as possible (explore), hoping to see them work better than what you already have (fast)?
To be clear, the goal is not to find the optimal proportion between exploration and exploitation, but rather the optimal behavior that will let you find what you’re looking for in the fastest way. In jargon, that would be the policy with the fastest rate of convergence.
In other words, there’s no clear-cut answer to the question “Should you prioritize management needs or developer needs?”. It depends. However, there could be some indication of an optimal way to find the answer to that question.
See, you might find product-market fit too late, burning all the money. Or you might be stuck with an inferior iteration that you committed to too fast. It all comes down to how the founders decide to handle the exploration-exploitation trade-off.
Mathematically, there are several sophisticated methods for finding such optimal policy. However, most of them are convoluted and unintuitive, so it’s hard to derive an indication on how to behave in a real-world scenario.
Luckily, there’s a category of “approximate solutions” good enough to give some hints and ideas.
One of them is the epsilon-greedy framework.
It works like this: You decide what fraction of your decisions you want to spend exploring (epsilon: ε) and what fraction you want to spend exploiting (1-ε). Whenever you find something better than what you currently have, you opportunistically exploit it. Eventually, you should converge to an optimal solution.
You can see ε as a “rate of exploration”. The larger it is (i.e., the more time you spend exploring novel ideas without committing too much to any of them), the faster you converge to something promising, but potentially not optimal. The lower it is (i.e., the more time you spend building a proper product), the slower you’ll converge, but more likely to an optimal solution.
Imagine an ice cream shop that wants to optimize flavours selection. Chocolate, so far, proved to sell better, so it gets 70% of the shop’s ice cream supply. The remaining 30% supply gets a new flavour every week, picked randomly. For the ice cream shop, ε is at 30%. By increasing ε, the shop can try more new flavours, and thus it’s more likely to find one appreciated more than chocolate. However, it also has less room to market the winner, and therefore it will never unlock revenue maximization.
On the contrary, by decreasing ε and erring on the side of “exploitation”, the shop can try less new flavours, so it’s going to take a lot of time to discover the best one. However, when it manages to do so, it can actually maximize marketing the winner.
A real-life version of an ε-greedy approach is Google, which famously had 80% of development time allocated to its main strategy, while 20% to trying something new.
This core insight behind ε-greedy can guide our thinking. See, ε is not fixed and predetermined. We can change it depending on the context.
In fact, it’s totally possible to reduce ε over time to try to get the best of both high and low values. That is to say: when you’re starting out, you explore as much as you can, and then you gradually decrease your exploration rate.
It can be proven mathematically that this is an effective strategy to converge, fast, to a solution that is very close to the optimal one (a so-called “approximate solution”).
Early exploring, late exploiting
With this framework, it should be easy to derive a development strategy for high-risk high-growth startups.
The problem we’re trying to solve is the discrepancy between management and developers in such companies: management wants speed and iteration, developers want stability and long-termism.
A decreasing exploration rate — the optimal policy discussed before and a potential solution for this problem — implies a high level of exploration early on, and a high level of exploitation later on.
With this policy in mind, the startup should start with prioritizing the management need for speed. Everything is hacked together, to the point of even being no-code, and is guided by short-termism — whatever the startup is building will likely be torn down the day after having shipped it. The best set of features that are going to win the market is still far from being found.
It’s all ultra-fast and super exciting, but also draining — especially for developers. That’s why, when startups are very early, they typically hire young risk-takers, excited by having the chance to explore different solutions to different problems. Early-stage startups hire pirates.
Later on, as ε decreases and exploitation takes the exploration spot, developers' needs should then be prioritized. Exploration becomes opportunistic and marginal. Whatever the company creates is carefully planned out and built atop of the existing infrastructure. Robustness and scalability are the true priorities, and the general approach is guided by foresight and precaution.
A much less exciting, but also very stable and profitable environment — due to which late-stage companies are likely to look for experience, solidity and reliability. They hire custodians.
Going deeper on a micro level
As I’m sure you might have noticed, we just formalized the MVP framework: Start small, and invest on stability only after you found something that really works.
That’s, overall, a macro analysis, as it looks at the whole life of a startup and its product. What happens, though, on a micro scale?
Let’s put the focus of our analysis on a specific moment of the startup life, say the seed stage: the startup already raised some money, but it’s not yet a series-A stage after product-market fit, in full scale-up mode.
It’s with this lens that you start seeing the real tension between developers and managers emerging in all its beauty — especially when the company is neither super early nor super late stage.
When you look at things on a micro level, you will quickly notice an oddity: it takes time to verify assumptions.
See, the ε-greedy approach lies on a primary assumption: when you try something new, and that something is better than what you currently have, you can instantly switch. Changing the focus of exploitation is immediate.
Of course, this is not the case when you’re working in a startup. It usually takes some time before the market reacts, and then some more time to devise an appropriate response. Anyone who worked data-driven will tell how long it usually takes to get statistical significance for an A/B test.
It’s an iterative process, by design.
This timing mismatch — between shipping the experiment and verifying its results — is actually a lifesaver.
The trick, here, is to carefully sync feature design and feature development keeping into account the asynchronicity. The result is still being able to move fast while avoiding debt at the same time.
The ebb and the flow
The process of developing a new feature starts in pure exploration mode. As we’ve established in the previous sections, exploration is the domain of speed.
Therefore, the startup kicks off the feature with fast and dirty development. Corners get cut, strings and values are hardcoded, and code gets duplicated. The goal is to test ideas quickly, with no regard for tidiness or elegance.
After the feature gets shipped, the product team moves on to see how the market reacts and how to respond. It first measures, and then designs the next iteration accordingly.
While it’s busy doing so, the dev team is focused on repairing the inevitable mess that they created before. This is essential for keeping the tech debt under reasonable control.
It’s like a hit-and-run warfare tactic. You hastily attack and then quickly retreat, to assess the situation while healing the wounded.
Founders are happy because they keep making fast decisions. Developers are happy because they can have time to fix the tangle and confusion produced by strict deadlines.
Now, it’s important to notice that this movement always happens, irrespective of the company stage. What changes — depending on whether the startup is in an exploration or exploitation phase — is the wavelength.
The earlier you are, the higher the frequency will be: every feature is radical and disruptive, so the market responds quickly, and you don’t have much need to take care of tech debt.
The farther you go in the startup journey, the slower the movement will become, with the product team spending increasing time to study the market reaction while the developers invest in infrastructure and scalability.
Exploration and exploitation are everywhere
The beautiful thing about the exploration-exploitation trade-off is that it’s everywhere.
The microanalysis will inevitably depend on the context, but some of the findings outlined above can be abstracted and applied to many other searching scenarios.
Let’s take writing, for instance. When writers, bloggers and researchers start out, on a macro level they should focus on exploring many things, without committing too early on to a given topic, or style. Their ε — the exploration rate —, should be high, allowing for a high level of serendipity and chance. On a micro level, the focus should be more on how they write rather than on what they write about.
Later on, ε should decrease gradually, while a consistent style starts to emerge. Eventually, the solution should converge towards an optimal writing style and writing area.
Venture investment works like this too. If you look at capital allocation, VCs never invest all their fund money in new companies. The exact ratio depends on the investment strategy, but most of the time 50% or more of the reserves goes to follow-on rounds — that is, putting more money into companies the fund already invested in.
And while the follow-on money is more than half of the fund, that doesn’t mean that more than half of the fund investment takes it. It’s actually more like a power law curve where only 20% of the portfolio companies get multiple investments.
And that’s precisely what an ε-greedy approach would look like. VCs start exploring with a broad investment thesis, and then gradually narrow down their focus until they only (re)invest in what they think are the winners of the portfolio (low-to-none ε, full exploitation).
In a broader sense, you can see this framework at play for any risk-taking scenarios. From this perspective, also career building is a matter of balancing exploration with commitment. As it is dating and love life.
The bottom line is simple and always the same:
Start exploring. Move fast. See what works and what doesn’t. Don’t linger too long, and see what the world has to offer.
However, be wary of too much exploration. It’s always fun to build new features, invest in new companies and try out new jobs, but after a while, it starts getting counterproductive. Without exploitation, you will never unlock the true potential of the opportunities you found along the way.
So, when you’re ready, invest as hard as you can, and reap what you sow!
Non-lame 0-bullshit newsletter where I tell my journey trying to empower people at scale. For the most part, it's about growth: product growth (startups and technology), knowledge growth (learning), personal growth (productivity). I only send emails when I feel your time to read them will be well spent. Lately, this happened every other month. I'll do my best to not clutter your inbox. Promise.