One of the great developments in product design has been the adoption of A/B testing. Instead of just guessing what is best for your customers, you can offer a product variant to a subset of customers and measure how well it works. While undeniably useful, A/B testing is sometimes said to encourage too much “hill climbing”, an incremental and short-sighted style of product development that emphasizes easy and immediate wins.
Discussion around hill climbing can sometimes get a bit vague, so I thought I would make some animations that describe four distinct pitfalls that can emerge from an overreliance on hill climbing.
1. Local maxima
If you climb hills incrementally, you may end up in a local maximum and miss out on an opportunity to land on a global maximum with much bigger reward. Concerns about local maxima are often wrapped up in critiques of incrementalism.
Local maxima and global maxima can be illustrated with hill diagrams like the one above. The horizontal axis represents product space collapsed into a single dimension. In reality, of course, there are many dimensions that a product could explore.
2. Emergent maxima
If you run short A/B tests, or A/B tests that do not fully capture network effects, you might not realize that a change that initially seems bad may be good in the long run. This idea, which is distinct from concerns about incrementalism, can be described with a dynamic reward function animation. As before, the horizontal axis is product space. Each video frame represents a time step, and the vertical axis represents immediate, measurable reward.
When a product changes, the intial effect is negative. But eventually, customers begin to enjoy the new version, as shown by changes in the reward function. By waiting at a position that initially seemed negative, you are able to discover an emergent mountain, and receive greater reward than you would have from short-term optimization.
3. Novelty effects
Short-term optimization can be bad, not only because it prevents discovery of emergent mountains, but also because some hills can be transient. One way a hill can disappear is through novelty effects, where a shiny new feature can be engaging to customers in the short term, but uninteresting or even negative in the long term.
4. Loss of differentiation
Another way a hill can disappear is through loss of differentiation from more dominant competitors. Your product may occupy a market niche. If you try to copy your competitor, you may initially see some benefits. But at some point, your customers may leave because not much separates you from your more dominant competitor. Differentiation matters in some dimensions more than others.
You can think of an industry as a dynamic ecosystem where each company has its own reward function. When one company moves, it changes its own reward function as well as the reward functions of other companies. If this sounds like biology, you’re not mistaken. The dynamics here are similar to evolutionary fitness landscapes.
While all of the criticisms of hill climbing have obvious validity, I think it is easy for people to overreact to them. Here are some caveats in defense of hill climbing:
The plots above probably exaggerate the magnitude and frequency with which reward functions change.
There is huge uncertainty and disagreement about what future landscapes will look like. In most cases, it’s better to explore regions that increase (rather than decrease) reward, making sure to run long term experiments when needed.
The space is high dimensional. Even if your product is at a local maximum in one dimension, there are many other dimensions to explore and measure.
We may overestimate the causal relationship between bold product moves and company success. Investors often observe that companies who don’t make bold changes are doomed to fail. While I don’t doubt that there is some causation here, I think there is also some reverse causation. Bold changes require lots of resources. Maybe it’s mostly the success-bound companies who have enough resources to afford the bold changes.
Jupyter notebooks are great because they allow you to easily present interactive figures. In addition, these notebooks include the figures and code in a single file, making it easy for others to reproduce your results. Sometimes though, you may want to present a cleaner report to an audience who may not care about the code. This blog post shows how to make code visibility optional, and how to remove various Jupyter elements to get a clean presentation.
On the top is a typical Jupyter presentation with code and some extra elements. Below that is a more polished version that removes some of the extra elements and makes code visibility optional with a button.
One shortcoming with what we have so far is that users may still see some code or other unwanted elements while the page is loading. This can be especially problematic if you have a long presentation with many plots. To avoid this problem, add a raw cell at the very top of your notebook containing a preloader. This example preloader includes an animation that signals to users that the page is still loading. It heavily inspired by this preloader created by @mimoYmima.
To work with these notebooks, you can clone my GitHub repository. While the notebooks render correctly on nbviewer (unpolished, polished), they do not render correctly on the GitHub viewer.
It was pretty fun to migrate everything over from WordPress – which is not to say there weren’t some hiccups along the way – but I was able to do so by following some nice instructions in this post by Joshua Lande.
For those of you using RSS, the feed for www.chris-said.io should be searchable in your RSS readers, but please let me know if it’s not.
Some videos on the internet are so good that I’ve watched them twice. Below is a list of 10 of my favorite interviews and dialogues. Obviously this isn’t an endorsement of all the positions taken. I just think they are very well done and fun to watch. The last four are best watched on 1.4x speed.
Everywhere you look, people are optimizing bad metrics. Sometimes people optimize metrics that aren’t in their self interest, like when startups focus entirely on signup counts while forgetting about retention rates. In other cases, people optimize metrics that serve their immediate short term interest but which are bad for social welfare, like when California corrections officers lobby for longer prison sentences.
The good news is that as we become a more data-driven society, there seems to be a broad trend — albeit a very slow one — towards better metrics. Take the media economy, for example. A few years ago, media companies optimized for clicks, and companies like Upworthy thrived by producing low quality content with clickbaity headlines. But now, thanks to a more sustainable business model, companies like Buzzfeed are optimizing for shares rather than clicks. It’s not perfect, but overall it’s better for consumers.
In science, researchers used to optimize for publication counts and citation counts, which biased them towards publishing surprising and interesting results that were unlikely to be true. These metrics still loom large, but increasingly scientists are beginning to optimize for other metrics like open data badges and reproducibility, although we still have a long way to go before quality metrics are effectively measured and incentivized.
In health care, hospitals used to profit by maximizing the quantity of care. Perversely, hospitals benefited whenever patients were readmitted due to infections acquired in the hospital or due to lack of adequate follow-up plan. Now, with ACApolicies that penalize hospitals for avoidable readmissions, hospitals are taking real steps to improve follow-up care and to reduce hospital-acquired infections. While the metrics should beadjusted so that they don’t unfairly penalize low income hospitals, the overall emphasis on quality rather than quantity is moving things in the right direction.
We still are light years from where we need to be, and bad incentives continue to plague everything from government to finance to education. But slowly, as we get better at measuring and storing data, I think we are getting at picking the right metrics.