More A/B Tests Won't Save You

Why Volume Without Purpose Is the Fastest Way to Ruin a Testing Program

By jason thompson | CEO | 33 Sticks

Several years ago, we were working with a large specialty retailer. They'd made a real investment in optimization software and brought us in to help mature their program, to move beyond running occasional experiments and toward a practice that actually generated insight, influenced decisions, and improved the customer experience in measurable ways.

Then COVID hit. And this company, like so many others, was scrambling.

One of the things we pride ourselves on at 33 Sticks is the ability to pivot, not in the trendy startup sense, but in the deeply practical sense of meeting our clients where they are. So we shifted our focus. We helped them stand up a buy-online-pickup-in-store model they'd never attempted before. We used our understanding of consumer behavior and our experience testing different user flows to help them find the right experience for their customers during a period when getting it wrong wasn't just a conversion rate problem, it was an operational one.

It worked. The team we partnered with was exceptional. Many of them have since moved on to other brands, stayed in touch, and referred business our way. That's the kind of relationships we build. But then the leadership changed.

The New Mandate

New leadership is always a complicated moment for everyone involved, internal teams, external partners, the people who built what exists and the person walking in with a mandate to change it. In the best cases, it brings fresh perspective and necessary course correction. In the worst cases, it brings ego and urgency dressed up as strategy.

This was closer to the latter.

The new leader came from a different company, a different context, and carried the confidence of having run things a certain way elsewhere. The directive was clear: RUN MORE TESTS! More in volume, more concurrently. The metric of success for the testing program would be the number of tests executed.

We asked a simple question: What are we trying to accomplish?

It wasn't a challenge. It was a genuine attempt to understand. Our experience told us that leading with "just run more tests" is generally a flawed strategy. But we were open to the possibility that there was something deeper behind the ask, perhaps a business outcome, a competitive pressure, a board-level commitment that would make the volume goal make sense. We wanted to hear that this wasn't just "our key performance metric is number of tests run."

The answer we got was, "Don't worry about it. Just run more tests. More tests equals better results."

We pushed further. We explained that we wanted to support the goal, but that we needed to understand it in order to do that well. We talked about infrastructure. We talked about ensuring the right people were in place. We talked about the risk of spoiling tests by running combinations that would muddy results to the point where you literally couldn't trust what you were seeing.

The response, more or less: Stay in your lane and do your job.

Why "More" Isn't a Strategy

More tests can absolutely lead to better outcomes. But the relationship isn't linear, and volume alone doesn't create value. It creates the illusion of progress.

When an organization is genuinely ready to scale its testing volume, certain things are true. They have a strong understanding of data collection and how running multiple tests exponentially increases the complexity of results analysis. They have technical frameworks in place to ensure tests don't break the site and that the deployed experience actually matches the hypothesis the testing team believes they're evaluating. And to be honest, these aren't nice-to-haves. They're the floor.

Let's take the first one, because most people in this space haven't fully internalized it. If you're running two tests concurrently that both have the potential to touch the same consumer, you cannot analyze them in isolation. To the customer, there is no "Test 1" and "Test 2." There is no control group and variant. There is only the experience they just had with your brand. If they were in the control for one test and the variant for another, those two things combined to form their experience. Analyzing each test independently, as though the other doesn't exist, is a fundamental failure to understand what you're measuring.

And yet, this is exactly what happens so often when the goal is to run more tests. Teams analyze in silos because that's the only way to move fast enough to justify the volume. The data tells a story, but it's the wrong story.

The second issue is just as common and just as damaging. In a rush to push more experiments live, code gets deployed hastily. The experience the customer actually sees doesn't match what the testing team thinks they're testing. i've lost count of how many spoiled tests i've seen, tests where the hypothesis was never actually evaluated because the execution was misaligned with the intent. You're not learning anything from a test like that. You're just burning cycles and calling it optimization.

When your measure of success is the number of tests you ran this quarter, regardless of the results, regardless of the experience it created for your customers, regardless of the impact on revenue, not just short-term but long-term, you don't have a testing strategy.

What's Really Behind the Volume Push

In our experience, the "just run more tests" mandate almost never comes from a deep understanding of experimentation. It comes from new leadership wanting to show quick impact, often combined with a lack of deep expertise in the subject matter and a lack of confidence to rely on the expert team already in place.

That's a hard sentence to write, but it's an honest one. And it's not a condemnation of any individual, it's a pattern. A new leader walks in under pressure to demonstrate change. They look at the testing program and see a lever they can pull that's easy to quantify. "We ran 40 tests last quarter. This quarter we'll run 80." It looks like progress in a slide deck. It feels decisive. It's a number that goes up.

But it's not strategy. It's signaling.

The teams on the ground know this. The analysts, the optimization leads, the people doing the actual work, they know that doubling the test count without doubling the infrastructure, the QA rigor, the analytical capacity, and the strategic clarity doesn't produce twice the insight.

What We Would Do Differently

i want to be honest about something. We lost this client, and while we stand behind the position we took, i don't think i personally handled the relationship as well as i could have.

Looking back, i could have invested more time understanding why the new leader was so attached to volume as a metric. i should have been more empathetic, more curious about the pressure they were under, the fears driving the mandate, the organizational dynamics i wasn't seeing. It's very possible we would have ended up at the same outcome. But it would have been worth putting in a better effort than i did.

Being right about the strategy doesn't mean you handled the human side well. That's a lesson we carry forward. At 33 Sticks, we're not yes men. We push back when we need to, and we're comfortable doing it. But pushing back effectively requires more than having the right answer. It requires understanding the person on the other side, what they need, what they're afraid of, what success looks like from where they're sitting.

i didn't do enough of that here.

The Cost of Holding Your Ground

We lost the client. It wasn't dramatic. There was no grand showdown. We understood our value, and we chose to move on knowing the lost revenue would be replaced.

That clarity didn't come from bravado. It came from how we've built this business. 33 Sticks has stayed deliberately small. We don't chase every dollar or try to be everything to everyone. There is more work out there than any one person or any one agency can do, so rather than trying to do all of it, we'd rather do the work that aligns with our values, our expertise, and the kind of partnerships where we can actually make a difference.

It's a choice we've made over and over again, and it's the reason we can walk away from an engagement when the conditions aren't right without it being a crisis.

Our way wasn't right. His way wasn't right. They were differing opinions, differing objectives, differing views on what the program should prioritize. But we fundamentally believe that making "number of tests run" your north star, independent of outcomes, independent of customer experience, independent of long-term revenue impact, is a failing strategy. And we are not willing to execute a strategy we believe will fail just to keep a contract.

For the People in the Room

If you're an analyst, an optimization lead, or someone on a testing team who's living some version of this story right now, being told to run more tests without a clear "why," watching quality erode in the name of speed, feeling like your expertise is being sidelined by a mandate that sounds decisive but lacks substance, i want you to hear three things.

1. Have confidence in your expertise. You understand how testing works at a level that many of the people making volume demands do not. That knowledge has value, and it's worth defending.

2. Invest in learning how to communicate and defend your ideas more effectively. Being right isn't enough. The ability to translate technical understanding into language that resonates with leadership, to connect your concerns to their goals, is a skill worth developing. It's the difference between being heard and being sidelined.

3. Always understand why you're doing what you're doing. Not just the task, but the purpose behind it. If you can't articulate why a test matters, what question it answers, what decision it informs, what it means for the customer, then running it is just activity. And activity without purpose is theater.

The best testing programs we've seen aren't the ones that run the most tests. They're the ones where every test has a reason, every result gets analyzed with intellectual honesty, and the people doing the work are trusted to do it well. Volume is a byproduct of maturity, not a substitute for it.

jason thompson

Jason is CEO of 33 Sticks, a boutique analytics consultancy specializing in conversion optimization and analytics transformation. He works directly with Fortune 500 clients to maximize their use of data while helping team members reach their potential. He writes about data literacy, critical thinking, and why most "insights" aren't.

Subscribe to Hippie CEO Life for thoughts on doing good work in a world optimized for engagement.

https://www.hippieceolife.com/
Next
Next

Under the Hood at 33 Sticks: What Four Months Taught Me About Data, Strategy, and What Really Matters