

Ken Kocienda, the designer who helped build the first iPhone shared:
“When I think of the creative work that means the most to me: the music of Bach and Pink Floyd, the writings of Shakespeare and Lao-Tze, the art of Diane Arbus and Jasper Johns… Were any of these created with A/B testing? Nope.”
Well, most of us aren’t chasing the path of genius everyone aspires to. Some of us are just trying to build a decently profitable business that pays the bills for us and our employees. And that’s exactly why this matters more- because the hidden costs of defaulting to A/B testing are the ones most ‘growth gurus’ rarely talk about.
If you’ve been in ecommerce long enough, you’ve seen the pattern: your team disagrees about a checkout change or a new page layout. Instead of making a call, someone says “let's just test it.”
Sounds reasonable. But what it really does is trade an uncomfortable conversation for three weeks of split traffic, an inconclusive result, and probably a decision that still doesn’t get made. And that's the hidden cost of turning A/B testing into a reflex.

So in this blog, we’ll look at where A/B testing tactics break down, why even smart operators become addicted to experimentation, and how to make product decisions without treating every disagreement like a science fair project.
This Reddit genius truly captured the essence:
“Is there any evidence that the use of A/B testing leads to better business outcomes?”
The thread filled up fast with - let’s call it healthy skepticism that comes from observing instances where testing became a replacement for judgment. Here’s what a commenter shared:
“A/B testing gets way overused because it's easier to quantify results and appear "data driven", and weak management teams like this.”
So turns out, you’re not the only one biting your tongue. And somewhere in there, is a thought you didn’t say out loud because it felt like heresy in a world that worships at the altar of the data dashboard: is any of this actually helping?
The shift happens gradually. One day, testing is a tool you reach for when you’re genuinely unsure. The next, it becomes the default answer to everything.
“You can't A/B test the important strategic questions like "should we continue building classic product features or ditch it all for AI based features"”
-A marketer pointed on Reddit
Harvard Business Review’s extensive report said: “A/B testing, once heralded as the gold standard for data-driven decision-making, often slows companies down due to an overemphasis on statistical significance.”
So the real cost isn’t even an execution risk, it’s actually the opportunity you lost while waiting for a statistic’s permission to take the next step. (A luxury that most early businesses simply can’t afford.)
After you ran that latest ad, your click-through rate went up. The dashboard looks glorious and the team is celebrating another win. But, your revenue hasn’t moved much. One Reddit user summed up the problem perfectly:
“I would always prioritise cpa, as the ad should also prevent irrelevant traffic from clicking through where possible.”
Teams often gravitate toward the metrics that move fastest in a two-week window: click-through rates, button clicks, page views. Things that look great in sprint reviews. Someone bluntly put it:
“If your LTV is low, your business sucks. Period. People don't like your product & business, or your business isn't solving enough problems.”

But what actually determines whether your store grows - lifetime value, retention, repeat purchases, brand trust - don’t neatly spike after a seven-day test. So a bigger button gets more clicks or a more aggressive popup captures more emails? Fine. But did any of it improve revenue, retention, or customer quality six months later?
Restructuring a five-step checkout into two is cross-functional, risky, and difficult to isolate cleanly. Testing another headline variant? Easy. Clean data. Nice presentation slide by Friday. No points for guessing which one gets prioritized.
And even people deep in experimentation acknowledge the limitation:
“I'd make sure that I'm testing a single thing, and that single thing is mission critical… Every test tells you a tiny bit about a tiny thing.”
With the wrong metrics being prioritised - your checkout turns into a patchwork of experimental variants depending on which test bucket a customer landed in. One sees a clean experience. Another gets a half-baked upsell widget someone wanted to “validate.”
Once testing becomes habit, everything starts feeling like it needs statistical permission before anyone acts. Conviction gets traded for consensus. Real decisions get sidelined to “let’s just test it” discussions - which can remain inconclusive for weeks.
Because nobody gets blamed for letting the data decide. A strong opinion can be wrong, and wrong feels risky. An inconclusive test result at least looks like due diligence.
There’s also the tooling problem. Experimentation platforms are built to encourage volume. More experiments running means the dashboard looks busy. Someone reports “we shipped 12 experiments this quarter” and it sounds like great momentum.
Nobody wants accountability tied to get fired for instinct, taste, or judgment. It feels much safer to say:
“Well, the data pointed us there.”
But some of the biggest consumer trends would never survive a clean A/B testing framework in their early days. Take Labubu - the collectible that exploded into a global craze through scarcity, and social virality. Pop Mart’s revenue tied to the brand crossed billions as demand surged worldwide.
.jpg)
No dashboard could have predicted that outcome early on. The products worked because people felt something irrational before the metrics started being justified.
Real experimentation requires strong hypotheses, controlled variables, enough traffic, and clarity about what question is actually being answered.
“an A/B test cannot explain why a version won. Here’s where qualitative research answers that question.”
If you’re merely running exploratory tweaks with vague goals like: “maybe orange converts better” or “let’s see what happens” - that’s hardly rigorous experimentation. It’s educated guessing at best and experimentation theatre at worst.
And then enters the real elephant in the room.
Even if you've done everything right - the world your test ran in two weeks ago isn’t the same world your customers are shopping in today. The way people find your store, how much attention they give it, and whether they even click through to it at all - that’s all shifting underneath your experiments.
One in three online shoppers now uses AI-powered tools during their purchase journey - chatbots, visual search, conversational product finders. Here’s how a consumer summarised this experience:
“Instead of navigating a website, you just... describe what you want. "I need durable carry-on luggage, under ₹8k, for frequent travel." The AI figures out the rest…”
Discovery, recommendations, checkout expectations - all shifting month to month.
If your team is spending three weeks testing badge placement while the way customers find and evaluate products is being rewritten, it doesn't matter how clean the experiment was.
But it’s worth adding that most of this “AI shopping” is still research behavior, not autonomous buying. We broke down the gap between the hype and what’s actually happening here: The AI Shopping Revolution Can Wait: What Your Store Actually Needs Right Now
The clean model of see website impression > click > convert got dismantled piece by piece. AI Overviews cut organic click-through rates by 34.5% according to Ahrefs. 60% of searches now end without a click to any website - on mobile, it's 77%.
The visitors who actually make it to your store are harder-won and more expensive than they were two years ago. And not all traffic is even human anymore - one major retailer reported that 72% of its Black Friday traffic came from malicious bots. When acquisition is this volatile and expensive, routing valuable users into half-baked test variants instead of your best-performing experience becomes a far costlier gamble than most teams admit.
The average ecommerce session on mobile lasts about 72 seconds. Screen-based attention has dropped to 47 seconds before task-switching.

You have a window measured in seconds. And when a meaningful portion of your traffic is routed into fragmented or half polished experiments - those users don’t stick around long enough for you to learn much from the test anyway
This blog isn’t saying you should throw out A/B testing tools. Just stop reaching for them by default.
A simpler way to think about it:
If every important decision in your business needs statistical permission first, don’t be surprised when faster competitors build the future while you’re still reviewing test results.
And if you want a no-nonsense solution for warranty and shipping protection that’s already been tried, tested, and benefited from by 500+ merchants - SureBright is worth a look.