resources / blog /
Most companies A/B test the wrong way- common mistakes, missed learnings and why testing doesn’t substitute conviction
May 8, 2026
3 min read

Most companies A/B test the wrong way- common mistakes, missed learnings and why testing doesn’t substitute conviction

Ken Kocienda, the designer who helped build the first iPhone shared:  

“When I think of the creative work that means the most to me: the music of Bach and Pink Floyd, the writings of Shakespeare and Lao-Tze, the art of Diane Arbus and Jasper Johns… Were any of these created with A/B testing? Nope.”  

Well, most of us aren’t chasing the path of genius everyone aspires to. Some of us are just trying to build a decently profitable business that pays the bills for us and our employees. And that’s exactly why this matters more- because the hidden costs of defaulting to A/B testing are the ones most ‘growth gurus’ rarely talk about.

If you’ve been in ecommerce long enough, you’ve seen the pattern: your team disagrees about a checkout change or a new page layout. Instead of making a call, someone says “let's just test it.”  

Sounds reasonable. But what it really does is trade an uncomfortable conversation for three weeks of split traffic, an inconclusive result, and probably a decision that still doesn’t get made. And that's the hidden cost of turning A/B testing into a reflex.

So in this blog, we’ll look at where A/B testing tactics break down, why even smart operators become addicted to experimentation, and how to make product decisions without treating every disagreement like a science fair project.

Why conventional A/B testing assumptions are getting weaker

This Reddit genius truly captured the essence:

“Is there any evidence that the use of A/B testing leads to better business outcomes?”  

The thread filled up fast with - let’s call it healthy skepticism that comes from observing instances where testing became a replacement for judgment. Here’s what a commenter shared:

“A/B testing gets way overused because it's easier to quantify results and appear "data driven", and weak management teams like this.”  

So turns out, you’re not the only one biting your tongue. And somewhere in there, is a thought you didn’t say out loud because it felt like heresy in a world that worships at the altar of the data dashboard: is any of this actually helping?

The shift happens gradually. One day, testing is a tool you reach for when you’re genuinely unsure. The next, it becomes the default answer to everything.  

“You can't A/B test the important strategic questions like "should we continue building classic product features or ditch it all for AI based features"”
-A marketer pointed on Reddit

Harvard Business Review’s extensive report said: “A/B testing, once heralded as the gold standard for data-driven decision-making, often slows companies down due to an overemphasis on statistical significance.”  

So the real cost isn’t even an execution risk, it’s actually the opportunity you lost while waiting for a statistic’s permission to take the next step. (A luxury that most early businesses simply can’t afford.)  

What you lose when testing becomes the default

After you ran that latest ad, your click-through rate went up. The dashboard looks glorious and the team is celebrating another win. But, your revenue hasn’t moved much. One Reddit user summed up the problem perfectly:

“I would always prioritise cpa, as the ad should also prevent irrelevant traffic from clicking through where possible.”

Your feature level metric optimization kill long term growth

Teams often gravitate toward the metrics that move fastest in a two-week window: click-through rates, button clicks, page views. Things that look great in sprint reviews. Someone bluntly put it:

“If your LTV is low, your business sucks. Period. People don't like your product & business, or your business isn't solving enough problems.”
Some solutions look better on paper

But what actually determines whether your store grows - lifetime value, retention, repeat purchases, brand trust - don’t neatly spike after a seven-day test. So a bigger button gets more clicks or a more aggressive popup captures more emails? Fine. But did any of it improve revenue, retention, or customer quality six months later?

The safest tests often beat the boldest ideas

Restructuring a five-step checkout into two is cross-functional, risky, and difficult to isolate cleanly. Testing another headline variant? Easy. Clean data. Nice presentation slide by Friday. No points for guessing which one gets prioritized.

And even people deep in experimentation acknowledge the limitation:

“I'd make sure that I'm testing a single thing, and that single thing is mission critical… Every test tells you a tiny bit about a tiny thing.”

Fragmented experience can damage product coherence

With the wrong metrics being prioritised - your checkout turns into a patchwork of experimental variants depending on which test bucket a customer landed in. One sees a clean experience. Another gets a half-baked upsell widget someone wanted to “validate.”  

Once testing becomes habit, everything starts feeling like it needs statistical permission before anyone acts. Conviction gets traded for consensus. Real decisions get sidelined to “let’s just test it” discussions - which can remain inconclusive for weeks.

So why can’t teams stop?

Because nobody gets blamed for letting the data decide. A strong opinion can be wrong, and wrong feels risky. An inconclusive test result at least looks like due diligence.

There’s also the tooling problem. Experimentation platforms are built to encourage volume. More experiments running means the dashboard looks busy. Someone reports “we shipped 12 experiments this quarter” and it sounds like great momentum.

“The data said so” becomes an armor

Nobody wants accountability tied to get fired for instinct, taste, or judgment. It feels much safer to say:

“Well, the data pointed us there.”

But some of the biggest consumer trends would never survive a clean A/B testing framework in their early days. Take Labubu - the collectible that exploded into a global craze through scarcity, and social virality. Pop Mart’s revenue tied to the brand crossed billions as demand surged worldwide.

No dashboard could have predicted that outcome early on. The products worked because people felt something irrational before the metrics started being justified.

Most teams aren’t running real experiments anyway

Real experimentation requires strong hypotheses, controlled variables, enough traffic, and clarity about what question is actually being answered.  

A marketer on Reddit shared:

“an A/B test cannot explain why a version won. Here’s where qualitative research answers that question.”

If you’re merely running exploratory tweaks with vague goals like: “maybe orange converts better” or “let’s see what happens” - that’s hardly rigorous experimentation. It’s educated guessing at best and experimentation theatre at worst.  

And then enters the real elephant in the room.  

Is your A/B testing even accurate?  

Even if you've done everything right - the world your test ran in two weeks ago isn’t the same world your customers are shopping in today. The way people find your store, how much attention they give it, and whether they even click through to it at all - that’s all shifting underneath your experiments.

AI is rewriting how customers buy  

One in three online shoppers now uses AI-powered tools during their purchase journey - chatbots, visual search, conversational product finders. Here’s how a consumer summarised this experience:

“Instead of navigating a website, you just... describe what you want. "I need durable carry-on luggage, under ₹8k, for frequent travel." The AI figures out the rest…”

Discovery, recommendations, checkout expectations - all shifting month to month.

If your team is spending three weeks testing badge placement while the way customers find and evaluate products is being rewritten, it doesn't matter how clean the experiment was.  

But it’s worth adding that most of this “AI shopping” is still research behavior, not autonomous buying. We broke down the gap between the hype and what’s actually happening here: The AI Shopping Revolution Can Wait: What Your Store Actually Needs Right Now

Zero-click is consuming the traffic you’re splitting

The clean model of see website impression > click > convert got dismantled piece by piece. AI Overviews cut organic click-through rates by 34.5% according to Ahrefs. 60% of searches now end without a click to any website - on mobile, it's 77%.

The visitors who actually make it to your store are harder-won and more expensive than they were two years ago. And not all traffic is even human anymore - one major retailer reported that 72% of its Black Friday traffic came from malicious bots. When acquisition is this volatile and expensive, routing valuable users into half-baked test variants instead of your best-performing experience becomes a far costlier gamble than most teams admit.

Your customers’ attention doesn’t care about statistical significance

The average ecommerce session on mobile lasts about 72 seconds. Screen-based attention has dropped to 47 seconds before task-switching.  

You have a window measured in seconds. And when a meaningful portion of your traffic is routed into fragmented or half polished experiments - those users don’t stick around long enough for you to learn much from the test anyway

What is the smarter way to test (Or not test at all)

This blog isn’t saying you should throw out A/B testing tools. Just stop reaching for them by default.

A simpler way to think about it:

Test when:

  • The stakes are high but reversible - e.g. pricing pages, subscription flows, checkout changes
  • You have a clear hypothesis - not “let’s see what happens,” but “autocomplete should reduce mobile checkout dropoff”
  • You have enough traffic - ideally enough to reach meaningful results within a few weeks without burning heavily on ads
  • The impact is isolated - one meaningful variable, not five changes bundled together

Skip the test and just fix it when:

  • The problem is already obvious - broken mobile checkout, slow load times, confusing UX, 404s
  • The downside is tiny - typo fixes, decluttering, consistency improvements
  • Your traffic is too low - qualitative feedback is often more useful than weak statistical guesses
  • The change is strategic or systemic - full checkout rebuilds, onboarding redesigns, warranty flows; validate these with beta users and customer feedback instead of endless split tests

If every important decision in your business needs statistical permission first, don’t be surprised when faster competitors build the future while you’re still reviewing test results.  

And if you want a no-nonsense solution for warranty and shipping protection that’s already been tried, tested, and benefited from by 500+ merchants - SureBright is worth a look.

A/B testing failure, ecommerce growth 2026, zero click attribution

Muskan Banga

About the author

Muskan is a content writer in the warranties and product protection industry, focused on demystifying and simplifying the industry for both her readers and herself. Her process begins with deep research, weaving in real-world examples to make complex ideas feel accessible and relatable. In her spare time, she obsessively devours Substack newsletters and books while losing herself in art films.

🔗 Link copied to clipboard!