A/B test your emails: a practical guide without the BS

You're guessing what works. Stop. Here's a structured approach to A/B testing that gives you answers you can actually use.

Hermod Team · AI-powered email marketing

Publiceret 6. april 2026

You don’t know what works. You think you do — you have a feel for good subject lines, you think your CTA buttons are strong, you always send on Tuesday morning because “that’s the best time.” But you’re guessing.

A/B testing replaces guessing with data. You send two variants of the same email to a small portion of your list, measure which one performs better, and send the winner to the rest. Optimizely’s experimentation guide covers the statistical foundations in depth. It takes 10 minutes to set up and can improve your results by 20-40% over time.

This guide shows you exactly what to test, how to do it right, and which mistakes to avoid.

What is an A/B test?

You take your email and create two versions that differ on only one point. Version A has one subject line, version B has another. You send A to 15% of your list and B to 15%. After 4-24 hours, you check which one got the highest open rate. The winner goes to the remaining 70%.

That’s it. No advanced statistics, no complexity. Just: test, measure, use the winner.

The golden rule: Test only one thing at a time. If you change the subject line AND the sender name AND the send time, you won’t know what caused the difference. Isolate the variable.

What can you test?

Subject lines (start here)

Subject lines are the easiest and most impactful test. They determine whether your email gets opened at all.

Test ideas with real examples:

Variant A	Variant B	What you’re testing
”5 tips for better emails"	"Your emails are performing below average”	Positive vs. negative framing
”New guide: email segmentation"	"You’re sending the wrong emails to the wrong people”	Feature vs. problem
”Newsletter week 14"	"The one thing that changed our open rate”	Generic vs. specific
”Free whitepaper on AI"	"[First name], your AI guide is ready”	Impersonal vs. personal
”Boost your conversion"	"How we increased conversion by 34%“	Vague vs. specific numbers

What typically performs best:

Personalization (+10-15% open rate on average)
Numbers in subject lines (+8-12%)
Questions over statements (+5-10%)
30-50 characters (short enough for mobile, long enough for context)

But those are averages. Your list might be different. That’s the entire point of testing.

Send timing

When you send has a surprisingly large effect.

Test setup: Send the same email (identical content, identical subject line) to two equal groups at different times. For example: Tuesday at 9 AM vs. Thursday at 2 PM.

What data typically shows:

B2B: Tuesday-Thursday, 9-11 AM (people check email in the morning)
B2C: Thursday-Sunday, 6-9 PM (people browse after work)
E-commerce: Friday-Sunday, peak around 8 PM

But again: test it for your list. Your contacts aren’t the average.

CTA (Call to Action)

What you ask people to do, and how you ask them.

Test options:

Text: “Read more” vs. “See the 5 tips”
Color: primary color vs. contrast color
Placement: above the fold vs. below the content
Count: one CTA vs. multiple CTAs

Benchmark: A specific CTA (“Download the free guide”) beats a generic CTA (“Click here”) by 15-25% in click-through rate in most tests.

Content format

The length and structure of your email.

Test ideas:

Short (150 words) vs. long (500 words)
Text-only vs. images
Listed format vs. storytelling
Single column vs. two columns

What we see: Shorter emails typically win for promotional content. Longer emails win for educational content. Text-only emails have higher click-through rate in B2B. Images win in e-commerce.

Sender name

Often overlooked, but it’s the first thing people see.

Test: “Mike from [Brand]” vs. “[Brand]” vs. “Mike Johnson”

Personal sender names beat brand names by 10-20% open rate in most B2B tests. For B2C, the brand name is often stronger.

Sample size: how many contacts do you need?

Here’s the table you actually need:

List size	Test group per variant	Minimum difference you can detect
2,000	300 (15%)	5+ percentage points
5,000	750	3+ percentage points
10,000	1,500	2+ percentage points
25,000	3,750	1+ percentage points
50,000+	5,000 (cap)	Under 1 percentage point

The rule: The smaller your list, the larger the difference between variants needs to be for you to trust the result.

With a list of 2,000 contacts, you can detect that one subject line gives 25% open rate vs. 20% — that’s a 5 percentage point difference. But you can’t detect that one gives 22% vs. 20% — that difference is too small to be significant with your sample size.

Practical recommendation: If your list is under 1,000, use A/B tests as indicators rather than proof. Run the same type of test multiple times over time to spot patterns.

Statistical significance: when is a result real?

Most email platforms calculate this for you. But you should understand the concept.

95% significance means there’s a 95% probability that the difference is real and not random. That’s the standard. You can use a free A/B test significance calculator to check your results.

Example: You test two subject lines. A gets 22% open rate, B gets 18%. With a test group of 500 per variant, this result is statistically significant at 95%. You can trust that A is better.

But: A gets 21% and B gets 20%? With 500 per variant, that’s NOT significant. The difference is too small relative to the sample size. You can’t conclude anything.

Rule of thumb: Wait to declare a winner until your platform shows 95%+ significance. If it doesn’t get there, the variants are probably equally good.

The 7 most common mistakes

1. Testing too many things at once

You change the subject line, the image, and the CTA. Variant B wins. But what was it that worked? You don’t know. Test one thing at a time.

2. Test groups too small

You send variant A to 50 contacts and B to 50. A gets 30% open rate, B gets 24%. Sounds like a clear winner. But with 50 contacts, it’s one person’s open that makes the difference. Not statistically significant.

3. Stopping the test too early

You check after 1 hour and variant A is leading. You send A to the rest. But B’s contacts live in a different time zone and haven’t checked email yet. Wait at least 4 hours for subject line tests, 24 hours to be sure.

4. Not logging results

You run 20 A/B tests over a year. What have you learned? If you don’t document results, you start over every time. Keep a simple log: date, what you tested, result, significance.

5. Testing things that don’t matter

Font color, footer text, image size — they have minimal impact. Focus on the big four: subject line, timing, CTA, and content format.

6. Ignoring segments

Your full list has one optimal subject line. But your segments might have completely different ones. A subject line that works for new signups might not work for customers who’ve bought three times. Test within segments, not just across the entire list.

7. Never acting on results

The most common mistake. You run the test, see that B wins, and next time you guess anyway. Use your learnings. If personalization wins three tests in a row, make it the default.

A practical testing plan

If you’ve never A/B tested before, follow this plan in order:

Month 1-2: Subject lines Run 4-6 subject line tests. Log the results. Find patterns: does your list prefer questions or statements? Personalization or not? Short or long?

Month 3: Send timing Test 2-3 different times. Find your optimal window.

Month 4: CTA Test CTA text and placement. Find out if your list prefers direct (“Buy now”) or soft (“Learn more”) calls-to-action.

Month 5+: Content format Test length, layout, and tone. These are the more nuanced tests that require larger lists.

Ongoing: Continue testing subject lines with every send. It takes no extra time and provides constant learning.

What do you do with the results?

Build a playbook for your email marketing based on your tests.

Example after 6 months of testing:

Subject lines with numbers perform 12% better → use numbers in 80% of your subject lines
Tuesday at 10 AM is the optimal send time → make it the default
“See the guide” beats “Click here” by 18% → use specific CTA text
Emails under 200 words have 25% higher CTR → keep it short for promotional emails

That’s your data. It applies to your list, your contacts, your business. It’s worth more than any best practice guide — including this one.

Summary

A/B testing isn’t complicated. It requires discipline: test one thing, use large enough groups, wait long enough, and log your results. Over time you build an understanding of your list that’s based on data, not gut feeling.

Start with subject lines next week. Run one test per send. In three months, you’ll know more about your contacts than most email marketers ever learn.

Ofte stillede spørgsmål

How many contacts do I need for an A/B test?

Minimum 1,000 per variant for the result to be statistically significant. If you're testing two subject lines, you need at least 2,000 contacts total. With smaller lists you can still test, but your results will have greater uncertainty — treat them as indicators, not proof.

What should I test first?

Subject lines. They have the biggest impact on open rate, and they're fastest to test. Once you have a winning formula for subject lines, move to send timing, then CTA placement, then content format.

How long should an A/B test run?

At least 24 hours for subject line tests, and up to 48-72 hours for content tests. Most emails are opened within the first 4 hours, but you want to include late openers to avoid bias toward certain time zones or habits.

Can I test more than two variants?

Yes, it's called multivariate testing. But be aware: with three variants you need 50% more contacts for the same significance level. With four variants, double. For most lists, A/B (two variants) is the practical choice.

What if my test doesn't give a clear result?

Then the difference is probably too small to matter. That's also a result — it means both variants perform equally, and you can pick whichever you prefer. Focus your next test on something with bigger potential impact.