The CRO process in five steps: research, hypothesise, prioritise, test, learn
CRO is not a one-off project. It is a repeating loop where each round builds on the last, and research is what keeps the whole thing honest.
Most teams treat conversion optimisation as a series of isolated experiments. They spot something that looks broken, run a test, ship the winner, and move on. Over time they accumulate a pile of one-off changes with no connective tissue, and no compounding.
The difference between a programme that compounds and one that stalls is structure: a disciplined loop that moves from evidence to hypothesis to test to learning, and then back to evidence. Each stage has a job. Each one produces something the next stage needs. Get the loop right and every experiment makes the next one smarter.
Stage one: research #
Job: understand what is actually happening, and why, before you touch anything.
Research is the engine of the whole loop. Without it you are guessing, and guessing is expensive when you are spending traffic on experiments. Research answers two questions: where are people dropping off, and why?
Inputs: your analytics (traffic, funnel drop-off, segment behaviour), session recordings, heatmaps, on-site survey responses, support tickets, and sales-call transcripts.
Outputs: a documented set of friction points, each with evidence explaining why it is a problem, not just that it exists.
The split between quantitative and qualitative data is what makes this stage work. Analytics tells you where the problem is. Qualitative research tells you what is actually going wrong. Both are required. A drop in form completion is not a hypothesis; it is a symptom.
The hypothesis lives in the qualitative layer. Imagine an on-site survey reveals that users are uncertain what happens after they submit, or a session replay shows most people hesitating on a field asking for company headcount, or a heatmap shows the primary CTA sitting in a dead zone. That is where the why comes from.
Most common failure: skipping this stage, or treating a shallow analytics pass as sufficient research. Teams that live in the dashboard see what is happening but not why. They end up testing random cosmetic changes (button colours, headline tweaks) that are grounded in no actual problem, and the win rate predictably suffers.
How it feeds the next stage: every documented finding becomes a candidate hypothesis. No finding, no hypothesis worth writing.
Research does not tell you what to build. It tells you what problem is worth solving, which is the harder question.
Rule of thumb: for every hour you spend setting up a test, spend at least as long on the research that justified it.
Stage two: hypothesise #
Job: translate a research finding into a testable statement that connects a change to an outcome.
A hypothesis is not a guess. It is a structured claim that follows from evidence. The most useful format is: because [research finding], we believe [proposed change] will [expected outcome], measured by [metric].
For example: because session replays show visitors repeatedly hovering over the pricing toggle without clicking, we believe adding a short label showing what each billing cycle costs annually will increase plan-selection clicks, measured by click-through on the toggle.
This structure matters for two reasons. First, it forces you to name the mechanism (the why), which is what you learn whether the test wins or loses. Second, it makes the claim falsifiable, so you know in advance what result would disprove it.
Inputs: documented research findings, ideally with supporting evidence attached: a session clip, a survey response, a specific funnel step with an anomalous drop-off.
Outputs: a set of written hypotheses, each linked to an evidence source and a proposed experiment design.
Most common failure: hypotheses that are really design decisions in disguise. “We believe a green button will beat a blue button” is not a hypothesis. It has no mechanism, so it teaches you nothing either way.
How it feeds the next stage: each hypothesis carries an evidence strength and an expected impact, which are exactly the inputs prioritisation needs to rank it.
A real hypothesis
- Starts from a named research finding, with evidence attached
- States the mechanism: why the change should work
- Is falsifiable: you know upfront what result disproves it
- Names one primary metric before the test runs
- Teaches you something whether it wins or loses
A design decision in disguise
- Starts from a preference or a meeting opinion
- Asserts a change with no mechanism behind it
- Cannot be disproven; any outcome gets rationalised
- Has no metric chosen until after the data arrives
- A loss tells you nothing; a win tells you nothing either
Stage three: prioritise #
Job: decide which hypotheses to test first, given limited traffic and bandwidth.
You will always have more ideas than capacity. Prioritisation is how you keep the programme productive rather than chaotic. The goal is to sequence experiments so you spend traffic on the ideas most likely to produce meaningful learning, not the ones that are easiest to build or loudest in the room.
Inputs: your backlog of hypotheses, data on the traffic available to each affected page, and an honest estimate of implementation effort.
Outputs: a ranked experiment roadmap covering the next one to three months.
The most common scoring approach is ICE: Impact (how much could this move the number if it wins?), Confidence (how strong is the evidence?), and Ease (how quickly can you build and run it?). Each factor is scored and the product gives you a rank. ICE is fast and forces your assumptions into the open. It also has real blind spots. Read how to prioritise experiments with ICE before using it uncritically.
One practical rule: weight your backlog toward high-traffic pages and high-intent funnel steps. A modest win on a page that sees ten thousand sessions a week is worth more than a large win on one that sees five hundred, and it reaches significance far faster.
Most common failure: letting recency or politics drive the roadmap. A hypothesis from the CEO with no supporting evidence should score lower than one backed by a session replay showing real friction, even when that is uncomfortable to say. Prioritisation is a discipline, not a negotiation.
How it feeds the next stage: the top-ranked hypothesis becomes the next experiment design, with its primary metric and affected pages already specified.
See your own site’s conversion leaks in 15 seconds
Run a free CRO scan. No account needed.
Stage four: test #
Job: run a controlled experiment that produces a credible answer.
Testing is where the hypothesis meets reality. Done well, it gives you a causal answer: the change caused the difference, not a coincidence. Done badly, it gives you a misleading result you act on, which is worse than no test at all. (For the full mechanics, see A/B testing explained.)
Inputs: a prioritised hypothesis, a defined experiment design (control vs. variant, primary metric, guardrail metrics), and a sample size and runtime calculated in advance.
Outputs: a result with a credible statistical interpretation: not just a number, but a judgement about whether the difference is real.
Three things matter most here. First, calculate your sample size before you start; if you do not know your required sample upfront, you cannot know when to stop. See sample size and runtime. Second, resist peeking: a result at 60% of your target sample is not a result. Third, define guardrail metrics alongside the primary one. A change that lifts form submissions but doubles your support-ticket rate is not a win.
Most common failure: ending the test too early. A result looks significant on day five, the team calls it and ships, and the apparent win erodes after rollout because the sample was too small and the significance was noise. This is so common it has a name (the peeking problem) and it quietly ruins more programmes than almost anything else. See statistical significance without fooling yourself.
How it feeds the next stage: the result, paired with the original hypothesis, is the raw material the learning stage turns into insight.
Rule of thumb: if you did not calculate the required sample size before launching, you do not have a result. You have a reading.
Stage five: learn #
Job: extract the insight that makes the next experiment smarter.
This is the most underinvested stage in almost every CRO programme. Teams call the winner, ship it, and move on. The learning (the why behind the result) gets lost. Six months later someone proposes the same test in a different form, and nobody remembers what happened last time.
Inputs: the test result, the original hypothesis and evidence, and the variant designs.
Outputs: a documented record that captures not just what happened, but what you now believe about your users, and what experiments that belief suggests next.
A useful learning document answers four questions: What did we test, and why? What did we find? What does this tell us about user behaviour? What should we test next? A loss is often more instructive than a win. If a change you expected to help made no difference, that is signal: either the mechanism was wrong, the research finding was not representative, or the change was too subtle to register. All three sharpen the next round of hypotheses.
Most common failure: treating documentation as overhead. Teams under time pressure skip it, so the programme never compounds and each experiment starts from scratch. A well-run knowledge base turns a sequence of tests into an accumulating understanding of your audience, which is the actual long-term asset of a CRO programme.
How it feeds the next stage: new beliefs and unanswered questions flow straight back into research, closing the loop and setting the agenda for the next investigation.
The five stages at a glance #
| Stage | Job | Output | Most common failure |
|---|---|---|---|
| 1. Research | Find where visitors drop off and why | Evidenced friction points | Skipping it, or mistaking a dashboard glance for research |
| 2. Hypothesise | Turn a finding into a falsifiable claim | Written hypotheses with a stated mechanism | Design decisions dressed up as hypotheses |
| 3. Prioritise | Sequence the backlog by expected value | A ranked one-to-three-month roadmap | Politics and recency outranking evidence |
| 4. Test | Run a controlled experiment | A statistically credible result | Stopping early: the peeking problem |
| 5. Learn | Extract the insight behind the result | A documented record feeding the next round | Treating documentation as optional overhead |
The loop, not the project #
The five stages are not a linear process you complete once. They are a loop you run continuously. Research informs hypotheses. Hypotheses feed the prioritised backlog. Tests generate results. Learning feeds back into research, often by raising new questions that require a fresh investigation.
The cadence matters. A monthly cycle works for most programmes: two weeks of research and hypothesis writing, two weeks of active tests, and a standing review to update the roadmap. The specifics depend on your traffic volume and team size, but the principle holds: the loop has to keep turning.
What accelerates the loop over time is the accumulating knowledge base. After twelve months of documented tests you have a durable picture of what your audience responds to, which friction points cost the most, and which hypotheses consistently underperform. That picture is worth more than any single winning experiment.
If you are just starting, the priority is not to run ten tests at once. It is to complete one clean loop and document it well.
- Research one funnel step: pick your highest-traffic drop-off, then pair the analytics with a few session replays, a heatmap, and an on-site survey to find the why.
- Write one hypothesis: convert the strongest finding into the because / we believe / measured by format, with the evidence attached.
- Score the backlog: rate each candidate on Impact, Confidence, and Ease, and let the top score decide what runs first.
- Run one clean test: single variable, primary metric and guardrails defined, sample size calculated before launch, no peeking.
- Document the learning: record what you tested, what you found, what it means about your users, and what to test next, then feed it back into research.
Frequently asked questions #
Where does the loop actually start?
With research, always. Every other stage depends on a real, evidenced problem to work on. Starting at hypothesis or test means you are optimising a guess, which is the most expensive way to spend traffic. If you have not grounded yourself in the basics first, how to calculate and benchmark your conversion rate is the right starting point, and what is conversion rate optimisation frames the whole discipline.
How long should one loop take?
For most programmes, about a month: roughly two weeks of research and hypothesis writing, two weeks of active testing, and a standing review to re-rank the roadmap. Lower-traffic sites run longer loops because tests need more runtime to reach significance. See sample size and test runtime for what your numbers can support.
What if my traffic is too low to run statistically clean tests?
Lean harder on research and on changes whose mechanism is strong and well-evidenced, where you can act with confidence without a formally significant test. You can still run the full loop; you just rely more on qualitative evidence and before-and-after measurement, and you reserve formal A/B tests for your highest-traffic pages.
Do losing tests still count?
Yes, and often they teach you more than wins. A loss tells you the mechanism was wrong, the research finding was unrepresentative, or the change was too small to register. Documented in your knowledge base, that is exactly the kind of insight that stops you repeating the same dead-end test next quarter.
The loop is simple. Running it with discipline (resisting the shortcut at every stage) is where most programmes either build momentum or lose it. For SaaS-specific context, CRO for SaaS maps the loop onto trial and subscription funnels. Either way, the rule is the same: convert more, guess less.
OptiWolf
OptiWolf is CRO and lead-generation software: A/B testing, personalization, and lead-capture popups on one measurement spine. The CRO Academy is where we share the playbooks. Convert more, guess less.
