Besides the binary conclusion on whether variant is lost or won it would be also good to know how much variant is better or worse. We could simply compare metric for experimental and control groups. But in the same way as with statistical significance it’s not that simple. Since there’s randomness involved in this whole process we can’t just divide metric of experimental group by control and say that this is how much uplift we’ll get if we rollout the change to all users. What we’ll do instead is calculate confidence interval. Confidence interval - the range of values within which we are 95% confident (or whatever confidence level we have) the [true value](true mean) of metric lies. We can then compare it to value of control group and get range of improvement over it that we 95% confident we’ll get if variant is deployed to all users. So instead of just saying that “we’re 95% confident that variant is better” we’ll say “we’re 95% confident that variant improves metric by 5%-6%” which is much more actionable.