If an experiment goes through ramp-up (i.e. two or more periods with different percentages assigned to the variants), combining the results can result in directionally incorrect estimates of the treatment effects. Treatment may be better than control in the first phase and in the second phase, but worse overall when the two periods are combined.
Example:
Conversion rate for two days. Each day has 1M customers and treament (T) is better than Control (C) on each day, yet worse overall.
Friday (traffic split 99/1%) | Saturday (traffic split 50/50%) | Total | |
---|---|---|---|
C | 20k/990k = 2.02% | 5k/500k = 1% | 25k/1.49M = 1.68% |
T | 230/10k = 2.30% | 6k/500k = 1.20% | 6230/510k = 1.20% |