Violations of SUTVA
Stable Unit Treatment Value Assumption (SUTVA) says “experiment units (e.g. users) do not interfere with one another”. This can be violated in settings like social networks or in a case where the treatment crashed the machine in certain scenarios. The crashes also took down the users who were in Control, so the delta wasn’t different b/c each population suffered similarly.
Survivorship bias
If you’re analyzing users who have been active for a while (e.g. two months), you have survivorship bias..
Intention to treat
If you only take the subset of folks who have opted into a new feature, you may overstate the treatment effect. Instead, we should be allocating users based on the offer (or the intention to treat) regardless of if they use it.
Sample ratio mismatch (SRM)
We expect the experiment traffic to be one-to-one, but it turns out it isn’t. For large numbers being off by +/- 0.01 indicates a likely issue.
Some causes include:
- browser redirects as a means to get to the treatment, which suffer from performance differences, bots handle this differently, and they are asymmetric (bookmarks or link sharing w/ friends)
- Click tracking could be lossy and the treatment may make iit lossier.
- residual or carryover effects wherein when the experiment rolls out, it causes a bug that affects users. They fix it, but the users were already affected and treatment can suffer as a result.
- Bad hash function for randomization
- “triggering impacted by treatment” where we select who is in the experiment based on an attribute that changes.. we must ensure that the treatment doesn’t have an effect on that attribute.
- time of day effects: variant A sent at 9am, variant B sent at 9pm.
- Data pipeline impacted by treatment: A feature increases engagement so much that power users were re-classified as bots and dropped from the data