One thing I've learned from A/B testing large sites is that you change only one metric at the time, but the domain clouding one sounds like an excellent candidate as well.
I agree that a given A/B test should change only one metric at a time. However I've had excellent results from running multiple A/B tests in parallel. As long as inclusion in each is independent and random, the results of each are informative, and if you're concerned about interaction effects you can analyze for signs of a potential interaction in a post-mortem, then do a more expensive multi-variate test if you have cause for concern.
Yep, that is right. Though you can do multivariate analysis, sticking with plain A/B test is best.
That said, you can (and should) measure performance on multiple benchmarks like clicks, comments, time spent. Gives you a correct picture of tradeoffs.
Surprisingly simple question that is very hard to answer well I think....
If you only test one variable at the time then you simplify your tests to the point where you can extract some metric to determine whether you've improved or not compared to the old situation.
Nothing stops you from then doing more ab testing with other combinations relative to your 'new best'. This may include going back to the original setting with some other variable changed, that way you avoid the local maximum problem.
So say we have a site in position 'A', we make version 'B' and we test them against each other. If we find out according to our chosen metric that 'B' performs better we now have several choices:
We can do another A/B test starting from 'B' changing some value to see if we can improve on 'B' directly, or alternatively we can go back to 'B' versus 'A' + some new modification that is not 'B'.
If you really believe that the A/B parameter space and the C/D parameter space interact with each other, then yes, you could get stuck in a local minimum. So in that case you should test all combinations simultaneously. However, it will take a lot longer to collect enough data in this case. So if you think it is likely that the parameters are independent, it would be better to change only one set at a time.