He is basically just saying that the errors need to be independent, so you can get that with different capture mechanisms. I think that is wrong. For example, small, fast and poorly reflective will have similar detection performance for lidar and cameras. I think that the real problem is getting consistent statistical estimators and fusing them at a useful update rate. Has anyone seen the lidar from the uber crash? It would be interesting to see if a person could "tell" that the poor pedestrian wasn't an artifact quickly from lidar alone.
The radar, Lidar and cameras on the Uber that struck Elaine Herzberg all detected her, it wasn't a sensor failure that led to her death, it was a software failure, the autonomous OS tuned to not respond to positives below a certain threshold.