With everyone using video conferencing a lot more, I have a couple of things on my wishlist.
1. Get the latency down. For my home DSL, it's around 250-500ms. In air, the latency is about 3ms per meter. So there's about two orders of magnitude improvement possible to match in-person meetings. I think each order of magnitude improvement will give qualitative gains. This might need a rethink right down to the hardware level. Our current packet-based approach to networking could be replaced by circuit switching. CPUs could be more about having many cores dedicatied to specific tasks. Audio compression can be done at low latency using prediction rather than block-based Fourier transforms*. Maybe something similar is possible for video?
2. Shared virtual space. The monitors should show the projection of this virtual space from the point of view of the viewer. Depth of field estimation seems to be solved, and for example Apple's new phones seem to be able to do it even with a single camera.
* I know from my old Bonk project that a predictor based approach can perform similarly to tranformation based lossy compression -- i.e. errors have the same frequency envelope as the signal and are perceptually hidden. Bonk is still packet based, but I'm sure a very low latency streaming version is possible.