Vector-logical CPUs?

A very random thought. Seems the way CPUs are going is towards vector operations, in which the same operation is applied to a set of different inputs in parallel. 3D graphics, image/video/sound manipulation, scientific simulations all use this kind of thing.

Ok, so if our main computing load these days is this kind of highly parallelizable operation, does each operation have to be fast? Instead of adding 16 values to 16 other values in parallel using 16 adder circuits 16 times, might we not add 256 values to 256 other values in parallel one time using only simple logical operations? Just calculate each result one bit at a time.

A CPU with only vector logical operations could be very much simpler, allowing either processing of very large vectors or multiple independant processors on a single chip.

This also has the cute effect of allowing arithmetic on arbitrary sized numbers as easily as on numbers that fit into registers.