Vectors are enough

homeblogmastodonblueskythingiverse



Numpy goes to some lengths to support multi-dimensional arrays, "tensors". This isn't necessary, vectors are enough.

R has a different solution. A data frame (similar to a table in a database) with column vectors for the dimensions and for the value stores the same information, and a data frame can store many other things besides. Sparse tensors are easy, for example.

As Judea Pearl might say, statistics is not about causality. A tensor implies causality, the dimensions of the tensor cause the values it holds. R, being a statistical language, rightly does not require causality to be specified and gains flexibility. Or we might say that the data frame approach does not require the primary key columns to be made explicit. So for example a bijective map need not privilege one direction of mapping (as a Python dict would).

A data frame dense in some of its columns (eg from a full factorial experiment) can be stored more compactly as a tensor, but this is a detail of implementation that could be hidden (eg in a new flavour of tibble). Similarly, one might want various indexes to be maintained alongside a data frame.




[æ]