Text/sub-text encoding

homeblogmastodonblueskythingiverse



Imagine a reduntant code in which multiple code-words map to the same meaning. Possibly it is redundant in order to be robust on a noisy channel, possibly not. With such a code we could send some further information in the selection of precisely which code-word to use for each meaning. Thus the text would have a sub-text.

The sub-text might encode a different type of data from the text. For example, a video stream might encode large features as text and fine detail as sub-text.

A noisy channel might only allow the text to be recovered, whereas if the channel is less noisy both text and sub-text can be recovered. There is a reduction in the robustness of the text encoding, but it won't be 100% less robust even if the sub-text is transmitted at the maximum possible rate.

One example of a redundant code is to encode a model for each datum. Having a model allows the datum to be encoded more concisely, but there are multiple possible models, introducing redundancy. Such codes are easier to de-code than non-redundant codes, only the sender need perform model estimation. MML is an example of this.

If an MML code is transmitted with a sub-text, the information from the choice of model can be recovered and perhaps shouldn't be counted in the message length. This kind of consideration already occurs in Snob, although hackishly: part of the text is converted into sub-text.

I propose that




[æ]