I suspect this has a name, but the derivation is also very simple.

We are seeking optimal weights in a weighted linear regression. The choice of weights is parameterized by some \(\theta\).

\[ y = X \beta + \varepsilon \]

\[ \varepsilon_i \sim \mathcal{N}\left(0, {1 \over \tau w_i(\theta)}\right) \]

\(\varepsilon_i\) can be initially estimated by Ordinary Least Squares, and the procedure below used to estimate \(\theta\). Further iteration of this process could be performed using Weighted Least Squares, if desired.

We estimate \(\theta\) and \(\tau\) by maximizing the log likelihood.

\[ \begin{aligned} l(\theta,\tau) & = \ln \prod_{i=1}^n \sqrt{ \tau w_i(\theta) \over 2 \pi } e^{-\tau w_i(\theta) \varepsilon_i^2 / 2} \\ & = -{n\over 2}\ln{2\pi} + {n \over 2}\ln\tau + {1\over 2}\sum_i{\ln w_i(\theta)} -{\tau \over 2} \sum_i w_i(\theta) \varepsilon_i^2 \end{aligned} \]

Maximum likelihood \(\tau\) for a given \(\theta\) can be obtained and substituted in (“profile likelihood”).

\[ \begin{aligned} 0 &= { \mathrm{d} l(\theta,\tau) \over \mathrm{d} \tau } \\ &= {n\over 2\tau} - {1\over 2}w_i(\theta)\varepsilon_i^2 \\ \tau &= {n \over \sum_i w_i(\theta) \varepsilon_i^2} \end{aligned} \]

So we need to maximize:

\[ \begin{aligned} l(\theta) & = -{n\over 2}\ln{2\pi} + {n \over 2}\ln {n \over \sum_i w_i(\theta) \varepsilon_i^2} + {1\over 2}\sum_i{\ln w_i(\theta)} - {n \over 2 \sum_i w_i(\theta) \varepsilon_i^2} \sum_i w_i(\theta) \varepsilon_i^2 \\ & = -{n\over 2}\ln{2\pi} + {n \over 2}\ln {n} - {n \over 2} \ln \left( \sum_i w_i(\theta) \varepsilon_i^2 \right) + {1\over 2}\sum_i{\ln w_i(\theta)} - {n \over 2} \end{aligned} \]

which can be achieved by maximizing the simpler expression:

\[ \sum_i \ln w_i(\theta) - n \ln \left( \sum_i w_i(\theta) \varepsilon_i^2 \right) \]

Maximum likelihood on \(\tau\) and the coefficients of a regression overestimates \(\tau\) (i.e. underestimates \(\sigma^2\)), but this is not a problem when choosing weights to plug into existing linear regression software.

My actual application here is to obtain weights for use with `limma`

where I have some outside estimate of the amount of technical variation but I also want to allow for biological variation. \(\theta\) is chosen to optimally balance these two source of noise. The model for `limma`

is a little more complicated, with different genes allowed to have different \(\tau\), but assuming a fixed \(\tau\) in the above should be good enough for choosing weights. My main application is differential poly(A) tail length, but this might also be applied to RNA-Seq differential gene expression or differential transcript expression.

With some data I’ve tested, the above seems to produce the same results as the use of a GLM with gamma distributed noise on the squared residuals. ~~I’m unsure if there is any deep meaning to this – in general, maximum likelihood estimates are altered when the data is transformed, and I’m not sure why this isn’t the case here.~~ Double GLMs are a thing that exist.

((Huh. One to one transformations of data don’t affect the maximum likelihood. This is one to one apart from sign, and losing the sign also doesn’t matter. So yes, this is a double GLM, although calling it this is overkill. There seem to be a variety of adjustments to profile likelihood to remove the variance under-estimation problem. I do still think the above is a cute formula for just optimizing the weights that approximately sidesteps the need for these.))