21 August 2019, 23:43 UTCSlides from a talk putting Topconfects in context

This slideshow places my Topconfects method in the wider context of the current debate over the use of p-values.

7 August 2019, 22:25 UTCE(f(rarefied count)) for consistently biassed transformation

This is a small improvement on the log transformation we use in RNA-Seq and scRNA-Seq.

- The idea and discussion on Twitter.

16 February 2019, 21:03 UTCLorne Genome 2019 poster - weighted principal components and canonical correlation with single cell data

Poster for Lorne Genome conference 2019, looking at single cell data where each gene can produce two measurements: RNA expression level and choice of polyadenylation site. We're not exactly sure what the correct tools for analysing this data are yet, this poster is plays with weighted principal components and canonical correlation. I'm interested in expanding my use of multivariate techniques, there are whole histories of unfamiliar techniques, such as techniques from ecology and Exploratory Factor Analysis methods used in psychology and marketing. Apparently multivariate techniques are particularly popular in France.

27 December 2018, 6:32 UTCRecommender systems and the viral bubble

People worry about being trapped in a filter bubble, but I have a different concern. Amongst content with a viral coefficient close to one, the amount of attention equivalent content receives is highly variable. That is, we are all sucked into the same viral bubble, collectively seeing some things and missing others of equal merit. Furthermore we tend to see viral content over content of more specific interest to us.

Recommender systems -- now commonly called "algorithms" -- have the potential to enhance or reduce this effect. Recommender systems as applied to social network content are a little creepy, but also necessary as people build up large numbers of people to follow over time. It is important to see the announcement that a distant acquaintance has cancer, but not the latest cat picture they found funny. With this necessity, perhaps the best we can aim for is that people to have control over their algorithm, rather than being forced to take what Facebook or Twitter (etc) provide.

Recommender systems come in two forms:

**Explicit**recommender systems learn from a collection of ratings, and learn to predict the rating of any given content.

**Implicit**recommender systems learn only from observing what content a person consumes, and learn to predict what of all possible content they may consume in future.

I include systems which only have a "like" but no "dislike" rating such as Facebook among implicit systems, even though they take direct user input. However it might be that Facebook tracks exactly what it has shown a user, which would bring it closer to an explicit recommender system.

The problem with implicit recommender systems is that they are necessarily biassed by exposure: you can only like or consume something you see. Explicit recommender systems do not *necessarily* have this problem.

Some regularization is probably needed in a practical explicit recommender system to avoid being swamped by new content with few ratings. Compare "hot" and "new" on Reddit. Without regularization, a newish post on reddit with a single vote (other than by the author) will have an unbiassed estimate of the upvote proportion that is either 0% or 100%. Regularization introduces bias, but this can at least be dialled up or down.

One useful observation is that an explict recommender system can use implicit data from other people and still have low bias. The dependent variable is a specific user's ratings. We need "Missing At Random" (MAR) data for this, which means data in which the missingness is not dependent on the rating *given the independent variables*. Any information that helps predict missingness can be used as an independent variable to reduce bias and increase accuracy.

Having the choice on social networks to use an explicit recommender system algorithm with a bias dial is an important freedom we currently lack.

**Notes**

-- The terms "bias", "regularization", and "Missing At Random" here have technical meanings.

-- njh points out these systems are often thought of in terms of multi-armed bandits. A multi-armed bandit has a record of exactly what it has shown the user (what levers it has pulled), so it is an explicit system with the potential to manage bias. The bandit concept of exploration/exploitation trade-off may be a better way of thinking about what I've called regularization.

13 October 2018, 0:27 UTCBall hypothesis tests

Short note on ball hypothesis tests as a generalization of interval hypothesis tests.

5 October 2018, 11:17 UTCWeighted least squares

Short note on choosing weights for weighted least squares, mostly for my own future reference.

9 March 2018, 10:59 UTCDetermining the sign of an effect size is quite similar from Frequentist and Bayesian perspectives

p-values and confidence intervals on an effect size have this correspondence: if p<0.05, the 95% confidence interval does not contain zero (or choose whatever α cutoff and 100%-α confidence interval you prefer). This means the interval is either entirely above zero or entirely below zero, which is to say we have determined the sign of the effect size (see previous blog entry).

*Clarification:* The precise guarantee here is "whatever the effect size may be, we will only make a false claim about its sign with probability at most 0.05." We may make no claim at all, and this is counted as not making a false claim.

Formally, the p-value is a means of rejecting the hypothesis that the effect size is zero, but it seems it is often more than this. Significant p-values, at least such as can have an associated confidence interval, allow us to reject fully half of the number line of effect sizes.

Where Frequentists like to talk of p-values, Bayesians like to talk of posterior probabilities. It had always seemed to me that this failed at the first hurdle: trying to replicate the t-test. If we take as H0 that the effect size is zero, and as H1 that the effect size is non-zero and hence drawn from some prior distribution, P(H0|y) and P(H1|y) will be dependent on the prior distribution associated with H1, with an overly wide distribution leading to smaller P(H1|y). This seems hopelessly subjective. Furthermore it requires the machinery of measure theory to even represent these peculiar prior beliefs, with a point mass of probability at zero within a continuous distribution.

But now consider an H1 of an effect size less than zero, and an H2 of an effect size greater than zero. A perfectly natural prior belief is that the distribution of the effect size is symmetric around zero. We no longer need a point mass. This still corresponds to the Frequentist test in that we are attempting to determine the sign of the effect size.

For the t-test, there is a choice of prior* such that the p-value is simply twice the Bayesian posterior probability of the less likely hypothesis.

* Improper, but choose a proper prior to get as close as you like.

**Update:** @higherfiveprime notes that Andrew Gelman (of course) and Francis Tuerlinckx have a paper somewhat related to this. Errors determining the sign *conditional on having confidently determined the sign* are referred to as "Type S" errors, and their point is that these are not controlled by the Frequentist procedure. Frequentist "Type I" errors, which are not conditional on a determination of the sign being made, are still controlled.

For Frequentist Type S error control, it appears you need to perform a False Discovery Rate (FDR) correction (eg Benjamini & Hochberg's method). So now we also have a nice Bayesian equivalent of FDR control!

**See also:**

8 November 2017, 5:34 UTCTopconfects talk

I gave an informal talk today about my Topconfects R package. If you do RNA-seq Differential Expression analysis it may be of interest.

28 October 2017, 3:55 UTCScatter plots with density quartiles

I think this is a better way to show density in scatter plots.

24 June 2017, 6:44 UTCA Bayesian walks into a bar and observes that a statistical hypothesis has been rejected

A classical statistical test controls the probability P(R|H0) that H0 will be falsely rejected. Conventionally this is controlled to be α=0.05 or α=0.01.

( As a small complication, H0 may be a set of hypotheses. α is the probability of rejection of the hypothesis in the set that is most likely to be rejected. A Bayesian may believe some of the hypotheses are more likely than others, but whatever they believe they will still have that P(R|H0) ≤ α. )

A Bayesian, in order to update their belief in the odds of H1 as opposed to H0, P(H1)/P(H0), also needs to know the statistical power, P(R|H1), the probability that H0 will be rejected if it is actually false. The Bayes Factor is then at least P(R|H1)/P(R|H0), and the odds of the competing hypotheses can be conservatively updated as:

P(H1|R)/P(H0|R) = P(R|H1)/P(R|H0) * P(H1)/P(H0)

So we have the curious conclusion that a Bayesian will pay more heed to a statistical test they believe to have high statistical power. If the Bayesian believes that H0 had little chance of being correctly rejected if false, they will be surprised by the rejection but it will not update their beliefs much.

( The classical test sought only to reject H0, not to confirm H1. If the test is rejects H0, perhaps the Bayesian should consider that their H1 was not sufficiently broad an alternative hypothesis. )

Confidence intervals are increasingly accepted as a superior alternative to p-values. A Bayesian argument for this can be given: A confidence interval gives an indication of the accuracy to which an effect has been measured, and hence the statistical power. A Bayesian may use a confidence interval to update their beliefs immediately, whereas they would require further information if only provided with a p-value. ( Leaving aside the technical distinction between confidence intervals and credible intervals, which does not exist for simple tests such as the t-test. )

( The probabilities above are Bayesian -- personal and subjective quantities used to make decisions. If we were talking frequency probabilities, the above would earn a "No!" )

29 May 2017, 23:28 UTCMelbourne Datathon 2017 - my Kaggle entry

1 April 2017, 22:58 UTCDiagrams of classical statistical procedures

16 March 2017, 23:29 UTCFinding your ut-re-mi-fa-sol-la on a monochord, and making simple drinking straw reedpipes

7 December 2016, 0:50 UTCBriefly,

18 October 2016, 5:53 UTCShiny interactivity with grid graphics

7 August 2016, 2:17 UTCCrash course in R a la 2016 with a biological flavour

17 July 2016, 1:45 UTCVectors are enough

28 May 2016, 22:04 UTCSci-Hub over JSTOR

4 November 2015, 23:41 UTCComposable Shiny apps

14 September 2015, 4:39 UTCRecorder technique and divisions in the 16th century

30 July 2015, 0:04 UTCLinear models, a practical introduction in R

7 May 2015, 5:47 UTCWhen I was a young lad in the '90s

3 November 2014, 11:41 UTCVirtualenv Python+R

24 October 2014, 22:08 UTCSexism: spreading from computer science to biology

21 August 2014, 6:03 UTCFirst-past-the-post voting outcomes tend to surprise the candidates

21 August 2014, 2:59 UTCDates in Google Search aren't trustworthy

27 June 2014, 2:22 UTCReading "Practical Foundations of Mathematics"

18 May 2014, 10:34 UTCCellular automaton tiles revisited

7 April 2014, 8:27 UTCSelfish sweep

3 April 2014, 2:40 UTCBagpipes kickstarter