People worry about being trapped in a filter bubble, but I have a different concern. Amongst content with a viral coefficient close to one, the amount of attention equivalent content receives is highly variable. That is, we are all sucked into the same viral bubble, collectively seeing some things and missing others of equal merit. Furthermore we tend to see viral content over content of more specific interest to us.
Recommender systems -- now commonly called "algorithms" -- have the potential to enhance or reduce this effect. Recommender systems as applied to social network content are a little creepy, but also necessary as people build up large numbers of people to follow over time. It is important to see the announcement that a distant acquaintance has cancer, but not the latest cat picture they found funny. With this necessity, perhaps the best we can aim for is that people to have control over their algorithm, rather than being forced to take what Facebook or Twitter (etc) provide.
Recommender systems come in two forms:
- Explicit recommender systems learn from a collection of ratings, and learn to predict the rating of any given content.
- Implicit recommender systems learn only from observing what content a person consumes, and learn to predict what of all possible content they may consume in future.
I include systems which only have a "like" but no "dislike" rating such as Facebook among implicit systems, even though they take direct user input. However it might be that Facebook tracks exactly what it has shown a user, which would bring it closer to an explicit recommender system.
The problem with implicit recommender systems is that they are necessarily biassed by exposure: you can only like or consume something you see. Explicit recommender systems do not necessarily have this problem.
Some regularization is probably needed in a practical explicit recommender system to avoid being swamped by new content with few ratings. Compare "hot" and "new" on Reddit. Without regularization, a newish post on reddit with a single vote (other than by the author) will have an unbiassed estimate of the upvote proportion that is either 0% or 100%. Regularization introduces bias, but this can at least be dialled up or down.
One useful observation is that an explict recommender system can use implicit data from other people and still have low bias. The dependent variable is a specific user's ratings. We need "Missing At Random" (MAR) data for this, which means data in which the missingness is not dependent on the rating given the independent variables. Any information that helps predict missingness can be used as an independent variable to reduce bias and increase accuracy.
Having the choice on social networks to use an explicit recommender system algorithm with a bias dial is an important freedom we currently lack.
-- The terms "bias", "regularization", and "Missing At Random" here have technical meanings.
-- njh points out these systems are often thought of in terms of multi-armed bandits. A multi-armed bandit has a record of exactly what it has shown the user (what levers it has pulled), so it is an explicit system with the potential to manage bias. The bandit concept of exploration/exploitation trade-off may be a better way of thinking about what I've called regularization.