1. Modifying its own beliefs.
2. Modifying its own utility function.
One might suppose that an agent having such an ability would simply decide to believe "whatever happens, I win". However, this is not something that should evolve. A rational agent should only decide to go insane if it believes that doing so is in its best interests from the point of view of its current beliefs and utility function.
Still, on the face of it, both of these are incredibly stupid things to do. If you believe something that is not true, you might act on the basis of that belief and do something that harms your interests. Modifying your utility function is even worse, you are unlikely to maximize your initial "true" utility function by maximizing some different utility function.
However, suppose that this agent is interacting with other agents that have some ability to read its mind. I'm not supposing telepathy, just the ability to infer internal thought processes from the agent's actions. To make this easier, they might first take the agent down to the pub and get it drunk. Humans do seem to have some ability to do this.
This creates the possibility, the agent will know, of others responding not only to its actions but to its beliefs and goals.
One nice thing about the ability to go insane is that it provides the start of an explanation of where utility functions come from. When making an AI, one has to specify a utility function, and it is hard to consider an AI with a fixed utility function as being truly intelligent or conscious, even if that utility function has been artificially evolved. However an AI that can change its goals maybe is in with a chance.
In a society of such agents, is there a fixed point for the utility function? Will any society of agents with this ability tend to some specific utility function? This would seem to be our society's inescapable fate, so let's hope it is a nice one. I am cautiously optimistic.
By the way, there are a number of basic misconceptions that need to be addressed before having a sensible discussion about this topic:
- A group of selfish rational agents does not itself act in a selfish rational way. This can be shown by inventing some (non-zero-sum) games and determining the actions rational players will make -- their actions will usually not maximize overall utility. This is a straightforward consequence of game theory, and should not be surprising. What is surprising, and should be a cause of hope, is that humans sometimes do not behave as game theory predicts.
- The genetic corollary: A population of identical genes that have undergone natural selection will not act in the common interest of that population except by accident. Instead, such a population is best thought of as the environment in which further evolution may occur. A mutant gene may arise that exploits the existing population, making many copies of itself in a way that happens to be to the detriment of the original population of genes.
- As group rationality and individual rationality are different things, our modern gas-guzzling McDonald's eating flamboyantly wasteful consumer society may be represent a return to (individual) rationality, rather than a departure from it.
- Group selection is nonsense, there is only kin selection, and kin selection is quite a weak effect in humans. See the "Selfish Gene" by Dawkins. Kin selection is a strong effect in social insects due to the way they breed (all genes in the hive are only ever passed on via the queen and a small number of fertile males), this should not be taken as evidence of true altruism in nature.
- There is no Gaia. Nature exists in a state of equilibreum, but not of overall optimality. Where apparent altruism exists in nature, it is usually either a kin selection effect or reciprocal altruism. Humans have done a lot of damage, but only by being successful. Any similarly successful species would do a similar amount of damage, there is no natural mechanism to prevent this. Humanity is no more "sinful" than any other species would be given the opportunity.
- Genes that cause behaviors are not memes. Memes replicate by means other than sexual reproduction. When I talk about evolved abilities and goals, this is plain old Darwinian evolution of DNA.
Finally, there is sometimes a notion that once a human level of intelligence is achieved, we can somehow transcend these mechanistic game theoretic models of behavior. Peter Singer is one person I've seen play with this opinion. I'm not sure if Dawkins goes that far too. Christians are pretty keen on it. Free will.
A non-mechanistic theory is (I think) necessarily not a comprehensible one, nor one that can be conveyed by language. I can't prove that there is not something like this at work, I'm not even sure how to go about asking the right question about it. What I am hoping to show is that it is not a necessary thing to suppose. We can form an argument that some kind of motivation higher than pure selfishness will bootstrap itself into existance without the help of a by-definition-incomprehensible influence.