The Million Dollar Challenge (MDC) requires, as I understand it, only a few things from an applicant. A statement of what you can do, a statement of under what conditions you can do it, and a statement of how well you can do it. From there, a decision can be made as to whether you are actually claiming something suitably outside the realms of the accepted to be eligible for the prize, and between the applicant and the JREF a scheme for a controlled test can be devised (I may be slightly out in minor details there, but I think that covers the important points for this blog post).
What interests me here are the statement of how well you can do something, and the decision on how well you actually have to perform in the test to win the prize, and what impact the actual winning of a prize should have on our skeptical worldviews.
This says simply that the probability of one thing (A) happening given that another thing (B) has, or will, happen (which is the P(A|B) term) is the probability that both happen (the P(A and B) term) divided by the probability that the other thing B will happen regardless (P(B)). If we rearrange the second and insert it into the first to get rid of the P(A and B), we get Bayes' Theorem:
This is a tremendously useful result, and you can easily find tutorials such as
this one that explains how it works in settings of conditional probability such as medical tests for a disease (I recommend taking the time out to read that, at this point). In that case, you might have a medical test which returns some result, and you want to know the probability that you have the disease given that your test came back positive. You can use Bayes' Theorem to work that out from overall probabilities that you have the disease, that the test comes back positive regardless of if you have the disease, and the probability that the test comes back positive given that you do actually have a disease. That's a bit of a hefty bunch of probabilities and 'given's flying around, so, as I said above, I recommend again the previously linked tutorial.
This sort of thing is often linked to when someone comes along asking what all the fuss is about Bayes' Theorem. That's fine, it explains simple cases when Bayes' Theorem can be applied, but it doesn't at all explain why something so simple should cause a fuss.
To explain that, let's move on to a discussion about hypothesis testing. You may have studied hypothesis testing at school, if you happened to do some relatively advanced courses in mathematics. The standard procedure runs broadly along these lines - you set up two hypotheses. Let's say they are like this:
The Null Hypothesis: A dowser (who we will call Fred) cannot determine which of two boxes a bowl of water is in better than chance.
The Alternative Hypothesis: The dowser can determine which of two boxes a bowl of water is in.
You then go and collect some evidence by performing suitably controlled trials on Fred, you find he succeeded in x trials out of a total of N and you calculate this quantity:
The probability that the dowser could succeed x or more times out of N purely by chance.
If this quantity is too small (commonly 5% or 1% or some similarly small number) you reject the Null Hypothesis and accept the Alternative Hypothesis.
The problem with this is slightly subtle, but crucial. It's asking the wrong question. It's asking "If the dowser has no ability, what is the probability that he would do at least as well as he did?"
If you think about that for a moment, you'll see that's really actually a bit of a dull question and has rather an uninteresting answer. What we actually want to know is "Does he have a supernatural ability to find bowls of water that are obscured from his conventional senses?"
In other words, we want to know
but we've calculated from our hypothesis testing
(note that I'm phrasing it as 'no dowsing ability' in both cases to make things simpler, but P(no dowsing ability | success) = 1 - P(dowsing ability | success), as he can either dowse or he can't, so translating between the two isn't too difficult).
If we look at Bayes' Theorem, we can see how we change from one to the other:

.
More generally when applying Bayes' Theorem to this sort of problem, we talk about this:
- the probability of a hypothesis H being true given some evidence E (P(H|E)) is equal to the probability of getting the evidence given that the hypothesis is true (P(H|E)) times the probability the hypothesis is true (P(H)) divided by the probability of getting the evidence regardless (P(E)).
This is great - we have a way from going from the answer to the wrong question ("could he succeed if he weren't a dowser?") to the answer to the right question ("is it likely he has a paranormal ability?").
However, there's a couple of catches when you do this, and it's these catches that are the source of the fuss surrounding Bayes' Theorem, but also the source of some interesting points about tests like the MDC.
Firstly, whereas our original example from our tutorial of having tests that sometimes work or sometimes don't, and of having people that may or may not have diseases, it's quite clear what the probability means. It fits naturally in with everyone's quite uncontroversial ideas of the actual meaning of the word 'probability' - you randomly select something from a population and the probability tells you the proportion of times you do this that you get a certain result.
However, in our case, we've now gone to a different thing - we've got a probability for "Fred has a paranormal ability". This wasn't drawn from a population about which we can say those kinds of things - Fred either definitely is or definitely isn't paranormal. It's not immediately clear that we can use probabilities when we talk about this discussion. This comes up in my own field of cosmology where you might actually be asking questions about the entire universe, and then it's really not clear that there are multiple universes with population distributions from which we can draw results (now that's an understatement).
It's actually possible, however, to demonstrate that we can use probability as an expression of our degree of belief in something, in such a way that the mathematics is completely consistent with the mathematics of ordinary probabilities that deal with the frequency of events, and that operate in exactly the same way when we do deal with those situations. This, fundamentally is the difference between two schools of thought - the frequentist and the Bayesian. Frequentists think probabilities only work when you deal with frequencies of events drawn from a population. Bayesians hold that probabilities can be used considerably more generally. Note that frequentists do not claim that Bayes' Theorem is wrong - it clearly works just fine in our medical testing example - but that Bayesians (such as myself) misapply it. This is a philosophical, very interesting, and often heated discussion, but for the purposes of this I'm just going to assume that being a Bayesian is right, and we can use probabilities to express our degree of belief in something.
Once we accept that, we can go on to our second big problem - what about those two other terms in Bayes' Theorem - P(H) and P(E)?
Lets start with P(E) first, as while at first sight it is hard to calculate a probability of getting some evidence regardless of the actual fact of the situation, it turns out to be easy to do away with. To do this, we note that either H is true or it isn't.

.
If we expand those two out using Bayes' Theorem we get to
This is a very vague statement. It might mean that Fred's dowsing ability is so weak that he spots bowls of water in boxes with a 50.1% rate of success, or a 70% rate of success or even 100%. It's a complete and continuous range of hypotheses which cover the full range of Fred's ability to exceed chance at his task. Fortunately, the framework we've been building up allows us to deal with this. We can basically use calculus to deal with this complete range of possibilities and compare them to a hypothesis like "Fred has a 50% chance of success", we can calculate overall probabilities for the two and look at the ratio, in a process called
Bayesian model comparison - broadly speaking take the ratio of the two probabilities for
H (in all its possibilities) and our other idea that Fred is deluded and he can't do better than chance.
Two very interesting things emerge from this, two principles which are very well-known to the skeptic. Firstly, because Fred might be succeeding with almost any rate of chance we have to work out the odds for all of the possible rates of chance and kind of average over them. This weakens the relative strength of this hypothesis - essentially because it has a free parameter - just how good Fred is. This is Occam's Razor - the theory is penalised because it is more complex. In other situations where the hypothesis might have other complicating factors it would be penalised even more. In contrast, the idea that Fred succeeds at a rate of 50% rate is a strong hypothesis - it's easily falsified as with sufficient evidence pointing at, say, 51% success we'd have to throw it out, but our original idea for Fred being paranormally endowed would cover this possibility. It's a simple and completely natural consequence of Bayesian ideas. It's a bit more rigorous and mathematically framed than many examples of Occam's Razor (how many free parameters does an invisible unicorn introduce?) but in this circumstance it's a powerful version of it that is fundamentally set out in a way that allows us to quantify how simple we should keep a theory in the face of evidence for complexity.
Secondly, lets look at P(H) - our prior probability that a hypothesis is true. This expresses our prior belief in an idea. If an idea is an extraordinary claim then P(H) is naturally a very small number, and to succeed against our more conventional idea it needs to have a P(E|H) that is really really big - it needs extraordinary evidence. Hence a simple and completely natural route to the idea that extraordinary claims require extraordinary evidence.
Now lets go back to the MDC and what all this means for it.
Suppose Fred comes along and applies for the MDC. He fills in the application form and he reaches the question "What is your success rate?". He now has a number of options available to him. He might think from his previous experience that he succeeds 90% of the time, or maybe he doesn't really have a good idea of what he thinks. What we should be doing is encouraging him to make as strong a statement on this front as he can. Why? Because it reduces the amount of evidence he needs to produce to demonstrate his claim is true. It makes for a stronger statement. However, we should be clear that if he isn't quite sure he should suggest a range of success rates and we can marginalise over these - he'll need to produce more evidence to demonstrate his claim as a result, but he's more likely (in his opinion) to have covered the actual level of his ability. Note that this means that if he says he succeeds 90% of the time, but he succeeds at 70% in the test, he doesn't win. He made a claim and it wasn't true, even if it turns out that he apparently defied the laws of nature at the time.
One might compare this in a more extreme example to someone claiming to be able to dowse, and promptly proceeding to undergo the test while flying over the boxes Superman-style. He's demonstrated a paranormal ability but outside the remit of the test. Of course if he did this, he'd be in a much better situation to reapply having adjusted his claim to his actual dowsing success rate of 70%, or having submitted a claim not to be able to dowse but to be able to fly through the air faster than a speeding bullet. Similarly but less extremely, suppose he consistently performed worse than chance? In that case it's certainly more likely that he was simply unlucky than he has some negative dowsing ability (which would be just as against our expectations as an effective dowsing ability) so we shouldn't be prepared to hand over a million dollars for that, even if its contrary to our expectations from chance.
On top of this question, we need to decide what level of evidence should be expected from him, and should this level of evidence be the same to win a million dollars as it is to actually convince the unbeliever that he really can do something amazing.
I would argue that these should differ. For one thing, it would give a terribly negative appearance to the skeptical community if we announced that someone had to provide absolutely astonishing levels of evidence to win the million dollars. They could, probably rightly, claim that we've set the bar unrealistically high. It becomes especially problematic if a frequentist comes along and explains the nature of more conventional hypothesis testing, and then they might claim that by being Bayesian we're making their life harder. And we would be.
So we should set our P(H) for winning the million dollars to be pretty low, and we might even take the approach of throwing this Bayesian approach out altogether - even though we may lose the benefits of encouraging the claimant to make a strong claim from the start.
However, we (perhaps individually) should consider what P(H) we should set in advance - what evidence we demand to actually change our mind. I would argue that this should be much much beyond that needed to win a million dollars. We might consider it much more likely that something else happened - Fred got exceptionally lucky, or Fred managed to outwit the Amazing Randi and his colleagues (practically impossible, but arguably far more likely than really being able to dowse). From this point of view winning the MDC is not something that should convince you that paranormal abilities exist. It's a strong indication that scientists need to jump on the case and find out exactly what's going on, but it is very much justifiable to be far harder to persuade than the non-skeptical community might like.
The MDC clearly has great value beyond simply assessing claims - it's about highlighting the lack of evidence, and highlighting the importance of scientific testing, and highlighting the unwillingness of many people claiming unusual abilities to subject themselves to it, and I think it makes a stronger point when we are less demanding.
But for more significant decisions we should be more willing to throw out weak claims or weak evidence. We shouldn't need to argue about a tiny but statistically significant effect above chance for a homeopathy study because it's tiny, too small to account for any effect a homeopath might claim to see in their clinic and so small as to be medically worthless. And we should be pushing people to make strong claims from the outset. If a strong claim is true, its strength makes it easier to find the evidence, and if a strong claim is false, its strength makes it more easily falsified. It's ultimately of benefit to both sides.