Jake Vanderplas at Pythonic Perambulations just finished his series of posts on the bayesian/frequentist distinction and he provided his arguments in favor of the bayesian approach.

This reminded me that I also wanted to do a post on this topic. But first, let me say that I do not consider this issue so important. There are much more important issues in data analysis. To improve inference in psychology, we first need to get rid of sloppy theorizing, hypothesis testing and SPSS. If you are a frequentist and you can do this then you are safe. Bayesian inference is somewhat correlated to these issues. Bayesians have advantage in that model comparison is rather difficult to perform with the modern MCMC samplers and as a consequence parameter estimation has dominated. Furthermore, no bayesian SPSS does exist at the moment. Instead bayesians use general-purpose MCMC samplers such as BUGS, JAGS or STAN that enforce certain discipline. However, our bayesian colleagues in psychology (Jeff Rouder and EJ Wagenmakers and their groups) are already working hard on popularizing the bayesian hypothesis testing with bayes factors. This should come implemented with a sort of bayesian SPSS. Then finally sloppy bayesian inference will become possible. In fact this is already happening. See for instance the study by Dambacher, Rolfs, and Cavanagh (2011), who unwittingly demonstrate the "difference between significant and nonsignificant is significant" (Gelman and Stern, 2006) fallacy with bayes factors. So just being bayesian won't save you.

Now back to my take on the two approaches. The delineation of these two is not always clear. The classical way to introduce them is to discus the definition of probability $P(E)$. The frequentist probability is related to the possibility of observing the respective event. Bayesian probability quantifies the uncertainty in $E$. $E_B$ doesn't need to be event with multiple potential realizations, but can be any statement. So bayesians are allowed to define $P(E)$ for $E=\mathrm{Sun \ will \ rise \ on \ 1. \ June \ 2060}$ or $E=\mathrm{I \ believe \ that \ precognition \ does \ not \ exist.}$. Frequentists are not allowed to do this. Bayesians argue that frequentist notion of probability is limited and does not allow researchers to quantify the uncertainty of statements that they are interested in. Frequentists argue that the bayesian statements are subjective and should be excluded from scientific discourse. I think the issue of probability definition is a red-herring. Frequentists are free to imagine multiple parallel universes such that any statement can be translated into an event. On the other hand modern bayesian analysis eschews the notion of subjective probability and focuses on events that are much in accord with frequentist's taste.

The most interesting delineation of the frequentist and bayesian statistics that I have seen so far has been presented by Mike Jordan. Jordan starts with the statement $L(\theta,D)$ which is a loss function of a model with parameters $\theta$ for data $D$. In the context of hypothesis testing $\theta$ is the hypothesis. Frequentists treat $\theta_f$ as fixed - $\theta_f$ is not a random variable. What varies is $D$ and frequentists integrate the loss over the space of all possible datasets. That is frequentists compute $\mathrm{min}_{\theta \in \Theta}\sum_i L(\theta,D_i)$. Bayesians on the other hand treat data as fixed and they integrate over the parameter space $\mathrm{min}_{i}\sum_{\theta \in \Theta} L(\theta,D_i)$. $D_b$ is trivially fixed to the dataset at hand. The point estimate of $\theta_f$ is usually obtained with maximum likelihood method. The ugly thing is that for each approach we need to define the space over which we wish to integrate. Frequentists need to speculate about the potential datasets. This is usually derived from the information about the experiment design. If this information changes the results of the analysis may change considerably. At the same time the observed data didn't change and so the change in results is considered as paradoxical by bayesians (e.g. see the example with voltmeter in Pratt, 1962). On the other hand bayesians need to define $\Theta$. This creates the much discussed trouble with subjective bayesians priors.

The different loss minimization strategy has several consequences. Bayesian data analysis is optimistic. It assumes that the data we have obtained is the best picture we can have. The data is full of interesting patterns and the purpose of data analysis is to describe these patterns. Frequentist data analysis is pessimistic. It assumes that measurement abounds with error and the purpose of data analysis is to minimize the error from any conclusions our analysis makes.

As another difference, frequentists put emphasis on the calibration properties of the statistical tool. A well-calibrated analysis provides conclusions that hold across wide-range of potential datasets. Bayesian analysis on the other hand provides tools that allow researchers to formulate any model for a given dataset. In particular this requires bayesians to provide rules for model building that are consistent - meaning that you always obtain a valid model by applying the rules and different order of rule application derives the same model. Again both approaches have pros and cons. If you are frequentist you either use a prepackaged existing method or you need to derive a new method which equals to doing primary statistics research and can be extremely time consuming. If you are bayesian and you know the rules of probability you are free to formulate any model you wish. This makes bayesian data analysis very flexible and appealing to many researchers. At the same time the bayesian models you formulate may turn out to be computationally unfeasible or they may perform poorly. As a consequence much of the modern work in statistics mixes the two approaches to obtain coherent and well-calibrated statistical tools.

Having clarified my position on this, we can return to more important issues.