A paper on robustness of statistical methods when measure is affected by ceiling and/or floor effect by me and Birgit Träuble was published by Plos One this week. Here is the abstract:
Goals and methods
A simulation study investigated how ceiling and floor effect (CFE) affect the performance of Welch’s t-test, F-test, Mann-Whitney test, Kruskal-Wallis test, Scheirer-Ray-Hare-test, trimmed t-test, Bayesian t-test, and the “two one-sided tests” equivalence testing procedure. The effect of CFE on the estimate of group difference and on its confidence interval, and on Cohen’s d and on its confidence interval was also evaluated. In addition, the parametric methods were applied to data transformed with log or logit function and the performance was evaluated. The notion of essential maximum from abstract measurement theory is used to formally define CFE and the principle of maximum entropy was used to derive probability distributions with essential maximum/minimum. These distributions allow the manipulation of the magnitude of CFE through a parameter. Beta, Gamma, Beta prime and Beta-binomial distributions were obtained in this way with the CFE parameter corresponding to the logarithm of the geometric mean. Wald distribution and ordered logistic regression were also included in the study due to their measure-theoretic connection to CFE, even though these models lack essential minimum/maximum. Performance in two-group, three-group and 2 x 2 factor design scenarios was investigated by fixing the group differences in terms of CFE parameter and by adjusting the base level of CFE.
Results and conclusions
In general, bias and uncertainty increased with CFE. Most problematic were occasional instances of biased inference which became more certain and more biased as the magnitude of CFE increased. The bias affected the estimate of group difference, the estimate of Cohen’s d and the decisions of the equivalence testing methods. Statistical methods worked best with transformed data, albeit this depended on the match between the choice of transformation and the type of CFE. Log transform worked well with Gamma and Beta prime distribution while logit transform worked well with Beta distribution. Rank-based tests showed best performance with discrete data, but it was demonstrated that even there a model derived with measurement-theoretic principles may show superior performance. Trimmed t-test showed poor performance. In the factor design, CFE prevented the detection of main effects as well as the detection of interaction. Irrespective of CFE, F-test misidentified main effects and interactions on multiple occasions. Five different constellations of main effect and interactions were investigated for each probability distribution, and weaknesses of each statistical method were identified and reported. As part of the discussion, the use of generalized linear models based on abstract measurement theory is recommended to counter CFE. Furthermore, the necessity of measure validation/calibration studies to obtain the necessary knowledge of CFE to design and select an appropriate statistical tool, is stressed.
This publication was lot of work. This mozgostroje post was the original motivation, but the goals and the scope of the study changed drastically from the point from which it started. As we argue in the paper, previous robustness research is predominantly focused on the influence of ordinality, skewness and heterogeneity of variance, while we think that when these factors occur and affect robustness they appear in conjunction in a pattern which researchers commonly call ceiling or floor effect. To make the notion of ceiling effect precise took lot of work. We went to the cemetery of dead academic ideas, dug out two of them, cleaned them up, stitched them together and brought this monster to life. If you have read this blog in the past, you already encountered E.T. Jaynes' concepts in more or less explicit form. In the current work we use Jaynes' principle of maximum entropy. Jaynes' idea was to derive the most plausible probability distribution from a set of constraints on its parameters. Jaynes problem, especially with continuous distributions, was to motivate and formulate these constraints in a concise manner. Jaynes worked on POME in 50s and 60s and his ideas are still developed by a small group of researchers, mostly physicists, though a wider influence of his views can be seen across wider areas of Bayesian statistics and machine learning.
In 60s, 70s and 80s Luce, Krantz, Suppes, Tversky and others worked on formal measurement theory. They wanted to know, what is the best way to assign numbers to empirical entities. Steven's scale types are the most well-known exponent (actually a predecessor) of this work, however, the theory accounts for a much wider range of research scenarios than Stevens' four scales allow. Starting with 90s, the work on formal measurement theory ceased and its applications failed to materialize. Cliff (1992) discusses this decline and the reasons for it. Notably, he mentions as one of the reasons, the missing translation of concepts from formal measurement theory to random variables which are commonly encountered in research. The theorems of formal measurement theory can be viewed as constraints on measured quantities which reflect the constraints found in the empirical objects. I believe that with POME these constraints can be used to derive actual probability distributions which in turn can be used as statistical models. Effectively, such statistical models are derived from the knowledge about the measurement tool.
In our publication we consider the measurement structures with essential maximum/minimum, which were derived back in 70s and 80s but we explore their probabilistic extension. I believe the combination of formal measurement theory and POME can help both of these ideas gain better applicability. I hope to explore the probabilistic application of other measurement structures with POME in future publications. A straight-forward next step would be to derive probabilistic extensions as generalized normal models - i.e. models in which the noise components are independent from each other and from the parameters, which is most easily achieved with normally-distributed residuals. Study of these models should demonstrate the feasibility of the program of adapting measurement theory to a probabilistic setting.
A subsequent work should consider formulation of existing statistical models such as the drift-diffusion model of response times or Rasch model for questionnaire data. Apart from repeated demonstration of the feasibility of the proposed program, formal measurement theory should gives us an idea how to extend these models to research contexts in which these weren't applied previously, for some of which no appropriate models exist at all. The over-arching goal of this program is to allow researchers to derive and select the appropriate statistical procedure (experiment design and statistical analysis) based on the research question and the available measurement tools.
In the past I wondered why some research blogs stop getting updates as the writer gains seniority and whether the same will happen to Mozgostroje. In 2017 I finished my graduate studies at Universität zu Köln and I worked as a post-doc research staff since then. Mozgostroje stopped getting updates. Not because I don't have time to write. Rather it seems like blogging lost its niche in my academic life. The topics that I'm currently working on are complex and deserve in-depth discussion. Writing short shallow reports does not seems worth-while to me and brief discussions of a specialized sub-topic would be difficult to understand even for somebody with the relevant domain-specific knowledge. Unless my perception changes I will continue to use this blog to announce individual manuscripts/publications and to point out where these publications fit in my current thinking and my current plans.