Applications
- [1] arXiv:2405.09865 [pdf, ps, html, other]
-
Title: Assessing course difficulty and the effect of weather in amateur cross country running racesSubjects: Applications (stat.AP)
Cross country running races are different to track and road races in that the courses are not typically accurately measured and the condition of the course can have a strong effect on the finish times of the participants. In this paper we investigate these effects by modelling the finish times of all participants in 28 cross country running races over 5 seasons in the North East of England. We model the natural logarithm of the finish times using linear mixed effects models for both the senior men's and senior women's races. We investigate the effects of weather and underfoot conditions using windspeed and rainfall as covariates, fit distance as a covariate, and investigate the effect of time via the season of the race, in particular investigating any evidence of a pre- to post-Covid effect. We use random athlete effects to model the participant to participant variability and identify the most difficult courses using random course effects. The statistical inference is Bayesian. We assess model adequacy by comparing samples from the posterior predictive distribution of finish times to the observed distribution of finish times in each race. We find strong differences between the difficulty of the courses, effects of rainfall in the month of the race and the previous month to increase finish times and an effect of increasing distance increasing finish times. We find no evidence that windspeed affects finish times.
- [2] arXiv:2405.09989 [pdf, ps, html, other]
-
Title: A Gaussian Process Model for Ordinal Data with Applications to ChemoinformaticsSubjects: Applications (stat.AP); Methodology (stat.ME); Machine Learning (stat.ML)
With the proliferation of screening tools for chemical testing, it is now possible to create vast databases of chemicals easily. However, rigorous statistical methodologies employed to analyse these databases are in their infancy, and further development to facilitate chemical discovery is imperative. In this paper, we present conditional Gaussian process models to predict ordinal outcomes from chemical experiments, where the inputs are chemical compounds. We implement the Tanimoto distance, a metric on the chemical space, within the covariance of the Gaussian processes to capture correlated effects in the chemical space. A novel aspect of our model is that the kernel contains a scaling parameter, a feature not previously examined in the literature, that controls the strength of the correlation between elements of the chemical space. Using molecular fingerprints, a numerical representation of a compound's location within the chemical space, we show that accounting for correlation amongst chemical compounds improves predictive performance over the uncorrelated model, where effects are assumed to be independent. Moreover, we present a genetic algorithm for the facilitation of chemical discovery and identification of important features to the compound's efficacy. A simulation study is conducted to demonstrate the suitability of the proposed methods. Our proposed methods are demonstrated on a hazard classification problem of organic solvents.
- [3] arXiv:2405.10247 [pdf, ps, html, other]
-
Title: Alternative ranking measures to predict international football resultsSubjects: Applications (stat.AP)
Over the last few years, there has been a growing interest in the prediction and modelling of competitive sports outcomes, with particular emphasis placed on this area by the Bayesian statistics and machine learning communities. In this paper, we have carried out a comparative evaluation of statistical and machine learning models to assess their predictive performance for the 2022 World Cup and for the 2024 Africa Cup of Nations by evaluating alternative summaries of past performances related to the involved teams. More specifically, we consider the Bayesian Bradley-Terry-Davidson model, which is a widely used statistical framework for ranking items based on paired comparisons that have been applied successfully in various domains, including football. The analysis was performed including in some canonical goal-based models both the Bradley-Terry-Davidson derived ranking and the widely recognized Coca-Cola FIFA ranking commonly adopted by football fans and amateurs.
New submissions for Friday, 17 May 2024 (showing 3 of 3 entries )
- [4] arXiv:2405.09906 (cross-list from stat.ME) [pdf, ps, html, other]
-
Title: Process-based Inference for Spatial Energetics Using Bayesian Predictive StackingComments: 38 pages, 13 figuresSubjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)
Rapid developments in streaming data technologies have enabled real-time monitoring of human activity that can deliver high-resolution data on health variables over trajectories or paths carved out by subjects as they conduct their daily physical activities. Wearable devices, such as wrist-worn sensors that monitor gross motor activity, have become prevalent and have kindled the emerging field of ``spatial energetics'' in environmental health sciences. We devise a Bayesian inferential framework for analyzing such data while accounting for information available on specific spatial coordinates comprising a trajectory or path using a Global Positioning System (GPS) device embedded within the wearable device. We offer full probabilistic inference with uncertainty quantification using spatial-temporal process models adapted for data generated from ``actigraph'' units as the subject traverses a path or trajectory in their daily routine. Anticipating the need for fast inference for mobile health data, we pursue exact inference using conjugate Bayesian models and employ predictive stacking to assimilate inference across these individual models. This circumvents issues with iterative estimation algorithms such as Markov chain Monte Carlo. We devise Bayesian predictive stacking in this context for models that treat time as discrete epochs and that treat time as continuous. We illustrate our methods with simulation experiments and analysis of data from the Physical Activity through Sustainable Transport Approaches (PASTA-LA) study conducted by the Fielding School of Public Health at the University of California, Los Angeles.
- [5] arXiv:2405.09929 (cross-list from q-fin.ST) [pdf, ps, html, other]
-
Title: The $\kappa$-generalised Distribution for Stock ReturnsSubjects: Statistical Finance (q-fin.ST); Applications (stat.AP)
Empirical evidence shows stock returns are often heavy-tailed rather than normally distributed. The $\kappa$-generalised distribution, originated in the context of statistical physics by Kaniadakis, is characterised by the $\kappa$-exponential function that is asymptotically exponential for small values and asymptotically power law for large values. This proves to be a useful property and makes it a good candidate distribution for many types of quantities. In this paper we focus on fitting historic daily stock returns for the FTSE 100 and the top 100 Nasdaq stocks. Using a Monte-Carlo goodness of fit test there is evidence that the $\kappa$-generalised distribution is a good fit for a significant proportion of the 200 stock returns analysed.
Cross submissions for Friday, 17 May 2024 (showing 2 of 2 entries )
- [6] arXiv:2403.10300 (replaced) [pdf, ps, other]
-
Title: The reliability of the gender Implicit Association Test (gIAT) for high-ability careersComments: 24 pages, 8 figures, 2 tables, 71 referencesSubjects: Applications (stat.AP)
Males outnumber females in many high-ability careers in the fields of science, technology, engineering, and mathematics, STEM, and academic medicine, to name a few. These differences are often attributed to subconscious bias as measured by the gender Implicit Association Test, gIAT. We compute p-value plots for results from two meta-analyses, one examines the predictive power of gIAT, and the other examines the predictive power of vocational interests, i.e. personal interests, and behaviors, for explaining gender differences in high-ability careers. The results are clear, the gender Implicit Association Test provides little or no information on male versus female differences, whereas vocational interests are strongly predictive. Researchers of implicit bias should expand their modeling to include additional relevant covariates. In short, these meta-analyses provide no support for the gender Implicit Association Test influencing choice and gender differences of high-ability careers.
- [7] arXiv:2310.11683 (replaced) [pdf, ps, html, other]
-
Title: Treatment bootstrapping: A new approach to quantify uncertainty of average treatment effect estimatesSubjects: Methodology (stat.ME); Applications (stat.AP)
This paper proposes a new non-parametric bootstrap method to quantify the uncertainty of average treatment effect estimate for the treated from matching estimators. More specifically, it seeks to quantify the uncertainty associated with the average treatment effect estimate for the treated by bootstrapping the treatment group only and finding the counterpart control group by pair matching on estimated propensity score without replacement. We demonstrate the validity of this approach and compare it with existing bootstrap approaches through Monte Carlo simulation and analysis of a real world data set. The results indicate that the proposed approach constructs confidence intervals and standard errors that have 95 percent or above coverage rate and better precision compared with existing bootstrap approaches, while these measures also depend on percent treated in the sample data and the sample size.