Abstract
Olympic medal totals are highly right-skewed and concentrated among a small group of top-performing nations, which complicates stable cross-country forecasting and interpretation. Using an unbalanced country-by-edition panel for the 1960 to 2024 Summer Games, we build a reproducible pipeline to predict national total medals and to examine plausible drivers. The target is log-transformed during training and then converted back to the medal scale with a non-negativity constraint. Predictors include host-country status, pre-Games macroeconomic conditions (gross domestic product, population, and gross domestic product per capita), performance inertia from the previous Olympic edition, and participation-related proxies such as athlete counts and coverage of sports and events when available. Random Forest and histogram-based gradient boosting models are evaluated using cross-validation that groups observations by Olympic year to better reflect next-edition forecasting and to limit within-year information leakage. For the Los Angeles 2028 Games, we construct features from the 2024 baseline and extrapolate macroeconomic inputs to the pre-Games year using country-specific compound growth rates. Uncertainty is quantified with bootstrap prediction intervals. Results indicate that the United States and China remain the leading medal producers, while interval overlap suggests that rankings among mid-tier nations are sensitive to uncertainty. Feature attribution highlights participation intensity and opportunity breadth as consistent positive contributors, whereas host status shows a smaller marginal role after controlling for observable factors. An event-study difference-in-differences specification yields imprecise host-effect estimates, underscoring the need for cautious causal interpretation.
References
[1] Balmer, N. J., Nevill, A. M., & Williams, A. M. (2003). Modelling home advantage in the Summer Olympic Games. Journal of Sports Sciences, 21(6), 469–478. https://doi.org/10.1080/0264041031000101890
[2] Bernard, A. B., & Busse, M. R. (2004). Who wins the Olympic Games: Economic resources and medal totals. Review of Economics and Statistics, 86(1), 413–417. https://doi.org/10.1162/003465304774201824
[3] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
[4] Callaway, B., & Sant’Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200–230. https://doi.org/10.1016/j.jeconom.2020.12.001
[5] Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1), 1–26. https://doi.org/10.1214/aos/1176344552
[6] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
[7] Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2), 254–277. https://doi.org/10.1016/j.jeconom.2021.03.014
[8] Grimes, A. R., Kelly, W. J., & Rubin, P. H. (1974). A socioeconomic model of national Olympic performance. Social Science Quarterly, 55(4), 777–783.
[9] Johnson, D. K. N., & Ali, A. (2004). A tale of two seasons: Participation and medal counts at the Summer and Winter Olympic Games. Social Science Quarterly, 85(4), 974–993.
[10] Lozano, S., Villa, G., Guerrero, F., & Cortés, P. (2002). Measuring the performance of nations at the Summer Olympics using data envelopment analysis. European Journal of Operational Research, 148(2), 308–325.
[11] Lui, H. K., & Suen, W. (2008). Men, money, and medals: An econometric analysis of the Olympic Games. Pacific Economic Review, 13(1), 1–16. https://doi.org/10.1111/j.1468-0106.2007.00386.x
[12] Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (NeurIPS 2017). https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
[13] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
[14] Shapley, L. S. (1953). A value for n-person games. In H. W. Kuhn & A. W. Tucker (Eds.), Contributions to the theory of games (Vol. II, pp. 307–317). Princeton University Press.
[15] Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2), 175–199. https://doi.org/10.1016/j.jeconom.2020.09.006

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2026 Jiaxin Zhang, Luping Tang

