Applying machine learning models and simulation for the prediction of medal winners in the XIX World Athletics Championships.
From August 19 to 27, Budapest will not only be the capital of Hungary but also the world capital of athletics. The main Magyar city will host the XIX World Athletics Championships on those dates. The National Athletics Center in Budapest will host the 49 events scheduled for this edition: 24 women's events, 24 men's events and one mixed event.
The athletics family and the fans of the king of sports will follow up on these days the athletes of their countries, their favorite athletes or the events that most interest them.
Being able to know the group of athletes that can lead each of the championship tests is a valuable tool for all interested in the great event.
To this end, monitoring the main athletes in each of the World Championship events, SYALIA, with the collaboration of the Artificial Intelligence Group of the Faculty of Mathematics and Computer Science of the University of Havana and from Postdata.club, has developed a minimal guide with the prediction of the results of all the events of the appointment.
For this, all the marks and times of all the athletes and teams that will participate in the World Cup have been used. With these data, prediction models based on Artificial Intelligence and Data Science techniques were developed, which yielded the results that we present today in this guide.
We hope this is a perfect complement to better enjoy the XIX World Athletics Championship - Budapest'2023.
Subscribe to our newsletter today and receive a link to download this report in your inbox.
The predictions were made following a simulation-based approach. These allow obtaining a result for each of the events that are forecast. In the relay events, the simulation methodology was not used and only expert criteria were used based on the individual results of those who make up the relay as well as the performances of the relays in other important competitions (world championships, Olympics and relay championships).
The data relating to each of the athletes who will participate in the World Championships were extracted from the World Athletics website. From there, the information on the marks or times of the athletes in the competitions reported from the year 2021 to 2023 (until August 10) was obtained. Indoor competitions were taken into account for events where there is a relative level playing field (diving events and shot put).
Subsequently, a pre-processing work was carried out with the set of results of each athlete, in which the most recent marks are weighted. A linear weight defined as [4,2,1] was chosen, which means that marks made in 2023 will appear four times, those in 2022 will be repeated twice in the set, and those in 2021 will only appear once.
In this way, the fewer marks an athlete has, the greater the resulting value of each mark will be (for events where the goal is to maximize the value of the mark, $\alpha$ is taken as negative). Consequently, this makes athletes with more marks have better results (which can be interpreted as an experience factor).
To estimate the marks that the athletes will make in each event, a Kernel Distribution Estimation (KDE) model is used. This model, for which a different one is made for every athlete, allows estimating the probability density function of the marks or times of each athlete.
Subsequently, at least 10,000 simulations were carried out for each of the events and a forecast is obtained based on the most repeated values. That is, to select the order of the participants in the competition, the mode of the places in which each athlete finished is calculated. The athlete who repeats the first place the most is selected, then the one who repeats the second place the most without being the first and so on.