A Statistical Model For Predicting the Billboard Hot 100 Year-End Lists

BACKGROUND

The Billboard Hot 100 tabulates song popularity every week through a formula weighting streaming, sales, and radio airplay of songs. This formula, for every song, leads to a numerical value of popularity called points. If a song charts on the weekly chart, its points for that week count toward the Year End list, typically released in early December each year.

The exact weights both across and within consumption media are kept secret; only Billboard insiders have access to that data. However, Twitter accounts such as Simon Falk and Talk of the Charts have been able to estimate points values with high accuracy on a weekly basis. The time and effort required to tabulate such points is substantial, making it difficult for individuals who are preoccupied with other matters, such as myself. Instead of estimating the specific numbers each week, one can observe the estimates and discern a distinct pattern to the data, a statistical distribution of sorts. By sampling from this distribution, we can statistically estimate the number of points attained each week by each song, then sum this over the entire year to predict the Year End list months in advance.

methodology

The mathematical model will be added here at a later time.

Results

This model is generalizable to any list for which data is available. So long as Billboard keeps their website bug-free and data free to attain, that is any year. However, since the points formula is very fluid and streaming has only contributed to song popularity since 2013, a model trained on 2019-2020 data may not be respectably accurate for a year such as 1969. Below, I report more extensive findings for the years I currently have online.

For 2021, I am happy to report that my model was quite accurate. I acknowledge three self-titled metrics: domain, radius, and accuracy. Across n trials (with n = 50 as of this report), these metrics are based off of the rankings based on mean point value by song, as well as the highest and lowest rankings a song could possibly take (by comparing the songs 0.05 percentile points value to the rest of the songs' 0.95 percentile points value). This creates a range for a song that is more accurate than a singular computed ranking, as the interval acknowledges the stochasticity of the data. Take, for instance, the best party song of the year (canción del año!) that unfortunately fell short of the 2021 list (likely at position 102), and will get caught between years and not make the 2022 list.

"Pepas" by Farruko garnered enough points to be as high as #88 on the 2021 list, and as low as #108 according to the model. Its exact rank was computed at #102.

Moment of silence. This absolute banger may not have made the Billboard list, but it will always rest in our hearts as a phenomenal track that would have taken the world club scene by storm without a global pandemic!

Now, consider another banger capturing ethos of the (unfortunately absent) club scene: "Need to Know" by Doja Cat.

More on the findings for this song, as well as the hyperlink and methodology, will be placed here later.

conclusion

Conclusion will be added here later.

Page updated

Google Sites

Report abuse