league-of-legends-analysis

eecs398 portfolio hw

View the Project on GitHub calvinchou71/league-of-legends-analysis

Prediction of the outcomes of League of Legends Matches after 10 minutes have passed.

by Calvin Chou (calchou@umich.edu)

Introduction

League of Legends is a video game, where the ultimate goal is to be on the winning team. This Portfolio aims to accurately predict the outcome of League of Legends matches as this is the most important variable in the game. The dataset used is from 2022 League of Legends matches for various leagues. It contains information on a number different variables from the game. There are 150180 rows and 161 columns. Given we want to predict the outcome of match, certain variables are more relevant to our prediction than others.

Our model will predict the result of a league of legends after 10 minutes of the match have passed. The most important feature columns are goldat10, xpat10, csat10, opp_goldat10, opp_xpat10, opp_csat10, golddiffat10, xpdiffat10, csdiffat10, killsat10, assistsat10, deathsat10, opp_killsat10, opp_assistsat10, and opp_deathat10.

Your opponent is the person on the other team in the same position as you. There are 5 roles in the game, top, jungle, mid, adc, and support.

All of these columns reveal information about the state of the game at the 10 minute mark in the game.


Cleaning and EDA

Cleaning

We seperated the dataset into two datasets. One where the dataset is filtered to contain rows for the entire team rather than a single player. This is useful when we predict the outcomes for an entire team. We then created a dataset for only the players to predict the outcomes for a single player.

We then fitlered the rows where the datacompletness column is equal to complete as the rows that aren’t complete didn’t were missing the values for our columns of interests.

Team Dataframe

print(df_clean_team[['position', 'champion', 'goldat10', 'xpat10', 'csat10', 'opp_goldat10', 'opp_xpat10', 'opp_csat10', 'golddiffat10', 'xpdiffat10', 'csdiffat10', 'killsat10', 'assistsat10', 'deathsat10', 'opp_killsat10', 'opp_assistsat10', 'opp_deathsat10']].head().to_markdown(index=False))
position champion goldat10 xpat10 csat10 opp_goldat10 opp_xpat10 opp_csat10 golddiffat10 xpdiffat10 csdiffat10 killsat10 assistsat10 deathsat10 opp_killsat10 opp_assistsat10 opp_deathsat10
team nan 16218 18213 322 14695 18076 330 1523 137 -8 3 5 0 0 0 3
team nan 14695 18076 330 16218 18213 322 -1523 -137 8 0 0 3 3 5 0
team nan 14939 17462 317 16558 19048 344 -1619 -1586 -27 1 1 3 3 3 1
team nan 16558 19048 344 14939 17462 317 1619 1586 27 3 3 1 1 1 3
team nan 15466 19600 368 15569 18787 355 -103 813 13 0 0 1 1 1 0

Player Data Frame

print(df_clean_player[['position', 'champion', 'goldat10', 'xpat10', 'csat10', 'opp_goldat10', 'opp_xpat10', 'opp_csat10', 'golddiffat10', 'xpdiffat10', 'csdiffat10', 'killsat10', 'assistsat10', 'deathsat10', 'opp_killsat10', 'opp_assistsat10', 'opp_deathsat10']].head().to_markdown(index=False))
position champion goldat10 xpat10 csat10 opp_goldat10 opp_xpat10 opp_csat10 golddiffat10 xpdiffat10 csdiffat10 killsat10 assistsat10 deathsat10 opp_killsat10 opp_assistsat10 opp_deathsat10
top Renekton 3228 4909 89 3176 4953 81 52 -44 8 0 0 0 0 0 0
jng Xin Zhao 3429 3484 58 2944 3052 63 485 432 -5 1 2 0 0 0 1
mid LeBlanc 3283 4556 81 3121 4485 81 162 71 0 0 1 0 0 0 1
bot Samira 3600 3103 78 3304 2838 90 296 265 -12 1 1 0 0 0 0
sup Leona 2678 2161 16 2150 2748 15 528 -587 1 1 1 0 0 0 1

We did not impute any values as we did not see a need to.

EDA

This plot shows the proportion of teams in each league that won a game condtionally on whether they had more kills, the same number of kills, or fewer kills 10 minutes into the game for some of the more popular leagues/competitions. This plot demonstrates the importance of having the greater or an equal amount of kills to the other team as it unlikely to win with fewer kills even at only 10 minutes into the game.

We know kills are very important on the outcome of a gaame from the previous plot, so we graphed the distribution of the number of kills at 10 minutes for eahc player. We see that almost all the values are 0 and 1, which tells us that kills are rare to come by, which makes them so important as in the game, it is probably the easiest way to gain an advantage.

We then created a scatter plot of number of kills against the gold differential at 10 minutes with mean highlighted and connected for each number of kills for players. We did this as we established kills are important, but now wanted to see how kills give players an advantage in terms of gold. When you get a kill 300 gold is gained. The general trend of the graph is that mean of points move upward as the number of kills increases. This tells us that the more kills one has the more likely they are to have a larger gold differential. It also tells us that a kill is more important in terms of gold than just the gold gained from the kill, but as well as gold gained from being able to gain more cs by being alive as the difference between the means for many of the nummber of kills is greater than 300.

This plot shows the top 5 most popular champions win percentage. We chose to show this as it interesting to see which champions win more often.

print(pivot_table.reset_index().head().to_markdown(index=False))
position Aatrox Ahri Akali Akshan Alistar Amumu Anivia Annie Aphelios Ashe Aurelion Sol Azir Bard Bel’Veth Blitzcrank Brand Braum Caitlyn Camille Cassiopeia Cho’Gath Corki Darius Diana Dr. Mundo Draven Ekko Elise Evelynn Ezreal Fiddlesticks Fiora Fizz Galio Gangplank Garen Gnar Gragas Graves Gwen Hecarim Heimerdinger Illaoi Irelia Ivern Janna Jarvan IV Jax Jayce Jhin Jinx K’Sante Kai’Sa Kalista Karma Karthus Kassadin Katarina Kayle Kayn Kennen Kha’Zix Kindred Kled Kog’Maw LeBlanc Lee Sin Leona Lillia Lissandra Lucian Lulu Lux Malphite Malzahar Maokai Master Yi Miss Fortune Mordekaiser Morgana Nami Nasus Nautilus Neeko Nidalee Nilah Nocturne Nunu & Willump Olaf Orianna Ornn Pantheon Poppy Pyke Qiyana Quinn Rakan Rammus Rek’Sai Rell Renata Glasc Renekton Rengar Riven Rumble Ryze Samira Sejuani Senna Seraphine Sett Shaco Shen Shyvana Singed Sion Sivir Skarner Sona Soraka Swain Sylas Syndra Tahm Kench Taliyah Talon Taric Teemo Thresh Tristana Trundle Tryndamere Twisted Fate Twitch Udyr Urgot Varus Vayne Veigar Vel’Koz Vex Vi Viego Viktor Vladimir Volibear Warwick Wukong Xayah Xerath Xin Zhao Yasuo Yone Yorick Yuumi Zac Zed Zeri Ziggs Zilean Zoe Zyra
bot -709 429 -679 nan nan nan nan nan 62.4877 -135.88 nan 270.667 nan nan -1113 354 57 241.325 nan -611 -157.125 -296 -695 nan -490 412.545 -134 nan nan -11.467 nan 141 nan 49 nan nan nan -1902 226 nan 138 -19.9333 nan 850 -801.667 nan 122 nan nan -88.3527 -22.9949 nan -24.424 314.243 -495.667 -183.333 nan nan nan nan nan nan 104.667 nan -68.625 nan -415.833 -1515.5 nan nan 109.501 -751.5 90.4 -2662 nan nan nan 4.75728 nan 320 -1722 -1017 -797 173.5 nan 60.0781 nan nan 527 nan -1310 nan -690 nan nan nan nan nan nan nan -1370 -222 -488 nan nan 875 -3.66346 -750.667 -577.366 -189.512 -148.333 nan nan nan nan -423 -80.7679 nan -601.5 -354.231 -172.483 816.667 -59.84 -308.789 -600.625 nan nan nan nan 147.952 nan nan -2012 -86.8564 nan nan 19.9159 -138.523 -4.54545 510 nan nan 556 -92.5 -18 nan nan -29.3333 -4.39118 -44.2222 211.5 -45.2037 nan nan -790 nan -308.333 -25.0483 -179.759 -170.667 nan nan
jng nan -315.5 nan nan nan 218 nan nan nan nan nan nan nan 262.796 -901.25 nan nan 852 nan nan nan nan 112 151.538 159.25 nan -334.667 154.222 -118.148 -111 -106.125 -330 nan nan 355 nan nan -69.7746 197.326 1.65532 135.432 nan 742 447 -348.133 -1165 -81.0894 188 nan -287 nan nan nan nan -301.667 308.068 nan nan nan 122.948 nan -30.6706 84.6054 nan nan nan -14.603 nan 105.486 nan nan 57 82.5 -262 nan -226.519 38.25 1304 -23.4737 86.7895 nan nan nan -147 325.316 nan 44.8746 -33.6667 89.7381 nan -779.667 27.6087 -16.4333 nan 4.48 nan nan -293.25 234.025 nan nan nan 183.829 637 318.806 nan nan -81.7827 nan 142 -342.25 99.4286 -547 36.8333 304 nan nan -94.7273 nan nan nan 69.4 nan nan 142.606 276.3 -1323 -204 nan -177 -141.605 83 nan nan 120.284 nan nan -99 nan -1329 -779 -134.845 48.6307 nan nan -40.6231 nan -91.1729 nan nan -61.3534 nan nan nan nan -94.1186 68.4211 349 nan nan nan nan
mid 329.208 9.80447 -39.6323 -18.0159 nan nan 75.7872 -30.25 411 nan -207 72.0499 nan nan nan 79 nan 299 -156.333 6.20339 26.5 38.6931 725 -203.1 nan -92.5 99.3 nan nan -34.3077 nan 446 -633.333 -184.26 221 nan -99 -275.353 7.16667 -268.667 -297 -27.7778 -521 117.315 -260.727 -1342 nan nan 57.3636 -439.667 nan nan -97.8824 452.5 -160.419 222.545 -163.782 -220.083 6.44444 -34.5 -740 nan nan 148.75 65.1786 110.781 -414.833 nan 231.4 -1.80976 109.252 -119.6 73.3684 -1110.8 -240.238 nan nan nan 308 120.375 nan -319 nan 232.211 nan nan -274 nan 440 -25.8687 -155.268 141 170 nan -140.706 -219 nan nan nan nan nan 100.01 nan -446 -41.6316 48.8855 nan -70.6667 -135 -135.732 50.8462 nan nan -1059.5 -376.286 -143.24 -203 nan -1586 -569.708 -71.6264 -60.873 101.493 nan -132.818 -381.6 nan -474 nan 131.329 -582 62.3884 211.588 nan 576 -229 -43.8947 473.8 -79.6157 61.74 -39.4055 50.8 136.333 -26.9747 -155.13 212 nan 731 529 -31.3109 -133 21.1776 -22.1804 nan -776 -719 17.88 -25.9744 -54.4565 -238.232 92.1548 nan
sup nan nan nan nan -52.7232 -18.061 nan nan nan 9.77011 nan 716 -95.5882 nan -36.6557 -328 -56.9706 126 465.333 nan 739.556 nan nan nan 1299 nan 1220 nan nan 1262 nan nan nan -10.9651 nan nan nan -102.217 nan nan nan 455.857 nan nan -400 -55.8137 29.25 246 nan nan nan nan nan nan 31.2162 nan nan nan nan nan 524 nan nan nan nan nan 1060.22 -19.6288 nan nan 489 -35.2473 198.714 nan nan -116.37 nan 669 367 188.63 -59.8039 -33.2308 -18.9847 236.667 nan nan nan nan nan nan 538.077 273.8 68 70.5627 nan nan -58.7082 nan nan -49.8445 -3.10013 nan nan nan 319.2 nan nan -45.619 300.177 336.528 138.787 -1669 44.9444 nan 181.188 804.348 834 nan -75.3402 7.7337 590.611 -102.857 20 145.877 60.5 nan -66.7473 nan -2.28297 nan 10.4 nan nan 194 nan nan 1290 -140 74.25 nan -988 27.6667 nan 420 926 nan nan 1115.56 nan -13.4 nan 753.643 1307 nan -123.409 -34 nan nan 164.333 -111.39 -46.5 181.8
top 80.5134 594.833 -50.1492 199.283 nan nan nan nan 268.5 nan nan -379 -1347.5 317.5 nan nan nan nan -60.8864 -189.5 -126 205.412 302.289 nan -99.2857 33 -693.333 509 nan nan nan 102.012 nan nan 117.65 -375.056 29.5949 -209.74 45.3809 -34.0412 nan 149.667 207.062 168.285 -1952.67 -1786.71 -115.167 -4.64655 138.237 nan nan -951 -584 160 -307.768 -98.25 nan nan -146.707 -590 89.6508 nan 1669 143.057 nan 805.5 93.8148 -967 146.056 489 251.545 -1076.25 -486 -339.165 nan -263.224 nan nan 70.9128 nan -734 -459.211 -59 568.571 -284 2148 214 nan 179.51 284.5 -173.129 -56.75 -46.0108 nan 724 -220.5 -2398 -363 nan nan nan 279.615 88.1667 85.5 293.446 44.6154 -193 -134.923 nan -651.333 107.074 -1914 -306.821 -213.007 -248.333 -204.596 nan nan nan -273.167 49.375 7.36667 nan -215.444 nan nan -1580.5 640.727 nan 961.3 82.0952 182.191 nan nan -389.462 -48.156 -763 157.412 -12 nan 263 18 7.64706 -141.909 16.6964 -34.7818 924 -137.748 nan nan 59.5 163.696 56.9346 146.231 nan -157.889 nan 174.15 358 78.1765 nan nan

This table shows the average gold diffferential at 10 minutes for each championn in each role. The nan’s mean the champion was never played in a given role. We chose to show this as it illustrates which champions are stronger early and give their team an advantange at this point in the game.

Framing a Prediction Problem

The variable we are trying to predict is the result column, which only has 2 possible values (1 and 0). 1 means the team won and 0 means the team lost. Therefore, the prediction problem is a binary classification problem. We chose the result as it is most important column as winning is the objective of the game.

We want to predict the result after the 10 minute mark of the game. Given this we will have all match information that has occured within the first 10 minutes. We chose 10 minutes as it was the a good point for predicting the result. We did not want to choose a later time as much of the result is already determined. We chose to use a logistic regression model as it works well on binary classification.

To test the efficacy of our model, we will use log loss and accuracy. We chose log loss as it works well on classification data because it penalizes confident misclassifications. We reported accuracy as it is an easy to interpret number.

Baseline Model

For our baseline model, we used a logistic regression model on goldat10, xpat10, and csat10. All these features are quantitative and did not require any encodings. These features describe how strong a champion is at the 10 minute mark. The more gold, more experience, and the more cs, the stronger the champion will be. (For team same logic applies)

The loss of the model was 0.6896490354686542 for players only. The accuracy of the model was 0.5186914054838583 for players only.

The loss of the model was 0.6214919289200207 for teams only. The accuracy of the model was 0.6552129912920687 for teams only.

The log loss for the player only data was 0.6896490354686542 and the team only data was 0.5186914054838583. The accuracy for the player only data was 0.5186914054838583 and the team only data was 0.6552129912920687.

The log loss isn’t that low. On a player by player basis our accuracy was essentially the same as random chance. We do expect the team only data to have a higher accuracy as it takes the aggreate of the entire team against the aggregate of the other team, so it should be more represenative of the game state. This model is not that good based on our results.

Final Model

For our final model, we used a logistic regression model on golddiffat10, xpdiffat10, csdiffat10, killsat10, assistsat10, deathsat10, opp_killsat10, opp_assistsat10. All these features are quantitative and did not require any encodings. The gold, xp, and cs features are the difference between a player and their opponent (team vs team for team data). This adds context to the previous gold, experience, and cs features we used from before. The more kills and assists is better, while the fewer deaths, opponent kills, and opponent assists. These features help us understand the game state and determine which team is doing better.

We added two features being the kills difference and assists difference. We added kills difference as it measures how many kills over their opponent they had, which is a better measure of purely kills. We added assists difference because in the game, multiple players can assist the same kill, which changes how the gold is distributed among the team. If more people assist the kill than the gold is distributed over more people. There are also some addditional more complicated calculations that can increase the amount of gold for an assist, but generally it is better if more players on your team get an assist as it is better to distribute the gold.

We again used a logistic regression model.

We selected hyperparameters using a GridSearch. We iterated over the C, the regularization parameter, for values of .5, 1, 2. For the tolerance, we iterated over .01, .001, .0001, and .00001.

Results

The loss of the model was 0.6428710526042916 for players only. The accuracy of the model was 0.6313890087474993 for players only.

Best Parameters: C=0.5, tol=0.01

Compared to the player only data from the base model, the loss is lower and the accuracy is higher, which tell us that the this model works better on the player data.

The loss of the model was 0.5712023541680418 for teams only. The accuracy of the model was 0.708166627441751 for teams only.

Best Parameters: C=0.5, tol=0.01

Compared to the team only data from the base model, the loss is lower and the accuracy is higher, which tell us that the this model works better on the team data as well.

Overall, the final model performs much better than the base model.