eecs398 portfolio hw
View the Project on GitHub calvinchou71/league-of-legends-analysis
by Calvin Chou (calchou@umich.edu)
League of Legends is a video game, where the ultimate goal is to be on the winning team. This Portfolio aims to accurately predict the outcome of League of Legends matches as this is the most important variable in the game. The dataset used is from 2022 League of Legends matches for various leagues. It contains information on a number different variables from the game. There are 150180 rows and 161 columns. Given we want to predict the outcome of match, certain variables are more relevant to our prediction than others.
Our model will predict the result of a league of legends after 10 minutes of the match have passed. The most important feature columns are goldat10, xpat10, csat10, opp_goldat10, opp_xpat10, opp_csat10, golddiffat10, xpdiffat10, csdiffat10, killsat10, assistsat10, deathsat10, opp_killsat10, opp_assistsat10, and opp_deathat10.
goldat10 is how much gold a player has at the 10 minute mark. Gold is used to buy items to make your character stronger.
xpat10 is how much xp a player has at the 10 minute mark. XP is used to level up your character, which allows you to improve your abilities and makes your character stronger.
csat10 is how many minions/monsters (cs) a player has killed at the 10 minute mark. cs is similiar to the amount of gold as getting cs gives you gold.
opp_goldat10, opp_xpat10, opp_csat10 is the amount of gold, xp, and cs the opponent has at the 10 minute mark.
Your opponent is the person on the other team in the same position as you. There are 5 roles in the game, top, jungle, mid, adc, and support.
golddiffat10, xpdiffat10, csdiffat10 is the difference between you and your opponent.
killsat10 is the number of kills you have at 10 minutes.
assistsat10 is the number of assists you have at 10 minutes. Assists are when you don’t kill a player on the opposing team, but help in the kill.
deathsat10 is the number of deaths you have at 10 minutes.
opp_killsat10, opp_assistsat10, and opp_deathsat10 are the number of kills, assists, and deaths for your opponent.
All of these columns reveal information about the state of the game at the 10 minute mark in the game.
We seperated the dataset into two datasets. One where the dataset is filtered to contain rows for the entire team rather than a single player. This is useful when we predict the outcomes for an entire team. We then created a dataset for only the players to predict the outcomes for a single player.
We then fitlered the rows where the datacompletness column is equal to complete as the rows that aren’t complete didn’t were missing the values for our columns of interests.
Team Dataframe
print(df_clean_team[['position', 'champion', 'goldat10', 'xpat10', 'csat10', 'opp_goldat10', 'opp_xpat10', 'opp_csat10', 'golddiffat10', 'xpdiffat10', 'csdiffat10', 'killsat10', 'assistsat10', 'deathsat10', 'opp_killsat10', 'opp_assistsat10', 'opp_deathsat10']].head().to_markdown(index=False))
position | champion | goldat10 | xpat10 | csat10 | opp_goldat10 | opp_xpat10 | opp_csat10 | golddiffat10 | xpdiffat10 | csdiffat10 | killsat10 | assistsat10 | deathsat10 | opp_killsat10 | opp_assistsat10 | opp_deathsat10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
team | nan | 16218 | 18213 | 322 | 14695 | 18076 | 330 | 1523 | 137 | -8 | 3 | 5 | 0 | 0 | 0 | 3 |
team | nan | 14695 | 18076 | 330 | 16218 | 18213 | 322 | -1523 | -137 | 8 | 0 | 0 | 3 | 3 | 5 | 0 |
team | nan | 14939 | 17462 | 317 | 16558 | 19048 | 344 | -1619 | -1586 | -27 | 1 | 1 | 3 | 3 | 3 | 1 |
team | nan | 16558 | 19048 | 344 | 14939 | 17462 | 317 | 1619 | 1586 | 27 | 3 | 3 | 1 | 1 | 1 | 3 |
team | nan | 15466 | 19600 | 368 | 15569 | 18787 | 355 | -103 | 813 | 13 | 0 | 0 | 1 | 1 | 1 | 0 |
Player Data Frame
print(df_clean_player[['position', 'champion', 'goldat10', 'xpat10', 'csat10', 'opp_goldat10', 'opp_xpat10', 'opp_csat10', 'golddiffat10', 'xpdiffat10', 'csdiffat10', 'killsat10', 'assistsat10', 'deathsat10', 'opp_killsat10', 'opp_assistsat10', 'opp_deathsat10']].head().to_markdown(index=False))
position | champion | goldat10 | xpat10 | csat10 | opp_goldat10 | opp_xpat10 | opp_csat10 | golddiffat10 | xpdiffat10 | csdiffat10 | killsat10 | assistsat10 | deathsat10 | opp_killsat10 | opp_assistsat10 | opp_deathsat10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
top | Renekton | 3228 | 4909 | 89 | 3176 | 4953 | 81 | 52 | -44 | 8 | 0 | 0 | 0 | 0 | 0 | 0 |
jng | Xin Zhao | 3429 | 3484 | 58 | 2944 | 3052 | 63 | 485 | 432 | -5 | 1 | 2 | 0 | 0 | 0 | 1 |
mid | LeBlanc | 3283 | 4556 | 81 | 3121 | 4485 | 81 | 162 | 71 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
bot | Samira | 3600 | 3103 | 78 | 3304 | 2838 | 90 | 296 | 265 | -12 | 1 | 1 | 0 | 0 | 0 | 0 |
sup | Leona | 2678 | 2161 | 16 | 2150 | 2748 | 15 | 528 | -587 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
We did not impute any values as we did not see a need to.
This plot shows the proportion of teams in each league that won a game condtionally on whether they had more kills, the same number of kills, or fewer kills 10 minutes into the game for some of the more popular leagues/competitions. This plot demonstrates the importance of having the greater or an equal amount of kills to the other team as it unlikely to win with fewer kills even at only 10 minutes into the game.
We know kills are very important on the outcome of a gaame from the previous plot, so we graphed the distribution of the number of kills at 10 minutes for eahc player. We see that almost all the values are 0 and 1, which tells us that kills are rare to come by, which makes them so important as in the game, it is probably the easiest way to gain an advantage.
We then created a scatter plot of number of kills against the gold differential at 10 minutes with mean highlighted and connected for each number of kills for players. We did this as we established kills are important, but now wanted to see how kills give players an advantage in terms of gold. When you get a kill 300 gold is gained. The general trend of the graph is that mean of points move upward as the number of kills increases. This tells us that the more kills one has the more likely they are to have a larger gold differential. It also tells us that a kill is more important in terms of gold than just the gold gained from the kill, but as well as gold gained from being able to gain more cs by being alive as the difference between the means for many of the nummber of kills is greater than 300.
This plot shows the top 5 most popular champions win percentage. We chose to show this as it interesting to see which champions win more often.
print(pivot_table.reset_index().head().to_markdown(index=False))
position | Aatrox | Ahri | Akali | Akshan | Alistar | Amumu | Anivia | Annie | Aphelios | Ashe | Aurelion Sol | Azir | Bard | Bel’Veth | Blitzcrank | Brand | Braum | Caitlyn | Camille | Cassiopeia | Cho’Gath | Corki | Darius | Diana | Dr. Mundo | Draven | Ekko | Elise | Evelynn | Ezreal | Fiddlesticks | Fiora | Fizz | Galio | Gangplank | Garen | Gnar | Gragas | Graves | Gwen | Hecarim | Heimerdinger | Illaoi | Irelia | Ivern | Janna | Jarvan IV | Jax | Jayce | Jhin | Jinx | K’Sante | Kai’Sa | Kalista | Karma | Karthus | Kassadin | Katarina | Kayle | Kayn | Kennen | Kha’Zix | Kindred | Kled | Kog’Maw | LeBlanc | Lee Sin | Leona | Lillia | Lissandra | Lucian | Lulu | Lux | Malphite | Malzahar | Maokai | Master Yi | Miss Fortune | Mordekaiser | Morgana | Nami | Nasus | Nautilus | Neeko | Nidalee | Nilah | Nocturne | Nunu & Willump | Olaf | Orianna | Ornn | Pantheon | Poppy | Pyke | Qiyana | Quinn | Rakan | Rammus | Rek’Sai | Rell | Renata Glasc | Renekton | Rengar | Riven | Rumble | Ryze | Samira | Sejuani | Senna | Seraphine | Sett | Shaco | Shen | Shyvana | Singed | Sion | Sivir | Skarner | Sona | Soraka | Swain | Sylas | Syndra | Tahm Kench | Taliyah | Talon | Taric | Teemo | Thresh | Tristana | Trundle | Tryndamere | Twisted Fate | Twitch | Udyr | Urgot | Varus | Vayne | Veigar | Vel’Koz | Vex | Vi | Viego | Viktor | Vladimir | Volibear | Warwick | Wukong | Xayah | Xerath | Xin Zhao | Yasuo | Yone | Yorick | Yuumi | Zac | Zed | Zeri | Ziggs | Zilean | Zoe | Zyra |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
bot | -709 | 429 | -679 | nan | nan | nan | nan | nan | 62.4877 | -135.88 | nan | 270.667 | nan | nan | -1113 | 354 | 57 | 241.325 | nan | -611 | -157.125 | -296 | -695 | nan | -490 | 412.545 | -134 | nan | nan | -11.467 | nan | 141 | nan | 49 | nan | nan | nan | -1902 | 226 | nan | 138 | -19.9333 | nan | 850 | -801.667 | nan | 122 | nan | nan | -88.3527 | -22.9949 | nan | -24.424 | 314.243 | -495.667 | -183.333 | nan | nan | nan | nan | nan | nan | 104.667 | nan | -68.625 | nan | -415.833 | -1515.5 | nan | nan | 109.501 | -751.5 | 90.4 | -2662 | nan | nan | nan | 4.75728 | nan | 320 | -1722 | -1017 | -797 | 173.5 | nan | 60.0781 | nan | nan | 527 | nan | -1310 | nan | -690 | nan | nan | nan | nan | nan | nan | nan | -1370 | -222 | -488 | nan | nan | 875 | -3.66346 | -750.667 | -577.366 | -189.512 | -148.333 | nan | nan | nan | nan | -423 | -80.7679 | nan | -601.5 | -354.231 | -172.483 | 816.667 | -59.84 | -308.789 | -600.625 | nan | nan | nan | nan | 147.952 | nan | nan | -2012 | -86.8564 | nan | nan | 19.9159 | -138.523 | -4.54545 | 510 | nan | nan | 556 | -92.5 | -18 | nan | nan | -29.3333 | -4.39118 | -44.2222 | 211.5 | -45.2037 | nan | nan | -790 | nan | -308.333 | -25.0483 | -179.759 | -170.667 | nan | nan |
jng | nan | -315.5 | nan | nan | nan | 218 | nan | nan | nan | nan | nan | nan | nan | 262.796 | -901.25 | nan | nan | 852 | nan | nan | nan | nan | 112 | 151.538 | 159.25 | nan | -334.667 | 154.222 | -118.148 | -111 | -106.125 | -330 | nan | nan | 355 | nan | nan | -69.7746 | 197.326 | 1.65532 | 135.432 | nan | 742 | 447 | -348.133 | -1165 | -81.0894 | 188 | nan | -287 | nan | nan | nan | nan | -301.667 | 308.068 | nan | nan | nan | 122.948 | nan | -30.6706 | 84.6054 | nan | nan | nan | -14.603 | nan | 105.486 | nan | nan | 57 | 82.5 | -262 | nan | -226.519 | 38.25 | 1304 | -23.4737 | 86.7895 | nan | nan | nan | -147 | 325.316 | nan | 44.8746 | -33.6667 | 89.7381 | nan | -779.667 | 27.6087 | -16.4333 | nan | 4.48 | nan | nan | -293.25 | 234.025 | nan | nan | nan | 183.829 | 637 | 318.806 | nan | nan | -81.7827 | nan | 142 | -342.25 | 99.4286 | -547 | 36.8333 | 304 | nan | nan | -94.7273 | nan | nan | nan | 69.4 | nan | nan | 142.606 | 276.3 | -1323 | -204 | nan | -177 | -141.605 | 83 | nan | nan | 120.284 | nan | nan | -99 | nan | -1329 | -779 | -134.845 | 48.6307 | nan | nan | -40.6231 | nan | -91.1729 | nan | nan | -61.3534 | nan | nan | nan | nan | -94.1186 | 68.4211 | 349 | nan | nan | nan | nan |
mid | 329.208 | 9.80447 | -39.6323 | -18.0159 | nan | nan | 75.7872 | -30.25 | 411 | nan | -207 | 72.0499 | nan | nan | nan | 79 | nan | 299 | -156.333 | 6.20339 | 26.5 | 38.6931 | 725 | -203.1 | nan | -92.5 | 99.3 | nan | nan | -34.3077 | nan | 446 | -633.333 | -184.26 | 221 | nan | -99 | -275.353 | 7.16667 | -268.667 | -297 | -27.7778 | -521 | 117.315 | -260.727 | -1342 | nan | nan | 57.3636 | -439.667 | nan | nan | -97.8824 | 452.5 | -160.419 | 222.545 | -163.782 | -220.083 | 6.44444 | -34.5 | -740 | nan | nan | 148.75 | 65.1786 | 110.781 | -414.833 | nan | 231.4 | -1.80976 | 109.252 | -119.6 | 73.3684 | -1110.8 | -240.238 | nan | nan | nan | 308 | 120.375 | nan | -319 | nan | 232.211 | nan | nan | -274 | nan | 440 | -25.8687 | -155.268 | 141 | 170 | nan | -140.706 | -219 | nan | nan | nan | nan | nan | 100.01 | nan | -446 | -41.6316 | 48.8855 | nan | -70.6667 | -135 | -135.732 | 50.8462 | nan | nan | -1059.5 | -376.286 | -143.24 | -203 | nan | -1586 | -569.708 | -71.6264 | -60.873 | 101.493 | nan | -132.818 | -381.6 | nan | -474 | nan | 131.329 | -582 | 62.3884 | 211.588 | nan | 576 | -229 | -43.8947 | 473.8 | -79.6157 | 61.74 | -39.4055 | 50.8 | 136.333 | -26.9747 | -155.13 | 212 | nan | 731 | 529 | -31.3109 | -133 | 21.1776 | -22.1804 | nan | -776 | -719 | 17.88 | -25.9744 | -54.4565 | -238.232 | 92.1548 | nan |
sup | nan | nan | nan | nan | -52.7232 | -18.061 | nan | nan | nan | 9.77011 | nan | 716 | -95.5882 | nan | -36.6557 | -328 | -56.9706 | 126 | 465.333 | nan | 739.556 | nan | nan | nan | 1299 | nan | 1220 | nan | nan | 1262 | nan | nan | nan | -10.9651 | nan | nan | nan | -102.217 | nan | nan | nan | 455.857 | nan | nan | -400 | -55.8137 | 29.25 | 246 | nan | nan | nan | nan | nan | nan | 31.2162 | nan | nan | nan | nan | nan | 524 | nan | nan | nan | nan | nan | 1060.22 | -19.6288 | nan | nan | 489 | -35.2473 | 198.714 | nan | nan | -116.37 | nan | 669 | 367 | 188.63 | -59.8039 | -33.2308 | -18.9847 | 236.667 | nan | nan | nan | nan | nan | nan | 538.077 | 273.8 | 68 | 70.5627 | nan | nan | -58.7082 | nan | nan | -49.8445 | -3.10013 | nan | nan | nan | 319.2 | nan | nan | -45.619 | 300.177 | 336.528 | 138.787 | -1669 | 44.9444 | nan | 181.188 | 804.348 | 834 | nan | -75.3402 | 7.7337 | 590.611 | -102.857 | 20 | 145.877 | 60.5 | nan | -66.7473 | nan | -2.28297 | nan | 10.4 | nan | nan | 194 | nan | nan | 1290 | -140 | 74.25 | nan | -988 | 27.6667 | nan | 420 | 926 | nan | nan | 1115.56 | nan | -13.4 | nan | 753.643 | 1307 | nan | -123.409 | -34 | nan | nan | 164.333 | -111.39 | -46.5 | 181.8 |
top | 80.5134 | 594.833 | -50.1492 | 199.283 | nan | nan | nan | nan | 268.5 | nan | nan | -379 | -1347.5 | 317.5 | nan | nan | nan | nan | -60.8864 | -189.5 | -126 | 205.412 | 302.289 | nan | -99.2857 | 33 | -693.333 | 509 | nan | nan | nan | 102.012 | nan | nan | 117.65 | -375.056 | 29.5949 | -209.74 | 45.3809 | -34.0412 | nan | 149.667 | 207.062 | 168.285 | -1952.67 | -1786.71 | -115.167 | -4.64655 | 138.237 | nan | nan | -951 | -584 | 160 | -307.768 | -98.25 | nan | nan | -146.707 | -590 | 89.6508 | nan | 1669 | 143.057 | nan | 805.5 | 93.8148 | -967 | 146.056 | 489 | 251.545 | -1076.25 | -486 | -339.165 | nan | -263.224 | nan | nan | 70.9128 | nan | -734 | -459.211 | -59 | 568.571 | -284 | 2148 | 214 | nan | 179.51 | 284.5 | -173.129 | -56.75 | -46.0108 | nan | 724 | -220.5 | -2398 | -363 | nan | nan | nan | 279.615 | 88.1667 | 85.5 | 293.446 | 44.6154 | -193 | -134.923 | nan | -651.333 | 107.074 | -1914 | -306.821 | -213.007 | -248.333 | -204.596 | nan | nan | nan | -273.167 | 49.375 | 7.36667 | nan | -215.444 | nan | nan | -1580.5 | 640.727 | nan | 961.3 | 82.0952 | 182.191 | nan | nan | -389.462 | -48.156 | -763 | 157.412 | -12 | nan | 263 | 18 | 7.64706 | -141.909 | 16.6964 | -34.7818 | 924 | -137.748 | nan | nan | 59.5 | 163.696 | 56.9346 | 146.231 | nan | -157.889 | nan | 174.15 | 358 | 78.1765 | nan | nan |
This table shows the average gold diffferential at 10 minutes for each championn in each role. The nan’s mean the champion was never played in a given role. We chose to show this as it illustrates which champions are stronger early and give their team an advantange at this point in the game.
The variable we are trying to predict is the result column, which only has 2 possible values (1 and 0). 1 means the team won and 0 means the team lost. Therefore, the prediction problem is a binary classification problem. We chose the result as it is most important column as winning is the objective of the game.
We want to predict the result after the 10 minute mark of the game. Given this we will have all match information that has occured within the first 10 minutes. We chose 10 minutes as it was the a good point for predicting the result. We did not want to choose a later time as much of the result is already determined. We chose to use a logistic regression model as it works well on binary classification.
To test the efficacy of our model, we will use log loss and accuracy. We chose log loss as it works well on classification data because it penalizes confident misclassifications. We reported accuracy as it is an easy to interpret number.
For our baseline model, we used a logistic regression model on goldat10, xpat10, and csat10. All these features are quantitative and did not require any encodings. These features describe how strong a champion is at the 10 minute mark. The more gold, more experience, and the more cs, the stronger the champion will be. (For team same logic applies)
The loss of the model was 0.6896490354686542 for players only. The accuracy of the model was 0.5186914054838583 for players only.
The loss of the model was 0.6214919289200207 for teams only. The accuracy of the model was 0.6552129912920687 for teams only.
The log loss for the player only data was 0.6896490354686542 and the team only data was 0.5186914054838583. The accuracy for the player only data was 0.5186914054838583 and the team only data was 0.6552129912920687.
The log loss isn’t that low. On a player by player basis our accuracy was essentially the same as random chance. We do expect the team only data to have a higher accuracy as it takes the aggreate of the entire team against the aggregate of the other team, so it should be more represenative of the game state. This model is not that good based on our results.
For our final model, we used a logistic regression model on golddiffat10, xpdiffat10, csdiffat10, killsat10, assistsat10, deathsat10, opp_killsat10, opp_assistsat10. All these features are quantitative and did not require any encodings. The gold, xp, and cs features are the difference between a player and their opponent (team vs team for team data). This adds context to the previous gold, experience, and cs features we used from before. The more kills and assists is better, while the fewer deaths, opponent kills, and opponent assists. These features help us understand the game state and determine which team is doing better.
We added two features being the kills difference and assists difference. We added kills difference as it measures how many kills over their opponent they had, which is a better measure of purely kills. We added assists difference because in the game, multiple players can assist the same kill, which changes how the gold is distributed among the team. If more people assist the kill than the gold is distributed over more people. There are also some addditional more complicated calculations that can increase the amount of gold for an assist, but generally it is better if more players on your team get an assist as it is better to distribute the gold.
We again used a logistic regression model.
We selected hyperparameters using a GridSearch. We iterated over the C, the regularization parameter, for values of .5, 1, 2. For the tolerance, we iterated over .01, .001, .0001, and .00001.
The loss of the model was 0.6428710526042916 for players only. The accuracy of the model was 0.6313890087474993 for players only.
Best Parameters: C=0.5, tol=0.01
Compared to the player only data from the base model, the loss is lower and the accuracy is higher, which tell us that the this model works better on the player data.
The loss of the model was 0.5712023541680418 for teams only. The accuracy of the model was 0.708166627441751 for teams only.
Best Parameters: C=0.5, tol=0.01
Compared to the team only data from the base model, the loss is lower and the accuracy is higher, which tell us that the this model works better on the team data as well.
Overall, the final model performs much better than the base model.