2023-08-19
I got the idea to do this project from a previous project I did in my STAT 632 class. In that presentation I tried to model the relationship between Points Per Game and several other predictors.
So, to do this I am interested in forecasting Wins and Losses based on four predictors: Team points, Opponent Points, and Games played with W/L as the response variable.
To start, I did a brief exploratory analysis in R.
Fig 1. Plot of Games versus Team points and Opponent points. Most of the blue curve (Lakers score) was higher than the yellow curve (opponents’ score).
Here is the head of the dataset.
| Games | Team Score | Opponent Score | Wins/Losses |
|---|---|---|---|
| 1 | 102 | 112 | L |
| 2 | 95 | 86 | W |
| 3 | 120 | 101 | W |
| 4 | 120 | 91 | W |
| 5 | 102 | 112 | L |
I performed an 80/20 test and training set split for the data.
I decided to test four different models to see which one was most accurate for my purposes. The dotplot showed that LDA (linear discriminant analysis) would be the best data to help predict W/L.
Fig 2. Analysis of the most effective model.
\[ LD1=-0.033G + 1.4956Tm + -1.431Opp \].
Since I already know the numbers of the missing games, the next step was to obtain a Team score and Opponent score based on the data.
Here are the top 5 data for the predicted scores and W/L.
| G | Predicted Team Score | Predicted Opponent Score | Predicted W/L |
|---|---|---|---|
| 64 | 109 | 134 | W |
| 65 | 93 | 102 | L |
| 66 | 119 | 97 | L |
| 67 | 97 | 114 | W |
From the simulation, I was able to conclude that had the Lakers played all 82 games, they would have won 55 games, and lost 27, with a final win percentage of 67% and a loss percentage of 33%.