Bootstrapping a Sport-Betting Assistant : Part II

Dan Gray

2021/01/31

Premable

With the source data defined, and the majority of the pre-processing completed, a few final steps remain before the data can be used as model input.

Motivation

The pre-processed dataset contains observations for both outcomes of any event, stored as individual rows in a single tabular format. The machine learning models we employ should use the complete set of event outcome data i.e. a single row for both fighters and one of the two observed outcomes [the other observation are simply reversed representations of this partner row].

Self Join Logic

We need to first bind the Fighter (IDs) with the results data. This should be done for both the Fighter and Opponent.

wc %>% mutate(rowid = as.integer(rowid)) -> wx

div_lightweights %>% rename(Fighter = Opponent) -> div_lightweights

wx %>% left_join(div_lightweights,by="rowid") ->wx

wx %>% rename(Fighter=Fighter.y, Opponent=Fighter.x) -> wx

And once again for the opponent.

div_lightweights %>% rename(Opponent = Fighter) -> div_lightweights

wx %>% left_join(div_lightweights,by="Opponent") -> wx

wx %>% rename(FighterId = rowid.x, OpponentId = rowid.y) -> wx

We can remove the columns we no longer need.

wx %>% select(-fighterName.x,-fighterName.y) -> wx

Self Join via Composite Key

We can use the combination of the Fighter/Opponent IDs, Fighter/Opponent (Name) and the EventDate to create a composite key for joining the tables onto matching records.

wx %>% mutate(FighterComposite = str_c(wx$FighterId,wx$Opponent,wx$DateEvent,sep = "_"),
              OpponentComposite =
                str_c(wx$OpponentId,wx$Fighter,wx$DateEvent,sep = "_")) ->wx 

Making copies of the object.

wx -> wy
wx %>% left_join(wy,by=c("FighterComposite"="OpponentComposite")) -> resultset
X FighterId.x Result.x Opponent.x Event.x Round.x Time.x Method.x DateEvent.x Outcome.x PreviousOutcome.x start.x streak_id.x streak.x StreakLength.x WinRatio.x CountFights.x FinishRatio.x FinishPrevious.x FinishedRatio.x FinishedPrevious.x FinishedCount.x FinishCount.x DamageDiff.x Fighter.x OpponentId.x FighterComposite OpponentComposite FighterId.y Result.y Opponent.y Event.y Round.y Time.y Method.y DateEvent.y Outcome.y PreviousOutcome.y start.y streak_id.y streak.y StreakLength.y WinRatio.y CountFights.y FinishRatio.y FinishPrevious.y FinishedRatio.y FinishedPrevious.y FinishedCount.y FinishCount.y DamageDiff.y Fighter.y OpponentId.y FighterComposite.y
1 1 win Kevin Croom ROF 41 - Bragging Rights 1 1:01 KO (Slam) 2011-08-20 1 NA TRUE 1 1 NA NA 0 NA NA NA NA 0 0 0 Justin Gaethje NA 1_Kevin Croom_2011-08-20 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 1 win Joe Kelso BTT MMA 2 - Genesis 1 4:32 TKO (Punches) 2011-10-01 1 1 FALSE 1 2 1 1 1 1.00 TRUE 0 FALSE 0 1 1 Justin Gaethje NA 1_Joe Kelso_2011-10-01 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 1 win Donnie Bell ROF 42 - Who’s Next 2 2:57 TKO (Punches) 2011-12-17 1 1 FALSE 1 3 2 1 2 1.00 TRUE 0 FALSE 0 2 2 Justin Gaethje 556 1_Donnie Bell_2011-12-17 556_Justin Gaethje_2011-12-17 556 loss Justin Gaethje ROF 42 - Who’s Next 2 2:57 TKO (Punches) 2011-12-17 0 1 TRUE 2 1 2 1.0 2 0.5000000 FALSE 0.0000000 FALSE 0 1 1.000000 Donnie Bell 1 556_Justin Gaethje_2011-12-17
4 1 win Marcus Edwards ROF 43 - Bad Blood 3 5:00 Decision (Unanimous) 2012-06-02 1 1 FALSE 1 4 3 1 3 1.00 TRUE 0 FALSE 0 3 3 Justin Gaethje NA 1_Marcus Edwards_2012-06-02 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
5 1 win Sam Young RITC - Rage in the Cage 162 2 1:58 Submission (Rear-Naked Choke) 2012-09-29 1 1 FALSE 1 5 4 1 4 0.75 FALSE 0 FALSE 0 3 3 Justin Gaethje NA 1_Sam Young_2012-09-29 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
6 1 win Drew Fickett RITC - Rage in the Cage 163 1 0:12 KO (Punch) 2012-10-20 1 1 FALSE 1 6 5 1 5 0.80 TRUE 0 FALSE 0 4 4 Justin Gaethje 382 1_Drew Fickett_2012-10-20 382_Justin Gaethje_2012-10-20 382 loss Justin Gaethje RITC - Rage in the Cage 163 1 0:12 KO (Punch) 2012-10-20 0 0 FALSE 20 2 -1 0.7 60 0.5666667 FALSE 0.2333333 TRUE 14 34 2.266667 Drew Fickett 1 382_Justin Gaethje_2012-10-20

Its noted there are some result sets which do not having matching records - these should be removed. In addition, we can remove the duplicated [reversed representation] of results for a given event by matching across (uniquely) identifiable columns, in this case Event, Round, Time and Method.

Remove Duplicates

resultset[!duplicated(resultset[c("Event.x","Round.x","Time.x","Method.x")]),] -> unduplicated_resultset
write.csv(unduplicated_resultset,paste0(Sys.Date(),"_","undup_resultset.csv"))

Filter To Use Only Complete Records

unduplicated_resultset %>% filter(complete.cases(.)) -> modelset
write.csv(modelset,paste0(Sys.Date(),"_","modelset.csv"))

The processed basic modelset.

X FighterId.x Result.x Opponent.x Event.x Round.x Time.x Method.x DateEvent.x Outcome.x PreviousOutcome.x start.x streak_id.x streak.x StreakLength.x WinRatio.x CountFights.x FinishRatio.x FinishPrevious.x FinishedRatio.x FinishedPrevious.x FinishedCount.x FinishCount.x DamageDiff.x Fighter.x OpponentId.x FighterComposite OpponentComposite FighterId.y Result.y Opponent.y Event.y Round.y Time.y Method.y DateEvent.y Outcome.y PreviousOutcome.y start.y streak_id.y streak.y StreakLength.y WinRatio.y CountFights.y FinishRatio.y FinishPrevious.y FinishedRatio.y FinishedPrevious.y FinishedCount.y FinishCount.y DamageDiff.y Fighter.y OpponentId.y FighterComposite.y
1 1 win Donnie Bell ROF 42 - Who’s Next 2 2:57 TKO (Punches) 2011-12-17 1 1 FALSE 1 3 2 1 2 1.0000000 TRUE 0 FALSE 0 2 2 Justin Gaethje 556 1_Donnie Bell_2011-12-17 556_Justin Gaethje_2011-12-17 556 loss Justin Gaethje ROF 42 - Who’s Next 2 2:57 TKO (Punches) 2011-12-17 0 1 TRUE 2 1 2 1.0000000 2 0.5000000 FALSE 0.0000000 FALSE 0 1 1.000000 Donnie Bell 1 556_Justin Gaethje_2011-12-17
2 1 win Drew Fickett RITC - Rage in the Cage 163 1 0:12 KO (Punch) 2012-10-20 1 1 FALSE 1 6 5 1 5 0.8000000 TRUE 0 FALSE 0 4 4 Justin Gaethje 382 1_Drew Fickett_2012-10-20 382_Justin Gaethje_2012-10-20 382 loss Justin Gaethje RITC - Rage in the Cage 163 1 0:12 KO (Punch) 2012-10-20 0 0 FALSE 20 2 -1 0.7000000 60 0.5666667 FALSE 0.2333333 TRUE 14 34 2.266667 Drew Fickett 1 382_Justin Gaethje_2012-10-20
3 1 win Gesias Cavalcante WSOF 2 - Arlovski vs. Johnson 1 2:27 TKO (Doctor Stoppage) 2013-03-23 1 1 FALSE 1 8 7 1 7 0.8571429 TRUE 0 FALSE 0 6 6 Justin Gaethje 270 1_Gesias Cavalcante_2013-03-23 270_Justin Gaethje_2013-03-23 270 loss Justin Gaethje WSOF 2 - Arlovski vs. Johnson 1 2:27 TKO (Doctor Stoppage) 2013-03-23 0 1 TRUE 12 1 1 0.6538462 26 0.5000000 TRUE 0.1538462 FALSE 4 13 2.600000 Gesias Cavalcante 1 270_Justin Gaethje_2013-03-23
4 1 win Brian Cobb WSOF 3 - Fitch vs. Burkman 2 3 2:19 TKO (Leg Kicks) 2013-06-14 1 1 FALSE 1 9 8 1 8 0.8750000 TRUE 0 FALSE 0 7 7 Justin Gaethje 437 1_Brian Cobb_2013-06-14 437_Justin Gaethje_2013-06-14 437 loss Justin Gaethje WSOF 3 - Fitch vs. Burkman 2 3 2:19 TKO (Leg Kicks) 2013-06-14 0 1 TRUE 10 1 1 0.7407407 27 0.5555556 FALSE 0.1851852 FALSE 5 15 2.500000 Brian Cobb 1 437_Justin Gaethje_2013-06-14
5 1 win Richard Patishnock WSOF 8 - Gaethje vs. Patishnock 1 1:09 TKO (Punches and Elbows) 2014-01-18 1 1 FALSE 1 11 10 1 10 0.9000000 TRUE 0 FALSE 0 9 9 Justin Gaethje 265 1_Richard Patishnock_2014-01-18 265_Justin Gaethje_2014-01-18 265 loss Justin Gaethje WSOF 8 - Gaethje vs. Patishnock 1 1:09 TKO (Punches and Elbows) 2014-01-18 0 1 TRUE 4 1 2 0.8571429 7 0.2857143 FALSE 0.1428571 FALSE 1 2 1.000000 Richard Patishnock 1 265_Justin Gaethje_2014-01-18
6 1 win Melvin Guillard WSOF 15 - Branch vs. Okami 3 5:00 Decision (Split) 2014-11-15 1 1 FALSE 1 13 12 1 12 0.9166667 TRUE 0 FALSE 0 11 11 Justin Gaethje 78 1_Melvin Guillard_2014-11-15 78_Justin Gaethje_2014-11-15 78 loss Justin Gaethje WSOF 15 - Branch vs. Okami 3 5:00 Decision (Split) 2014-11-15 0 1 TRUE 24 1 1 0.6530612 49 0.4693878 TRUE 0.2857143 FALSE 14 23 1.533333 Melvin Guillard 1 78_Justin Gaethje_2014-11-15

Derive Additional Metrics

We can explicitly define some comparison metrics between fighters. Other options here could be to look at activity, competitiveness and measures such as perceived favorite (an analogue for Odds) - which would be needed if the model was to be later supplemented by human inputs.

modelset %>% mutate(delta_FP=FinishPrevious.x-FinishPrevious.y,
                    delta_FIP=FinishedPrevious.x-FinishedPrevious.y,
                    delta_FC=FinishCount.x-FinishCount.y,
                    delta_FIC=FinishedCount.x-FinishedCount.y) -> modelset_derived

write.csv(modelset_derived,paste0(Sys.Date(),"_","modelset_derived.csv"))

Select the Final Model Data Table

Finally we can select the columns for the final tabular model.

modelset_derived %>% select(Result.x,
                    Method.x,
                    Fighter.x,
                    Opponent.x,
                    CountFights.x,
                    PreviousOutcome.x,
                    StreakLength.x,
                    WinRatio.x,
                    FinishRatio.x,
                    FinishPrevious.x,
                    FinishedRatio.x,
                    FinishPrevious.x,
                    FinishCount.x,
                    FinishedCount.x,
                    DamageDiff.x,
                    CountFights.y,
                    PreviousOutcome.y,
                    StreakLength.y,
                    WinRatio.y,
                    FinishRatio.y,
                    FinishPrevious.y,
                    FinishedRatio.y,
                    FinishPrevious.y,
                    FinishCount.y,
                    FinishedCount.y,
                    DamageDiff.y,
                    delta_FP,
                    delta_FIP,
                    delta_FC,
                    delta_FIC) ->modelset_selected

write.csv(modelset_selected,paste0(Sys.Date(),"_","modelset_selected.csv"))
X Result.x Method.x Fighter.x Opponent.x CountFights.x PreviousOutcome.x StreakLength.x WinRatio.x FinishRatio.x FinishPrevious.x FinishedRatio.x FinishCount.x FinishedCount.x DamageDiff.x CountFights.y PreviousOutcome.y StreakLength.y WinRatio.y FinishRatio.y FinishPrevious.y FinishedRatio.y FinishCount.y FinishedCount.y DamageDiff.y delta_FP delta_FIP delta_FC delta_FIC
1 win TKO (Punches) Justin Gaethje Donnie Bell 2 1 2 1 1.0000000 TRUE 0 2 0 2 2 1 2 1.0000000 0.5000000 FALSE 0.0000000 1 0 1.000000 1 0 1 0
2 win KO (Punch) Justin Gaethje Drew Fickett 5 1 5 1 0.8000000 TRUE 0 4 0 4 60 0 -1 0.7000000 0.5666667 FALSE 0.2333333 34 14 2.266667 1 -1 -30 -14
3 win TKO (Doctor Stoppage) Justin Gaethje Gesias Cavalcante 7 1 7 1 0.8571429 TRUE 0 6 0 6 26 1 1 0.6538462 0.5000000 TRUE 0.1538462 13 4 2.600000 0 0 -7 -4
4 win TKO (Leg Kicks) Justin Gaethje Brian Cobb 8 1 8 1 0.8750000 TRUE 0 7 0 7 27 1 1 0.7407407 0.5555556 FALSE 0.1851852 15 5 2.500000 1 0 -8 -5
5 win TKO (Punches and Elbows) Justin Gaethje Richard Patishnock 10 1 10 1 0.9000000 TRUE 0 9 0 9 7 1 2 0.8571429 0.2857143 FALSE 0.1428571 2 1 1.000000 1 0 7 -1
6 win Decision (Split) Justin Gaethje Melvin Guillard 12 1 12 1 0.9166667 TRUE 0 11 0 11 49 1 1 0.6530612 0.4693878 TRUE 0.2857143 23 14 1.533333 0 0 -12 -14