Premable
With the source data defined, and the majority of the pre-processing completed, a few final steps remain before the data can be used as model input.
Motivation
The pre-processed dataset contains observations for both outcomes of any event, stored as individual rows in a single tabular format. The machine learning models we employ should use the complete set of event outcome data i.e. a single row for both fighters and one of the two observed outcomes [the other observation are simply reversed representations of this partner row].
Self Join Logic
We need to first bind the Fighter (IDs) with the results data. This should be done for both the Fighter and Opponent.
wc %>% mutate(rowid = as.integer(rowid)) -> wx
div_lightweights %>% rename(Fighter = Opponent) -> div_lightweights
wx %>% left_join(div_lightweights,by="rowid") ->wx
wx %>% rename(Fighter=Fighter.y, Opponent=Fighter.x) -> wx
And once again for the opponent.
div_lightweights %>% rename(Opponent = Fighter) -> div_lightweights
wx %>% left_join(div_lightweights,by="Opponent") -> wx
wx %>% rename(FighterId = rowid.x, OpponentId = rowid.y) -> wx
We can remove the columns we no longer need.
wx %>% select(-fighterName.x,-fighterName.y) -> wx
Self Join via Composite Key
We can use the combination of the Fighter/Opponent IDs
, Fighter/Opponent
(Name) and the EventDate
to create a composite key for joining the tables onto matching records.
wx %>% mutate(FighterComposite = str_c(wx$FighterId,wx$Opponent,wx$DateEvent,sep = "_"),
OpponentComposite =
str_c(wx$OpponentId,wx$Fighter,wx$DateEvent,sep = "_")) ->wx
Making copies of the object.
wx -> wy
wx %>% left_join(wy,by=c("FighterComposite"="OpponentComposite")) -> resultset
X | FighterId.x | Result.x | Opponent.x | Event.x | Round.x | Time.x | Method.x | DateEvent.x | Outcome.x | PreviousOutcome.x | start.x | streak_id.x | streak.x | StreakLength.x | WinRatio.x | CountFights.x | FinishRatio.x | FinishPrevious.x | FinishedRatio.x | FinishedPrevious.x | FinishedCount.x | FinishCount.x | DamageDiff.x | Fighter.x | OpponentId.x | FighterComposite | OpponentComposite | FighterId.y | Result.y | Opponent.y | Event.y | Round.y | Time.y | Method.y | DateEvent.y | Outcome.y | PreviousOutcome.y | start.y | streak_id.y | streak.y | StreakLength.y | WinRatio.y | CountFights.y | FinishRatio.y | FinishPrevious.y | FinishedRatio.y | FinishedPrevious.y | FinishedCount.y | FinishCount.y | DamageDiff.y | Fighter.y | OpponentId.y | FighterComposite.y |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | win | Kevin Croom | ROF 41 - Bragging Rights | 1 | 1:01 | KO (Slam) | 2011-08-20 | 1 | NA | TRUE | 1 | 1 | NA | NA | 0 | NA | NA | NA | NA | 0 | 0 | 0 | Justin Gaethje | NA | 1_Kevin Croom_2011-08-20 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
2 | 1 | win | Joe Kelso | BTT MMA 2 - Genesis | 1 | 4:32 | TKO (Punches) | 2011-10-01 | 1 | 1 | FALSE | 1 | 2 | 1 | 1 | 1 | 1.00 | TRUE | 0 | FALSE | 0 | 1 | 1 | Justin Gaethje | NA | 1_Joe Kelso_2011-10-01 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
3 | 1 | win | Donnie Bell | ROF 42 - Who’s Next | 2 | 2:57 | TKO (Punches) | 2011-12-17 | 1 | 1 | FALSE | 1 | 3 | 2 | 1 | 2 | 1.00 | TRUE | 0 | FALSE | 0 | 2 | 2 | Justin Gaethje | 556 | 1_Donnie Bell_2011-12-17 | 556_Justin Gaethje_2011-12-17 | 556 | loss | Justin Gaethje | ROF 42 - Who’s Next | 2 | 2:57 | TKO (Punches) | 2011-12-17 | 0 | 1 | TRUE | 2 | 1 | 2 | 1.0 | 2 | 0.5000000 | FALSE | 0.0000000 | FALSE | 0 | 1 | 1.000000 | Donnie Bell | 1 | 556_Justin Gaethje_2011-12-17 |
4 | 1 | win | Marcus Edwards | ROF 43 - Bad Blood | 3 | 5:00 | Decision (Unanimous) | 2012-06-02 | 1 | 1 | FALSE | 1 | 4 | 3 | 1 | 3 | 1.00 | TRUE | 0 | FALSE | 0 | 3 | 3 | Justin Gaethje | NA | 1_Marcus Edwards_2012-06-02 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
5 | 1 | win | Sam Young | RITC - Rage in the Cage 162 | 2 | 1:58 | Submission (Rear-Naked Choke) | 2012-09-29 | 1 | 1 | FALSE | 1 | 5 | 4 | 1 | 4 | 0.75 | FALSE | 0 | FALSE | 0 | 3 | 3 | Justin Gaethje | NA | 1_Sam Young_2012-09-29 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
6 | 1 | win | Drew Fickett | RITC - Rage in the Cage 163 | 1 | 0:12 | KO (Punch) | 2012-10-20 | 1 | 1 | FALSE | 1 | 6 | 5 | 1 | 5 | 0.80 | TRUE | 0 | FALSE | 0 | 4 | 4 | Justin Gaethje | 382 | 1_Drew Fickett_2012-10-20 | 382_Justin Gaethje_2012-10-20 | 382 | loss | Justin Gaethje | RITC - Rage in the Cage 163 | 1 | 0:12 | KO (Punch) | 2012-10-20 | 0 | 0 | FALSE | 20 | 2 | -1 | 0.7 | 60 | 0.5666667 | FALSE | 0.2333333 | TRUE | 14 | 34 | 2.266667 | Drew Fickett | 1 | 382_Justin Gaethje_2012-10-20 |
Its noted there are some result sets which do not having matching records - these should be removed. In addition, we can remove the duplicated [reversed representation] of results for a given event by matching across (uniquely) identifiable columns, in this case Event
, Round
, Time
and Method
.
Remove Duplicates
resultset[!duplicated(resultset[c("Event.x","Round.x","Time.x","Method.x")]),] -> unduplicated_resultset
write.csv(unduplicated_resultset,paste0(Sys.Date(),"_","undup_resultset.csv"))
Filter To Use Only Complete Records
unduplicated_resultset %>% filter(complete.cases(.)) -> modelset
write.csv(modelset,paste0(Sys.Date(),"_","modelset.csv"))
The processed basic modelset.
X | FighterId.x | Result.x | Opponent.x | Event.x | Round.x | Time.x | Method.x | DateEvent.x | Outcome.x | PreviousOutcome.x | start.x | streak_id.x | streak.x | StreakLength.x | WinRatio.x | CountFights.x | FinishRatio.x | FinishPrevious.x | FinishedRatio.x | FinishedPrevious.x | FinishedCount.x | FinishCount.x | DamageDiff.x | Fighter.x | OpponentId.x | FighterComposite | OpponentComposite | FighterId.y | Result.y | Opponent.y | Event.y | Round.y | Time.y | Method.y | DateEvent.y | Outcome.y | PreviousOutcome.y | start.y | streak_id.y | streak.y | StreakLength.y | WinRatio.y | CountFights.y | FinishRatio.y | FinishPrevious.y | FinishedRatio.y | FinishedPrevious.y | FinishedCount.y | FinishCount.y | DamageDiff.y | Fighter.y | OpponentId.y | FighterComposite.y |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | win | Donnie Bell | ROF 42 - Who’s Next | 2 | 2:57 | TKO (Punches) | 2011-12-17 | 1 | 1 | FALSE | 1 | 3 | 2 | 1 | 2 | 1.0000000 | TRUE | 0 | FALSE | 0 | 2 | 2 | Justin Gaethje | 556 | 1_Donnie Bell_2011-12-17 | 556_Justin Gaethje_2011-12-17 | 556 | loss | Justin Gaethje | ROF 42 - Who’s Next | 2 | 2:57 | TKO (Punches) | 2011-12-17 | 0 | 1 | TRUE | 2 | 1 | 2 | 1.0000000 | 2 | 0.5000000 | FALSE | 0.0000000 | FALSE | 0 | 1 | 1.000000 | Donnie Bell | 1 | 556_Justin Gaethje_2011-12-17 |
2 | 1 | win | Drew Fickett | RITC - Rage in the Cage 163 | 1 | 0:12 | KO (Punch) | 2012-10-20 | 1 | 1 | FALSE | 1 | 6 | 5 | 1 | 5 | 0.8000000 | TRUE | 0 | FALSE | 0 | 4 | 4 | Justin Gaethje | 382 | 1_Drew Fickett_2012-10-20 | 382_Justin Gaethje_2012-10-20 | 382 | loss | Justin Gaethje | RITC - Rage in the Cage 163 | 1 | 0:12 | KO (Punch) | 2012-10-20 | 0 | 0 | FALSE | 20 | 2 | -1 | 0.7000000 | 60 | 0.5666667 | FALSE | 0.2333333 | TRUE | 14 | 34 | 2.266667 | Drew Fickett | 1 | 382_Justin Gaethje_2012-10-20 |
3 | 1 | win | Gesias Cavalcante | WSOF 2 - Arlovski vs. Johnson | 1 | 2:27 | TKO (Doctor Stoppage) | 2013-03-23 | 1 | 1 | FALSE | 1 | 8 | 7 | 1 | 7 | 0.8571429 | TRUE | 0 | FALSE | 0 | 6 | 6 | Justin Gaethje | 270 | 1_Gesias Cavalcante_2013-03-23 | 270_Justin Gaethje_2013-03-23 | 270 | loss | Justin Gaethje | WSOF 2 - Arlovski vs. Johnson | 1 | 2:27 | TKO (Doctor Stoppage) | 2013-03-23 | 0 | 1 | TRUE | 12 | 1 | 1 | 0.6538462 | 26 | 0.5000000 | TRUE | 0.1538462 | FALSE | 4 | 13 | 2.600000 | Gesias Cavalcante | 1 | 270_Justin Gaethje_2013-03-23 |
4 | 1 | win | Brian Cobb | WSOF 3 - Fitch vs. Burkman 2 | 3 | 2:19 | TKO (Leg Kicks) | 2013-06-14 | 1 | 1 | FALSE | 1 | 9 | 8 | 1 | 8 | 0.8750000 | TRUE | 0 | FALSE | 0 | 7 | 7 | Justin Gaethje | 437 | 1_Brian Cobb_2013-06-14 | 437_Justin Gaethje_2013-06-14 | 437 | loss | Justin Gaethje | WSOF 3 - Fitch vs. Burkman 2 | 3 | 2:19 | TKO (Leg Kicks) | 2013-06-14 | 0 | 1 | TRUE | 10 | 1 | 1 | 0.7407407 | 27 | 0.5555556 | FALSE | 0.1851852 | FALSE | 5 | 15 | 2.500000 | Brian Cobb | 1 | 437_Justin Gaethje_2013-06-14 |
5 | 1 | win | Richard Patishnock | WSOF 8 - Gaethje vs. Patishnock | 1 | 1:09 | TKO (Punches and Elbows) | 2014-01-18 | 1 | 1 | FALSE | 1 | 11 | 10 | 1 | 10 | 0.9000000 | TRUE | 0 | FALSE | 0 | 9 | 9 | Justin Gaethje | 265 | 1_Richard Patishnock_2014-01-18 | 265_Justin Gaethje_2014-01-18 | 265 | loss | Justin Gaethje | WSOF 8 - Gaethje vs. Patishnock | 1 | 1:09 | TKO (Punches and Elbows) | 2014-01-18 | 0 | 1 | TRUE | 4 | 1 | 2 | 0.8571429 | 7 | 0.2857143 | FALSE | 0.1428571 | FALSE | 1 | 2 | 1.000000 | Richard Patishnock | 1 | 265_Justin Gaethje_2014-01-18 |
6 | 1 | win | Melvin Guillard | WSOF 15 - Branch vs. Okami | 3 | 5:00 | Decision (Split) | 2014-11-15 | 1 | 1 | FALSE | 1 | 13 | 12 | 1 | 12 | 0.9166667 | TRUE | 0 | FALSE | 0 | 11 | 11 | Justin Gaethje | 78 | 1_Melvin Guillard_2014-11-15 | 78_Justin Gaethje_2014-11-15 | 78 | loss | Justin Gaethje | WSOF 15 - Branch vs. Okami | 3 | 5:00 | Decision (Split) | 2014-11-15 | 0 | 1 | TRUE | 24 | 1 | 1 | 0.6530612 | 49 | 0.4693878 | TRUE | 0.2857143 | FALSE | 14 | 23 | 1.533333 | Melvin Guillard | 1 | 78_Justin Gaethje_2014-11-15 |
Derive Additional Metrics
We can explicitly define some comparison metrics between fighters. Other options here could be to look at activity, competitiveness and measures such as perceived favorite (an analogue for Odds) - which would be needed if the model was to be later supplemented by human inputs.
modelset %>% mutate(delta_FP=FinishPrevious.x-FinishPrevious.y,
delta_FIP=FinishedPrevious.x-FinishedPrevious.y,
delta_FC=FinishCount.x-FinishCount.y,
delta_FIC=FinishedCount.x-FinishedCount.y) -> modelset_derived
write.csv(modelset_derived,paste0(Sys.Date(),"_","modelset_derived.csv"))
Select the Final Model Data Table
Finally we can select the columns for the final tabular model.
modelset_derived %>% select(Result.x,
Method.x,
Fighter.x,
Opponent.x,
CountFights.x,
PreviousOutcome.x,
StreakLength.x,
WinRatio.x,
FinishRatio.x,
FinishPrevious.x,
FinishedRatio.x,
FinishPrevious.x,
FinishCount.x,
FinishedCount.x,
DamageDiff.x,
CountFights.y,
PreviousOutcome.y,
StreakLength.y,
WinRatio.y,
FinishRatio.y,
FinishPrevious.y,
FinishedRatio.y,
FinishPrevious.y,
FinishCount.y,
FinishedCount.y,
DamageDiff.y,
delta_FP,
delta_FIP,
delta_FC,
delta_FIC) ->modelset_selected
write.csv(modelset_selected,paste0(Sys.Date(),"_","modelset_selected.csv"))
X | Result.x | Method.x | Fighter.x | Opponent.x | CountFights.x | PreviousOutcome.x | StreakLength.x | WinRatio.x | FinishRatio.x | FinishPrevious.x | FinishedRatio.x | FinishCount.x | FinishedCount.x | DamageDiff.x | CountFights.y | PreviousOutcome.y | StreakLength.y | WinRatio.y | FinishRatio.y | FinishPrevious.y | FinishedRatio.y | FinishCount.y | FinishedCount.y | DamageDiff.y | delta_FP | delta_FIP | delta_FC | delta_FIC |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | win | TKO (Punches) | Justin Gaethje | Donnie Bell | 2 | 1 | 2 | 1 | 1.0000000 | TRUE | 0 | 2 | 0 | 2 | 2 | 1 | 2 | 1.0000000 | 0.5000000 | FALSE | 0.0000000 | 1 | 0 | 1.000000 | 1 | 0 | 1 | 0 |
2 | win | KO (Punch) | Justin Gaethje | Drew Fickett | 5 | 1 | 5 | 1 | 0.8000000 | TRUE | 0 | 4 | 0 | 4 | 60 | 0 | -1 | 0.7000000 | 0.5666667 | FALSE | 0.2333333 | 34 | 14 | 2.266667 | 1 | -1 | -30 | -14 |
3 | win | TKO (Doctor Stoppage) | Justin Gaethje | Gesias Cavalcante | 7 | 1 | 7 | 1 | 0.8571429 | TRUE | 0 | 6 | 0 | 6 | 26 | 1 | 1 | 0.6538462 | 0.5000000 | TRUE | 0.1538462 | 13 | 4 | 2.600000 | 0 | 0 | -7 | -4 |
4 | win | TKO (Leg Kicks) | Justin Gaethje | Brian Cobb | 8 | 1 | 8 | 1 | 0.8750000 | TRUE | 0 | 7 | 0 | 7 | 27 | 1 | 1 | 0.7407407 | 0.5555556 | FALSE | 0.1851852 | 15 | 5 | 2.500000 | 1 | 0 | -8 | -5 |
5 | win | TKO (Punches and Elbows) | Justin Gaethje | Richard Patishnock | 10 | 1 | 10 | 1 | 0.9000000 | TRUE | 0 | 9 | 0 | 9 | 7 | 1 | 2 | 0.8571429 | 0.2857143 | FALSE | 0.1428571 | 2 | 1 | 1.000000 | 1 | 0 | 7 | -1 |
6 | win | Decision (Split) | Justin Gaethje | Melvin Guillard | 12 | 1 | 12 | 1 | 0.9166667 | TRUE | 0 | 11 | 0 | 11 | 49 | 1 | 1 | 0.6530612 | 0.4693878 | TRUE | 0.2857143 | 23 | 14 | 1.533333 | 0 | 0 | -12 | -14 |