Before consuming this dissertation, you had better brush up on your Statistical background particularly on many of the terms used. That’s because many comparisons are made between the various forms of regression.
As follows:
Linear and logistic regressions are important statistical methods for testing relationships between variables and quantifying the direction and strength of the association. Linear regression is used with continuous outcomes, and logistic regression is used with categorical outcomes.
Conditional logistic regression is an extension of logistic regression that allows one to account for stratification and matching. Its main field of application is observational studies.
Support Vector Regression (SVR) is a type of machine learning algorithm used for regression analysis. The goal of SVR is to find a function that approximates the relationship between the input variables and a continuous target variable, while minimizing the prediction error.
Although its suggested that Benter used these two approaches (above), I’m wondering why they stated the following and I quote: “To the best of our knowledge, most published algorithms fail to show systematic profitability.”
Their recommended approach is:
The Partial Least Squares regression (PLS) is a method which reduces the variables, used to predict, to a smaller set of predictors. These predictors are then used to perform a regression.
They claim:
“This approach has been shown to be stable with respect to collinearity and is able to generate unbiased prediction equations from pre-processed datasets. The advantage of PLS over other approaches is that it identifies only relevant predictor variables, while other linear models require pre-selection of potential predictor variables prior to regression analysis. The disadvantage of PLS is that the resulting components don't necessarily correspond to a specific 'factor' unlike typical regressions that identify predictive attributes directly related to the data. “
They list all of the Factors used in the analysis of racing data used for the logistic regression and PLS models. While the majority of those factors are objective in nature there are a few Key factors that are obviously very subjective and could inicate misleading results.
The two very significant factors that are not presented are the horse’s (pre-race) current mental and physical condition, and whether or not the intentions are for the horse to attempt to win the race.
.
.
|