PDA

View Full Version : Need help in Multi Logit Regression


wkcalvin
02-06-2006, 03:39 PM
HI, every body, I am a newbie to use SPSS Multinomial Logit Regression for handicap, I have a BIG PROBLEM when using the SPSS result.



For example:



Step 1

I use SPSS read the data from database and the following coefficients is come out


--------------B
Career Starts -0.008
Finish Position 0.01



Step 2



I use this 2 coefficients to calculate the probabilities of a race, the formula will like that:

--------Starts Finished P Total Mark

Horse A 10 25 ------------(10*-0.008)+(25*0.01)=0.17

Horse B 20 -10 -----------(20*-0.008)+(-10*0.01)=-0.26

Horse C 1 1 --------------(1*-0.008)+(1*0.01)=0.002



Probability

Horse A 0.17/-0.088(sum of ttl marks) = -1.93

Horse B = 2.954

Horse C = -0.022



Questions:



I think my formula or calculation is totally wrong because the probability can not be LESS THAN 1 or GREATER than 1



Can any body help me in this case??? I haven't any idea.



Thank you very much

garyoz
02-06-2006, 03:50 PM
I think the problem is probably with your data input, maybe your variable definitions? Also high correlation between predictor variables can lead to instability in the model, including the signs (+,-). I used to know SPSS much better. Use it or lose it. If you isolate a specific issue I can probably get it answered.

GameTheory
02-06-2006, 03:55 PM
You're leaving the "logit" part out of the equation. You should also probably have a constant that comes out of the SPSS report.

The final form of the equation should be:

p = 1 / ( 1 + exp( -(CONSTANT + FEATURE1 * COEF1 + FEATURE2 * COEF2) ))

Then normalize.

exp() is the exponential function, which is the opposite of ln(), the natural log.

Google "logistic regression"...

wkcalvin
02-06-2006, 04:00 PM
You're leaving the "logit" part out of the equation. You should also probably have a constant that comes out of the SPSS report.

The final form of the equation should be:

p = 1 / ( 1 + exp( -(CONSTANT + FEATURE1 * COEF1 + FEATURE2 * COEF2) ))

Then normalize.

exp() is the exponential function, which is the opposite of ln(), the natural log.

Google "logistic regression"...

:D :D :D ..Thank you very much...but still a liitle bit

What is the CONSTANT ?? In my case or general meaning ???

Thank you in advance

garyoz
02-06-2006, 04:29 PM
The constant is theoretically where the best fit line would intersect the X access. In this case the number associated with S-shaped probability density function (0 to 1) that represents dependent variable (probability of winning).

Not to be discourteous, but if you don't know standard regression pretty well, you'll have alot of problems with logistic regression. Getting the data clean enough to use SPSS is another issue..

You can't just throw variables at a logistic function dependent variable and get reliable and interpretable models. But, come to think of it, I think that's what Allways does. Or, actually they may regress against the 0,1 dependent variable--forget the logit stuff, or maybe they really don't do regression?

Do a search for "regression analysis" on this board or "logistic" or maybe "probit analysis" and you'll find a bunch of posts.

GameTheory
02-06-2006, 04:59 PM
The constant is the intercept. It is basically a coefficient that is multiplied by 1, so you don't actually need the 1. So you end with a linear equation:

CONSTANT + F1*C1 + F2*C2 ... + Fn*Cn

n being the number of features

If it was really a linear equation, you'd stop there. But you have a logit, so to convert it to a (unnormalized) probability, you apply the transformation:

p = 1 / ( 1 + exp( -[result from above] ))

Where does the constant come from? It should spit out along with your coefficients by the program doing the regression. I've never used SPSS, so if it ain't there, I can't tell you where to find it...

JustRalph
02-06-2006, 05:01 PM
Did you ever feel like a pair of Brown Shoes in a room full of Tuxedo's?

highnote
02-06-2006, 06:35 PM
Did you ever feel like a pair of Brown Shoes in a room full of Tuxedo's?


:confused: :lol:

NoDayJob
02-06-2006, 08:57 PM
Did you ever feel like a pair of Brown Shoes in a room full of Tuxedo's?

:D How 'bout bare foot at the Queen's audience? :D

wkcalvin
02-06-2006, 09:43 PM
Did you ever feel like a pair of Brown Shoes in a room full of Tuxedo's?

yeah....I am a newbie :lol: :D :cool:

Thank you all and will try to do it again:bang:

PaceAdvantage
02-06-2006, 10:18 PM
yeah....I am a newbie :lol: :D :cool:

Thank you all and will try to do it again:bang:

When it comes to this subject wkcalvin, lots of us are newbies....

traynor
02-06-2006, 11:12 PM
Logistic regression has two constraints ("constants") that define the type of function as logistic. Those are "horizontal holes," or, more properly, horizontal asymptotes, that the function can approach but never reach. The "lower" constraint is usually (if not always, for all practical purposes) zero. The formula for the logistic regression is the upper limit ("constant") divided by one plus whatever ln(x) function you are calculating.

Try putting your x and y values in a simple table, then selecting logistic regression as the option in SPSS. All it does is try to fit the data points (the x and y values) into a formula that will "predict" where other data points should fall (providing tomorrow exactly replicates yesterday and today).

I think I missed the part about why you want to model a logistic function in the first place. Does that fit your data set the best? Look at the r or r^2 values associated with the logistic function that SPSS spits out to see how closely it fits; closer to 1 the better, should be .94 or .95, higher is even better. If below .90, you might try modeling other types of functions.
Good Luck

garyoz
02-07-2006, 06:22 AM
When it comes to this subject wkcalvin, lots of us are newbies....

IMHO, you're not missing much. Regression really doesn't work that well relative to handicapping. I've tried applying many different ways--I consider the results more a distraction than an insight.

LaughAndBeMerry
02-07-2006, 09:59 AM
Traynor:

Are you saying that assuming you're looking at the right data, that a logistic equation should explain 90+% of whether a horse won or lost? I'm assuming that the 0/1 variable is win/lose. That seems like an awfully high r2. Do you adjust the 0/1 in any way(e.g. give horses credit for finishing a close second or third)? I would think there's much more than 5% randomness in the outcome due to trip, pace, horse having an off day, etc.

LBM

garyoz
02-07-2006, 01:28 PM
It does seem much too high. For the confidence level of one variable as a significant predictor, 95-plus would be ok, but not for the predictive ability of the model.

What too many of the "impact value" models do is measure the variables one at a time as a predictor (bivariate analysis) get an impact value (the beta value) then try to sum the impact values for each variable together. This way over measures the importance of the variable and any model using this approach would way overestimate its predictive ability.

Allways software when backfitting rarely has a model that predicts 40% of the winners. Usually they predict about 35% of the winners.

The problem is that class, speed, pace ratings really measure much of the same things, and shouldn't be treated as independent measures. You are double counting in models that include them. I think one approach would be to create composite variables using factor analysis and then use those variables in a logistic analysis. However, these new variables would be pretty much like power figures (Bris Prime Power, etc.) and you'd just be reinventing the wheel.

I wish statistical modeling such as regression worked better. This is not to say that database modeling doesn't work, but that is really a different approach than doing linear equation modeling.

traynor
02-07-2006, 01:33 PM
LaughAndBeMerry wrote: <Are you saying that assuming you're looking at the right data, that a logistic equation should explain 90+% of whether a horse won or lost? I'm assuming that the 0/1 variable is win/lose. That seems like an awfully high r2. Do you adjust the 0/1 in any way(e.g. give horses credit for finishing a close second or third)? I would think there's much more than 5% randomness in the outcome due to trip, pace, horse having an off day, etc.>

We are talking about apples and oranges. The logistic function models the data. You put in numbers, SPSS calculates the "fit" of those numbers to a function. The function is a model of the data entered; it is not a prediction of which horse will win some other race. The 0/1 variable (r or r^2) is a statement of how closely the function developed by SPSS fits the data points you entered.

That is looking backwards. Looking forward, if you use that function, entering the new values for the race in question, you can determine how closely the new values fit the previous values. That is pretty much all it will tell you. You could do the same thing a bit more easily if you enter the new data as a scatter plot along with the old data, and then graph the function on the same screen to see how closely the "new" data points coincide with the function developed to model the "old" data points (presumably from previous winners).
Good Luck

garyoz
02-07-2006, 02:45 PM
The logistic function does not model the data. The logistic function is a transformation of the dependent variable from one or zero to an S-shaped curve that approximates a normal curve. Regressing against a dichotomous (1,0) violates the assumption in regression analysis of the independence of error terms (observed data minus the expected value on the best fit curve). Other than that you can approach logistic regression pretty much as standard multiple linear regression. No way you can get an R-squared above .95 (an indicator of the overall explanatory ability for the model--in other words how much variance it accounts for) for predicting the probability of winning.

arkansasman
02-07-2006, 07:04 PM
Pseudo R2 in a horse racing logit model should be computed as follows:

1 - LN(model) / LN(1/Nj)

Traynor - Is this the formula that SPSS uses?

obeguy
02-07-2006, 07:35 PM
If probabilities don't sum to 1 you are using the wrong model.
Also the goal is not to predict the winner but to find inefficiencies in the betting pool where the expected value of the probability of winning times the payoff would show a profit.

traynor
02-07-2006, 08:47 PM
garyoz wrote: <The logistic function does not model the data. The logistic function is a transformation of the dependent variable from one or zero to an S-shaped curve that approximates a normal curve. >

When you generate a logistic function in SPSS, the function does not exist; SPSS models the existing data to generate the function. Whether the use of the logistic regression is appropriate or not depends on the constraints. For example, human age can be modeled by a logistic--it can't be less than zero, and the upper limit is somewhere in the 125-130 range.
Good Luck

traynor
02-07-2006, 08:52 PM
arkansasman wrote: <Is this the formula that SPSS uses?>

Click the link for a better (and really simple) explanation.
Good Luck

http://www.wmueller.com/precalculus/families/1_80.html

garyoz
02-07-2006, 09:30 PM
From the SPSS help file:

Logistic regression is useful for situations in which you want to be able to predict the presence or absence of a characteristic or outcome based on values of a set of predictor variables. It is similar to a linear regression model but is suited to models where the dependent variable is dichotomous. Logistic regression coefficients can be used to estimate odds ratios for each of the independent variables in the model. Logistic regression is applicable to a broader range of research situations than discriminant analysis.

Example. What lifestyle characteristics are risk factors for coronary heart disease (CHD)? Given a sample of patients measured on smoking status, diet, exercise, alcohol use, and CHD status, you could build a model using the four lifestyle variables to predict the presence or absence of CHD in a sample of patients. The model can then be used to derive estimates of the odds ratios for each factor to tell you, for example, how much more likely smokers are to develop CHD than nonsmokers.

Statistics. For each analysis: total cases, selected cases, valid cases. For each categorical variable: parameter coding. For each step: variable(s) entered or removed, iteration history, -2 log-likelihood, goodness of fit, Hosmer-Lemeshow goodness-of-fit statistic, model chi-square, improvement chi-square, classification table, correlations between variables, observed groups and predicted probabilities chart, residual chi-square. For each variable in the equation: coefficient (B), standard error of B, Wald statistic, R, estimated odds ratio (exp(B)), confidence interval for exp(B), log-likelihood if term removed from model. For each variable not in the equation: score statistic, R. For each case: observed group, predicted probability, predicted group, residual, standardized residual.

Hence you are using the probability from one to zero as the dependent variable, the so-called odds ratio (or probability) of the dependent variable being 1 (the horse winning) or 0 the horse losing based upon the predictor variable. This dependent variable usually takes the form of a curve (otherwise known as a probability density function). Other than that as stated in the help file, it is very similar to linear regression. BTW, the constant (alpha in the regression model) is set to zero unless specified to be measured by the model.

Traynor, I've got no idea what your age example is getting at. In a logistic regression the dependent variable is dichotomous--one or zero, success or failure, has disease or disease free.

Logistic regression should be a natural for handicapping modeling, but for the reasons listed above (and in numerous other posts) it does not work well.

traynor
02-08-2006, 12:33 AM
garyoz wrote: <Traynor, I've got no idea what your age example is getting at. In a logistic regression the dependent variable is dichotomous--one or zero, success or failure, has disease or disease free.>

I think we are talking about different things. My reference is to a logistic function, generated by SPSS to model a set of data. My question is why a logistic function would be considered appropriate for horse racing. As for the age example, that should be self-explanatory. The nature of logistic functions is the S-shape--indicating both an upper limit and a lower limit. That is why it is S-shaped; it has both an upper and lower bounded region that are never reached.
Good Luck

garyoz
02-08-2006, 06:44 AM
Traynor, It seems you don't have a handle on logistic regression. It would be highly suitable for horse race modeling because it uses the probability of an outcome as what the model is trying to predict. Hence the dependent variable (what the equation is predicting) can be expressed as a percentage of the likelihood of the event occuring. The dependent variable is usually a normal curve with values between 0 and 1 (although theoretically the 0 and 1 values are never reached--correct there). The problem with the modeling is on the predictor variable side, with the major issue being a high correlation between the variables (class and speed, etc.) leading to what is called multicolinearity. There have been extensive posts on this before.

GameTheory
02-08-2006, 01:03 PM
It is not appropriate to use logistic regression to PREDICT age. It would be appropriate to use age TO PREDICT some other binary variable, like alive or dead. The result of logistic regression is an S-shaped probability curve. Age itself would tend to be normally-distributed, wouldn't it? (Bell-shaped)

wkcalvin
02-09-2006, 02:56 AM
Hi every body, I am back, thanks so much for your helpful information, special thanks to GameTheory.



In past few days, I have work on the MLR using SPSS, the result (simulate) is not bad.



I am from HK, here only have 2 race courses and around 1000 horses, all track works has been recorded, so MLR is fit to HK's races.



I am using around 15 factors and will try to find out more at future, the most important that I found is:



Career Starts - the GLODEN PERIOD of a horse is between its 3RD - 15TH starts



Another one is related to Pedigree, I am not talking about Sire or Mare, but the yearling pricing, the yearling price can reflect 'HOW GOOD IS THIS HORSE'. I only bet on the horse with yearling pricing over 50K AUD (most of HK's horse came from AUS)



I think most of you may already knew these factors, but I still like to share with you



If you guys need HK's information, send a message to me, I will try my best



Thanks you all again :D :cool: ;)

traynor
02-09-2006, 11:34 PM
garyoz wrote: <Traynor, It seems you don't have a handle on logistic regression. It would be highly suitable for horse race modeling because it uses the probability of an outcome as what the model is trying to predict. Hence the dependent variable (what the equation is predicting) can be expressed as a percentage of the likelihood of the event occuring. The dependent variable is usually a normal curve with values between 0 and 1 (although theoretically the 0 and 1 values are never reached--correct there). The problem with the modeling is on the predictor variable side, with the major issue being a high correlation between the variables (class and speed, etc.) leading to what is called multicolinearity. There have been extensive posts on this before.>

gary, it seems you don't have a handle on modeling. Models never, ever "try to predict" anything. The best they can do is approximate a description of past events. And as much as I appreciate the information about terminology, it is not necessary. I have been wading through the academic slums of the osmosis of the cosmosis for the past seven years, and I much prefer the use of simple terms to describe simple phenomena. I would prefer to dismiss "multicolinearity" as meaningless correlation, and let it go at that.
Good Luck

traynor
02-09-2006, 11:49 PM
GameTheory wrote: <It is not appropriate to use logistic regression to PREDICT age. It would be appropriate to use age TO PREDICT some other binary variable, like alive or dead. The result of logistic regression is an S-shaped probability curve. Age itself would tend to be normally-distributed, wouldn't it? (Bell-shaped)>

I think I may have insufficiently explained the point. Logistic regression is useful for modeling bounded regions. Age is a bounded region--it has no negative values, and it has a definite upper limit. The distribution of age within a random sample can be modeled quite accurately with a logistic regression on age distribution in representative subsets of that sample. So I would have to disagree that its use would be inappropriate.

Without going off into jargon land, I would be interested in your opinion as to why you believe logistic regression is appropriate to model horse races. That is not sarcasm; I am a highly pragmatic person, and if you can tell me a better way, I would sincerely appreciate it. To my view, logistic regression is one of the least useful models for horse races.

"Dichotomous" and "binary" are not synonyms. Dichotomous simply means divided into two groups. That set includes, but is not limited to, the set of binary values. Applying it to age, a dichotomy exists between ages < 25, and ages =>25 (or any other set of values). Dichotomous refers to the division only, and does not imply equivalencies in the number of values in each of the resulting segments.
Good Luck

GameTheory
02-10-2006, 02:26 AM
I think I may have insufficiently explained the point. Logistic regression is useful for modeling bounded regions. Age is a bounded region--it has no negative values, and it has a definite upper limit. The distribution of age within a random sample can be modeled quite accurately with a logistic regression on age distribution in representative subsets of that sample. So I would have to disagree that its use would be inappropriate.You'll have to give me an example of what you're taking about. We're talking about using one set of variables to predict another, a set of input variables to predict an output variable, a set of independent variables to predict a dependent one. In the case of logistic regression (at least the form we are discussing), the dependent variable is always 0/1 and predictions generated are on a probability curve in the range of 0 to 1.

Without going off into jargon land, I would be interested in your opinion as to why you believe logistic regression is appropriate to model horse races. That is not sarcasm; I am a highly pragmatic person, and if you can tell me a better way, I would sincerely appreciate it. To my view, logistic regression is one of the least useful models for horse races.I never stated that I did believe logistic regression is appropriate to model horse races, and it is certainly not my preferred method. But it does have a lot in common with other approaches (machine learning approaches) that I do use. And that fact that Bill Benter made millions of dollars using logistic regression to model horse races certainly gives it some credibility as to its appropriateness to the task.

"Dichotomous" and "binary" are not synonyms. Dichotomous simply means divided into two groups. That set includes, but is not limited to, the set of binary values. Applying it to age, a dichotomy exists between ages < 25, and ages =>25 (or any other set of values). Dichotomous refers to the division only, and does not imply equivalencies in the number of values in each of the resulting segments.Point? I thought you didn't want to go off into jargon-land? Logistic regression uses 0/1 for the dependent variable. It has two possible values; it's binary. Each value MAY represent a range of values of some other variable it represents, like age < 25, but that is irrelevant. And who said anything at all about "equivalencies in the number of values in each of the resulting segments"? What are you talking about?

Let's not take something real simple and obfuscate it, ok?


Actually, now that I think about it please don't respond to any of this. I have no interest in discussing this subject or any other with you since I think you are just a trouble-maker. It really is amazing how you "misunderstand" and are simultaneously "misunderstood" in most of the threads in which you inflict yourself...

traynor
02-10-2006, 01:24 PM
GameTheory wrote: <You'll have to give me an example of what you're taking about. We're talking about using one set of variables to predict another, a set of input variables to predict an output variable, a set of independent variables to predict a dependent one. In the case of logistic regression (at least the form we are discussing), the dependent variable is always 0/1 and predictions generated are on a probability curve in the range of 0 to 1.>

Much research is based on independent variables being tweaked to gauge the effect of the tweaking on a dependent variable. Because there are no independent (controllable) variables in horse racing (at least for the average bettor), all you get are correlations. One variable is associated with, rather influencing, another variable. To view correlations as dependent/independent is a convenience of modeling. That convenience should not be at the expense of attributing causality. What happens if you flip X and Y? Not much, because neither "causes" the other. Neither "predicts" the other--they are simply associated.

The only thing that can be stated about correlations is that in z cases in which both X and Y appeared, the outcome was whatever. If you divide X by Y, (or divide Y by X, it doesn't matter) you can call it an impact value, or you can call it nonsense. In that situation, I don't understand why anyone would think a logistic function would be any better than a simple percentage; Z% of the time, when X exists, Y also exists.
Good Luck

traynor
02-10-2006, 01:37 PM
GameTheory wrote: <Actually, now that I think about it please don't respond to any of this. I have no interest in discussing this subject or any other with you since I think you are just a trouble-maker. It really is amazing how you "misunderstand" and are simultaneously "misunderstood" in most of the threads in which you inflict yourself...>

If you accuse me of failing to be dazzled by a flurry of words and a superior attitude, I plead guilty. Self-proclaimed expertise and pontification impresses me very little. I am similarly unimpressed by long, involved postings that conclude with "please don't respond." I think you have some good ideas, some valuable information, and I think it is more useful to expose those ideas than it is to digress into petulance.

I have no interest whatsoever in attacking you or your ideas, nor in "making points" to create an impression of expertise. I bet on horse races for a living, and I am always interested in new (or different) ideas.
Good Luck

garyoz
02-10-2006, 01:52 PM
We are not talking about an experiment in a lab, but observations from the reality--variables are not being tweaked. Multiple regression allows you to examine more than one variable at a time, and potentially to build a model that combines the effects of multiple variables--in this case, on a dependent variable which is the probability of winning. These variables are observations from past performances--this is Statistics 101 and Research Design 100. This discussion is getting tedious.

traynor
02-10-2006, 08:43 PM
garyoz wrote: <We are not talking about an experiment in a lab, but observations from the reality--variables are not being tweaked. Multiple regression allows you to examine more than one variable at a time, and potentially to build a model that combines the effects of multiple variables--in this case, on a dependent variable which is the probability of winning. These variables are observations from past performances--this is Statistics 101 and Research Design 100. This discussion is getting tedious.>

I agree that it is not an experiment; it is uncontrolled observation. Yes, a model can be constructed that combines (correctly or incorrectly) the association of a number of factors. If you get beyond Statistics 101 and Research Design 100--which I did a long time ago, just as you probably did--the point becomes, "Do these numbers really mean anything?" or, more to the point, "Do they mean what I think they mean?" In most cases, the answer is "no."

Another observation from basic stats; correlation does not equal causality. With the type of observations from past performances you mention, all that can ever be determined is a suggested, possible trend--not a prediction. Whether simple percentages, a linear model, or a logistic function derived from a multiple regression analysis is used to create the model, it is still only an approximation of a function that "fits" what has already happened. One is no more "predictive" than another. Which has been my point all along.
Good Luck

prank
04-28-2006, 12:05 AM
With the type of observations from past performances you mention, all that can ever be determined is a suggested, possible trend--not a prediction.

Ouch. Actually, if you think about the sample distribution and one's assumptions about the sample distribution approximating the population distribution, pretty much the standard assumption in modeling, then we expect that we can make predictions.



Whether simple percentages, a linear model, or a logistic function derived from a multiple regression analysis is used to create the model, it is still only an approximation of a function that "fits" what has already happened. One is no more "predictive" than another. Which has been my point all along.

Yes, but you can try cross-validation, to get started.

dav4463
04-28-2006, 12:11 AM
This thread makes me feel dumb! Is it about horseracing? :confused: :)

toetoe
04-28-2006, 12:16 AM
Prankster,

that's what PA is all about --- you validate me, and I validate you. :)

prank
04-29-2006, 06:17 PM
Prankster,

that's what PA is all about --- you validate me, and I validate you. :)

Thanks for the welcome. :)

I'm obviously new here, but I'm scouting the scene. I won't pretend to be a horse race fan or gambler. Instead, I'm a statistician and my interest is in rank prediction. Naturally, horse racing is one really great example of rank prediction. This summer, I will be working on several projects in ranking, and I am sort of checking out the literature, as the case may be. I guess you all are more interested in ranking than almost any other field, though it's getting popular elsewhere -- those search engines are really coming along nicely. :D

Unfortunately, I think that with all the money at stake in this field, getting involved, although appealing in some ways, involves a rich investment for data collection. I'll post more asking for ideas. I guess the "Computer Software" forum is the correct one, is that right?

Thanks!

Prank

sjk
04-29-2006, 07:10 PM
Don't know much about the application of ranking elsewhere but I would say if you want to bet races you need an understanding of each horse's probability of win and place.

Probably Overlay will support and JeffP will disagree.

Actually a scientific perspective is welcome.

prank
04-29-2006, 09:28 PM
Don't know much about the application of ranking elsewhere but I would say if you want to bet races you need an understanding of each horse's probability of win and place.

Probably Overlay will support and JeffP will disagree.

Actually a scientific perspective is welcome.

You're right: the probability of an event, as I understand the betting systems that rely upon information theory, is critical in deciding how to allocate your bets (or investments / portfolio allocations, in other contexts). I know that Shannon had a method, and there's Cover (of Cover & Thomas fame) who's also written on it. I agree that this is very important. From the theoretical perspective that I've had so far, it seems it's important in any kind of betting.

Unfortunately, most work in ranking systems use a surrogate convex loss function that can help in optimizing ranking, and usually yield a probability estimate, but these loss functions may not necessarily correspond with real-world loss functions like actual losses and wins at the track. I'm actually fascinated by the different loss functions and the different methods for minimizing the loss. I'm also fascinated by many non-linear transformations (not just, say, ordinal logistic regression) that can be used.

I'll admit that I'm completely ignorant when I'm reading about variables like pace & speed numbers. Do I want to learn more about them? Only insofar as they may help me work on ranking. :) In each domain that I'm working on, I have to learn about how the data is collected, what's important, etc., and some domains are more involved than others. In the other projects I'm working on, I have data from academic contacts and data I've been able to collect by myself. For horse racing, I think it's like much data in finance: the rich data sets are hard for academics to get their hands on.

So, here I am. I'll probably drop in a bit, but I've got to prepare & take finals over the next few weeks.

Well, I'm unfortunately contributing to taking this thread off-topic, and my apologies. I'll stick to my data interests in the thread I started over in the computer software forum.

Thanks!

sjk
04-29-2006, 09:35 PM
When it comes to betting races neither scientific analysis nor traditional handicapping can stand tall without respect for the other.

prank
04-29-2006, 09:45 PM
When it comes to betting races neither scientific analysis nor traditional handicapping can stand tall without respect for the other.

I'll agree with that. Actually, I know very little about handicapping, and I found the articles by Benter, et al., to be very interesting. The book on the efficiency of horse race markets is interesting, but I guess I'm not much of an efficient market kind of guy (at least not for now). In grad school, we just have the time to focus on one or two things, get done, get out, and get to work. :) So, my apologies if I seem too narrowly focused. Perhaps, on the other hand, someone here would be interested in some kind of collaboration.

Thanks again for the input.