Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Thoroughbred Horse Racing Discussion > General Handicapping Discussion


Reply
 
Thread Tools Rate Thread
Old 09-24-2014, 09:41 AM   #76
classhandicapper
Registered User
 
classhandicapper's Avatar
 
Join Date: Mar 2005
Location: Queens, NY
Posts: 20,686
Quote:
Originally Posted by Cratos
Thus I would not use either Equibase or DRF data because the data from those sources is collected toward race performance and not horse performance. I would use Trakus data and I would collect historical Trakus data from Santa Anita Park for two basic reasons: 1. Santa Anita Park has multiple years of Trakus data and 2. Santa Anita Park dirt track is a “perfect” 1 mile track layout.
Trakus data is very useful for a variety of things, but there are ground loss and timing errors in it also.

The entire issue with fractions is accuracy.

No one has fully accurate wind data. Weather reports from locations away from the track don't cut it. The best available data is probably obtained by experienced players located at the track that can look at trees, flags, and feel what's going on (including gusts) in various sections of the track and then provide a very general assessment of wind direction, intensity, and gusts for each race to the figure maker. That's the kind of data TG and Ragozin supposedly gather.

The other problem is different parts of a track get different levels of maintenance between races and are exposed to different amounts of sun and wind due to the grandstand and other track features. That changes moisture content etc.. The chute could be very deep and tiring, but the rest of the backstretch glib. The clubhouse turn could be way different from the far turn. The backstretch could be way different than the stretch. So the first 2F of a 6F race could be way different than the first 2F of a mile race out of a chute for no other reason than the surface is different in different sections of the track.

That is only half the problems.

Good pace figure makers will try to account for some of this stuff, but there will still be loads of significant inaccuracies.

This is partly why running styles are generally preferred over fractions for predicting the leaders unless the fraction differences are large enough to overcome the potential inaccuracies. The other being that front runners generally only run as fast as they have to secure the lead. So the fastest horse in the world could have slow fractions for multiple races simply because he happened to draw into fields without much other early speed.
__________________
"Unlearning is the highest form of learning"
classhandicapper is offline   Reply With Quote Reply
Old 09-24-2014, 03:53 PM   #77
thoroughbred
Registered User
 
Join Date: Aug 2001
Location: Santa Barbara, CA
Posts: 509
Quote:
Originally Posted by classhandicapper
Trakus data is very useful for a variety of things, but there are ground loss and timing errors in it also.

The entire issue with fractions is accuracy.

No one has fully accurate wind data. Weather reports from locations away from the track don't cut it. The best available data is probably obtained by experienced players located at the track that can look at trees, flags, and feel what's going on (including gusts) in various sections of the track and then provide a very general assessment of wind direction, intensity, and gusts for each race to the figure maker. That's the kind of data TG and Ragozin supposedly gather.

The other problem is different parts of a track get different levels of maintenance between races and are exposed to different amounts of sun and wind due to the grandstand and other track features. That changes moisture content etc.. The chute could be very deep and tiring, but the rest of the backstretch glib. The clubhouse turn could be way different from the far turn. The backstretch could be way different than the stretch. So the first 2F of a 6F race could be way different than the first 2F of a mile race out of a chute for no other reason than the surface is different in different sections of the track.

That is only half the problems.

Good pace figure makers will try to account for some of this stuff, but there will still be loads of significant inaccuracies.

This is partly why running styles are generally preferred over fractions for predicting the leaders unless the fraction differences are large enough to overcome the potential inaccuracies. The other being that front runners generally only run as fast as they have to secure the lead. So the fastest horse in the world could have slow fractions for multiple races simply because he happened to draw into fields without much other early speed.
Classhandicapper,

EXCELLENT
__________________
Thoroughbred
thoroughbred is offline   Reply With Quote Reply
Old 09-24-2014, 04:43 PM   #78
DeltaLover
Registered user
 
DeltaLover's Avatar
 
Join Date: Oct 2008
Location: FALIRIKON DELTA
Posts: 4,439
A first step towards the creation of the model, I created a Neural Network that is able to predict the second call leader 23 % of the time assuming a 7 horse race. Note that a completely random model shows only 14% success.

To train the model, I am using a set of 800 races while for back testing a set of 200 races who are never seen from the NN before. All the races consist of exactly seven horses, while each horse has at least three past performances.

For each starter I am using six data points while the full algorithm to select them can be seen here:

Code:
def get_input_for_starter(race, starter):
    """ Creates the input that will be used for the NN for the specified starter """
    data = []
    pp_to_use = None
    pps = [pp for pp in starter.past_performances if int(pp.first_call_position) == 1]
    if len(pps) > 0:
        pp_to_use = sorted(pps, key=lambda pp: pp.second_call_position)[0]
    if pp_to_use is None:
        pp_to_use = sorted(starter.past_performances, key=lambda pp: pp.second_call_position)[0]
    pp = pp_to_use
    data.append(normalize(abs(pp.distance), 2000,900))
    data.append(normalize(int(starter.post_position),40, 1))
    data.append(normalize(int(pp.post_position),40, 1))
    data.append(normalize(int(pp.first_call_position),40, 1))
    data.append(normalize(int(pp.start_call_position),40, 1))
    if pp.first_call_beaten_lengths_only.strip() == '':
        data.append(0)
    else:
        data.append(normalize(float(pp.first_call_beaten_lengths_only),10., 0.1))
    return data

The network I am using consists of a 42 point input layer and a 7 point output, while it has three hidden layers with 100 – 30 – 30 nodes.

The related code can be seen here:

Training:
https://github.com/deltalover/hoplat...pp/apply_nn.py

Back testing:
https://github.com/deltalover/hoplat...backtest_nn.py

Creating the input
https://github.com/deltalover/hoplat...ld_nn_input.py

The neural network implementation I am using for this exercise can be found here:

pybrain


If there is still interest on the topic, I will continue with a Logistic Regression model calculating a probability vector.

Last edited by DeltaLover; 09-24-2014 at 04:46 PM.
DeltaLover is offline   Reply With Quote Reply
Old 09-24-2014, 05:35 PM   #79
Cratos
Registered User
 
Join Date: Jan 2004
Location: The Big Apple
Posts: 4,252
Quote:
Originally Posted by classhandicapper
Trakus data is very useful for a variety of things, but there are ground loss and timing errors in it also.

The entire issue with fractions is accuracy.

No one has fully accurate wind data. Weather reports from locations away from the track don't cut it. The best available data is probably obtained by experienced players located at the track that can look at trees, flags, and feel what's going on (including gusts) in various sections of the track and then provide a very general assessment of wind direction, intensity, and gusts for each race to the figure maker. That's the kind of data TG and Ragozin supposedly gather.

The other problem is different parts of a track get different levels of maintenance between races and are exposed to different amounts of sun and wind due to the grandstand and other track features. That changes moisture content etc.. The chute could be very deep and tiring, but the rest of the backstretch glib. The clubhouse turn could be way different from the far turn. The backstretch could be way different than the stretch. So the first 2F of a 6F race could be way different than the first 2F of a mile race out of a chute for no other reason than the surface is different in different sections of the track.

That is only half the problems.

Good pace figure makers will try to account for some of this stuff, but there will still be loads of significant inaccuracies.

This is partly why running styles are generally preferred over fractions for predicting the leaders unless the fraction differences are large enough to overcome the potential inaccuracies. The other being that front runners generally only run as fast as they have to secure the lead. So the fastest horse in the world could have slow fractions for multiple races simply because he happened to draw into fields without much other early speed.
“Trakus data is very useful for a variety of things, but there are ground loss and timing errors in it also”

Your above statement is spurious and without merit because you have provided no proof to your assertion that there are ground loss and errors in the Trakus data. Yes, the Trakus system is not perfect and coming from an engineering culture I have never seen a perfect system and I have worked on some very sophisticated aerospace and medical device systems at the Six Sigma level of quality.

However I will attempt to give a brief overview between the two existing race timing systems, The Legacy Beam System and Trakus Sensor System.

It is useful to understand that Trakus is a speed-distance curve measurement system for thoroughbred horseracing that uses sensor technology to measure the speed of each horse in the race with respect to its distance traveled in the race and from that measurement, the distance between horses at the time of POC (point of call) can be measured.

In contrast, the legacy “beaten length” system is a point measurement system which uses beam technology to measure the time of the leading horse at the pre-determined POC of the race and does not measure distance. Distance between horses is calculated in this system from the non-standard metric of the length.

In essence the legacy “beaten length” system does not measure distance and only measure the time of the leading horse at the aforementioned predetermined POC. Distance between horses using this method is fanciful.

An example of the difference between the two methods comes from the 10th race at Belmont on October 14, 2013 as follows:

The race was run at the 7F distance on the turf, but only the 1/4M POC is used here to illustrate the difference between the two methods because it would be redundant to do the other POCs of the race.

Trackus Method

Leader at the 1/4M POC is the #7, Giant Jo and it travelled a distance of 1,328 feet in 22.35 seconds Mark My Way the #2 horse is in 9th place at the 1/4M POC of the race and it also traveled 1.328 feet, but its time is 23.48 seconds.

Therefore at the 1/4M POC what is the distance between the two horses?

Divide the distance traveled, 1328 ft by the time elapsed and convert into feet/second which for Giant Jo is 59.42 feet/second. For Mark My Way the metric is 56.56 feet/second

At 22.35 seconds we know that Giant Jo traveled 1,328 feet which is the 1/4M POC of the race as measured by Trakus. But Mark My Way at this point would have only traveled 1,264.18 feet (1,328*56.56/59.42)

Therefore the distance between the two horses at the 1/4M POC of the race would be: 1,328 – 1,264.08 = 63.92 feet.

Converting that into feet per 1/5 second or lengths behind it would be:
63.92/11.88 = 5.38 lengths.

Checking our calculation:

23.48 – 22.35 = 1.13 seconds or 63.92/56.56 = 1.13 seconds.
Equibase Method (Beaten Length)

If you go to the Equibase chart for the 10th race at Belmont on October 14, 2013 you will see that Giant Jo was in the lead at the 1/4M POC, but its time according to Equibase was 22.43 seconds or.08 slower than Trakus which had Giant Jo in the time of 22.35 seconds andMark My Way was in 9th place at 1/4M POC on the Equibase chart and if you count the lengths behind you will have 13.

Converting that into time and using 9 feet per length you will have 117 feet/11.88 =1.97 seconds and adding that to 22.43 you will get 24.40 seconds for the estimated time for Mark My Way at the 1/4M POC.
However there are different metrics used for the length and therefore Mark My Way time can be one of many different times under the beaten length methodology.

Summary

Trakus is the better of the two timing systems and can be used to measure the distance between horses and if put into a spreadsheet which is easy to do these calculations can be made very quickly in a non-vapid manner.

Your comments about wind calculation are so far-fetched they are hard to believe.

The following statement needs further explanation by you: “No one has fully accurate wind data. Weather reports from locations away from the track don't cut it” because either you don’t understand meteorology or you just being cynical.

In determining the weather, sets of surface measurements are important data to meteorologists. They give a snapshot of a variety of weather conditions at one single location and are usually at a weather station, a ship or a weather buoy. The measurements taken at a weather station can include any number of atmospheric observables. Usually, temperature, pressure, wind measurements, and humidity are the variables that are measured by a thermometer, barometer, anemometer, and hygrometer, respectively.
Upper air data are of crucial importance for weather forecasting. The most widely used technique is launches of radiosondes. Supplementing the radiosondes a network of aircraft collection is organized by the World Meteorological Organization.

Therefore with satellite technology weather measurements can be made very accurate in place in the world.

There two difference impedance to racehorse during a race and they are wind force which you attempted to address and air resistance (aerodynamic drag).
In the realm of things near the ground (like a racehorse would be), the wind is very erratic due to interaction with ground features. This can make it difficult to really know what speed is effectively acting on a structure in close proximity to the ground, but the generic wind pressure formula is accurate enough for horseracing use, however figuring out what wind speed to use with it is not as straightforward as I would like.

I not going into the math of calculating the wind speed, but your notion of using wind data “probably obtained by experienced players located at the track that can look at trees, flags, and feel what's going on (including gusts) in various sections of the track and then provide a very general assessment of wind direction, intensity, and gusts for each race to the figure maker” is beyond my comprehension and I will leave there.
The aerodynamic effect on a race horse is different and it has a force (F) component and a power (P)

I am not going into any mathematical calculation of air resistance, but I will say that the greater speed of the horse during the race, the greater air resistance.
__________________
Independent thinking, emotional stability, and a keen understanding of both human and institutional behavior are vital to long-term investment success – My hero, Warren Edward Buffett

"Science is correct; even if you don't believe it" - Neil deGrasse Tyson
Cratos is offline   Reply With Quote Reply
Old 09-24-2014, 05:40 PM   #80
DeltaLover
Registered user
 
DeltaLover's Avatar
 
Join Date: Oct 2008
Location: FALIRIKON DELTA
Posts: 4,439
Quote:
Originally Posted by Cratos
“Trakus data is very useful for a variety of things, but there are ground loss and timing errors in it also”

Your above statement is spurious and without merit because you have provided no proof to your assertion that there are ground loss and errors in the Trakus data. Yes, the Trakus system is not perfect and coming from an engineering culture I have never seen a perfect system and I have worked on some very sophisticated aerospace and medical device systems at the Six Sigma level of quality.

However I will attempt to give a brief overview between the two existing race timing systems, The Legacy Beam System and Trakus Sensor System.

It is useful to understand that Trakus is a speed-distance curve measurement system for thoroughbred horseracing that uses sensor technology to measure the speed of each horse in the race with respect to its distance traveled in the race and from that measurement, the distance between horses at the time of POC (point of call) can be measured.

In contrast, the legacy “beaten length” system is a point measurement system which uses beam technology to measure the time of the leading horse at the pre-determined POC of the race and does not measure distance. Distance between horses is calculated in this system from the non-standard metric of the length.

In essence the legacy “beaten length” system does not measure distance and only measure the time of the leading horse at the aforementioned predetermined POC. Distance between horses using this method is fanciful.

An example of the difference between the two methods comes from the 10th race at Belmont on October 14, 2013 as follows:

The race was run at the 7F distance on the turf, but only the 1/4M POC is used here to illustrate the difference between the two methods because it would be redundant to do the other POCs of the race.

Trackus Method

Leader at the 1/4M POC is the #7, Giant Jo and it travelled a distance of 1,328 feet in 22.35 seconds Mark My Way the #2 horse is in 9th place at the 1/4M POC of the race and it also traveled 1.328 feet, but its time is 23.48 seconds.

Therefore at the 1/4M POC what is the distance between the two horses?

Divide the distance traveled, 1328 ft by the time elapsed and convert into feet/second which for Giant Jo is 59.42 feet/second. For Mark My Way the metric is 56.56 feet/second

At 22.35 seconds we know that Giant Jo traveled 1,328 feet which is the 1/4M POC of the race as measured by Trakus. But Mark My Way at this point would have only traveled 1,264.18 feet (1,328*56.56/59.42)

Therefore the distance between the two horses at the 1/4M POC of the race would be: 1,328 – 1,264.08 = 63.92 feet.

Converting that into feet per 1/5 second or lengths behind it would be:
63.92/11.88 = 5.38 lengths.

Checking our calculation:

23.48 – 22.35 = 1.13 seconds or 63.92/56.56 = 1.13 seconds.
Equibase Method (Beaten Length)

If you go to the Equibase chart for the 10th race at Belmont on October 14, 2013 you will see that Giant Jo was in the lead at the 1/4M POC, but its time according to Equibase was 22.43 seconds or.08 slower than Trakus which had Giant Jo in the time of 22.35 seconds andMark My Way was in 9th place at 1/4M POC on the Equibase chart and if you count the lengths behind you will have 13.

Converting that into time and using 9 feet per length you will have 117 feet/11.88 =1.97 seconds and adding that to 22.43 you will get 24.40 seconds for the estimated time for Mark My Way at the 1/4M POC.
However there are different metrics used for the length and therefore Mark My Way time can be one of many different times under the beaten length methodology.

Summary

Trakus is the better of the two timing systems and can be used to measure the distance between horses and if put into a spreadsheet which is easy to do these calculations can be made very quickly in a non-vapid manner.

Your comments about wind calculation are so far-fetched they are hard to believe.

The following statement needs further explanation by you: “No one has fully accurate wind data. Weather reports from locations away from the track don't cut it” because either you don’t understand meteorology or you just being cynical.

In determining the weather, sets of surface measurements are important data to meteorologists. They give a snapshot of a variety of weather conditions at one single location and are usually at a weather station, a ship or a weather buoy. The measurements taken at a weather station can include any number of atmospheric observables. Usually, temperature, pressure, wind measurements, and humidity are the variables that are measured by a thermometer, barometer, anemometer, and hygrometer, respectively.
Upper air data are of crucial importance for weather forecasting. The most widely used technique is launches of radiosondes. Supplementing the radiosondes a network of aircraft collection is organized by the World Meteorological Organization.

Therefore with satellite technology weather measurements can be made very accurate in place in the world.

There two difference impedance to racehorse during a race and they are wind force which you attempted to address and air resistance (aerodynamic drag).
In the realm of things near the ground (like a racehorse would be), the wind is very erratic due to interaction with ground features. This can make it difficult to really know what speed is effectively acting on a structure in close proximity to the ground, but the generic wind pressure formula is accurate enough for horseracing use, however figuring out what wind speed to use with it is not as straightforward as I would like.

I not going into the math of calculating the wind speed, but your notion of using wind data “probably obtained by experienced players located at the track that can look at trees, flags, and feel what's going on (including gusts) in various sections of the track and then provide a very general assessment of wind direction, intensity, and gusts for each race to the figure maker” is beyond my comprehension and I will leave there.
The aerodynamic effect on a race horse is different and it has a force (F) component and a power (P)

I am not going into any mathematical calculation of air resistance, but I will say that the greater speed of the horse during the race, the greater air resistance.
I think the information you are presenting here, is really interesting and comprehensive but I am not exactly sure that it belongs to this thread... What do you think?
DeltaLover is offline   Reply With Quote Reply
Old 09-24-2014, 05:47 PM   #81
DeltaLover
Registered user
 
DeltaLover's Avatar
 
Join Date: Oct 2008
Location: FALIRIKON DELTA
Posts: 4,439
Quote:
Originally Posted by DeltaLover
A first step towards the creation of the model, I created a Neural Network that is able to predict the second call leader 23 % of the time assuming a 7 horse race. Note that a completely random model shows only 14% success.

To train the model, I am using a set of 800 races while for back testing a set of 200 races who are never seen from the NN before. All the races consist of exactly seven horses, while each horse has at least three past performances.

For each starter I am using six data points while the full algorithm to select them can be seen here:

Code:
def get_input_for_starter(race, starter):
    """ Creates the input that will be used for the NN for the specified starter """
    data = []
    pp_to_use = None
    pps = [pp for pp in starter.past_performances if int(pp.first_call_position) == 1]
    if len(pps) > 0:
        pp_to_use = sorted(pps, key=lambda pp: pp.second_call_position)[0]
    if pp_to_use is None:
        pp_to_use = sorted(starter.past_performances, key=lambda pp: pp.second_call_position)[0]
    pp = pp_to_use
    data.append(normalize(abs(pp.distance), 2000,900))
    data.append(normalize(int(starter.post_position),40, 1))
    data.append(normalize(int(pp.post_position),40, 1))
    data.append(normalize(int(pp.first_call_position),40, 1))
    data.append(normalize(int(pp.start_call_position),40, 1))
    if pp.first_call_beaten_lengths_only.strip() == '':
        data.append(0)
    else:
        data.append(normalize(float(pp.first_call_beaten_lengths_only),10., 0.1))
    return data

The network I am using consists of a 42 point input layer and a 7 point output, while it has three hidden layers with 100 – 30 – 30 nodes.

The related code can be seen here:

Training:
https://github.com/deltalover/hoplat...pp/apply_nn.py

Back testing:
https://github.com/deltalover/hoplat...backtest_nn.py

Creating the input
https://github.com/deltalover/hoplat...ld_nn_input.py

The neural network implementation I am using for this exercise can be found here:

pybrain


If there is still interest on the topic, I will continue with a Logistic Regression model calculating a probability vector.
Playing with the input and the structure of the NN, is where the bulk of the work lies in this type of an approach; since there does not exist a deterministic way to decide on either, the task becomes more of a trial and error process, once you set up the platform.

Tweaking a bit the data, adding a couple of data points per horse and also adding a hidden layer to the network, was enough to elevate the success rate to over 25%... I am sure that spending some more time on it, will improve even more the hit rate...
DeltaLover is offline   Reply With Quote Reply
Old 09-24-2014, 05:49 PM   #82
cashmachine
Registered User
 
Join Date: Oct 2012
Posts: 155
Quote:
Originally Posted by DeltaLover
I think the information you are presenting here, is really interesting and comprehensive but I am not exactly sure that it belongs to this thread... What do you think?
I think it doesn't belong to this website either
cashmachine is offline   Reply With Quote Reply
Old 09-24-2014, 05:55 PM   #83
Cratos
Registered User
 
Join Date: Jan 2004
Location: The Big Apple
Posts: 4,252
Quote:
Originally Posted by DeltaLover
I think the information you are presenting here, is really interesting and comprehensive but I am not exactly sure that it belongs to this thread... What do you think?
I agree and I sincerely apologize, but I responded which I probably shouldn't have.

I will not respond to anymore "off thread" topics and allow you to make your case for your "model" construction
__________________
Independent thinking, emotional stability, and a keen understanding of both human and institutional behavior are vital to long-term investment success – My hero, Warren Edward Buffett

"Science is correct; even if you don't believe it" - Neil deGrasse Tyson
Cratos is offline   Reply With Quote Reply
Old 09-24-2014, 05:57 PM   #84
DeltaLover
Registered user
 
DeltaLover's Avatar
 
Join Date: Oct 2008
Location: FALIRIKON DELTA
Posts: 4,439
Quote:
Originally Posted by Cratos
I agree and I sincerely apologize, but I responded which I probably shouldn't have.

I will not respond to anymore "off thread" topics and allow you to make your case for your "model" construction
Please do not take me wrong.. I really think you are providing valuable information and also you are making presenting some provocative ideas, I only propose we try to group our discussions per a single topic as oppose to mix things up.
DeltaLover is offline   Reply With Quote Reply
Old 09-24-2014, 05:59 PM   #85
cashmachine
Registered User
 
Join Date: Oct 2012
Posts: 155
Quote:
Originally Posted by DeltaLover
The network I am using consists of a 42 point input layer and a 7 point output, while it has three hidden layers with 100 – 30 – 30 nodes.
I am not an expert in NN but it seems to me that size of your NN is unreasonably huge. It's expressive power is enormous. I would limit number of hidden layers by 2 max, and max 3 vertices in each hidden layer.

Also why you need 42 input vertices? You don't have to submit whole race as input at once. Just make 6 input vertices, one for each indicator, and submit one horse at a time, and you will get float number as output of NN which represents "strength" of this particular horse; to get probabilities you just use your logistic transformation to the 7 outputs of NN.

Last edited by cashmachine; 09-24-2014 at 06:04 PM.
cashmachine is offline   Reply With Quote Reply
Old 09-24-2014, 06:07 PM   #86
DeltaLover
Registered user
 
DeltaLover's Avatar
 
Join Date: Oct 2008
Location: FALIRIKON DELTA
Posts: 4,439
Quote:
Originally Posted by cashmachine
I am not an expert in NN but it seems to me that size of your NN is unreasonably huge. It's expressive power is enormous. I would limit number of hidden layers by 2 max, and max 3 vertices in each hidden layer.

Also why you need 42 input vertices? You don't have to submit whole race as input at once. Just make 6 input vertices, one for each indicator, and submit one horse at a time, and you will get float number as output of NN; to get probabilities you just use your logistic transformation to the 7 outputs of NN.
I need 42 inputs since the model targets exactly 7 horses and each horse has 6 variables.

Not exactly sure I understand what you mean here:
Quote:
Just make 6 input vertices, one for each indicator, and submit one horse at a time, and you will get float number as output of NN; to get probabilities you just use your logistic transformation to the 7 outputs of NN.
If you mean to convert the output of the NN to a probability, no, this not how a NN is supposed to be used. It might be doable, but it involves additional development that can be better implemented by a genetic algorithm


Quote:
Originally Posted by cashmachine
I am not an expert in NN but it seems to me that size of your NN is unreasonably huge. It's expressive power is enormous. I would limit number of hidden layers by 2 max, and max 3 vertices in each hidden layer.
You are right that we need to be cautious about the number of hidden layers but the size you are proposing is very small given the space we are trying to search. Still, it is possible to have equally good results with just one hidden layer, although the number of nodes is going to be large...
DeltaLover is offline   Reply With Quote Reply
Old 09-24-2014, 06:25 PM   #87
cashmachine
Registered User
 
Join Date: Oct 2012
Posts: 155
Quote:
Originally Posted by DeltaLover
I need 42 inputs since the model targets exactly 7 horses and each horse has 6 variables.

Not exactly sure I understand what you mean here:
It is just different approach to design NN. How do you interpret output of your NN? Is it like, you have 7 output vertices and you want one of them light up: "horse number 3 is the best guess"? Is it how you do it?

You can design NN differently, you just think along following lines. "So I want my NN output "strength" or "worthiness" or "quality" or a horse as float number, so I will have one output vertex. I guess that it is a function of the 6 indicators that I chosen, so I have 6 input vertices. So if I input values of indicators <1.2, 3, 0.5, ...> for horse 1, and I get 0.37 as output, then horse 1 has strength 0.37. Now I input values of indicators for second horse <1.1, 2, 0.7, ...> for horse 2, and I get output 0.25. Since 0.37 is greater than 0.25, then horse 1 is more likely to be leader at the first call. And the greater difference between the output values the more confidence I have in my prediction. In order to get most likely leader at first call, I submit values of indicators for all horses (one horse at a time, so you do it 7 times), and choose the horse for which output was largest. If I want to get probabilities for every horse, I just do logistic transformation on the 7 outputs of the NN".

PS. If you design NN this way you can train it one races with varying number of horses, you won't be limited only to 7 horse races.

Last edited by cashmachine; 09-24-2014 at 06:39 PM.
cashmachine is offline   Reply With Quote Reply
Old 09-24-2014, 06:47 PM   #88
DeltaLover
Registered user
 
DeltaLover's Avatar
 
Join Date: Oct 2008
Location: FALIRIKON DELTA
Posts: 4,439
Quote:
Originally Posted by cashmachine
It is just different approach to design NN. How do you interpret output of your NN? Is it like, you have 7 output vertices and you want one of them light up: "horse number 3 is the best guess"? Is it how you do it?

You can design NN differently, you just think along following lines. "So I want my NN output "strength" or "worthiness" or "quality" or a horse as float number, so I will have one output vertex. I guess that it is a function of the 6 indicators that I chosen, so I have 6 input vertices. So if I input values of indicators <1.2, 3, 0.5, ...> for horse 1, and I get 0.37 as output, then horse 1 has strength 0.37. Now I input values of indicators for second horse <1.1, 2, 0.7, ...> for horse 2, and I get output 0.25. Since 0.37 is greater than 0.25, then horse 1 is more likely to be leader at the first call. And the greater difference between the output values the more confidence I have in my prediction. In order to get most likely leader at first call, I submit values of indicators for all horses (one horse at a time, so you do it 7 times), and choose the horse for which output was largest. If I want to get probabilities for every horse, I just do logistic transformation on the 7 outputs of the NN".


PS. If you design NN this way you can train it one races with varying number of horses, you won't be limited only to 7 horse races.
Your approach is not considering the competition, but always treat every horse individually. it is closer to a logit model. Passing all starters at once considers the race as a whole. also i am only trying to find the leader, not a prob vector. For the probs i use logit models.
DeltaLover is offline   Reply With Quote Reply
Old 09-24-2014, 06:48 PM   #89
cashmachine
Registered User
 
Join Date: Oct 2012
Posts: 155
Quote:
Originally Posted by DeltaLover
Your approach is not considering the competition, but always treat every horse individually. it is closer to a logit model. Passing all starters at once considers the race as a whole. also i am only trying to find the leader, not a prob vector. For the probs i use logit models.
You considering competition when you compare outputs. Moreover, you can design indicators specifically to take into account competition, for example instead of input raw value of speed index you can input rank of horse according to speed index.

Last edited by cashmachine; 09-24-2014 at 06:56 PM.
cashmachine is offline   Reply With Quote Reply
Old 09-24-2014, 06:58 PM   #90
DeltaLover
Registered user
 
DeltaLover's Avatar
 
Join Date: Oct 2008
Location: FALIRIKON DELTA
Posts: 4,439
Quote:
Originally Posted by cashmachine
You considering competition when you compare outputs.
How your model is going to now that the particular horse is the only one who ever took the lead before? Also how about the same horse running against three similar sprinters? obviously this situations should result to different signals, which is impossible if u apply them individually.
DeltaLover is offline   Reply With Quote Reply
Reply





Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
» Current Polls
Wh deserves to be the favorite? (last 4 figures)
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 11:20 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.