Is there any need for an open source data processing tool? [Archive] - Horse Racing Forum - PaceAdvantage.Com

DeltaLover

06-25-2014, 09:53 AM

Would you like to have an open source framework to process bris files, to do handicapping factor research, print custom reports and racing programs, create custom figures etc?

Note that I am not referring to a front end GUI application but to programming APIs instead.

DJofSD

06-25-2014, 10:03 AM

As a web based tool or something else?

When I see "API" I think programming using a language translator on my system. Data could be local, on a server or "cloud based."

Capper Al

06-25-2014, 10:10 AM

Isn't JCapper something like this?

DeltaLover

06-25-2014, 10:39 AM

As a web based tool or something else?

When I see "API" I think programming using a language translator on my system. Data could be local, on a server or "cloud based."

Not necssary web based. The library will be able to collect the data from the file system or data bases running as a pyhon object. Of course this functionality can always be exposed through a web server hiding all the data retrieval and processing details.

DeltaLover

06-25-2014, 10:41 AM

Isn't JCapper something like this?

I am not exactly sure about JCapper. Is it open source anyway? I thought it is a propertitary system but I might be wrong..

DJofSD

06-25-2014, 10:45 AM

Not sure if Jcapper would be what is originally being asked about.

Jeff is a member of this community and better describe what JCapper is all about. But, to help narrow the focus of the thread, this is how I would describe JCapper: a Windows platform app which uses handicapping data from different web based data providers and gives the end user a means of handicapping races using either predefined criteria, or, user created criteria. A back-end data base is stored locally, and, updates to and maintenance of the database are incorporated into JCapper and perform in an unobtrusive manner, keeping the local data base current but without the end user needing knowledge of relational databases.

Now, again, I am offering what I know about JCapper from many years ago when I had obtained it then and learned what I needed about it as part of an exploration of my own. JCapper is written in Visual Basic and would likely not lend itself too readily to any extension via an open source API that is being discussed in the thread. But then I could be very wrong and Jeff can correct my errors.

SpotPlays

06-25-2014, 10:57 AM

DeltaLover, I don't believe this is not something BRIS would approve of, per their customer terms and agreements.

DJofSD

06-25-2014, 11:06 AM

DeltaLover, I don't believe this is not something BRIS would approve of, per their customer terms and agreements.

Do you mean this notice:

Products created with data that were supplied by and are proprietary to Equibase Company LLC. All rights reserved. Reuse of this data is expressly prohibited.

Data provided or compiled by Equibase Company LLC generally are accurate but errors and omissions occur as a result of incorrect data received from others, mistakes in processing and other causes. Churchill Downs Technology Initiatives Company and Equibase Company LLC disclaim responsibility for the consequences, if any, of such errors, but would appreciate having any such errors called to their attention.

The copyright and disclaimer applies to any inquiry run during this session, or to any information retrieved from our World Wide Web Site now or in the future. No part of this inquiry may be reproduced in any form or by any means without written permission of Copyright © 2014 Bloodstock Research Information Services. All rights reserved.

raybo

06-25-2014, 01:32 PM

Storing and distributing proprietary data from another provider would be illegal, unless the distributor has express license to do so from the data provider. Providing a program that uses purchased data from another provider, for the program's users' own use, without further distribution, would not be illegal.

So, creating any open source, or closed source, software that imports and manipulates proprietary data, from localized files (stored on their own computer or their own personal "cloud" storage account), would be legal, as long as the proprietary data itself is not included with the software offering to others. However, if the software's creator has entered into legal contractual agreement with the data provider for access, download/purchase, and import of that data into the software, then that could also be legal as long as the user(s) of the software does not circumvent any portions of the original contractual agreement.

So, as long as the individual users of the software, purchase their own data, legally, the creator of the software itself would not be held responsible for any illegal activities by the individual users of the software.

That is my understanding anyway.

DeltaLover

06-25-2014, 01:36 PM

Of course, I am only refering to the library itself, assuming the data will be stored in a local repository. There is not intention of publishing racing data at all, each user will have its own copy of data stored localy and configure the API to connect to them

Tom

06-25-2014, 01:51 PM

OK, like the old punch card days.
We had a big computer on campus, and we all learned how to key punch cards that we took there and they ran them for us in a program of our choice, we came back a few days later and picked up 345 pounds of print outs. I used my slide rule to separate the sheets! :lol:

But I get your drift.
I think yes, it would be very useful. I have no clue no desire to do this on my own, but if a third party existed to allow me to run my data, I would be very interested.

It is sad the INDUSTRY does not looks for ways to improve itself by cultivating and encouraging its customers. Unless of course you like pink hats.

Good idea. :ThmbUp::ThmbUp::ThmbUp:

raybo

06-25-2014, 01:52 PM

Of course, I am only refering to the library itself, assuming the data will be stored in a local repository. There is not intention of publishing racing data at all, each user will have its own copy of data stored localy and configure the API to connect to them

I think that there would be great interest in such a program, as long as one doesn't need to be a programmer in order to "make it their own", by being able to easily manipulate the raw data to create their own factors.

That was my original intent with the "AllData Project", provide a simple means of importing local Bris files into a widely held app, like Excel, provide the basic PPs, and allow the user to create their own factors and views, without having to learn a programming language. We accomplished all of that, and more.

Robert Goren

06-25-2014, 03:06 PM

Most of us would not mind paying a reasonable fee for historical data. Unfortunately what most of us consider reasonable and what the providers charge are oceans apart. That is the problem that all software of this type face. Trying to operate on a few months of data because of cost concerns and adding to it as you bet seldom works although I think that is what most users try. Who wants to invest a couple of grand into buying data with no guarantee that you find anything useful?

Robert Goren

06-25-2014, 03:09 PM

OK, like the old punch card days.
We had a big computer on campus, and we all learned how to key punch cards that we took there and they ran them for us in a program of our choice, we came back a few days later and picked up 345 pounds of print outs. I used my slide rule to separate the sheets! :lol:

But I get your drift.
I think yes, it would be very useful. I have no clue no desire to do this on my own, but if a third party existed to allow me to run my data, I would be very interested.

It is sad the INDUSTRY does not looks for ways to improve itself by cultivating and encouraging its customers. Unless of course you like pink hats.

Good idea. :ThmbUp::ThmbUp::ThmbUp: Been there, done that. Even doing a sample of 50 races was a huge undertaking.

Tom

06-25-2014, 03:49 PM

Remember the dreaded data clerk who came back to the window without the reams of paper printouts?

"You must have an error in one of your (3 million) punch card! Go fix it and bring them back in!" :eek::faint:

raybo

06-25-2014, 05:17 PM

Been there, done that. Even doing a sample of 50 races was a huge undertaking.

Are punch cards similar to flipping individual binary bit switches, for each 8 or 16 bit byte of data? That is no bed of roses either. That's what had to be done in the USAF to program the '70s era (state of the art - LOL) air navigation/weapons control computers. Those programs ran on punched tape using light readers.

Sapio

06-25-2014, 06:45 PM

Of course, I am only refering to the library itself, assuming the data will be stored in a local repository. There is not intention of publishing racing data at all, each user will have its own copy of data stored localy and configure the API to connect to them

Hi DeltaLover

I think it is a great idea, especially if you make analytical tools available.

Thomas Sapio

DeltaLover

06-25-2014, 08:23 PM

Hi DeltaLover

I think it is a great idea, especially if you make analytical tools available.

Thomas Sapio

I am thinking of a developing independent packages including data storage and retrieval, analytic engine (statistical calculations, clustering etc), a DSL layer to simplify interaction with the system and a web – based presentation layer.

raybo

06-25-2014, 09:13 PM

I am thinking of a developing independent packages including data storage and retrieval, analytic engine (statistical calculations, clustering etc), a DSL layer to simplify interaction with the system and a web – based presentation layer.

What do you have in mind for "composite" factor creation, regarding user interface. How could one select 3 raw factors and combine and weight them, for example, into a new compound factor? Would programming skills be needed for that, or could one simply select the raw data field names and tell the program how to combine them in some non-programatical manner?

I guess what I'm getting at is, if you are writing this program for programmers, why bother? If they are programmers already couldn't they do it themselves?

DeltaLover

06-25-2014, 10:01 PM

What do you have in mind for "composite" factor creation, regarding user interface. How could one select 3 raw factors and combine and weight them, for example, into a new compound factor? Would programming skills be needed for that, or could one simply select the raw data field names and tell the program how to combine them in some non-programatical manner?

I guess what I'm getting at is, if you are writing this program for programmers, why bother? If they are programmers already couldn't they do it themselves?

Nice question.

I do not think that the GUI approach is the best when it comes to what you describe here. Although possible to come up with a UI to allow factor composition, still I believe that this approach is primitive. What makes this approach interesting, is the ability of the computer itself to compose factors without human intervention, something that will allow for a more extensive search. This can be accomplished by the creation of a simple DSL, which can be viewed as a mini programming language with very low learning curve

raybo

06-25-2014, 10:15 PM

Nice question.

I do not think that the GUI approach is the best when it comes to what you describe here. Although possible to come up with a UI to allow factor composition, still I believe that this approach is primitive. What makes this approach interesting, is the ability of the computer itself to compose factors without human intervention, something that will allow for a more extensive search. This can be accomplished by the creation of a simple DSL, which can be viewed as a mini programming language with very low learning curve

Well, so far, you're talking as if programmers are your only audience. So, likely, they will be your only users. In order to reach the non-programmer you have to be able to forget that you're a programmer, and think like a non-programmer. Many "techy" people can't do that.

DeltaLover

06-25-2014, 10:35 PM

I am not visualizing a system that needs a programmer to use it. This is why I am thinking of a DSL, this approach is used widely.

For example there are computer games exposing simple interaction with the user providing verbs like 'turn' 'right' 'left' etc. Lua http://gamedev.stackexchange.com/questions/56189/why-lua-is-so-important-frequently-used-in-game-development

is used for this purposes..

Also excel's macro language is another example of this approach..

Here you can read more about it:

http://en.wikipedia.org/wiki/Domain-specific_language#Examples_2

Robert Goren

06-26-2014, 01:40 PM

When you start talking Excel Macros, you lose at least half the posters here. You would need to come down to the level of PokerTracker to reach most of us. Even that is a reach for some.

raybo

06-26-2014, 02:13 PM

When you start talking Excel Macros, you lose at least half the posters here. You would need to come down to the level of PokerTracker to reach most of us. Even that is a reach for some.

I assume you're addressing me about Excel macros. Excel macros are small programs that can be run inside Excel, or Word, or Access, etc.. They can be "recorded", or written from scratch, or a combination of the two.

When you start the macro recorder in Excel, you will be prompted to name the macro (something that describes what the macro will do), once named then the "recording" begins. It works just like a tape recorder or video recorder, everything that happens is recorded, as VBA programming code. So, you don't have to write the code, Excel does it for you, all you have to do is perform the mouse actions and keyboard actions required to do what you want to be able to replicate in the future (automate repetitive tasks or perform tasks that cannot be done via formulas, etc.). After you have completed all the actions, you just stop the recorder. At that point you can open the code window for that macro and see the code that Excel wrote. After you look at a little code you will begin to see what the code does and ways it can be modified to do other things. You can also place a button on your worksheet and assign that macro to it (simply by right clicking the button and selecting "Assign macro", then selecting the macro you want to assign, from a list of macros already in the workbook), so when the button is clicked, the code in that macro will be performed, the same way every time you click it. You can even combine many small macros into a large macro, that will run all of those smaller macros, in the order you want, with the click of a single button.

You can oftentimes record a very simple macro and then open the code window and modify that code to perform other actions that can't be recorded, (you need to learn a little VBA programming code to do this, or you can simply go to one of the hundreds of Excel tech sites on the internet, post a question asking how to perform a certain task or how to write a certain snippet of code, and someone will tell you how, or write the code for you, which you could then copy and paste into your macro code window). So, you don't need to be a programmer to create macros that automate things in Excel, or perform things that you can't do using formulas, etc., you just need to ask questions to get the answers. I use "MrExcel.com" frequently, and have for years, because, after all these years of working in Excel, I still can't write VBA macros, from scratch. I am definitely NOT a programmer, I just know how to ask questions and solicit help.

JJMartin

06-27-2014, 01:42 AM

Imo, Excel is the best tool for handicapping research and development based on pp's. My program can calculate whatever you can imagine as long as the data needed is in the file such as bris single data files. Example: what are the long term stats on a horse in races only on Wednesday's after a previous win, within 21 days, going from sprints to routes, on a last minute surface change, ranking first in speed but 3 points more than the next best speed, with no more than 20% front runners in the field from only post 3, allowance only,NW3, and trained only by a guy named "Frank". It can be done. And easily too! I could have added 10 more conditions, it does not matter. But like Raybo pointed out, you have to know how to write the formula's and the macro's. Good luck trying to accomplish this in Access or any database using queries. You cannot do it. It took me 8 years of constant refinements and innovations to put this program together, so I am not saying it is easy with Excel, but I am saying it is the best way for database research. Ultimately, the versatility within Excel is the key and your own imagination and skill on how to program it.

Uncle Salty

06-27-2014, 04:30 AM

I too would be interested in seeing a program like this, especially if it can replicate the features in the BRIS PP Gen application.

raybo

06-27-2014, 04:34 AM

I am not visualizing a system that needs a programmer to use it. This is why I am thinking of a DSL, this approach is used widely.

For example there are computer games exposing simple interaction with the user providing verbs like 'turn' 'right' 'left' etc. Lua http://gamedev.stackexchange.com/questions/56189/why-lua-is-so-important-frequently-used-in-game-development

is used for this purposes..

Also excel's macro language is another example of this approach..

Here you can read more about it:

http://en.wikipedia.org/wiki/Domain-specific_language#Examples_2

Go for it! It will either be usable by most players and of value to them, or it won't.

JJMartin

06-27-2014, 01:03 PM

Wasn't there a site called RDSS that tried to do this before?

DeltaLover

06-27-2014, 01:57 PM

Wasn't there a site called RDSS that tried to do this before?

Is this a open source project?

JJMartin

06-27-2014, 02:05 PM

I am not sure, don't know too much about it.

DJofSD

06-27-2014, 02:05 PM

http://paceandcap.com/forums/showthread.php?t=9105

DeltaLover

06-27-2014, 03:12 PM

http://paceandcap.com/forums/showthread.php?t=9105

from what I read here, this is not open source plus it seems tighly coupled to a specific methodology..

traynor

06-28-2014, 09:12 AM

What do you believe the advantage(s) would be of such an effort, that would make it better than apps already available?

DJofSD

06-28-2014, 09:20 AM

from what I read here, this is not open source plus it seems tighly coupled to a specific methodology..
That's right.

Not being a smart-ass, but, why would you expect it to be open source?

Yes, it is tightly coupled to the Sartin Methodology. It basically is the Sartin methods, written in a more modern language and for a Windows GUI.

DeltaLover

06-28-2014, 12:06 PM

What do you believe the advantage(s) would be of such an effort, that would make it better than apps already available?

see here:

http://open-source.gbdirect.co.uk/migration/benefit.html

http://www.pcworld.com/article/209891/10_reasons_open_source_is_good_for_business.html

https://www.google.com/?gws_rd=ssl#q=advantages+of+open+source+software

DJofSD

06-28-2014, 12:19 PM

see here:

http://open-source.gbdirect.co.uk/migration/benefit.html

http://www.pcworld.com/article/209891/10_reasons_open_source_is_good_for_business.html

https://www.google.com/?gws_rd=ssl#q=advantages+of+open+source+software

Interesting reads but those somewhat flawed perspectives only partially answer the question and in a general manner.

What is missing in the panoply of handicapping software offerings that your idea will provide?

BTW, in case you believe I am against open source, I'm agnostic. If you can get adequate solutions using open source, great, but, most companies do not fully realize all of the ramifications. But as a solution for an individual, it is a lot simpler -- either it fits the bill or it doesn't.

traynor

06-28-2014, 02:29 PM

see here:

http://open-source.gbdirect.co.uk/migration/benefit.html

http://www.pcworld.com/article/209891/10_reasons_open_source_is_good_for_business.html

https://www.google.com/?gws_rd=ssl#q=advantages+of+open+source+software

I was more interested in specifics than generic comments on open source vs proprietary. Specifically, what would your app do that others do not already do?

Using software entails an investment of time and effort (for the user). A simile might be OpenOffice--aside from being free, there is almost nothing to recommend it, and most consider it a (very) poor substitute for MS Office. For most end users, the proof is in the doing, not in the cost.

What would the app you envision do that is not already being done? In short, what would you provide as reason for users to discard whatever they are using already (Excel, home-grown, business analysis, data mining) to use the app you envision? Answering that question might enable you to focus more clearly on exactly what you are trying to accomplish.

DeltaLover

06-28-2014, 07:02 PM

I was more interested in specifics than generic comments on open source vs proprietary. Specifically, what would your app do that others do not already do?

Using software entails an investment of time and effort (for the user). A simile might be OpenOffice--aside from being free, there is almost nothing to recommend it, and most consider it a (very) poor substitute for MS Office. For most end users, the proof is in the doing, not in the cost.

What would the app you envision do that is not already being done? In short, what would you provide as reason for users to discard whatever they are using already (Excel, home-grown, business analysis, data mining) to use the app you envision? Answering that question might enable you to focus more clearly on exactly what you are trying to accomplish.

OS solutions enjoy great success in our days. Technologies like Linux, g++, python ruby, pylons, open office, mysql, postresql, mongodb, eclipse, git, vagrant, vim, emacs, apache and many many more are the first choices of any developer who is not blocked in the sad world of M$...

The general comments about OS apply to horse handicapping as to any other domain.

I find existing handicapping proprietary solutions very limited.

Good software needs to be open, so it can attract many different contributors assuring its evolutionary path. Ideally, software should have no couplings to external dependencies like data providers, data feeds, data bases etc.

The best way I know towards this direction, is to develop based in open standards, focusing more on contracts rather than implementation details.

In other words I find it way more important to define the WHAT than the HOW. The latter can have more than one implementations, while the user base is going to decide about the fittest approach.

What matters the most is the definition of the interface which will outlive any kind of specific implementation.

Details like what programming language of database to use should never be a requirement and good software needs to allow a very plural ecosystem of third parties to function with it....

As far as what application I am envisioning, how about an library that can provide historical data in a LINQ like fashion, assuring complete transparency from the back end, while conforming to a specific API that can use any kind of back end (SQL, NoSQL , CSV or whatever else) , with the ability to easily create a web wrapper around using for example RESTfull services or classical SOAP...

On top of this library, we can create layers of data transformation and analysis (including any kind of statistical analysis, pattern recognition, real time odds analysis and many more)...

We can have many developers working in parallel extending the functionality, something that will eventually allow the best solution to become the standard.

Based in the OS nature of the project, we are able to use any kind of other OS library we need, as far we have an adapter to bridge the two solutions.

For example if we need neural networks, we are not limited to specific implementation but we can select from any possible solution like FANN, ffnet, neurolab or whatever else, or even a custom solution...

In parallel, we can have a presentation layer completely isolated from the specifics of the other layers, allowing for any kind of user interaction, like command line, desktop application, web browser, android or whatever else...

Anyway you see it, opening the source to the public, is beneficial for any kind of software which as history shows, following this route will either match or outperform proprietary solutions...

raybo

06-28-2014, 07:36 PM

OS solutions enjoy great success in our days. Technologies like Linux, g++, python ruby, pylons, open office, mysql, postresql, mongodb, eclipse, git, vagrant, vim, emacs, apache and many many more are the first choices of any developer who is not blocked in the sad world of M$...

The general comments about OS apply to horse handicapping as to any other domain.

I find existing handicapping proprietary solutions very limited.

Good software needs to be open, so it can attract many different contributors assuring its evolutionary path. Ideally, software should have no couplings to external dependencies like data providers, data feeds, data bases etc.

The best way I know towards this direction, is to develop based in open standards, focusing more on contracts rather than implementation details.

In other words I find it way more important to define the WHAT than the HOW. The latter can have more than one implementations, while the user base is going to decide about the fittest approach.

What matters the most is the definition of the interface which will outlive any kind of specific implementation.

Details like what programming language of database to use should never be a requirement and good software needs to allow a very plural ecosystem of third parties to function with it....

As far as what application I am envisioning, how about an library that can provide historical data in a LINQ like fashion, assuring complete transparency from the back end, while conforming to a specific API that can use any kind of back end (SQL, NoSQL , CSV or whatever else) , with the ability to easily create a web wrapper around using for example RESTfull services or classical SOAP...

On top of this library, we can create layers of data transformation and analysis (including any kind of statistical analysis, pattern recognition, real time odds analysis and many more)...

We can have many developers working in parallel extending the functionality, something that will eventually allow the best solution to become the standard.

Based in the OS nature of the project, we are able to use any kind of other OS library we need, as far we have an adapter to bridge the two solutions.

For example if we need neural networks, we are not limited to specific implementation but we can select from any possible solution like FANN, ffnet, neurolab or whatever else, or even a custom solution...

In parallel, we can have a presentation layer completely isolated from the specifics of the other layers, allowing for any kind of user interaction, like command line, desktop application, web browser, android or whatever else...

Anyway you see it, opening the source to the public, is beneficial for any kind of software which as history shows, following this route will either match or outperform proprietary solutions...

So, is that what all this was really all about, hating on Microsoft and supporting anything non-Microsoft? You have a definite history of that here, but I hope that is not your underlying theme in this thread. There are plenty of good things going for Microsoft, and many of the things we take for granted with computers is due to them.

Why not just forget about the differences between open source and non-open source and just concentrate on ease of use, ability to personalize, and power, regardless of the platform.

Why not just put something together and post it, so people can download it and see what it does. It will either invite interest and participation, and begin to evolve, or it won't. The proof is in the pudding, ask Doug about the "HandiFast" evolution, and I certainly can attest to the fact that presenting something that actually works (the original "AllData PPs" program), even if it is preliminary work, creates a lot of feedback and participation and leads to more evolution. You can talk and poll all you want, but until something is actually created, in the real world, it's only talk.

DeltaLover

06-28-2014, 07:43 PM

I am not necessary anti MS... What I am trying to say, is that an open source leads to better results... To start a successful OS about handicapping, is more about getting the right people to get interest to it and contribute to the analysis rather than implementation which is relatively easy... Take a group of experienced bettors like Thask, Traynor, PA, CJ, Raybo, Dave S and many more that I do not remember their names right now, and have them decide about what needs to be done, also in parallel have an implementation team delivering solutions in small increments... Loop over and over... This is what OS is all about..

raybo

06-28-2014, 07:51 PM

I am not necessary anti MS... What I am trying to say, is that an open source leads to better results... To start a successful OS about handicapping, is more about getting the right people to get interest to it and contribute to the analysis rather than implementation which is relatively easy... Take a group of experienced bettors like Thask, Traynor, PA, CJ, Raybo, Dave S and many more that I do not remember their names right now, and have them decide about what needs to be done, also in parallel have an implementation team delivering solutions in small increments... Loop over and over... This is what OS is all about..

Couldn't the same be said about non-open source? Collaboration is nothing new, and programming in more than one language is nothing new either. Just put something on the board and see where it leads. I would suggest you start with data importation. That shouldn't be too difficult, you probably already have something like that in your portfolio. Or, would you rather spend your time on this forum talking about it?

traynor

06-28-2014, 08:31 PM

I am not necessary anti MS... What I am trying to say, is that an open source leads to better results... To start a successful OS about handicapping, is more about getting the right people to get interest to it and contribute to the analysis rather than implementation which is relatively easy... Take a group of experienced bettors like Thask, Traynor, PA, CJ, Raybo, Dave S and many more that I do not remember their names right now, and have them decide about what needs to be done, also in parallel have an implementation team delivering solutions in small increments... Loop over and over... This is what OS is all about..

That seems like you are saying you don't really have any new ideas, but rather hope that if a number of knowledgeable people work together, they might come up with something useful.

JJMartin

06-28-2014, 09:24 PM

That seems like you are saying you don't really have any new ideas, but rather hope that if a number of knowledgeable people work together, they might come up with something useful.

That would be cool to try.

traynor

06-28-2014, 09:52 PM

I know a number of very serious bettors. I know a number of developers who do occasional contract work for those bettors, including two (of the best I have ever known) who work on a retainer basis for healthy fees. Because we are a discrete bunch, we tend to share ideas and use the occasional "brainstorming" session for mutual benefit.

However, I can very honestly say that I am not interested in how they do what they do. It is the ideas, the strategies, the innovations that are of value. The programming and development part can be parceled out to Bangalore or St Pete for modest fees, in a manner that precludes anyone from putting the pieces together. The value of the programming/development part of the process is minimal.

I have read most of the postings on this forum. I have not seen any that stir the slightest bit of envy, or even curiosity on my part. If what someeone is doing works for them, great. If anyone is doing anything interesting enough to be profitable, they do not seem to be loudly proclaiming the nuts-and-bolts of how they do it to anyone who will listen.

One of the big pluses of 2014 is that a reasonably competent developer can implement his or her own ideas, concepts, and approaches, test them readily, and then implement them. I have not seen anything that would convince me there is value in a project such as you describe, either as a user or as a participant. It looks to me like the marketing version of an empty box.

raybo

06-28-2014, 10:18 PM

You can hardly predict where something put in the public domain will go, or how far. What's the harm? It either results in something of value, to someone, or it doesn't. I say, go for it, and quit talking about going for it.

DeltaLover

06-28-2014, 10:56 PM

You can hardly predict where something put in the public domain will go, or how far. What's the harm? It either results in something of value, to someone, or it doesn't. I say, go for it, and quit talking about going for it.

I do not view this as a one man show.

This kind of a project needs several people who will believe in it forming its core team and dictating its evolution.

Judging from the negativity I can sense here, something like this might not be feasible within the horse gamblers community for various reasons, ego been one of them..

Traynor said :

That seems like you are saying you don't really have any new ideas, but rather hope that if a number of knowledgeable people work together, they might come up with something useful.

Is not my lack of ideas that makes me thinking that an OS solution might be useful but the fact that a communal effort, has usually superior results compared to a solo performer who tries to wear many hats.

headhawg

06-28-2014, 11:03 PM

I have read most of the postings on this forum. I have not seen any that stir the slightest bit of envy, or even curiosity on my part.And yet you hang out here. Color me curious. On second thought, nevermind. I'll guess it's probably akin to feeding pigeons at the park.

raybo

06-28-2014, 11:06 PM

I do not view this as a one man show.

This kind of a project needs several people who will believe in it forming its core team and dictating its evolution.

Judging from the negativity I can sense here, something like this might not be feasible within the horse gamblers community for various reasons, ego been one of them..

Traynor said :

Is not my lack of ideas that makes me thinking that an OS solution might be useful but the fact that a communal effort, has usually superior results compared to a solo performer who tries to wear many hats.

Someone always has to start things rolling. Might as well be you, since it's your idea.

DeltaLover

06-28-2014, 11:08 PM

Someone always has to start things rolling. Might as well be you, since it's your idea.

I agree..

Hoofless_Wonder

06-29-2014, 05:36 AM

Delta, I think the idea of an "open source" handicapping package or toolbox has some merit, but at the end of the day, it all comes back to the conflict of people's time versus making money. Or, creating something with value, and having something without value. I'm not sure the open source model fits.

Currently, there's a large gulf between what the data providers give us (data files and their structure, a few rudimentary tools) versus what the higher-end proprietary value-add vendors have (nice GUIs, weightings, odds-lines, etc). It's $1 to buy a BRIS data file, it can cost hundreds for a decent program, so I must be getting something (hopefully) for my money? I would love to see some open source code be developed that I can "borrow" for my own tweaks and purposes, but would be extremely hesitant to share anything that might be usefully unique and profitable.

For example, if I'm developing an API that can be used to project the pace pressure of a race from BRIS or DRF data files, let's say I get something useful working in a month. I check it into the open source library, and perhaps it's even added to the base for the "race predictor" program that the non-techies would want. Then another person comes along, let's call him BangWhiz, checks out the API, applies a few tweaks over the course of a couple of days, and finds the improvements lead to an improvement of the race predictor program from break even to a positive 30 percent ROI for sprint races at 2/3 of U.S. tracks.

Does BangWhiz ever check his version of the API into the library?

On the other hand, if the goal of an open-source handicapping library is simply to provide the basic tools so people like BangWhiz can come along and become a fan and even a successful player, then that's great. There's certainly tons of work necessary just to keep basic code up to date, and compatible with different operating systems. But where do you draw the line?

After all, we're talking horse betting and money here, and not making an mp3 player better than VLAN or Windows Media Player.....

DeltaLover

06-29-2014, 07:22 AM

I think you are mixing software with its applications.

Discovering an approach that can show you a consistent 30% ROI (an extremely unlike scenario) using a specific code base, is comparable to writing a best seller using Open Office; checking in your potential code changes needed to OO will not reveal your secret of success while might improve its quality and features.

More than this, I do not believe in any “secrets” when it comes to horse betting.

There is no formula, software, AI solution or even magic spell that can show consistent profitability; although this things can be useful to improve our understanding of it, the bottom line still involves personal judgment calls and this is the beauty of the game.

Sapio

06-29-2014, 11:24 AM

I think you are mixing software with its applications.

Discovering an approach that can show you a consistent 30% ROI (an extremely unlike scenario) using a specific code base, is comparable to writing a best seller using Open Office; checking in your potential code changes needed to OO will not reveal your secret of success while might improve its quality and features.

More than this, I do not believe in any “secrets” when it comes to horse betting.

There is no formula, software, AI solution or even magic spell that can show consistent profitability; although this things can be useful to improve our understanding of it, the bottom line still involves personal judgment calls and this is the beauty of the game.

Hi DeltaLover

Again, I like your idea. The problem I see is attracting the "right" type of individuals. Most, if not nearly all, of the individuals you'd seek are already associated with individual whales, syndicates or going alone in secrecy.

Thomas Sapio

raybo

06-29-2014, 11:59 AM

Why don't you list, and describe, the individual elements you have in mind, in layman's terms that everyone can understand. Then maybe people can start making suggestions. That is in regards to what the program will do with the data.

As far as attracting other programmers to the project, the open source thing might be causing problems, as there probably aren't a lot of programmers here who don't program in a specific traditional language. If your open source program can encompass traditional programming languages, then those kinds of programmers should be able to contribute, and then you could write/include the ability to utilize those other languages, maybe?

I assume you aren't looking for specific handicapping/analysis stuff that others would not want to reveal? We're just talking "tools" aren't we, not personal/proprietary things?

Sapio

06-29-2014, 12:21 PM

There is a misconception here. I don't believe he is seeking out programmers. I believe he is primarily seeking analysts and there is a world of difference between a programmer and systems analyst.

Thomas Sapio

raybo

06-29-2014, 12:38 PM

There is a misconception here. I don't believe he is seeking out programmers. I believe he is primarily seeking analysts and there is a world of difference between a programmer and systems analyst.

Thomas Sapio

Heck, I thought he was a systems analyst! I just thought he was asking for other programmers/system builders, to handle some of the tasks, so he doesn't have to do it all.

Sapio

06-29-2014, 12:50 PM

Heck, I thought he was a systems analyst! I just thought he was asking for other programmers/system builders, to handle some of the tasks, so he doesn't have to do it all.

Hi Raybo

I believe he is an analyst as well as a programmer., but the programming part is the trivial part. He is seeking more people like him wanting/willing to contribute ( my guess would be Magister Ludi, TM types and others ). Only he can clarify, I am guessing.

For example; Someone adding Logistic Regression to your ALLData thing.

Thomas Sapio

raybo

06-29-2014, 12:56 PM

Hi Raybo

I believe he is an analyst as well as a programmer., but the programming part is the trivial part. He is seeking more people like him wanting/willing to contribute ( my guess would be Magister Ludi, TM types and others ). Only he can clarify, I am guessing.

For example; Someone adding Logistic Regression to your ALLData thing.

Thomas Sapio

Adding logistic regression to AllData wouldn't require a systems analyst would it, just someone proficient in logistic regression, but the integration between Excel and the LR would require someone familiar with that kind of integration, right?

Magister Ludi

06-29-2014, 04:47 PM

Hi Raybo

I believe he is an analyst as well as a programmer., but the programming part is the trivial part. He is seeking more people like him wanting/willing to contribute ( my guess would be Magister Ludi, TM types and others ). Only he can clarify, I am guessing.

For example; Someone adding Logistic Regression to your ALLData thing.

Thomas Sapio

I would be a poor choice to be a contributor to such a project. All of our hardware is based upon proprietary high-performance highly-parallel multi-FPGA systems which we build in-house. We use no commercial proprietary software, much less FOSS. All of the software that we use is also proprietary (not Windows, OS X, or Linux-compatible) and written in-house. All of our systems are security-hardened, air-gapped, with both hardware and software encryption, and housed in SCIFs.

I'm certainly not suggesting that a system like this is necessary or even desirable for most. However, we are active in numerous markets where this level of performance and security is necessary. We’re as far from FOSS as we can be.

Sapio

06-29-2014, 05:50 PM

I would be a poor choice to be a contributor to such a project. All of our hardware is based upon proprietary high-performance highly-parallel multi-FPGA systems which we build in-house. We use no commercial proprietary software, much less FOSS. All of the software that we use is also proprietary (not Windows, OS X, or Linux-compatible) and written in-house. All of our systems are security-hardened, air-gapped, with both hardware and software encryption, and housed in SCIFs.

I'm certainly not suggesting that a system like this is necessary or even desirable for most. However, we are active in numerous markets where this level of performance and security is necessary. We’re as far from FOSS as we can be.

That would be a no for Magister Ludi. Not surprising. As I said most, if not all, qualified individuals would not participate in such a project.

Thomas Sapio

JJMartin

06-29-2014, 05:58 PM

I would be a poor choice to be a contributor to such a project. All of our hardware is based upon proprietary high-performance highly-parallel multi-FPGA systems which we build in-house. We use no commercial proprietary software, much less FOSS. All of the software that we use is also proprietary (not Windows, OS X, or Linux-compatible) and written in-house. All of our systems are security-hardened, air-gapped, with both hardware and software encryption, and housed in SCIFs.

I'm certainly not suggesting that a system like this is necessary or even desirable for most. However, we are active in numerous markets where this level of performance and security is necessary. We’re as far from FOSS as we can be.

What do you use this software for, sounds like its for more than horse racing?

DeltaLover

06-30-2014, 09:54 AM

Hi Raybo

I believe he is an analyst as well as a programmer., but the programming part is the trivial part. He is seeking more people like him wanting/willing to contribute ( my guess would be Magister Ludi, TM types and others ). Only he can clarify, I am guessing.

For example; Someone adding Logistic Regression to your ALLData thing.

Thomas Sapio

For this kind of project, we need programmers, testers, statisticians and end users to work togerther as a team.

Some related links:

http://opensource.com/business/13/6/four-types-organizational-structures-within-open-source-communities

http://www.smashingmagazine.com/2013/01/03/starting-an-open-source-project/

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=8&ved=0CGAQFjAH&url=http%3A%2F%2Fwww.flossproject.org%2Fworkshop%2 Fpresentations%2FStewart-presentation.ppt&ei=tGuxU_CcMNOSyATo64HICQ&usg=AFQjCNFH5yY-6Af2iLrgjDN_TD5LCSlptg&bvm=bv.69837884,d.aWw

DJofSD

06-30-2014, 10:00 AM

For this kind of project, we need programmers, testers, statisticians and end users to work togerther as a team.

Some related links:

http://opensource.com/business/13/6/four-types-organizational-structures-within-open-source-communities

http://www.smashingmagazine.com/2013/01/03/starting-an-open-source-project/

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=8&ved=0CGAQFjAH&url=http%3A%2F%2Fwww.flossproject.org%2Fworkshop%2 Fpresentations%2FStewart-presentation.ppt&ei=tGuxU_CcMNOSyATo64HICQ&usg=AFQjCNFH5yY-6Af2iLrgjDN_TD5LCSlptg&bvm=bv.69837884,d.aWw
First, I can not open that PPT document in the MS Visualizer. And, no, I do not have MS Office -- I'm too cheap.

Next, have you selected a programming language and source control/version control?

traynor

06-30-2014, 10:05 AM

And yet you hang out here. Color me curious. On second thought, nevermind. I'll guess it's probably akin to feeding pigeons at the park.

It is actually really simple. I write code and analyze data the major portion of most days. When I hit a snag, I use diversion to allow my mind to work on the resolution of the problem without interference. It is primarily during those times that I read/post on PA. I could probably watch re-runs of Gilligan's Island and accomplish the same thing. I like PA better.

traynor

06-30-2014, 10:34 AM

I think you are mixing software with its applications.

Discovering an approach that can show you a consistent 30% ROI (an extremely unlike scenario) using a specific code base, is comparable to writing a best seller using Open Office; checking in your potential code changes needed to OO will not reveal your secret of success while might improve its quality and features.

More than this, I do not believe in any “secrets” when it comes to horse betting.

There is no formula, software, AI solution or even magic spell that can show consistent profitability; although this things can be useful to improve our understanding of it, the bottom line still involves personal judgment calls and this is the beauty of the game.

That sounds like a good argument for working on one's (uniquely individual) observational and decision-making skills, rather than fiddling around writing software apps that do nothing more than those already in existence already do.

I am a big fan of Gordon Gecko. Take away the profit motivation and little or nothing is left. Deciding early on that consistent profit is impossible seems a poor starting point. Your inability to show consistent profitability should not be considered a universal that applies to everyone else.

DeltaLover

06-30-2014, 10:46 AM

First, I can not open that PPT document in the MS Visualizer. And, no, I do not have MS Office -- I'm too cheap.

IMHO you are doing the right thing by not using MSOffice. Installting
https://www.openoffice.org/
will allow you to open the presentation..

Next, have you selected a programming language and source control/version control?

Yes. Python ( with C++ bindings where perfomance boost is needed) and git / gitghub

Ted Craven

06-30-2014, 12:00 PM

from what I read here, this is not open source plus it seems tighly coupled to a specific methodology..

That's right.

Not being a smart-ass, but, why would you expect it to be open source?

Yes, it is tightly coupled to the Sartin Methodology. It basically is the Sartin methods, written in a more modern language and for a Windows GUI.
FWIW, RDSS is not open-source, though many (but not all) of the older Sartin Methodology & RDSS concepts and computations are in the public domain. Profitable use is all in the implementation of 'best practices'. A good tennis racquet works best in the hands of a pro or a dedicated enthusiast, and less so in the hands of an occasional fan.

Also FWIW, I suspect it unlikely that those individuals who are making significant profits using RDSS or other Sartin implementations (or profitable users of HTR, HSH, Jcapper, BLAM, AllData, etc) would take time to stop doing what works now in order to spend time to participate in a project just to support the (laudable) philosophy of 'open-source'.

In the end - if one's aim is to take profit out of pari-mutuel pools, one simply has to decide what cost to pay (in tools, data, education and one's own time) - then pay it and start moving toward your goal.

There is an 'opportunity cost' associated with deferring use of proven existing tools, and consequently the preference for home-grown, open-source solutions. For those with a philosophical antipathy towards 'not-invented-here', or who require access to all computations for the ability to improve and experiment on their own timelines - then open-source would seem to be a solution.

cheers,

Ted

P.S. Also FWIW, I believe the poster who earlier speculated on 'RDSS' as open-source, might have been thinking of 'DDSS' -another type of DSS (Decision Support System (http://en.wikipedia.org/wiki/Decision_support_system)), which I believe Traynor once had some connection with. Just do a search of PA posts.

DeltaLover

06-30-2014, 12:32 PM

[/i]

Also FWIW, I suspect it unlikely that those individuals who are making significant profits using RDSS or other Sartin implementations (or profitable users of HTR, HSH, Jcapper, BLAM, AllData, etc) would take time to stop doing what works now in order to spend time to participate in a project just to support the (laudable) philosophy of 'open-source'.

The main reasons to use OS is its high quality which to the ability to customize it by just cloning the code without having any issues about copyrights and intellectual property.

[/i]
There is an 'opportunity cost' associated with deferring use of proven existing tools, and consequently the preference for home-grown, open-source solutions.

OS does not necessary mean home-grown. In contrary quite the opposite is true. I am sure that some related research will change your view about OS...

For starters try the following:

https://www.ibm.com/developerworks/community/blogs/6e6f6d1b-95c3-46df-8a26-b7efd8ee4b57/entry/why_big_companies_are_embracing_open_source119?lan g=en

http://www.openlogic.com/blog/bid/244265/How-Open-Source-Software-Can-Give-Your-Company-a-Competitive-Advantage

https://www.google.com/#q=open+source+software+companies&start=10

traynor

06-30-2014, 03:48 PM

The main reasons to use OS is its high quality which to the ability to customize it by just cloning the code without having any issues about copyrights and intellectual property.

I think most bettors would measure quality by how many winners an app selects, and at what prices those selections win, rather than "ability to customize" that application.

In short, unless it is designed primarily for recreational users, the "ability to customize" can be interpreted as a failure in the design stage. If the app is well-designed, it should not need to be continually tweaked, nor should the output need to be "interpreted." Assuming, of course, that the intent is profit. Apps that do little more than massage numbers and present them back in a (slightly) different output format are not especially useful. There are hundreds available (many freeware or shareware) and many more available for little or no cost (depending on the sellers kickback arrangement with the data providers) if one signs up for data subscriptions.

DeltaLover

06-30-2014, 04:09 PM

I think most bettors would measure quality by how many winners an app selects, and at what prices those selections win, rather than "ability to customize" that application.

In short, unless it is designed primarily for recreational users, the "ability to customize" can be interpreted as a failure in the design stage. If the app is well-designed, it should not need to be continually tweaked, nor should the output need to be "interpreted." Assuming, of course, that the intent is profit. Apps that do little more than massage numbers and present them back in a (slightly) different output format are not especially useful. There are hundreds available (many freeware or shareware) and many more available for little or no cost (depending on the sellers kickback arrangement with the data providers) if one signs up for data subscriptions.

Software Quality is measured by its longevity and number of users and it cannot be determined by individual users... We can certainly define indicators to high quality software but the final verdict depends in many more factors.

As far as the ability to customize, I think you misunderstood what I am trying to say here. How many winers and at what price are selected by an application does not represent its quality...

What is important from the application point of view is not to provide a winning approach.

What is needed is a dynamic environment with the ability to express any possible betting scenario or handicapping idea, while talking to any kind of a back end plus having the ability to evolve to directions completely unexpected from the original designer.

As an analogy you can take the design of a programming language for example C++ (which resembles the design of the application), its implementation (for example g++ or intel C++) and finally its clients that can be seen in any conceivable programming task …

In the same way the winning system is not an attribute of the application we are talking about but a completely different construct that happens to be implemented on top of the specific application....

traynor

06-30-2014, 04:55 PM

Software Quality is measured by its longevity and number of users and it cannot be determined by individual users... We can certainly define indicators to high quality software but the final verdict depends in many more factors.

As far as the ability to customize, I think you misunderstood what I am trying to say here. How many winers and at what price are selected by an application does not represent its quality...

What is important from the application point of view is not to provide a winning approach.

What is needed is a dynamic environment with the ability to express any possible betting scenario or handicapping idea, while talking to any kind of a back end plus having the ability to evolve to directions completely unexpected from the original designer.

As an analogy you can take the design of a programming language for example C++ (which resembles the design of the application), its implementation (for example g++ or intel C++) and finally its clients that can be seen in any conceivable programming task …

In the same way the winning system is not an attribute of the application we are talking about but a completely different construct that happens to be implemented on top of the specific application....

I disagree. It is equivalent to designing the best widget in the world while ignoring the fact that the widget really doesn't do anything worthwhile. The "software as a tool" simile breaks down when the expectation is that it is going to enable users to make a profit. If it fails in that regard, the "quality" (however explained or not explained) in other areas is irrelevant.

That "doing something worthwhile" is an attribute that seems to be missing in the design stage of a lot of handicapping software. The tool simile is fine, but unless that tool does something useful that all the many, many other tools do not, it is of no more value than all the many, many other tools. For bettors, software applications "doing something worthwhile" does not seem related to how easy (or difficult) it is to tweak the code of that application. It seems far more related to Win% and ROI.

traynor

06-30-2014, 04:58 PM

Unless, of course, one is selling the software, or one is being paid to develop the software.

DeltaLover

06-30-2014, 05:12 PM

For bettors, software applications "doing something worthwhile" does not seem related to how easy (or difficult) it is to tweak the code of that application. It seems far more related to Win% and ROI.

By “customization”, I am not referring to tweaks but to the ability to adapt to new specs and requirements, ultimately resulting to completely new uses that were not anticipated from the original designer.

traynor

06-30-2014, 06:58 PM

By “customization”, I am not referring to tweaks but to the ability to adapt to new specs and requirements, ultimately resulting to completely new uses that were not anticipated from the original designer.

From a philophical view of software development in general, that might be true (at least to some extent). The weakness is in the premise of "let's build in a lot of stuff we don't need so someone somewhere can do something else with it someday. Maybe." I think most rational people (including most business analysts) would prefer that developers focus more on software that does something it is intended to do better than anything else available, and leave the philosophy to philosophers.

In the specific case of race analysis software, why would anyone work on developing something that someone else somewhere else could (possibly) use someday for some totally unrelated-to-horse-racing purpose? Unless someone with VERY deep pockets is willing to foot the bill for that effort?

I think that may be why most "race analysis software" fails to do more than (relatively) trivial data massage and formatting. Too much emphasis on "software" and not enough on "race analysis."

DeltaLover

06-30-2014, 10:23 PM

From a philophical view of software development in general, that might be true (at least to some extent). The weakness is in the premise of "let's build in a lot of stuff we don't need so someone somewhere can do something else with it someday. Maybe." I think most rational people (including most business analysts) would prefer that developers focus more on software that does something it is intended to do better than anything else available, and leave the philosophy to philosophers.

In the specific case of race analysis software, why would anyone work on developing something that someone else somewhere else could (possibly) use someday for some totally unrelated-to-horse-racing purpose? Unless someone with VERY deep pockets is willing to foot the bill for that effort?

I think that may be why most "race analysis software" fails to do more than (relatively) trivial data massage and formatting. Too much emphasis on "software" and not enough on "race analysis."

Joel Spolsky introduced the term of an “astronaut” developer who never delivers a working solution since he is lost in the creation of a very generic solution absorbing all his energy and not allowing him to see the real problem he is trying to solve.

As in many other things in life, the optimal approach lies somewhere in the middle of an “astronaut” and doer who only tries to satisfy the immediate needs of his users. Experience and talent are required for the developer to realize the optimal rate of specific need addressing versus extensibility and generalization which represents one of the most important pillars of successful software development.

whodoyoulike

06-30-2014, 10:45 PM

By “customization”, I am not referring to tweaks but to the ability to adapt to new specs and requirements, ultimately resulting to completely new uses that were not anticipated from the original designer.

I may not be following your idea correctly. Are you trying to develop something like MS Excel? Why not just use Excel?

DeltaLover

06-30-2014, 10:56 PM

I may not be following your idea correctly. Are you trying to develop something like MS Excel? Why not just use Excel?

What I am discussing here has absolutely nothing to do with excel. It is about creating a set of libraries to assist data I/O, apply factors and statistical analysis, pattern recognition and other similar approaches

whodoyoulike

06-30-2014, 11:16 PM

What I am discussing here has absolutely nothing to do with excel. It is about creating a set of libraries to assist data I/O, apply factors and statistical analysis, pattern recognition and other similar approaches

I realize you're attempting something new and different. But, in your comments above, couldn't Excel do what you're suggesting? Maybe it would be more practical to "create a set of libraries to assist data I/O, apply factors and statistical analysis, pattern recognition" etc., if it's possible using Excel?

DeltaLover

06-30-2014, 11:20 PM

I realize you're attempting something new and different. But, in your comments above, couldn't Excel do what you're suggesting? Maybe it would be more practical to "create a set of libraries to assist data I/O, apply factors and statistical analysis, pattern recognition" etc., if it's possible using Excel?

Excel is mainly a spreadsheet; it can be customized to perform calculations, interface with databases or even create a GUI but is rather limited for what we are discussing here.

Hoofless_Wonder

07-01-2014, 03:12 AM

Hey Delta,

I think I'm getting a clearer picture of what the main purpose of the project would be, though I'm still a bit fuzzy on where the lines would be drawn between where the data i/o routine and analysis would end, and where the application of those APIs in handicapping would begin.

I'm one of the people that responded that this is something I need, 'cause I can't program my way out of a paper bag. However, I can swipe somebody else's code, and often get it to do what I want, especially if it's FORTRAN 77, SAS or a korn shell script.

Although I'm not entirely sold on the idea that open APIs like this would be too popular with horse racing, I believe a lot of the same code could be used to generate similar analyses of football or baseball teams, and that could be very interesting in terms of growth and/or popularity.

So, when I upgrade my desktop to Linux Mint 18, will I be able to download the "Horse Racing Toolbox 0.5" under the python umbrella? :)

DeltaLover

07-01-2014, 06:22 AM

I believe a lot of the same code could be used to generate similar analyses of football or baseball teams, and that could be very interesting in terms of growth and/or popularity.

Yes, this is one of the possibilites.

"Horse Racing Toolbox 0.5"

This sounds like a nice name :ThmbUp:

hcap

07-01-2014, 07:33 AM

Excel is mainly a spreadsheet; it can be customized to perform calculations, interface with databases or even create a GUI but is rather limited for what we are discussing here.Other than speed of execution, what specifically can't excel do that a horse player could use practically?

DeltaLover

07-01-2014, 08:23 AM

Other than speed of execution, what specifically can't excel do that a horse player could use practically?

Excel is good at what is doing but certainly cannot be compared to a full blown programming language. Excel does not compete against languages like python, C++ or ruby as it solves a different problem.

Would you ever think of writting a DSL for handicapping purposes, a REST server or an out of process server in excel?

hcap

07-01-2014, 08:41 AM

Ok, but what specifically and practically would an average non-programmer horse player Not be able to do with excel if someone with some excel expertise created a useful set of tools in excel?

Any useful application must be geared to easily perform queries or "what if" scenarios generated by the user to answer questions about the game. Not the programming behind it. If you look at existing software packages, that in fact is what they try to do. What you are proposing might be overkill-except if you are both a horse player and programmer.

DJofSD

07-01-2014, 08:47 AM

Can you use Excel to filter data?

If you can, great, but I can see where an app could provide tools for performing data selection and analysis. Yes, I know Excel has built in libraries for computations but it becomes cumbersome.

raybo

07-01-2014, 09:02 AM

Can you use Excel to filter data?

If you can, great, but I can see where an app could provide tools for performing data selection and analysis. Yes, I know Excel has built in libraries for computations but it becomes cumbersome.

Of course Excel can filter data, unless your definition of "filter" is not the traditional one. In the AllData Project, Excel's "advanced filter" function was used extensively, particularly in "AllDataBase", the free database workbook created for querying and modeling. In the "Black Box" we didn't use "advanced filter" but rather, pivot tables, because of the savings in overhead and processing time. They both accomplish the same thing, filtering.

PaceAdvantage

07-01-2014, 09:40 AM

Why must the pro-Excel guys try to discourage the development of this thread/idea? We get where you are coming from...no need to continue knocking...but please sit back and let it play out without actively trying to derail it...thanks...

DeltaLover

07-01-2014, 09:44 AM

Excel or Open office can certainly be used as front ends to a middle layer serving as a data or calculation (or whatever else) server. The idea is to maintain clear tier separation, allowing heterogeneous technologies to work in parallel. For example we should be able to easily substitute an Excel client with a browser based interface without any code changes or we should be able to change the data source from plain CSV to Mongo without our clients noticing any difference.

Of course the END USER does not need to interact with such a system in the level of a programming language (at least not a very imperative one, since as I said before it is quite feasible to create a simple domain level vocabulary to allow the description of relatively complicated filters and factors in an easy way)..

DJofSD

07-01-2014, 09:55 AM

Sometimes giving a concrete example of a problem or task helps to shed light on the perceived problem.

hcap

07-01-2014, 10:03 AM

Why must the pro-Excel guys try to discourage the development of this thread/idea? We get where you are coming from...no need to continue knocking...but please sit back and let it play out without actively trying to derail it...thanks...I voted yes I will give it a try. My observation is just the opposite of yours. Many programmers assume excel is not suited for a sophisticated application and analysis.

I agree that it is limited in the amount of data and speed of execution as compared to other languages, but sometimes fresh data is quite useful.

However it is quite possible to filter on 2 or 3 years of date at a few tracks (up tol 64,000 records and 100+ fields) in any combination of those fields in seconds rather than minutes. Meaning queries can be altered and modified quickly to allow the user to zero in what was working in at least that data set. More sophisticated queries can test those suppositions going forward as well as most commercial programs-which of course is always the catch. And why we are not all rich :lol:

hcap

07-01-2014, 10:12 AM

Sometimes giving a concrete example of a problem or task helps to shed light on the perceived problem.Particularly by the end user. Knowing what questions to ask is part of the mix.

DJofSD

07-01-2014, 10:25 AM

Particularly by the end user. Knowing what questions to ask is part of the mix.
Yes, good point. Asking the correct question, and, at the correct time, can be extremely helpful.

And, another aspect to the current impasse, as I perceive it, is like this: telling me what something is not does not tell me what it is.

traynor

07-01-2014, 10:32 AM

This seems a good point to make a suggestion. Spend a bit of time with Google researching what is already out there in data analysis for horse racing. Seeing examples of what has already been done by others (and is more or less available already) may help to clarify both what is already available (as in "don't re-invent the wheel") and what is not (as in deficencies and areas requiring further development).

One of the advantages of open-source is that someone else's code can be used initially to get a project (or idea) up and running in short order, minimizing the slow, ponderous, talking stage in which great plans are made but little or nothing of value is accomplished.

hcap

07-01-2014, 10:32 AM

Yes, good point. Asking the correct question, and, at the correct time, can be extremely helpful.

And, another aspect to the current impasse, as I perceive it, is like this: telling me what something is not does not tell me what it is.May be useful as an eliminator. For instance, filtering out any runner whose last 3 finishes cumulatively over x lengths behind may work often.

Giving the user the option to specify "X" would be nice

hcap

07-01-2014, 10:41 AM

This seems a good point to make a suggestion. Spend a bit of time with Google researching what is already out there in data analysis for horse racing. Seeing examples of what has already been done by others (and is more or less available already) may help to clarify both what is already available (as in "don't re-invent the wheel") and what is not (as in deficencies and areas requiring further development).

One of the advantages of open-source is that someone else's code can be used initially to get a project (or idea) up and running in short order, minimizing the slow, ponderous, talking stage in which great plans are made but little or nothing of value is accomplished.Many specific "finds" bu many users are held tightly and although general approaches are out there, much weeding out is often needed. Not all general approaches have value

raybo

07-01-2014, 11:04 AM

Why must the pro-Excel guys try to discourage the development of this thread/idea? We get where you are coming from...no need to continue knocking...but please sit back and let it play out without actively trying to derail it...thanks...

Hey, I said it was interesting and to go for it, but he seems to keep resisting putting anything together. If you want to show what open source can do, put something out there. Otherwise the project appears to be dead before it gets started.

DeltaLover

07-01-2014, 11:08 AM

Hey, I said it was interesting and to go for it, but he seems to keep resisting putting anything together. If you want to show what open source can do, put something out there. Otherwise the project appears to be dead before it gets started.

Not resisting at all... It is a matter of time and schedule as you can understand... 4th of July weekened is coming and might have some time for this...

raybo

07-01-2014, 11:16 AM

Not resisting at all... It is a matter of time and schedule as you can understand... 4th of July weekened is coming and might have some time for this...

I completely understand time constraints, believe me, but just acknowledgement that you plan on posting something material, in the near future, to get the ball rolling, would be fine.

And again, expressing what you are talking about, in layman's terms, would be extremely helpful, for those of us who are not programmers, or database designers, or engineers, or IT guys/gals, etc., most of us here are just horse players.

If I had done what you have done so far, asking for interest and people to join a team of developers, and people for testing, etc., instead of my going ahead and posting a downloadable app that imported common Brisnet data files and created DRF style PPs, the AllData Project may have never happened at all. There has to be a first step taken, by someone, and who better than the idea man himself?

Looking forward to what you put together!!

DJofSD

07-01-2014, 11:20 AM

And, for those of us who are IT guys, we all don't speak the same dialects. I don't speak python, C, yes, C++ or Obj-C and others but not python, ruby, etc.

raybo

07-01-2014, 11:26 AM

And, for those of us who are IT guys, we all don't speak the same dialects. I don't speak python, C, yes, C++ or Obj-C and others but not python, ruby, etc.

I think that is what DL is trying to get across, that open source can handle traditional languages also, without having the whole project written and designed and implemented in a single traditional language. If I'm wrong, then I am just tech dumb I guess, and please accept my apologies.

TrifectaMike

07-01-2014, 12:32 PM

Hi Raybo

I believe he is an analyst as well as a programmer., but the programming part is the trivial part. He is seeking more people like him wanting/willing to contribute ( my guess would be Magister Ludi, TM types and others ). Only he can clarify, I am guessing.

For example; Someone adding Logistic Regression to your ALLData thing.

Thomas Sapio

On an academic level, I like the idea.
On a practical level, I would not participate.

Best wishes, DL.

Mike

DeltaLover

07-01-2014, 01:05 PM

On an academic level, I like the idea.
On a practical level, I would not participate.

Best wishes, DL.

Mike

I understand Doc...

DeltaLover

07-01-2014, 01:13 PM

I think that is what DL is trying to get across, that open source can handle traditional languages also, without having the whole project written and designed and implemented in a single traditional language. If I'm wrong, then I am just tech dumb I guess, and please accept my apologies.

Correct. What matters the most is the interface definition meaning the contracts to use for message exchanging among the various components.

What language, data base, or UI will be used is more of an implementation detail that ideally should be easy to switch among similar technologies (for example the choice among mongo, mysql, SQLServer, Oracle or even flat files should represent a minor commitment, easy enough to be changed at any point)

whodoyoulike

07-01-2014, 04:07 PM

I thought you trying to have a discussion. I wasn't trying to discourage you. It's just that I felt your objectives could be accomplished with Excel. It has limitations but a lot of it is because of the user's knowledge and how to apply the formulas. I feel a well designed Excel program could do what you wanted.

DeltaLover

07-01-2014, 04:09 PM

I thought you trying to have a discussion. I wasn't trying to discourage you. It's just that I felt your objectives could be accomplished with Excel. It has limitations but a lot of it is because of the user's knowledge and how to apply the formulas. I feel a well designed Excel program could do what you wanted.

Although Excel can be used for the presentation layer, it still is not suitable for what we are discussing here..

traynor

07-02-2014, 09:59 AM

I think the suggestion that "whales are making millions by churning rebates"--if accepted as true by a few members of this group--would result in software applications that would knock your socks off.
http://www.bbc.com/news/business-28062071

There is no lack of programming skill or expertise--open source or otherwise--for anything even remotely perceived as potentially profitable. While few may be motivated by the creation of hobbyist-grade applications, enthusiasm for projects with a goal of profitability should not be underestimated.

Anyone who doesn't realize that has not spent much time at Sha Tin.

DJofSD

07-02-2014, 10:02 AM

Anyone who doesn't realize that has not spent much time at Sha Tin.

You lost me with that last statement.

raybo

07-02-2014, 10:21 AM

I think the suggestion that "whales are making millions by churning rebates"--if accepted as true by a few members of this group--would result in software applications that would knock your socks off.
http://www.bbc.com/news/business-28062071

There is no lack of programming skill or expertise--open source or otherwise--for anything even remotely perceived as potentially profitable. While few may be motivated by the creation of hobbyist-grade applications, enthusiasm for projects with a goal of profitability should not be underestimated.

Anyone who doesn't realize that has not spent much time at Sha Tin.

Why would you even mention that people here may not spend much time at Sha Tin (I would bet the vast most of people on this forum don't even know where Sha Tin is, or what it is, as most people on this forum live in the USA and have never visited that part of the world, nor have any interest in what happens at that track)? You really amaze me with your attitude, and you continually hit new highs in that regard.

Must you always post as if you are the only one on the planet who knows anything, about anything? Without a doubt, you aggravate me more than any other person on this forum. You really ought to come down off your golden throne, you'll find it much more enjoyable, and we would certainly appreciate it, too.

I apologize to all the other members, but this guy really goes over the top sometimes.

PaceAdvantage

07-02-2014, 10:57 AM

I think the suggestion that "whales are making millions by churning rebates"--if accepted as true by a few members of this group--would result in software applications that would knock your socks off.Which no doubt exist...just not within the view of the general public.

DJofSD

07-02-2014, 11:03 AM

Happy Valley and Sha Tin: the meccas of the T'bred wagering world. Huge fields, huge pools and highly regulated racing with information the casual US fan would not have a clue how to use it.

traynor

07-02-2014, 11:10 AM

You lost me with that last statement.

Horse racing is immensely popular. So is the obsession with developing software apps that show a profit.

traynor

07-02-2014, 11:13 AM

Which no doubt exist...just not within the view of the general public.
No doubt.

traynor

07-02-2014, 11:24 AM

Happy Valley and Sha Tin: the meccas of the T'bred wagering world. Huge fields, huge pools and highly regulated racing with information the casual US fan would not have a clue how to use it.

Unless they had an open-source (or proprietary) data processing tool to crunch the numbers and mine the data readily available. The question posed on this thread about the need for an open-source data processing tool is a good one. Limiting that/those tool(s) to North American racing and BRIS data files may not be useful. There are lots of horse races, lots of fans, lots of bettors, lots of interest in data processing tools in Asia, Australia, UK, and elsewhere that would make a project of this nature highly desirable.

Creating generic tools that could be adapted for Sha Tin or the Swedish harness racing circuit, for example, would make the project far more interesting than if it were limited to US/Canada/BRIS.

DeltaLover

07-02-2014, 11:31 AM

Horse racing is immensely popular. So is the obsession with developing software apps that show a profit.

What does “showing profit” really means?

As I have said before, I do not have enough evidence to believe in the existence of an application, that serving as a black box, is able to overcome the (extremely high) takeout, to the point of becoming profitable, constantly squeezing profit out of the pools.

This does not mean that there is no room for horse betting R&D, mainly focusing in the improvement of our understating of the game and possibly revealing hidden aspects that might serve as catalysts for the ultimate goal of profitability.

The topic of a “profitable application”is very controversial and after so many related debates, continuing the same discussion becomes a useless rhetoric with no real value at all..

I would prefer to see more pragmatic conversations, with the potential to create something new and useful, rather to repeat over and over the same aphorisms, adding nothing to whatever we already know...

DJofSD

07-02-2014, 11:41 AM

Unless they had an open-source (or proprietary) data processing tool to crunch the numbers and mine the data readily available. The question posed on this thread about the need for an open-source data processing tool is a good one. Limiting that/those tool(s) to North American racing and BRIS data files may not be useful. There are lots of horse races, lots of fans, lots of bettors, lots of interest in data processing tools in Asia, Australia, UK, and elsewhere that would make a project of this nature highly desirable.

Creating generic tools that could be adapted for Sha Tin or the Swedish harness racing circuit, for example, would make the project far more interesting than if it were limited to US/Canada/BRIS.
Ah ha -- the light bulb is burning a bit brighter now.

So, IOW, better to start with a larger set of data, i.e. any and every data type available to the racing crowds at HV/ST should encompass what we use here in the US.

That would be a better approach than to start small (the US collection of unique data points) and then to scale up. And, I would speculate, with the correct initial design, being able to exclude missing data items would make the tool universal -- you don't have, can't get, don't use the weight of the horse, uncheck that option.

raybo

07-02-2014, 11:55 AM

What does “showing profit” really means?

As I have said before, I do not have enough evidence to believe in the existence of an application, that serving as a black box, is able to overcome the (extremely high) takeout, to the point of becoming profitable, constantly squeezing profit out of the pools.

This does not mean that there is no room for horse betting R&D, mainly focusing in the improvement of our understating of the game and possibly revealing hidden aspects that might serve as catalysts for the ultimate goal of profitability.

The topic of a “profitable application”is very controversial and after so many related debates, continuing the same discussion becomes a useless rhetoric with no real value at all..

I would prefer to see more pragmatic conversations, with the potential to create something new and useful, rather to repeat over and over the same aphorisms, adding nothing to whatever we already know...

Well, isn't any "viable" software product created with the goal of enabling the user to explore things that he/she could not do without software? When we put data in a receptacle/app/suite, and we are able to manipulate that data in any way we wish, and also combine that data with our own reasoning, gained by previous experience and knowledge, we combine the "objective" data and the "subjective" data. Is there any other kind of data out there? If not, then isn't that about as good as a tool gets? If the tool also allows us to access subjective data, within the confines of the software itself, without our having to rely on our own memories of previous experience and knowledge, then we have, in effect, created a "black box", a very good one. That is what many handicappers have been trying to accomplish for a long time. Many are very close to accomplishing that, and some may have even reached that point already, and I'm not just talking about the "whales" and their super powerful computer technology and "unfair" wagering technology edge, being able to bet after everyone else, except the others like themselves, has already bet. You see, it's not that the whales know any more than the rest of us, it's that they have leveraged their abilities in ways that enable them to "cheat". That is their edge, IMO.

So, this proposed project has merit, in that it may well, eventually, allow users to accomplish things that they could not do before. If so, then certainly, some will take the ball and run with it, and most others will not put in the work to learn to use it, and remain mired in the past, never to progress regardless of the tools at their disposal.

traynor

07-02-2014, 12:14 PM

What does “showing profit” really means?

As I have said before, I do not have enough evidence to believe in the existence of an application, that serving as a black box, is able to overcome the (extremely high) takeout, to the point of becoming profitable, constantly squeezing profit out of the pools.

This does not mean that there is no room for horse betting R&D, mainly focusing in the improvement of our understating of the game and possibly revealing hidden aspects that might serve as catalysts for the ultimate goal of profitability.

The topic of a “profitable application”is very controversial and after so many related debates, continuing the same discussion becomes a useless rhetoric with no real value at all..

I would prefer to see more pragmatic conversations, with the potential to create something new and useful, rather to repeat over and over the same aphorisms, adding nothing to whatever we already know...

Simple cost/benefit. Software tools are not needed to lose money.

DeltaLover

07-02-2014, 12:20 PM

Well, isn't any "viable" software product created with the goal of enabling the user to explore things that he/she could not do without software?

Yes.

When we put data in a receptacle/app/suite, and we are able to manipulate that data in any way we wish, and also combine that data with our own reasoning, gained by previous experience and knowledge, we combine the "objective" data and the "subjective" data.

One challenge I can see here, is the automation of this process, meaning allowing the application to start with primitive data only and compose factors, metrics and figures by itself.

Is there any other kind of data out there?

Yes, there is enough data that is not known to the public while are known to insiders.

If not, then isn't that about as good as a tool gets?

If the tool also allows us to access subjective data, within the confines of the software itself, without our having to rely on our own memories of previous experience and knowledge, then we have, in effect, created a "black box", a very good one.

You see, it's not that the whales know any more than the rest of us, it's that they have leveraged their abilities in ways that enable them to "cheat". That is their edge, IMO.

I am not convinced about how effective the whales are. There is not enough data to form a good opinion. The closer analogy I can see to the whales can be found on hedge funds and most of them eventually go out of business, so I tend to believe the same for the whales as well.

Robert Goren

07-02-2014, 01:04 PM

I am not convinced about how effective the whales are. There is not enough data to form a good opinion. The closer analogy I can see to the whales can be found on hedge funds and most of them eventually go out of business, so I tend to believe the same for the whales as well. Bad analogy! Hedge funds are run with investors money and are designed to get the manager extremely wealthy. They operate in markets in which they have little control over. The whale probably has investors, but I doubt he can get those kinds of fees. He also operate in a market in which he has more ability to set odds(price).
The original idea behind a Hedge Fund was to limit risk, but in practice they take very huge risks because if they don't, the fund can not possibly show a profit after fees. The idea is to show a good profit for a few years before the bottom falls out of the market to draw in more investors and generate more fees. (all of this very easy to see when you have no money to invest, but not so much so when you are looking for a place to park your money)

DeltaLover

07-02-2014, 01:30 PM

They operate in markets in which they have little control over.

Don't be so sure about that...

whodoyoulike

07-02-2014, 03:16 PM

On an academic level, I like the idea.
On a practical level, I would not participate.

Best wishes, DL.

Mike

I take it, you're the one Sapio is always making reference in his posts? Looking forward to reading your posts and racing related insights.

DeltaLover called you Doc. Are you a Vet because I speak a little horse?

Hoofless_Wonder

07-03-2014, 05:02 AM

So Delta, do you have an idea of the "suite" of tools the project would start out with?

Being more of an "idea hamster" and less of a programmer, I'd like to see things like:

- command line tool to analyze a BRIS PP, MCP and/or DRF data file and perform a health check on all fields
- command line tool and web/browser gui to display PPs from data files, and like one poster mentioned, provide some of the display options like the BRIS PP Generator tool
- command line tool that imports a PP file, and assigns variables for all fields, to be used for other processing
- web/browser gui to "scrape" some of the popular web sites for info for things like odds tracking, scratches and changes, and most importantly - data mining from the Hong Kong Jockey Club web site - I want to know the last time my horse worked 4 furlongs, its splits during the last barrier trial, and how many times it went swimming in the last 21 days

These are just some basic building blocks - the APIs and routines for actual analysis and/or handicapping are endless. For example, DukeofPerl provided a tool called TASK (I think) which generates par times for class, track and distance from a directory of BRIS PPs. I don't know how many lines of code that thing is, but it ran awfully fast.

I'd argue that to have long-term success or momentum as a project, the toolbox would also need to include at least a rudimentary predictor/handicapping app, and these days it would have to include the option to run on the web and mobile devices. Something like Handifast which allows for the changing the weights of multiple factors would allow the general public non-programmers (and even Excel users) to easily install, run, maintain and modify to take advantage of the expanding library of APIs.....

Robert Goren

07-03-2014, 10:01 AM

Is there a way to still get the DukeOfPerl program?

DeltaLover

07-03-2014, 10:36 AM

So Delta, do you have an idea of the "suite" of tools the project would start out with?

Being more of an "idea hamster" and less of a programmer, I'd like to see things like:

- command line tool to analyze a BRIS PP, MCP and/or DRF data file and perform a health check on all fields
- command line tool and web/browser gui to display PPs from data files, and like one poster mentioned, provide some of the display options like the BRIS PP Generator tool
- command line tool that imports a PP file, and assigns variables for all fields, to be used for other processing
- web/browser gui to "scrape" some of the popular web sites for info for things like odds tracking, scratches and changes, and most importantly - data mining from the Hong Kong Jockey Club web site - I want to know the last time my horse worked 4 furlongs, its splits during the last barrier trial, and how many times it went swimming in the last 21 days

These are just some basic building blocks - the APIs and routines for actual analysis and/or handicapping are endless. For example, DukeofPerl provided a tool called TASK (I think) which generates par times for class, track and distance from a directory of BRIS PPs. I don't know how many lines of code that thing is, but it ran awfully fast.

I'd argue that to have long-term success or momentum as a project, the toolbox would also need to include at least a rudimentary predictor/handicapping app, and these days it would have to include the option to run on the web and mobile devices. Something like Handifast which allows for the changing the weights of multiple factors would allow the general public non-programmers (and even Excel users) to easily install, run, maintain and modify to take advantage of the expanding library of APIs.....

Nice ideas, I like all of them and would like to hear more...

I created a wiki where you or anyone else can go ahead and add his comments and suggestions:

http://themindofagambler.com/mediawiki-1.23.1/index.php?title=Brainstrorming

DeltaLover

07-03-2014, 11:23 AM

Also, please feel free to append to the following design document with anything you might be interested on:

http://themindofagambler.com/mediawiki-1.23.1/index.php?title=Design

TrifectaMike

07-03-2014, 11:28 AM

DL,

I'm certain that when it comes to the integration portion of your project, you'll get it right. I understand your approach and I like it (reminds me of discussions about functional modules and hiding functions in the 70's and 80's).

In my opinion, the success or lack of real success will depend on "your" ability to take 'primitive" data and combine it into components which have MAXIMUM VARIANCE. If you can accomplish that arriving at a blackbox is not to difficult.

Mike

DeltaLover

07-03-2014, 11:43 AM

DL,

In my opinion, the success or lack of real success will depend on "your" ability to take 'primitive" data and combine it into components which have MAXIMUM VARIANCE. If you can accomplish that arriving at a blackbox is not to difficult.

Mike

This is a great comment that we should keep in mind..

I think that where a computerized approach might be most helpful, is exactly in the creation of derivatives since we can automate their parameterization and composition as well...

For example, a very common factor used today, has to do with third of the layoff. This factor can be generalized to use any kind of days as a 'layoff' and any kind of races after it; having this type of a factor the unknown becomes the proper combination of layoff days and number of races after it. This approach assumes that we have a valid method to classify the derivative factors and pick the fittest...

JJMartin

07-04-2014, 01:15 PM

Well, after all this dialog this project must go into development :)

raybo

07-04-2014, 01:31 PM

Well, after all this dialog this project must go into development :)

I agree! The first step appears to be accessing and importing data, keeping in mind that this must be "open-ended" regarding data sources/suppliers and their various formatting. Without accessible basic data nothing else related to horse racing can be done, IMO.

DeltaLover

07-04-2014, 01:47 PM

I agree! The first step appears to be accessing and importing data, keeping in mind that this must be "open-ended" regarding data sources/suppliers and their various formatting. Without accessible basic data nothing else related to horse racing can be done, IMO.

This is what I am doing right now:

https://github.com/deltalover/hoplato

Robert Goren

07-04-2014, 02:54 PM

DL,

I'm certain that when it comes to the integration portion of your project, you'll get it right. I understand your approach and I like it (reminds me of discussions about functional modules and hiding functions in the 70's and 80's).

In my opinion, the success or lack of real success will depend on "your" ability to take 'primitive" data and combine it into components which have MAXIMUM VARIANCE. If you can accomplish that arriving at a blackbox is not to difficult.

MikeI would like to welcome you back TM. Your contributions have been missed. I hope you stay.

DeltaLover

07-04-2014, 03:01 PM

If memory serves, I have seen a spreadsheet based application to display DRF files, can anyone point me to it?

DeltaLover

07-04-2014, 03:33 PM

If memory serves, I have seen a spreadsheet based application to display DRF files, can anyone point me to it?

I am not referring to Raybo's but a different one that I have tried before and was displaying past performances in a spreadsheet using custom format...

traynor

07-04-2014, 06:46 PM

I am not referring to Raybo's but a different one that I have tried before and was displaying past performances in a spreadsheet using custom format...

If you are building from the ground up, you may find it easier to use XML. That seems to be the most common format used by data providers. Whichever data provider(s) you use should have schemas on their sites.

DeltaLover

07-04-2014, 06:53 PM

If you are building from the ground up, you may find it easier to use XML. That seems to be the most common format used by data providers. Whichever data provider(s) you use should have schemas on their sites.

Since I am using mongo , json makes more sense..

traynor

07-05-2014, 12:33 PM

Then--at least in theory and in the spirit of open source--one subscriber could download BRIS files, parse them, and distribute the result to others?

raybo

07-05-2014, 01:15 PM

If memory serves, I have seen a spreadsheet based application to display DRF files, can anyone point me to it?

I don't know which spreadsheet you're talking about but it's just a matter of importing the data file, and then using the mapping for that file, reference the fields you want to display in the PPs, in the cell locations you want them to appear in, of course there is a lot of formatting required in order to get the PPs to look like the standard DRF PPs.

In Excel 2007 and subsequent versions, you can import data file directly, 1 horse per row, 1 field per cell. Doing this will cause a lot of duplications of data on each horse's row, like general track and race info, etc.. But the import will follow that data source's file mapping for the csv file you're importing.

I assume Open Office will accomplish the same thing, if it contains enough columns to import all the fields in the file, 1435 fields in the Brisnet single file (.drf and .mcp file extensions) and 1435 fields in the JCapper data file (.jcp extension), for instance.

If the spreadsheet app doesn't have enough columns then you have to split the fields between several rows, for each horse on the card. Mine uses 6 rows per horse for Excel 2003 and earlier versions (255 columns wide x 6 rows = 1530 fields).

raybo

07-05-2014, 01:20 PM

Then--at least in theory and in the spirit of open source--one subscriber could download BRIS files, parse them, and distribute the result to others?

That would be illegal of course - LOL.

Each user will have to purchase and download their own data files, only the open source software is distribute-able if it is PC based, or share-able, if it is online based.

DeltaLover

07-05-2014, 01:48 PM

I don't know which spreadsheet you're talking about but it's just a matter of importing the data file, and then using the mapping for that file, reference the fields you want to display in the PPs, in the cell locations you want them to appear in, of course there is a lot of formatting required in order to get the PPs to look like the standard DRF PPs.

In Excel 2007 and subsequent versions, you can import data file directly, 1 horse per row, 1 field per cell. Doing this will cause a lot of duplications of data on each horse's row, like general track and race info, etc.. But the import will follow that data source's file mapping for the csv file you're importing.

I assume Open Office will accomplish the same thing, if it contains enough columns to import all the fields in the file, 1435 fields in the Brisnet single file (.drf and .mcp file extensions) and 1435 fields in the JCapper data file (.jcp extension), for instance.

If the spreadsheet app doesn't have enough columns then you have to split the fields between several rows, for each horse on the card. Mine uses 6 rows per horse for Excel 2003 and earlier versions (255 columns wide x 6 rows = 1530 fields).

Sure, I know all this, but what I need is just an alternative importing mechanism so I can automate retrieval tests against it...

Hoofless_Wonder

07-06-2014, 03:15 AM

Is there a way to still get the DukeOfPerl program?

Looks like the links in the TASK thread are all dead, and I don't believe the Duke has posted for a while.

http://www.paceadvantage.com/forum/showthread.php?t=66828

I've got v1.3 of the EasyTask setup file which runs on windows - I don't believe there's an issue with distributing it, since dukeofperl had it on a freeware site at one point. PM if you'd like me to email it to you - it's just a smidge over 3 MB in size....

raybo

07-06-2014, 10:33 AM

Quote:
Originally Posted by TrifectaMike
DL,

In my opinion, the success or lack of real success will depend on "your" ability to take 'primitive" data and combine it into components which have MAXIMUM VARIANCE. If you can accomplish that arriving at a blackbox is not to difficult.

Mike

This is a great comment that we should keep in mind..

I think that where a computerized approach might be most helpful, is exactly in the creation of derivatives since we can automate their parameterization and composition as well...

For example, a very common factor used today, has to do with third of the layoff. This factor can be generalized to use any kind of days as a 'layoff' and any kind of races after it; having this type of a factor the unknown becomes the proper combination of layoff days and number of races after it. This approach assumes that we have a valid method to classify the derivative factors and pick the fittest...

By "primitive data", are you talking about primitive "factors" like date, track, surface, distance, post position, sex, age, medication, equipment, field size, fractional and final times, fractional and final beaten lengths, fractional and final positions, etc.. My understanding of primitive data is data that is not duplicated or does not include other factors, like raw times versus pace/speed figures, as the figures are derivatives of raw times.

If this is what you mean, what is "maximum variance" referring to? Please give me an example from racing data.

hcap

07-06-2014, 10:43 AM

By "primitive data", are you talking about primitive "factors" like date, track, surface, distance, post position, sex, age, medication, equipment, field size, fractional and final times, fractional and final beaten lengths, fractional and final positions, etc.. My understanding of primitive data is data that is not duplicated or does not include other factors, like raw times versus pace/speed figures, as the figures are derivatives of raw times.

If this is what you mean, what is "maximum variance" referring to? Please give me an example from racing data.I am guessing "composite" factors that are more effective than those simpler data components they are based upon. Remember, in Alldata I added that factor which combines race class with early speed within a certain time period.

DeltaLover

07-06-2014, 11:22 AM

I am guessing "composite" factors that are more effective than those simpler data components they are based upon. Remember, in Alldata I added that factor which combines race class with early speed within a certain time period.

I have to disagree.

Factor composition is a very important aspect of handicapping and I think it hides a lot of traps. The more complex a factor is, the less the available testing data are going to be, decreasing our level of confidence to it.

The most we have to gain from a factor is to understand how the crowd is going to bet. The crowd is always using simple factors that can be expressed with a single condition or two, like for example a MSW dropping to MCL for first time and understanding these main (simple) factors that are driving the pools adds the most value from a betting perspective.

traynor

07-06-2014, 11:51 AM

I have to disagree.

Factor composition is a very important aspect of handicapping and I think it hides a lot of traps. The more complex a factor is, the less the available testing data are going to be, decreasing our level of confidence to it.

With all due respect to the importance of levels of confidence in statistics supposedly representative of reality, I think confidence directly related to money in hand is a far better measure. One's level of confidence should be more related to success or failure in the real world than to statistical "proof."

If one is winning--and winning consistently--and winning over time, it really doesn't matter what the statistical level of confidence may be or not be. Reality trumps theory every time.

The most we have to gain from a factor is to understand how the crowd is going to bet. The crowd is always using simple factors that can be expressed with a single condition or two, like for example a MSW dropping to MCL for first time and understanding these main (simple) factors that are driving the pools adds the most value from a betting perspective.

Perhaps in the old days, but not in 2014. "The crowd" often includes some very serious people using some very serious computer applications to outperform everyone else. If profit is the motive, that is the real competition--not the elderly gentleman with a long overcoat and a stunned expression staring blankly at a Daily Racing Form.

raybo

07-06-2014, 12:11 PM

I am guessing "composite" factors that are more effective than those simpler data components they are based upon. Remember, in Alldata I added that factor which combines race class with early speed within a certain time period.

Harry, I understand what you're talking about, but I'm just trying to understand what Mike's post was saying, and what Delta's response to it was saying, in layman's terms.

What does Mike mean by "primitive data" and "maximum variance", and since Delta thought Mike's statement was very important, regarding the project's development, what was Delta's own understanding of that statement?

It would certainly make things easier to understand if terms and examples were used that do not require formal statistics or mathematical education.

hcap

07-06-2014, 12:18 PM

Factor composition is a very important aspect of handicapping and I think it hides a lot of traps. The more complex a factor is, the less the available testing data are going to be, decreasing our level of confidence to it.It is quite possible to assign a value to each runner in each race except for more limited obvious data points of 1st time starters and foreign unknowns. So the majority of runners will have things like win % and days away and early speed. If those three are available, each of those may be tested statistically using any weighted mix of ratios and composition resulting in equal sample sizes of runners to test.

Many of us have attempted a "silver" bullet mix. From personal experience, I know which ratios of what factors outperform others.

TrifectaMike?

In my opinion, the success or lack of real success will depend on "your" ability to take 'primitive" data and combine it into components which have MAXIMUM VARIANCE. If you can accomplish that arriving at a blackbox is not to difficult.

Mike Ray , I was typing my post as you were posting. As you can see, I agree

raybo

07-06-2014, 12:18 PM

I have to disagree.

Factor composition is a very important aspect of handicapping and I think it hides a lot of traps. The more complex a factor is, the less the available testing data are going to be, decreasing our level of confidence to it.

The most we have to gain from a factor is to understand how the crowd is going to bet. The crowd is always using simple factors that can be expressed with a single condition or two, like for example a MSW dropping to MCL for first time and understanding these main (simple) factors that are driving the pools adds the most value from a betting perspective.

I don't believe Harry (Hcap) said anything about "complex" when talking about "composite" factors. Surely you don't suggest that we use raw times and beaten lengths to handicap races. One invariably must "combine" raw factors in order to obtain anything useful from them, but one doesn't have to make them so complex that there is not enough data to analyze them. His example of a composite factor contained only 3 factors, which is not complex at all.

DeltaLover

07-06-2014, 12:26 PM

Harry, I understand what you're talking about, but I'm just trying to understand what Mike's post was saying, and what Delta's response to it was saying, in layman's terms.

What does Mike mean by "primitive data" and "maximum variance", and since Delta thought Mike's statement was very important, regarding the project's development, what was Delta's own understanding of that statement?

It would certainly make things easier to understand if terms and examples were used that do not require formal statistics or mathematical education.

My understanding is the following:

Primitive data: raw data, as unprocessed as possible. Meaning fractional times, fractional positions and anything else does not add a calculation layer like a speed figure, track variant etc

Combine into components: A layer of composition receiving primitives and providing derivatives

Maximum Variance: A property of a any component (derivative) that differentiate it from the average. For example consider a component picking horses completely randomly. Based on this component the value of any other (component) is relative to its deviation from it (either positively or negatively).

Note that the same concept can be applied to any other condition. For example we might consider a player betting all the favorites and compare him with other players who play again only favorites but are using some logic to skip some of them. Again among them we need to have as much variance as possible to detect the good and the bad.

hcap

07-06-2014, 01:12 PM

I think there are 2 general approaches. One is the glorified spot play mostly an AND query filtering from contention runners which do not meet all the conditions specified. Many of us have used excel to do this. As I mentioned I have done this using the advanced filter utility built into excel, filtering on up to 64,000 records and as many as 100 fields raw and composite factors in seconds while using any combination of those 100 fields (composites pre-processed in a importer program also excel,)

(Btw, excel 2003 is still the query champ unless excel power pivot can be made simple and workable in the newer versions)

This approach does have the drawback of cutting down sample size. But the other common approach is to assign some mix of value derived from lots of factors. This way all runners get rated.

Generally most programs commercially available and home grown cook up some combination of filtering and rating.

How would what is being proposed here work in concept in comparison to these existing approaches?

I left out model construction, both automatic and manual so as not to add another layer now to the discussion. Usually employed after the first 2 approached are completed, but IMO, should be part of the overall mix of a program

raybo

07-06-2014, 01:22 PM

My understanding is the following:

Primitive data: raw data, as unprocessed as possible. Meaning fractional times, fractional positions and anything else does not add a calculation layer like a speed figure, track variant etc

Combine into components: A layer of composition receiving primitives and providing derivatives

Maximum Variance: A property of a any component (derivative) that differentiate it from the average. For example consider a component picking horses completely randomly. Based on this component the value of any other (component) is relative to its deviation from it (either positively or negatively).

Note that the same concept can be applied to any other condition. For example we might consider a player betting all the favorites and compare him with other players who play again only favorites but are using some logic to skip some of them. Again among them we need to have as much variance as possible to detect the good and the bad.

Ok, so your understanding of "primitive" data is data that is the lowest level of data, like track code, surface, age, sex, days since last race, post position, assigned weight, listed equipment, listed medication. Are there any more "lowest level" factors? I assume that because there are so few of these lowest level factors that one must necessarily include other "2nd tier" factors to the list of "primitive" data, like fractional times, final times, beaten lengths, class designation (clm nw2, etc.), race distance. There aren't a lot of 2nd tier factors either, most other factors are further manipulations of those 2 lower level factors (variant, pace and speed figures, fractional and final/total velocities, etc.).

So, is the goal to analyze every possible combination of those primitive factors, into composite factors (derivatives), and then compare all of them against the race results from multiple horses, and multiple races, to see which ones produce higher than average finish position results, and lower than average finish position results? The "variance" being the degree of difference between the results produced by random selection and the results produced by the existence of those composite factors?

Is it necessary to recalculate pace and speed figures, running styles, etc., within the project or would it simply use the ones supplied in the data files? I guess what I'm trying to get at is, how far down the list of factors we start with, regarding "primitive data", before we begin combining them.

My own decision years ago, was to get away from data source supplied proprietary factors and go back to more basic raw data. But, their is a point at which one must rely on at least some calculated data, or you hit a wall regarding analysis.

hcap

07-06-2014, 01:27 PM

My understanding is the following:

Primitive data: raw data, as unprocessed as possible. Meaning fractional times, fractional positions and anything else does not add a calculation layer like a speed figure, track variant etc

Combine into components: A layer of composition receiving primitives and providing derivatives

Maximum Variance: A property of a any component (derivative) that differentiate it from the average. For example consider a component picking horses completely randomly. Based on this component the value of any other (component) is relative to its deviation from it (either positively or negatively).

Delta could you expand on derivatives as used here and in statistics?

DeltaLover

07-06-2014, 01:37 PM

I have just pushed an early release implementing a utility to import Bris PP to a mongo db.

Source code can be cloned from here:
https://github.com/deltalover/hoplato

PDF documentation can be found here:
http://www.themindofagambler.com/api.pdf

You will need python, nosetests, pip, pymongo and mongodb to run the code. You need to place your pp fileds in directories named after the year they reprsent and place them under any directory the name of which you will set to an environment variable as you can read in the documentation of importdata.py...

Also before you run nose or importdata, you need to do a python setup.py develop under the main directory..

Preferable you will need a 64 bit box since the volume of the data is very large... Let me know for any help you might need

raybo

07-06-2014, 01:53 PM

I have just pushed an early release implementing a utility to import Bris PP to a mongo db.

Source code can be cloned from here:
https://github.com/deltalover/hoplato

PDF documentation can be found here:
http://www.themindofagambler.com/api.pdf

You will need python, nosetests, pip, pymongo and mongodb to run the code. You need to place your pp fileds in directories named after the year they reprsent and place them under any directory the name of which you will set to an environment variable as you can read in the documentation of importdata.py...

Also before you run nose or importdata, you need to do a python setup.py develop under the main directory..

Preferable you will need a 64 bit box since the volume of the data is very large... Let me know for any help you might need

Dang, how many propective users/contributors did you just eliminate? Sheesh :bang:

DeltaLover

07-06-2014, 02:25 PM

Dang, how many propective users/contributors did you just eliminate? Sheesh :bang:

If you are referring to the 64bit , is not a requirement, a 32 will work as well, but with limited capacity due to mongo. Any way upgrading to 64 is very easy and the suggested thing to do. Yesrerday i have installed a 64bit in 20 min while i was watching the races at aqueduct.. no reason you should not do it.

raybo

07-06-2014, 02:43 PM

If you are referring to the 64bit , is not a requirement, a 32 will work as well, but with limited capacity due to mongo. Any way upgrading to 64 is very easy and the suggested thing to do. Yesrerday i have installed a 64bit in 20 min while i was watching the races at aqueduct.. no reason you should not do it.

By "64 bit box" I assume you mean a 64 bit processor. That means buying another processor, or a new computer, just to run this thing. But that is not really what my concern was about, I can deal with 32 bit not being optimal. But, having to download and install 5 pieces of software, and then go through all the stuff that that entails, and then being able to get the code into it, for someone unfamiliar with those pieces of software, would probably run most people away from this project, screaming!

Again, if you're going to aim this thing at programmers/tech people, why bother, they could probably do it themselves.

DeltaLover

07-06-2014, 03:05 PM

By "64 bit box" I assume you mean a 64 bit processor. That means buying another processor, or a new computer, just to run this thing. But that is not really what my concern was about, I can deal with 32 bit not being optimal. But, having to download and install 5 pieces of software, and then go through all the stuff that that entails, and then being able to get the code into it, for someone unfamiliar with those pieces of software, would probably run most people away from this project, screaming!

Again, if you're going to aim this thing at programmers/tech people, why bother, they could probably do it themselves.

Most modern processors can run a 64 bit, see here for more: http://superuser.com/questions/39713/what-are-the-requirements-for-running-a-64-bit-operating-system

I repeat that is not an absolute requirement, but I really see no reason to keep using a 32 bit OS in our days...

raybo

07-06-2014, 03:11 PM

Most modern processors can run a 64 bit, see here for more: http://superuser.com/questions/39713/what-are-the-requirements-for-running-a-64-bit-operating-system

I repeat that is not an absolute requirement, but I really see no reason to keep using a 32 bit OS in our days...

Never mind, you're missing the point.

DeltaLover

07-06-2014, 03:12 PM

Never mind, you're missing the point.

Can you expain your point then?

JJMartin

07-06-2014, 04:49 PM

Can you make an Excel interface for this thing? The user would select options through and Excel program and the background programs could do the work, and display the results? Whatever options are chosen it would have to be user friendly or there will be no audience.

Hoofless_Wonder

07-06-2014, 05:02 PM

Delta, I believe the point that Raybo is making is the skill level of the potential end user will make a big difference at who can use the toolbox.

If a user has enough skills to understand your toolset and successfully implement them, then they may very well be inclined to switch to their own preferred set of tools. Python? Why not perl or ruby or c++? Mongo DB? Why not mysql or Access? In the time it takes me to understand what your line of thinking is, the equivalent of the tool can be written in something else.

OTH, if the end user is a technically challenged person, then they may find the implementation well beyond their level of expertise. Read through the TASK thread to understand what I'm saying there - just the basic install and maintenance of a few pieces of open source code is going to be daunting to some folks here....

traynor

07-07-2014, 09:53 AM

Delta, I believe the point that Raybo is making is the skill level of the potential end user will make a big difference at who can use the toolbox.

If a user has enough skills to understand your toolset and successfully implement them, then they may very well be inclined to switch to their own preferred set of tools. Python? Why not perl or ruby or c++? Mongo DB? Why not mysql or Access? In the time it takes me to understand what your line of thinking is, the equivalent of the tool can be written in something else.

OTH, if the end user is a technically challenged person, then they may find the implementation well beyond their level of expertise. Read through the TASK thread to understand what I'm saying there - just the basic install and maintenance of a few pieces of open source code is going to be daunting to some folks here....

Some might be even more interested in what--exactly--this project will do that is not already being done by other (simpler) means? It would be dismaying to get it all glued together and discover the end capabilities are little more than could be accomplished with Excel and a bit of VBA code.

Tools are nice, but they are only useful insofar as they do meaningful work and accomplish meaningful tasks. Having a better hammer is only useful to those who have nails to drive. No nails, no use for a hammer.

DeltaLover

07-07-2014, 09:57 AM

Having a better hammer is only useful to those who have nails to drive. No nails, no use for a hammer.

:ThmbUp:

Robert Goren

07-07-2014, 10:39 AM

Remember that a lot of posters here ,if not most, can not write one line of code and have no interest in learning to doing so. You have decide whether you want a more or less finished product that is free to use or you want a some thing that the techies here can play with. I don't think you can make both work, but I could be wrong.

DeltaLover

07-07-2014, 10:59 AM

Remember that a lot of posters here ,if not most, can not write one line of code and have no interest in learning to doing so. You have decide whether you want a more or less finished product that is free to use or you want a some thing that the techies here can play with. I don't think you can make both work, but I could be wrong.

A finished product as you described, is build on top of multiple tiers, encapsulating several layers of abstractions leading from a simple GUI to more detailed concepts implemented in formal programming languages and means of storage like mongo, mysql, csv etc . At this stage we should concentrate to the definition and implementation of the necessary APIs that will later allow the creation of higher level applications.

DeltaLover

07-07-2014, 11:06 AM

Some might be even more interested in what--exactly--this project will do that is not already being done by other (simpler) means? It would be dismaying to get it all glued together and discover the end capabilities are little more than could be accomplished with Excel and a bit of VBA code.

Since you are making the point of simpler (?) means, I have a question for those using Excel of VBA for the construction of their software:

Assuming a medium size data base (let's say 300K races), how long it takes your application to process a universe of let's say 100 handicapping factors?

Each handicapping factor representing a scenario that can be expressed by any of the PP data for each starter of the race, while it might also need access to all the other starters of the race. For example:

Third of the Layoff

or

One of the top two BRIS PRIME POWER in a race where more than two horses are coming from a layoff

raybo

07-07-2014, 11:35 AM

300,000 races? I don't know if the later versions of Excel, that have over 1,000,000 rows and over 16,000 columns would filter on 300,000 races without bogging down, but Hcap said earlier that he could use the Advanced Filter function built into Excel to filter up to 100 factors, or combinations of factors, on 64,000 rows/races, in seconds. So, assuming Excel would handle 300,000 rows/races, then a few seconds times about 5?

I assume that the filtering could be split into 5 sections to alleviate having to look at 300,000 rows each cycle, and send the filtered results to a receiving area for each of the 5 sections, and then display the total results for all 5 sections. Hcap could answer this better than I, because I have never found a need to database 300,000 races. You're talking about several years of data files for every track in the US. I guess my point is, why would one want to look back several years and mix in every track in the US? ideally, wouldn't one want to either be track specific, or circuit specific, or database only the tracks one plays? And, wouldn't data that old get stale and not particularly representative of the future?

DeltaLover

07-07-2014, 11:45 AM

300,000 races? I don't know if the later versions of Excel, that have over 1,000,000 rows and over 16,000 columns would filter on 300,000 races without bogging down, but Hcap said earlier that he could use the Advanced Filter function built into Excel to filter up to 100 factors, or combinations of factors, on 64,000 rows/races, in seconds. So, assuming Excel would handle 300,000 rows/races, then a few seconds times about 5?

This timing involves the actual calculation of each factor for each horse or it assumes that they are precalculated?

raybo

07-07-2014, 11:52 AM

This timing involves the actual calculation of each factor for each horse or it assumes that they are precalculated?

One would probably have 2 workbooks, an importer/processor workbook, and a separate database workbook. All the factor calculations would be done during the processing routine, but once it's done then it's done, all those factors would already be in the database workbook, and you would simply filter on each of them.

DeltaLover

07-07-2014, 12:13 PM

One would probably have 2 workbooks, an importer/processor workbook, and a separate database workbook. All the factor calculations would be done during the processing routine, but once it's done then it's done, all those factors would already be in the database workbook, and you would simply filter on each of them.

OK, my question was reffering to what you call processing routine.

How long it takes for this application to calculate the factors from raw data? (starting from raw DRF files with nothing else precalculated)

raybo

07-07-2014, 12:26 PM

OK, my question was reffering to what you call processing routine.

How long it takes for this application to calculate the factors from raw data? (starting from raw DRF files with nothing else precalculated)

Harry will have to answer that, his importer is much faster than the one I use, mine takes about 1/2 second or so per race, to import the card and process each race on the card.

DeltaLover

07-07-2014, 12:31 PM

Harry will have to answer that, his importer is much faster than the one I use, mine takes about 1/2 second or so per race, to import the card and process each race on the card.

So, this mean that we need 150,000 seconds or 2,500 minutes or 41 hourse to import 300K races?

raybo

07-07-2014, 12:53 PM

So, this mean that we need 150,000 seconds or 2,500 minutes or 41 hourse to import 300K races?

I suppose, with my program. I said Hcap's importer is much faster than mine because mine also creates PPs, Pace Ratings, Summaries, and Black Box views, among other things, including all of my formatting which means splitting each horse's raw data onto 6 rows in Excel because of the column restrictions in Excel 2002 (which I use) of 255 columns while the raw data file has 1435 fields.

If you were only importing the file, one row per horse, and only calculating the factors to create an export file for the database, then the processing time would be much faster.

How fast will yours import, process, and calculate?

We're getting off topic here, the problem with your project, as I see it, is that in order to run anything the user will have to download several pieces of software, install them, configure them, etc., etc., while many of the handicappers I have dealt with have trouble creating folders and putting data files into them, on their computer, much less trying to learn how to operate all these pieces of software, to say nothing of any possible more technical aspects that might be required of them.

You're saying that before I can even look at what you have created I will have to download 5 pieces of software and whatever else you said had to be done. Are all these pieces of software free? What is involved in installing and configure them. What else has to be done to get the code into the software, and how do you do that? Which software actually runs the code, and how do you do that? I'm not stupid when it comes to computer stuff, but even I have doubts about ever being able to work in the tool you have envisioned.

DeltaLover

07-07-2014, 01:11 PM

We're getting off topic here, the problem with your project, as I see it, is that in order to run anything the user will have to download several pieces of software, install them, configure them, etc., etc., while many of the handicappers I have dealt with have trouble creating folders and putting data files into them, on their computer, much less trying to learn how to operate all these pieces of software, to say nothing of any possible more technical aspects that might be required of them.

You're saying that before I can even look at what you have created I will have to download 5 pieces of software and whatever else you said had to be done. Are all these pieces of software free? What is involved in installing and configure them. What else has to be done to get the code into the software, and how do you do that? Which software actually runs the code, and how do you do that? I'm not stupid when it comes to computer stuff, but even I have doubts about ever being able to work in the tool you have envisioned.

Please do not be scared by this “5 pieces of software and whatever else you said had to be done”, I assure you that it is a very simple process and once you do it once or twice it look like a piece of cake!

The installation process is very simple. Usually all you need to do is either execute one command line and there is no or very little configuration needed.

All the technologies I am working with are Open Source and can be downloaded and installed completely free of any charge. The source code for all of them is also available in case we need to contribute to it by adding any necessary feature or fixing a known bug.

In most of my systems, I have automated builds that take a machine from scratch and install all the necessary components completely automatically, without any kind of human intervention... The software I use for automated builds is http://jenkins-ci.org/ which of course is open source and very high quality (used by thousand of shops across the globe).. I almost always install my software on a VM and more precisely http://www.vagrantup.com/ which works great in both M$ and LINUX platforms and should certainly be considered for any kind of development.

How fast will yours import, process, and calculate?

I will answer this question shortly, providing all the related code and time stats. But let's continue the discussion about the timing of the DRF importing for a few more postings...

Anyone else has to say how long it takes him?

JJMartin

07-07-2014, 01:31 PM

Since you are making the point of simpler (?) means, I have a question for those using Excel of VBA for the construction of their software:

Assuming a medium size data base (let's say 300K races), how long it takes your application to process a universe of let's say 100 handicapping factors?

Each handicapping factor representing a scenario that can be expressed by any of the PP data for each starter of the race, while it might also need access to all the other starters of the race. For example:

Third of the Layoff

or

One of the top two BRIS PRIME POWER in a race where more than two horses are coming from a layoff

I have built a program on Excel that does what you're saying and this is how it works:
Step 1 - select track and number of cards with results.
Step 2 - load and convert card to columns in Excel
step 3 - process structuring routine and calculations based on specific UDM
step 4 - add processed card to database
step 5 - repeat until all cards are done.

Total time about 6.5 seconds on XP (1.81 GHz) with Excel 2010 per card.
Used to be more than 30 seconds per card but time was reduced after applying improved methods and code.

With your idea, you would need a database program that would already have all the cards and results data preloaded and possibly some structuring already done for processing time purposes. 300k races is not medium size, that is tremendous. But I guess that is a matter of opinion. With my program I make Track profile databases per track per year (racing season). This amounts to about 100 to 200 days (about 1000 to 2000 races per database). Excel is not going to be suitable for 300k except perhaps as an interface with all the data somewhere else.

raybo

07-07-2014, 01:42 PM

I have built a program on Excel that does what you're saying and this is how it works:
Step 1 - select track and number of cards with results.
Step 2 - load and convert card to columns in Excel
step 3 - process structuring routine and calculations based on specific UDM
step 4 - add processed card to database
step 5 - repeat until all cards are done.

Total time about 6.5 seconds on XP (1.81 GHz) with Excel 2010 per card.
Used to be more than 30 seconds per card but time was reduced after applying improved methods and code.

With your idea, you would need a database program that would already have all the cards and results data preloaded and possibly some structuring already done for processing time purposes. 300k races is not medium size, that is tremendous. But I guess that is a matter of opinion. With my program I make Track profile databases per track per year (racing season). This amounts to about 100 to 200 days (about 1000 to 2000 races per database). Excel is not going to be suitable for 300k except perhaps as an interface with all the data somewhere else.

Yeah, that's about what my program does, on my machine (slow) about 9-12 seconds per card, on Hcap's machine using my program about 6-8 seconds per card, depending on number of horses and races. But once the card is imported it only takes about 1/2 second to process a race, which includes all the 30+ mb of formulas and formatting in the program, which severely degrades processing time.

I agree, Excel is not the best receptacle for large data storage (and I agree 300k races is not "medium", IMO, not a lot of players have access to that many cards), but then Excel was not designed to be used for that, it is an analysis app, not a database app. Access was designed to be the MS database app..

hcap

07-07-2014, 01:44 PM

So, this mean that we need 150,000 seconds or 2,500 minutes or 41 hourse to import 300K races?
Generally became of excels', memory limitations it is wise to do import and pre-processing in one program and analyse data and filter in another. To import 1400+ fields in 2003 fields needs vba, just to open all fields. 2007 and up does it natively but requires vba after the fact to move data around and some other housekeeping chores. Filtering is pretty fast in 2003 up to its row limit and pivot table usage is fast in all versions. Importing takes 4 seconds per card in 2003 and 2 seconds in later versions. I do not see a much faster way soon, but could be wrong. My vba is ok, but not professional by any means.

Later versions 2007 and up also have the new power pivot feature which is "supposed to handle millions of rows. I found it very buggy, but really did not work at it. 2003 does many things quicker than 2007 and 2010 my 2 versions. The advanced filter is fast.
Power pivot was an add in in both. Came built in later versions. Maybe the newest does it well.Theoretically a giant leap for data analyzation.

http://msdn.microsoft.com/en-us/library/gg413497%28v=sql.110%29.aspx

http://msdn.microsoft.com/en-us/library/gg399131%28v=sql.110%29.aspx

The data that you work on in the PowerPivot window is stored in an analytical database inside the Excel workbook, and a powerful local engine loads, queries, and updates the data in that database. The PowerPivot data can be further enriched by creating relationships between the tables in the PowerPivot window. And because PowerPivot data is in Excel, it is immediately available to PivotTables, PivotCharts, and other features in Excel that you use to aggregate and interact with data. All data presentation and interactivity are provided by Excel 2010; and the PowerPivot data and Excel presentation objects are contained within the same workbook file. PowerPivot supports files up to 2GB in size and enables you to work with up to 4GB of data in memory.

In addition to the graphical tools that help you to analyze your data, PowerPivot includes Data Analysis Expressions (DAX). DAX is a new formula language that extends the data manipulation capabilities of Excel to enable more sophisticated and complex grouping, calculation, and analysis. The syntax of DAX formulas is very similar to that of Excel formulas, using a combination of functions, operators, and values. For more information, see Data Analysis Expressions (DAX) Overview.

whodoyoulike

07-07-2014, 02:04 PM

... Assuming a medium size data base (let's say 300K races), how long it takes your application to process a universe of let's say 100 handicapping factors?

Each handicapping factor representing a scenario that can be expressed by any of the PP data for each starter of the race, while it might also need access to all the other starters of the race. For example:

Third of the Layoff

or

One of the top two BRIS PRIME POWER in a race where more than two horses are coming from a layoff

300,000 races?... I guess my point is, why would one want to look back several years and mix in every track in the US? ideally, wouldn't one want to either be track specific, or circuit specific, or database only the tracks one plays? And, wouldn't data that old get stale and not particularly representative of the future?

DeltaLover:

Raybo has raised a very good question which needs to be answered before continuing with this project. Why would you think you would need a database this large when a much smaller (and practical) one could provide a similar answer?

JJMartin

07-07-2014, 02:11 PM

DeltaLover:

Raybo has raised a very good question which needs to be answered before continuing with this project. Why would you think you would need a database this large when a much smaller (and practical) one could provide a similar answer?

To be meaningful to handicapping, a per track and prior year analysis would be ideal imo. A big ass DB with half a million races would address different types of questions which could also be useful.

whodoyoulike

07-07-2014, 02:19 PM

To be meaningful to handicapping, a per track and prior year analysis would be ideal imo. A big ass DB with half a million races would address different types of questions which could also be useful.

Why not a million races? Do you really think your results would be different with a half million races versus a million versus 10,000?

JJMartin

07-07-2014, 02:24 PM

Why not a million races? Do you really think your results would be different with a half million races versus a million versus 10,000?
Yes

But just to clarify, I am in favor of smaller (few thousand) amounts by track.

whodoyoulike

07-07-2014, 02:29 PM

Yes

But just to clarify, I am in favor of smaller (few thousand) amounts by track.

You can test this with your own current DB. Query 1/2 (or some portion of) your DB size and see if you come up with a significantly different answer. If there is a difference, analyze why it's so.

Do you own a mainframe computer?

raybo

07-07-2014, 02:31 PM

I'm not trying to suggest that Delta not design something that will handle 300k races, because that may well be what some user in the future might want to do, and we need to plan for all types of usage. Large databases DO answer generalized questions, while smaller, more specific, databases answer more specific questions. Both methods have their own place in analysis. I'm just concerned about all the technicality issues of what he has presented thus far. I really don't envision myself going through, what I perceive as a whole bunch of prep, before ever being able to operate the app.

Also, I wonder if Delta realizes how much user support function he will have to perform. Sure, he may get a couple of people to help with that, but my own experience is that one will spend several hours per day, 6 or 7 days per week performing updates, upgrades, and user support functions, not to mention trying to fit in your own personal play. It requires a huge commitment of time and energy.

raybo

07-07-2014, 02:34 PM

Why not a million races? Do you really think your results would be different with a half million races versus a million versus 10,000?

Yes, depending on what questions you are wanting to answer, and how long you want to wait for your live play to actually obtain what your answer was. If you're looking at 6 or 7 years of data, and you ask a question about the whole set, how long do you suppose it will take, in live play, to realize that answer?

JJMartin

07-07-2014, 02:40 PM

Also, I wonder if Delta realizes how much user support function he will have to perform. Sure, he may get a couple of people to help with that, but my own experience is that one will spend several hours per day, 6 or 7 days per week performing updates, upgrades, and user support functions, not to mention trying to fit in your own personal play. It requires a huge commitment of time and energy.

That is for sure, it took about 8 yrs to get where I am now. However with many experienced people working together...maybe less time, lol.

Robert Goren

07-07-2014, 02:50 PM

The likelihood of finding something profitable that has a database of 10,000 races is close to zero. Some things you have you have to go back to 1990s or earlier. There is horse racing article in the 2+2 magazine for July where the author went back 9 years to get a thousand NYRA races that fit his system. An average 2 plays a weeks for +13.5% return over 9 years. How huge is his total data base?

sjk

07-07-2014, 02:51 PM

For perspective on the number of races in a db, I have right at 300k races in the US since 1/1/2008 and I am only missing some of the small tracks and odd distances.

With today's computers process time is not an issue for me.

DeltaLover

07-07-2014, 02:54 PM

OK... Let's forget 300K and talk only for 20K... I agree that my original number of 300k was very large and certainly not a medium size db... Taking the 1/2 sec per race speed, RAYBO's answer is going to be two hours for 20 K races... Correct?

whodoyoulike

07-07-2014, 02:56 PM

Yes, depending on what questions you are wanting to answer, and how long you want to wait for your live play to actually obtain what your answer was. If you're looking at 6 or 7 years of data, and you ask a question about the whole set, how long do you suppose it will take, in live play, to realize that answer?

... For example:

Third of the Layoff

or

One of the top two BRIS PRIME POWER in a race where more than two horses are coming from a layoff

These two questions above.

I don't understand the "... and how long you want to wait for your live play to actually obtain what your answer was."

sjk

07-07-2014, 02:58 PM

OK... Let's forget 300K and talk only for 20K... I agree that my original number of 300k was very large and certainly not a medium size db... Taking the 1/2 sec per race speed, RAYBO's answer is going to be two hours for 20 K races... Correct?

Downloads of even 20k races takes time and costs a lot of money.

I saw Ed Bain was trying to sell a database or races back to 2006 for $30,000.

LOL

Hoofless_Wonder

07-07-2014, 03:09 PM

I agree with Raybo's statements here on the complexity of the toolbox. The assumption that a database is required for better long-term results is probably valid, but in my mind that's a Phase 2 or 3 or 4 stage function, not out of the gate. It's like handing me an air-wrench or chain saw when I don't yet know how to use a screwdriver (or hammer).

IMHO, Phase 1 should be focused on parsing out a BRIS file to readable PPs or variables that can be used for other functions. At first, this won't provide any more functionality (and actually less) than Excel, but without the foundation to build on, many of us wanna be developers will be lost.

Delta, you're certainly within your rights to draw the lines where you see fit, but I believe Raybo is correct that many folks will roll their eyes and go back to using Acroread and a PDF file of PPs.

Since I use a 64-bit machine, dual bootable in Linux and Vista, I'll give the toolset a try. I also have 7 years of test and integration experience, so this should be within my capabilities. But, just searching on the mongo DB, the package manager listed out several dozen "plugins" for it......and my eyes started rolling......

whodoyoulike

07-07-2014, 03:12 PM

The likelihood of finding something profitable that has a database of 10,000 races is close to zero. Some things you have you have to go back to 1990s or earlier. There is horse racing article in the 2+2 magazine for July where the author went back 9 years to get a thousand NYRA races that fit his system. An average 2 plays a weeks for +13.5% return over 9 years. How huge is his total data base?

So, he found something which allows him about 100 plays a year for the last 9 years? I wonder what he noticed where he felt he needed that much data over such an extended period of time? I guess my point, did he really need to go back that far? Why not back to the 1980's?

I believe one should gather just enough data to be able to make a decision.

whodoyoulike

07-07-2014, 03:24 PM

OK... Let's forget 300K and talk only for 20K... I agree that my original number of 300k was very large and certainly not a medium size db... Taking the 1/2 sec per race speed, RAYBO's answer is going to be two hours for 20 K races... Correct?

I don't know much about this database stuff but wouldn't it be better to work off a subset of the entire database? Wouldn't that allow your program to run faster but, again I'm not a programmer?

DeltaLover

07-07-2014, 03:26 PM

Take a look here:

http://themindofagambler.com/screen-shot-fast-processing.jpeg

Over 6K races process per second for 4 handicapping factors (I do not have access to a full universe of factors coded in C++ right now)....

Processing 18,472 races in 3 seconds...

(I have not bind the code to retrieve results, this will add few seconds for the whole universe or races)...

Code can be seen and cloned from here

https://github.com/deltalover/samples

Please let me know if you have any build issues or any questions about how this speed is achived (all is in the code though!)

whodoyoulike

07-07-2014, 03:43 PM

What info were you trying to see?

whodoyoulike

07-07-2014, 03:45 PM

For perspective on the number of races in a db, I have right at 300k races in the US since 1/1/2008 and I am only missing some of the small tracks and odd distances.

With today's computers process time is not an issue for me.

How does 300k races work for you?

DeltaLover

07-07-2014, 03:49 PM

What info were you trying to see?

ONly four handicapping factors that can be seen here:

https://github.com/deltalover/samples/blob/master/fast_factor_maker/handicappingfactors.h

Adding more factors will increase timing very-very little... i have comparable results with as many as 64 factors... Having this kind of a speed makes it possible to optimize using a GA. For example if you have an expression containing several constants, like layoff days and average layoff you might need to optimize them for better results, in this case you need a very fast way to go through the universe of the races since the algorithm will require several iterations...

Is not that we necessary will use 300K races but we might use 20k many times during one run. This is why we need a fast algortithm for this kind of processing

Robert Goren

07-07-2014, 03:49 PM

So, he found something which allows him about 100 plays a year for the last 9 years? I wonder what he noticed where he felt he needed that much data over such an extended period of time? I guess my point, did he really need to go back that far? Why not back to the 1980's?

I believe one should gather just enough data to be able to make a decision.I suspect since is a writer, he wanted a large enough sample to ward off any criticism.

sjk

07-07-2014, 04:23 PM

How does 300k races work for you?

I actually have races back to 1993 but see no reason to carry all that history around in a file I use every day.

I think 100k is probably big enough for all practical purposes.

whodoyoulike

07-07-2014, 04:38 PM

I remember you now. You're the Access guy. Doesn't it have file size limitations? Do you still use Access?

sjk

07-07-2014, 05:08 PM

Yes I still use Access and have never found anything it was not able to get done.

It is limited to 2Mb which is one of the reasons I have truncated the really old stuff.

whodoyoulike

07-07-2014, 05:12 PM

Trying to understand your results in post #192, what is the difference between BigWinnerLastOut and WinnerLastOut? Were there 11305 (8.22%) horses out of 137364 of the total number of horses included in WinnersLastOut?

Thanks.

whodoyoulike

07-07-2014, 05:14 PM

Yes I still use Access and have never found anything it was not able to get done.

It is limited to 2Mb which is one of the reasons I have truncated the really old stuff.

Is it 2Mb or 2Gb? I think my version was 1Gb.

sjk

07-07-2014, 05:16 PM

Sorry. It is GB. The size limitation doubled back around 2000 but the files converted to twice the size so there was no real gain.

It takes some effort to work around the size limit but nothing unmanageable.

DeltaLover

07-07-2014, 05:23 PM

Yes I still use Access and have never found anything it was not able to get done.

It is limited to 2Mb which is one of the reasons I have truncated the really old stuff.

Given the wide range of enterprise quality open source databases available today, I would not touch Access, JET or mdb files even with a ten feet pole.. Any data sector is limited to only 2GB while its SQL flavor feels very strange and capricious. The fact that is a one file database limits the db capabilities and this is one of the main reasons why it becomes so slow as data is growing.. I also believe that it also has a relatively low number of concurrent users.. If memory serves MSAccess does not even support nested subqueries...

sjk

07-07-2014, 05:24 PM

I hope I don't have to give the winnings back

DeltaLover

07-07-2014, 05:26 PM

Trying to understand your results in post #192, what is the difference between BigWinnerLastOut and WinnerLastOut? Were there 11305 (8.22%) horses out of 137364 of the total number of horses included in WinnersLastOut?

Thanks.

Nothing interesting is going on there... Two silly factors just to demo the retrieval speed and nothing more...

Big winners are horses coming from a last performance where they won with more than 5 lengths while winners won with any margin.. This code can be found here;

https://github.com/deltalover/samples/blob/master/fast_factor_maker/handicappingfactors.h

Again this is just for demo purposes..

DeltaLover

07-07-2014, 05:27 PM

I hope I don't have to give the winnings back

Hmm...

As far as your winnings are going, I guess we have to take your word for it...

sjk

07-07-2014, 05:29 PM

I have been around here a lot longer than you so perhaps you should.

DeltaLover

07-07-2014, 05:36 PM

I have been around here a lot longer than you so perhaps you should.

Let's concentrate on the topic of this thread please...

You can always start as many new threads you wish about Access, bragging and red boarding if this is why you are here...

sjk

07-07-2014, 05:39 PM

A point that I have been trying to make is that it takes a substantial effort in time and money to build and use a database. You might guess that if I have spent $10-$20k and 15 minutes a day for the last 20 years just to keep the data complete and up to date it is probably not all to lose money.

A conundrum that I live with as a keeper of a database is if I miss a day I am done for life. Anytime I go on vacation I need to take the time to catch things up or all the past work gets me nowhere.

To bring it back to the current thread I think it will be difficult to get many others engaged to the point of actually spending time and money without a clearer vision of the end point.

whodoyoulike

07-07-2014, 05:48 PM

Let's concentrate on the topic of this thread please...

You can always start as many new threads you wish about Access, bragging and red boarding if this is why you are here...

Hey! I may have unintentionally sidetracked things. I appreciated his previous posts.

Getting back to my question in post #200. Were there 11305 (8.22%) horses out of 137364 of the total number of horses included in WinnersLastOut? I'm just trying to understand the results nothing more. I'm not trying to hold you or your programming to anything. I'm also guessing the database is for actual data.

Thanks.

DeltaLover

07-07-2014, 05:51 PM

A point that I have been trying to make is that it takes a substantial effort in time and money to build and use a database. You might guess that if I have spent $10-$20k and 15 minutes a day for the last 20 years just to keep the data complete and up to date it is probably not all to lose money.

A conundrum that I live with as a keeper of a database is if I miss a day I am done for life. Anytime I go on vacation I need to take the time to catch things up or all the past work gets me nowhere.

To bring it back to the current thread I think it will be difficult to get many others engaged to the point of actually spending time and money without a clearer vision of the end point.

I do not think it is such a difficult task to maintain a horse racing database.

In contrary, I am tempted to say it is a trivial task, given the limited size of the data and the easiness of their retrieval. Comparing to today's Big Data landscape. horse racing information feels like a very tiny domain that can be handled fairly easy (and it certainly is)...

All the related steps, like file FTP from bris and data cleaning can be automated to the degree of almost never having to manually intervene with the process.

Missing a day does not really makes any difference and more than this any single day can be easily retrieved from BRIS archives...

DeltaLover

07-07-2014, 05:53 PM

Hey! I may have unintentionally sidetracked things. I appreciated his previous posts.

Getting back to my question in post #200. Were there 11305 (8.22%) horses out of 137364 of the total number of horses included in WinnersLastOut? I'm just trying to understand the results nothing more. I'm not trying to hold you or your programming to anything. I'm also guessing the database is for actual data.

Thanks.

Obviously a BIG WINNER is contained in WINNERS as well.. Again you can check the code to verify this.

hcap

07-08-2014, 08:02 AM

Take a look here:

http://themindofagambler.com/screen-shot-fast-processing.jpeg

Over 6K races process per second for 4 handicapping factors (I do not have access to a full universe of factors coded in C++ right now)....

Processing 18,472 races in 3 seconds...

(I have not bind the code to retrieve results, this will add few seconds for the whole universe or races)...

Code can be seen and cloned from here

https://github.com/deltalover/samples

Please let me know if you have any build issues or any questions about how this speed is achived (all is in the code though!)Pretty impressive. :cool:

Is there any way to create dynamic models for those of us who would rather work with "fresh" data. In other words take a current snapshot-a selectable period- at a track-at a specific distance or surface or class, possibly using all three or more, and gauge what is effective in terms of win% and roi, and test new races not yet included in this models' data set? In do understand by manually altering dates one can do this, but can it be automated?

traynor

07-08-2014, 09:07 AM

Pretty impressive. :cool:

Is there any way to create dynamic models for those of us who would rather work with "fresh" data. In other words take a current snapshot-a selectable period- at a track-at a specific distance or surface or class, possibly using all three or more, and gauge what is effective in terms of win% and roi, and test new races not yet included in this models' data set? In do understand by manually altering dates one can do this, but can it be automated?

I think that creating a GUI for end users to create simple queries (a la HTR and others) is easy. The more complex the query (meaning queries involving more than a few elements of primary data) the more likely it is that one will need to get into the actual coding of those queries.

However, the upside is that the more you learn about asking the right questions and creating the right queries, the more useful the information generated as a result. That is a gentle way of saying you will probably need to learn enough code and coding techniques to write your own queries to really benefit from the tools being created.

hcap

07-08-2014, 09:57 AM

I sort of knew you would say that. Although it is certainly tempting to increase processing speed by a factor pf 100x or so, learning a whole new language at my age, (67) and after many years of work in excel and vba (self-taught like most excel users posting here) , a bit beyond what I am willing or able to do. Thanks :)

DeltaLover

07-08-2014, 10:17 AM

Pretty impressive. :cool:

Is there any way to create dynamic models for those of us who would rather work with "fresh" data. In other words take a current snapshot-a selectable period- at a track-at a specific distance or surface or class, possibly using all three or more, and gauge what is effective in terms of win% and roi, and test new races not yet included in this models' data set? In do understand by manually altering dates one can do this, but can it be automated?

I think that creating a GUI for end users to create simple queries (a la HTR and others) is easy. The more complex the query (meaning queries involving more than a few elements of primary data) the more likely it is that one will need to get into the actual coding of those queries.

However, the upside is that the more you learn about asking the right questions and creating the right queries, the more useful the information generated as a result. That is a gentle way of saying you will probably need to learn enough code and coding techniques to write your own queries to really benefit from the tools being created.

As Traynor says, a GUI driven approach is not suitable for this kind of queries, this is why we need to either code them directly. Coding a factor, can either be done is a more imperative language (like C++, C#, Java or Python) or in a mini language (DSL) which will provide enough decorativeness to the point of been usable from an end (power) user.

I have followed this kind of an approach in the past, mainly to allow Stock and Forex traders to describe and optimize their strategies, with great success and I am doing the same in my own handicapping software for years. Of course there is some learing curve associated with it, but when the user reaches a certain level of expertize he can usually work quickly and independently..

The main challenge I can see with the DSL approach, has to do with performance. The necessary interpretation layers slow down the whole process and this is the main reason why I prefer coding my factors directly in C++ and re-complile...

One interesting, related topic is the automation of the generation of the factors, which represents an even more time consuming process.. Towards this direction we need to use some form of an expression tree, which can be implemented in various ways: LISP is a perfect fit or C# - .NET which natively supports them since they were needed for the implementation of LINQ... There is a lot to talk about this component and probably it deserves its own thread..

I sort of knew you would say that. Although it is certainly tempting to increase processing speed by a factor pf 100x or so, learning a whole new language at my age, (67) and after many years of work in excel and vba (self-taught like most excel users posting here) , a bit beyond what I am willing or able to do. Thanks :)

I understand your concern about learning a new language (and age has nothing to do with it!) but again here we are talking about a very declarative and easy to use, that you should learn its basics in an hour and start feeling comfortable after a few days of use..

TrifectaMike

07-08-2014, 10:42 AM

DL, an admirable start to what I believe will be a powerful tool suite.

Check your pm's.

Mike

DeltaLover

07-08-2014, 10:46 AM

DL, an admirable start to what I believe will be a powerful tool suite.

Check your pm's.

Mike

No pm yet... Maybe send it again?

DeltaLover

07-08-2014, 10:55 AM

Superior Speed Figures as a dependency of high processing speed

One of the most important advantage of having a fast parser been able to go through thousands of cards per second can be seen in the ability to create superior speed rating which will be customizable on the fly. The decoration of the PP with this type of ratings, covering each horse individually and the race itself as as component, will be one of the next steps towards the creation of the toolkit...

traynor

07-09-2014, 04:47 PM

You might find a great deal more enthusiasm for an open-source project to scrape results (free) from HKJC, parse them, and load the data into a usable format for research and handicapping. Subscriptions are all well and good for those already addicted to horse racing as a hobby. The same subscriptions (and the expense involved) are a serious turnoff to (potential bettors) interested in knowing more. Especially in conjunction with endless warnings that winning is either impossible or only (barely) possible after decades of study, trial-and-error, and losing.

The biggest advantage of open-source is that it is free. An open-source data mining and handicapping app that works on free data seems like a really good idea. (Not to mention that it would save me many hours of coding my own apps.)

DeltaLover

07-09-2014, 04:53 PM

You might find a great deal more enthusiasm for an open-source project to scrape results (free) from HKJC, parse them, and load the data into a usable format for research and handicapping. Subscriptions are all well and good for those already addicted to horse racing as a hobby. The same subscriptions (and the expense involved) are a serious turnoff to (potential bettors) interested in knowing more. Especially in conjunction with endless warnings that winning is either impossible or only (barely) possible after decades of study, trial-and-error, and losing.

The biggest advantage of open-source is that it is free. An open-source data mining and handicapping app that works on free data seems like a really good idea. (Not to mention that it would save me many hours of coding my own apps.)

links??

traynor

07-09-2014, 06:59 PM

links??

Doubt that will be the issue in Hong Kong.

Blood Horse Quote
The Hong Kong Jockey Club reported a record total handle of HK$101.838 billion for the 2013-14 season, which concluded July 6.
:eek: :eek: :eek:
Read more on BloodHorse.com: http://www.bloodhorse.com/horse-racing/articles/86035/hong-kong-season-ends-with-handle-

From THE DEMISE OF RACING??? in General Racing Discussion. More links on that thread.

Seabiscuit@AR

07-10-2014, 12:28 AM

Make sure you ask the HKJC for permission to scrape their website first, their results have Copyright on the bottom of each page and there is this message on their website

http://www.hkjc.com/english/corporate/corp_copyright.asp

All contents and information, including but not limited to graphical design, navigation links and programming, are proprietary to The Hong Kong Jockey Club and are subjected to copyright protection. Republication, redistribution or unauthorized use of any content or information contained in this website is expressly prohibited without the prior written consent of The Hong Kong Jockey Club.

DJofSD

07-10-2014, 08:53 AM

Make sure you ask the HKJC for permission to scrape their website first, their results have Copyright on the bottom of each page and there is this message on their website

http://www.hkjc.com/english/corporate/corp_copyright.asp

All contents and information, including but not limited to graphical design, navigation links and programming, are proprietary to The Hong Kong Jockey Club and are subjected to copyright protection. Republication, redistribution or unauthorized use of any content or information contained in this website is expressly prohibited without the prior written consent of The Hong Kong Jockey Club.
Poppycock.

You can not copyright facts.

DeltaLover

07-10-2014, 10:05 AM

For now I will focus on our racing only..

As a side note, although Hong Kong racing looks so impressive from the first glance, the only reason I might bet in a foreign pool, would be be the offering of fixed odds (which I believe does not exist for HK racing) and nothing else.

Seabiscuit@AR

07-10-2014, 10:35 AM

DJofSD

I believe British racing lost a copyright case. However Australian racing fields are copyrighted. They got the various state governments to enact legislation after the British lost their case so Australian vs British racing is different in that regard

Not sure the stuff on HKJC website is simply facts. Would take a fair bit of work to reproduce the info that the HKJC supply on their website

DJofSD

07-10-2014, 10:42 AM

Some links for those two instances would be appreciated.

Copyrighted material, at least in the US and Europe, can still be used under the fair use concept.

What are the laws and guidelines for HK is beyond me.

traynor

07-10-2014, 10:42 AM

For now I will focus on our racing only..

As a side note, although Hong Kong racing looks so impressive from the first glance, the only reason I might bet in a foreign pool, would be be the offering of fixed odds (which I believe does not exist for HK racing) and nothing else.

Not surprising. There are a number of apps already, and more being developed. None are open source. That, too, is not surprising.

Seabiscuit@AR

07-11-2014, 01:00 AM

DJofSD

Here is a news story about British racing losing their copyright case in 2004

http://www.telegraph.co.uk/sport/football/2389824/Football-and-racing-lose-data-cash-case.html

Here is a news story on the Australian case

http://www.abc.net.au/news/2012-03-30/racing-nsw-wins-court-battle/3922762

DJofSD

07-11-2014, 07:39 AM

Thanks!

TrifectaMike

07-11-2014, 08:26 PM

Superior Speed Figures as a dependency of high processing speed

One of the most important advantage of having a fast parser been able to go through thousands of cards per second can be seen in the ability to create superior speed rating which will be customizable on the fly. The decoration of the PP with this type of ratings, covering each horse individually and the race itself as as component, will be one of the next steps towards the creation of the toolkit...

DL, this area can surely can use some new approaches.

Mike

DeltaLover

07-12-2014, 12:20 PM

Here:

http://themindofagambler.com/mediawiki-1.23.1/index.php?title=Horse_Player_Toolkit

you can see an initial list of factors that I put together very quickly, I am sure that I am missing a lot.. Feel free to propose any factors you might like...

traynor

07-12-2014, 12:23 PM

I have just pushed an early release implementing a utility to import Bris PP to a mongo db.

Source code can be cloned from here:
https://github.com/deltalover/hoplato

PDF documentation can be found here:
http://www.themindofagambler.com/api.pdf

You will need python, nosetests, pip, pymongo and mongodb to run the code. You need to place your pp fileds in directories named after the year they reprsent and place them under any directory the name of which you will set to an environment variable as you can read in the documentation of importdata.py...

Also before you run nose or importdata, you need to do a python setup.py develop under the main directory..

Preferable you will need a 64 bit box since the volume of the data is very large... Let me know for any help you might need

Has anyone downloaded and set this up? If so, how much tinkering was involved, how long did it take, and how does it work? Meaning, is the data downloaded appropriately, and in a usable state?

raybo

07-12-2014, 12:38 PM

Has anyone downloaded and set this up? If so, how much tinkering was involved, how long did it take, and how does it work? Meaning, is the data downloaded appropriately, and in a usable state?

I was wondering the same things. I personally don't have the guts yet. :lol:

DeltaLover

07-12-2014, 12:39 PM

Has anyone downloaded and set this up? If so, how much tinkering was involved, how long did it take, and how does it work? Meaning, is the data downloaded appropriately, and in a usable state?

As far as I know, nobody has downloaded this library so far. For the movement I will concentrate in a very fast PP loaded and factor analyzer (written in C++ and not in python) which can be seen here:

https://github.com/deltalover/thogar

The output of this will be used from the python application (which is going to be too slow for this kind of processing and this is why I am using C++ for it)

The factors I am planning to implement for the first pass are the following:

http://themindofagambler.com/mediawiki-1.23.1/index.php?title=Horse_Player_Toolkit#Changes

We can always discuss the factors and ratings that will be used...

DeltaLover

07-12-2014, 01:22 PM

I was wondering the same things. I personally don't have the guts yet. :lol:

When I complete the current project I will post detailed instructions of how to install and run it.. It should be pretty simple and straightforward...

eurocapper

07-12-2014, 01:26 PM

Perhaps it would provide some added value to some people over other (easier to use) existing tools if it also provided Bayesian (that I know you have mentioned here) or other data mining research tools (not me though, I believe in theory and deduction over data mining). Otherwise what would one do with a database.

raybo

07-12-2014, 02:07 PM

Perhaps it would provide some added value to some people over other (easier to use) existing tools if it also provided Bayesian (that I know you have mentioned here) or other data mining research tools (not me though, I believe in theory and deduction over data mining). Otherwise what would one do with a database.

Obviously, research, "what ifs", modeling by track, distance, surface, class, etc., custom factor creation is a biggie. Lots that can be done with such a database model, but for it to be widely accepted to maximize collaboration, it must be easy for the average player to set up and operate. As we are seeing, those here who really "get it", and could offer programming and other technical help, are choosing to not participate in its development. So, most users will probably be more traditional players, with little or no programming or technical knowledge.

DeltaLover

07-12-2014, 02:20 PM

Obviously, research, "what ifs", modeling by track, distance, surface, class, etc., custom factor creation is a biggie. Lots that can be done with such a database model, but for it to be widely accepted to maximize collaboration, it must be easy for the average player to set up and operate. As we are seeing, those here who really "get it", and could offer programming and other technical help, are choosing to not participate in its development. So, most users will probably be more traditional players, with little or no programming or technical knowledge.

I am not concerned about getting technical help.. Testers, reviewers and early adapters (who do not really need to be programmers) is more helpful..

TrifectaMike

07-12-2014, 03:23 PM

Here:

http://themindofagambler.com/mediawiki-1.23.1/index.php?title=Horse_Player_Toolkit

you can see an initial list of factors that I put together very quickly, I am sure that I am missing a lot.. Feel free to propose any factors you might like...

I noticed that the majority of the horse and race factors are naturally setup for Chi-Square testing. If it is by design, very clever. If not, it is still a nice design.

Mike

DeltaLover

07-12-2014, 03:46 PM

I noticed that the majority of the horse and race factors are naturally setup for Chi-Square testing. If it is by design, very clever. If not, it is still a nice design.

Mike

Sure it is by design.

Ideally for each category each horse should only have one match. For example layoffs: it should either be layoff, second, third or more.. By following this approach we can easily apply X2 and more than this we can represent the horse as a sequence of bits (for example 111010100001) something the simplifies grouping, clustering and filtering...

raybo

07-12-2014, 04:02 PM

Sure it is by design.

Ideally for each category each horse should only have one match. For example layoffs: it should either be layoff, second, third or more.. By following this approach we can easily apply X2 and more than this we can represent the horse as a sequence of bits (for example 111010100001) something the simplifies grouping, clustering and filtering...

I used the bit sequencing( 1s and 0s) for my auto-paceline selection method in the free AllData workbook, worked quite well. Today's race had a sequence and all the horses in that race had a sequence for each of their pacelines. Just used the paceline sequence that most closely matched the race sequence, with the user being able to assign their personal priority ranking to the different categories of factors, surface, distance, recency, class, etc..

DeltaLover

07-12-2014, 04:14 PM

I used the bit sequencing( 1s and 0s) for my auto-paceline selection method in the free AllData workbook, worked quite well. Today's race had a sequence and all the horses in that race had a sequence for each of their pacelines. Just used the paceline sequence that most closely matched the race sequence, with the user being able to assign their personal priority ranking to the different categories of factors, surface, distance, recency, class, etc..

Nice.

The challenge is to find the closer match. Since the bits are correlated to some degree, you need to clean them up, removing correlated points and also favoring 'seldom' bits...

Ideally you should not rely on the user to prioritize filters; something like this is subjective and open to any possible interpretation. It should be the algorithm the one who decides about the optimal filters to use, this process can become very calculation intensive and this is why we need a very fast parser and loader to be able to go through large samples quickly, deciding the ultimate way to match. This is exactly what I will implement in the next weeks and make it available to everyone.

For anyone curious about the approach I will follow, he can read here:

http://en.wikipedia.org/wiki/Dynamic_programming

and here for a more related problem to what we try to accomplish:

http://en.wikipedia.org/wiki/Knapsack_problem

raybo

07-12-2014, 04:24 PM

I agree, I had a "default" priority that was used unless the user changed it (which was very easy for them to do). The default paceline settings proved to be better in the long run, in my testing, but the user still had the option to alter the priority if they wished.

DJofSD

07-12-2014, 04:56 PM

Nice.

The challenge is to find the closer match. Since the bits are correlated to some degree, you need to clean them up, removing correlated points and also favoring 'seldom' bits...

Ideally you should not rely on the user to prioritize filters; something like this is subjective and open to any possible interpretation. It should be the algorithm the one who decides about the optimal filters to use, this process can become very calculation intensive and this is why we need a very fast parser and loader to be able to go through large samples quickly, deciding the ultimate way to match. This is exactly what I will implement in the next weeks and make it available to everyone.

For anyone curious about the approach I will follow, he can read here:

http://en.wikipedia.org/wiki/Dynamic_programming

and here for a more related problem to what we try to accomplish:

http://en.wikipedia.org/wiki/Knapsack_problem
Egads!

Flashbacks to http://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm and the little red book.

DeltaLover

07-12-2014, 05:09 PM

Egads!

Flashbacks to http://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm and the little red book.

I am using graphs in several places in my handicapping programs, but I admit I cannot see how the maximum flow can be applied in the domain.

Related algorithms I am currently using are:

http://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm

http://en.wikipedia.org/wiki/Longest_path_problem

Mainly to detect longest cycles. As I have said before, one of the best approaches to estimated the track variant is to construct a graph where vertexes will be the dates and edges the horses and minimize the variance of the time differences..

DJofSD

07-12-2014, 06:02 PM

You took me too literally. All I tried to say is seeing the mention of the knapsack problem reminded me of classes and material in the same arena.

traynor

07-12-2014, 06:50 PM

I am sure that if anyone who is betting serious money believes this approach could be profitable, your project (and ideas) will be followed with great enthusiasm and interest.

DeltaLover

07-12-2014, 07:17 PM

You took me too literally. All I tried to say is seeing the mention of the knapsack problem reminded me of classes and material in the same arena.

ok, i see...

DeltaLover

07-24-2014, 07:10 PM

Extract from here:

http://themindofagambler.com/mediawiki-1.23.1/index.php?title=Pool_Data_Retriever

The interesting conversation we had about real time odds and the importance of pools in our betting decisions made me to change my plans about the sequence of the components to implement for the horse player's toolkit. By the time I have found a little time, I created the foundation for a real time odds retriever that can be found here:

https://github.com/deltalover/hoplato/tree/master/sample_odds_server

A primitive pilot demoing the basic abilities of this server can be seen here:

http://themindofagambler.com/hoplato/sample_odds_server/test1.html

At this point I have the following ideas about improving its functionality:

(1) Add polling to the html client for the race detail (I am already polling for the race summary)

(2) Extend the REST API to contain filtered JSON based in the needs of the user

(3) Write a historical data retriever that will store the pool to a mongo data base from the time the pool opens until post time

(4) Provide a sample mechanism to make this data available from excel or open office

(5) Design an HTML/JavaScript page to display fancy graphs for the movement of the pool etc

If anyone has any related ideas / suggestions / recomendations please let me know