ms Access database to handicap races [Archive] - Horse Racing Forum - PaceAdvantage.Com

oldman

01-24-2013, 01:30 PM

Does anyone use Microsoft access? I use Bris to download my races and access puts them in various tables. I see some use excel which would do some of what I do rank speed etc, I don’t see how it can research, the data.

Red Knave

01-24-2013, 04:09 PM

I don't use Abcess, uh I mean Access ( ;) ), but a number of PA members do. You can do a lot of research using its built-in tools.
You can also query an Access database with Excel so you can use all the math and stats functions etc. and return the results to your spreadsheet. That way your database won't get mucked up by you doing a drop or delete you didn't mean to do.

oldman

01-24-2013, 06:32 PM

Yes I know I use access programming as a hobby. The program shows contenders and possible long shots I don’t bet much. Today at Gulfstream I picked number one in the first two races in the first bombed the second was third @20-1, I only bet 5 total on three races and won 40 on a, trifecta

ranchwest

02-03-2013, 07:44 PM

You are on the correct path with a database. Spreadsheets are easier to get started, but in the long run the database is the better way to go.

sjk

02-04-2013, 05:21 AM

I have used Access for 20 years and have never found anything I was not able to do with it.

raybo

02-04-2013, 09:08 AM

If you're already proficient in Access, or any database program, that's the way to go. But, if you're not, Excel would be much easier and quicker to use, and can accomplish many of the things databases can do. Many use both.

ranchwest

02-04-2013, 09:17 PM

Spreadsheets are fine for manipulating relatively small amounts of data, but for the volume of data that would be attractive for data mining, a database is a must.

raybo

02-04-2013, 09:20 PM

Spreadsheets are fine for manipulating relatively small amounts of data, but for the volume of data that would be attractive for data mining, a database is a must.

I agree. We have a database spreadsheet that uses "advance filter", and it works fine, but with huge amounts of race data, it would probably bog down.

JJMartin

02-05-2013, 12:53 PM

Imo, you are better off working with Excel and experimenting with different calculations. Some of my calculations involve comparing horses within each race vertically and further comparing against all the races. I don't see how you would do that with a database.

Red Knave

02-05-2013, 04:40 PM

Some of my calculations involve comparing horses within each race vertically and further comparing against all the races. I don't see how you would do that with a database.
Any database could do that. Your concept of a database is incomplete. Instead of, say, 256 or 1024 rows and columns a database can have multiple thousands of rows and columns, all with whatever data elements you want, same as Excel. You can do any kind of math, logic, string manipulation etc. that you want.

Capper Al

02-05-2013, 05:17 PM

Any database could do that. Your concept of a database is incomplete. Instead of, say, 256 or 1024 rows and columns a database can have multiple thousands of rows and columns, all with whatever data elements you want, same as Excel. You can do any kind of math, logic, string manipulation etc. that you want.

How does one beyond the 255 column limit in Access 2007 without using serveral tables? I'd like to save all my BRIS Multicap files as is(about 1430 columns), but this seems impossible.

BIG49010

02-05-2013, 06:49 PM

How does one beyond the 255 column limit in Access 2007 without using serveral tables? I'd like to save all my BRIS Multicap files as is(about 1430 columns), but this seems impossible.

Depending on what your trying to do, excel would be your only option I think.

raybo

02-05-2013, 11:47 PM

How does one beyond the 255 column limit in Access 2007 without using serveral tables? I'd like to save all my BRIS Multicap files as is(about 1430 columns), but this seems impossible.

You'd have to divide the raw data file into 6 sets of data, create 6 tables of data, in Excel or some other app, then import that data into Access, into 6 separate tables, then join those tables to suit. Or something similar.

The free AllData_J1 workbook puts each horse's data on 6 separate rows in Excel, but don't know how you would get that into Access.

vegasone

02-05-2013, 11:51 PM

Why not try one of the free databases available.

raybo

02-05-2013, 11:56 PM

Why not try one of the free databases available.

Don't know of any that have 1435 columns/fields.

DeltaLover

02-06-2013, 12:00 AM

Don't know of any that have 1435 columns/fields.

try this:
http://www.mongodb.org/

(of cource you dont need 1435 fields... BY NO MEANS)

raybo

02-06-2013, 01:21 AM

try this:
http://www.mongodb.org/

(of cource you dont need 1435 fields... BY NO MEANS)

No, you don't but to do a single import to gather all the fields in a Bris file, would require an app that has at least 1435 columns/fields. If the db program you linked to has that many fields then great, but does Al already have the ability to work in it? He's working in Access now, maybe he only has the knowledge to operate in that platform, maybe he doesn't know C++, or are you suggesting he stop his current racing activities and study whatever would be needed to operate work in that app?

DeltaLover

02-06-2013, 08:32 AM

No, you don't but to do a single import to gather all the fields in a Bris file, would require an app that has at least 1435 columns/fields. If the db program you linked to has that many fields then great, but does Al already have the ability to work in it? He's working in Access now, maybe he only has the knowledge to operate in that platform, maybe he doesn't know C++, or are you suggesting he stop his current racing activities and study whatever would be needed to operate work in that app?

Raybo,

C++ is not a requirement for this task and I would not recommend it as the best tool for data research.

There is no need for a db supporting thousand of columns and the specific data base technology you are using makes very little difference.

What you need is to understand and create your data model having normalization and query efficiency in mind.

You have to know that designing a software platform consists of a sequence of decisions you are always making between abstractions and specifications.

Abstract is a design direction generalizing your approach while specific is a direction which is more aware of the specifics of the problem you are trying to solve.

As far as your data schema goes, I think you seem a little confused by the representation of the data as they appear in the DRF file. The fact these data are delivered in this format does not mean that we should maintain it as the are entering our systems. This format is convenient for information delivery but not equally good as a data schema to be used for further processing.

Just to give an example:

As we now many of these 1453 fields are used to represent similar information, like for example the last 10 races of a horse.

A good design decision here is to create new data entity (a table in SQL or a spreadsheet in excel or a flat file or whatever) which will contain the data for each past performance and will be somehow linked with the starter name and race.

By doing so you are getting rid of hundred of fields just by adding one abstraction.

The same process can be repeated until we are satisfied by the leanness and effectiveness of our schema which now will be much easier to visualize and use than before...

Take a look here you might find it interesting:

http://en.wikipedia.org/wiki/Database_normalization

BIG49010

02-06-2013, 09:30 AM

Bris has the Multi File for this purpose, who ever set it up in the beginning did a pretty good job. I have been playing with the idea of separating a Jcapper file in the same manor, for my use to save on the data costs per month.

raybo

02-06-2013, 10:05 AM

Raybo,

C++ is not a requirement for this task and I would not recommend it as the best tool for data research.

There is no need for a db supporting thousand of columns and the specific data base technology you are using makes very little difference.

What you need is to understand and create your data model having normalization and query efficiency in mind.

You have to know that designing a software platform consists of a sequence of decisions you are always making between abstractions and specifications.

Abstract is a design direction generalizing your approach while specific is a direction which is more aware of the specifics of the problem you are trying to solve.

As far as your data schema goes, I think you seem a little confused by the representation of the data as they appear in the DRF file. The fact these data are delivered in this format does not mean that we should maintain it as the are entering our systems. This format is convenient for information delivery but not equally good as a data schema to be used for further processing.

Just to give an example:

As we now many of these 1453 fields are used to represent similar information, like for example the last 10 races of a horse.

A good design decision here is to create new data entity (a table in SQL or a spreadsheet in excel or a flat file or whatever) which will contain the data for each past performance and will be somehow linked with the starter name and race.

By doing so you are getting rid of hundred of fields just by adding one abstraction.

The same process can be repeated until we are satisfied by the leanness and effectiveness of our schema which now will be much easier to visualize and use than before...

Take a look here you might find it interesting:

http://en.wikipedia.org/wiki/Database_normalization

The program you linked to is written in C++, that's why I mentioned that language. I'm sure he already knows how to use a db, but he's been operating in Access I believe.

raybo

02-06-2013, 10:08 AM

Bris has the Multi File for this purpose, who ever set it up in the beginning did a pretty good job. I have been playing with the idea of separating a Jcapper file in the same manor, for my use to save on the data costs per month.

Al would have to separate all his MultiCAPS files into several different files. MulitCAPS is a single file format data file, not a multi file format. That is the PPs Multi file.

DeltaLover

02-06-2013, 10:10 AM

Bris has the Multi File for this purpose, who ever set it up in the beginning did a pretty good job. I have been playing with the idea of separating a Jcapper file in the same manor, for my use to save on the data costs per month.

Multi file is just a brake down to a per race basis while the single one is having the whole card in one file. This has nothing to do with data normalization.

raybo

02-06-2013, 10:13 AM

Multi file is just a brake down to a per race basis while the single one is having the whole card in one file. This has nothing to do with data normalization.

The multi file is a whole card file also, it just breaks the fields into 6 categories/files, I believe, of data, never exceeding 255 fields.

DeltaLover

02-06-2013, 10:24 AM

The program you linked to is written in C++, that's why I mentioned that language. I'm sure he already knows how to use a db, but he's been operating in Access I believe.

If you are referring to mongo (or anything else including access, sqlserver, excel, whatever you can think of) it does not matter what language it is written in. This is completely transparent to you as a user of the technology.
All of todays operating systems are written in C, same applies to most low level tools, the whole idea about computing is to impose layers of abstraction where the execution is controlled by gradually more declarative paradigms.
What this means is that starting from the lower possible level which is machine language, we build layers of abstraction in top of each other, going to assembly code, C / C++, .NET / C#, SQL or even something more concrete like what is called a Domain Specific Language : http://en.wikipedia.org/wiki/Domain-specific_language

Now talking about Access and Jet in general... It is a very old technology that although can serve our purposes to some extend, there is absolutely no reason to use it today considering the plethora of open source data bases that are running cycles around it. Things like MySql, postgresql , sqllite to name a few relational data bases and mongodb, CouchDb when it comes to NoSQL databases are superior to Access by any way you are going to view it.

DeltaLover

02-06-2013, 10:25 AM

The multi file is a whole card file also, it just breaks the fields into 6 categories/files, I believe, of data, never exceeding 255 fields.

I never used it, so it could be

raybo

02-06-2013, 10:38 AM

I never used it, so it could be

My mistake, it contains 4 files, not 6. But none of them exceed the 255 field limit, which means you're not getting all 1435 fields of the single file format file. Probably most of the fields you would use though.

There are apps available that supposedly change single to multi, but have never tried any. I'v always used the single and everything I've created in Excel was designed to use those. That was the problem in the past, how to get all 1435 fields of data into Excel, or Access. Took several months of work to do it. That was all done using a data parser, until AllData NI was created, then it was all automated, within Excel, and batch processing added. We've come a long way since the data parsing days.

Capper Al

02-06-2013, 11:19 AM

try this:
http://www.mongodb.org/

(of cource you dont need 1435 fields... BY NO MEANS)

Couldn't verify the field size. It's scary to invest time in a small upstart. The advantage with ms is that they'll be around in five years. Thanks for info. This is what I am figuring out at the moment.

DeltaLover

02-06-2013, 11:25 AM

Couldn't verify the field size. It's scary to invest time in a small upstart. The advantage with ms is that they'll be around in five years. Thanks for info. This is what I am figuring out at the moment.

What exactly do you mean by verifying the field size?

As far as MS tools most of them are outdated and inferior to their open source counterparts (only exception I can see is C# which is a great static language)...

Capper Al

02-06-2013, 11:31 AM

No, you don't but to do a single import to gather all the fields in a Bris file, would require an app that has at least 1435 columns/fields. If the db program you linked to has that many fields then great, but does Al already have the ability to work in it? He's working in Access now, maybe he only has the knowledge to operate in that platform, maybe he doesn't know C++, or are you suggesting he stop his current racing activities and study whatever would be needed to operate work in that app?

Fyi -- in a past life, I was a DBA. My app is 1,200 lines of code or more written in C++. I would just like to upload my Multicaps files as is without breaking them up. I'm about to start my journey on my laptop to store information. Access would do well for everything except the raw multicaps file.

headhawg

02-06-2013, 11:32 AM

What you need is to understand and create your data model having normalization and query efficiency in mind.
.
.
.
As far as your data schema goes, I think you seem a little confused by the representation of the data as they appear in the DRF file. The fact these data are delivered in this format does not mean that we should maintain it as the are entering our systems. This format is convenient for information delivery but not equally good as a data schema to be used for further processing.
If you're going to do datamining then this info from DeltaLover is really important to understand. If the amount of data is small maybe less so; you can brute force your queries. But efficiency starts with good design.

DeltaLover

02-06-2013, 11:35 AM

I would just like to upload my Multicaps files as is without breaking them up.

Why you want to do something like this? A better apporach is to create an object model which is populated from Multicaps files who should only serve this puprose without dictating implementation directions....

raybo

02-06-2013, 11:59 AM

Why you want to do something like this? A better apporach is to create an object model which is populated from Multicaps files who should only serve this puprose without dictating implementation directions....

So, you'd somehow parse the fields in the raw data file, extracting only the data you want to populate the object with, then you would use that object, along with others I assume, to manipulate or research or whatever you need to do, calling the individual objects and joining them as needed?

raybo

02-06-2013, 12:05 PM

If someone who knows nothing, or very little, of programming language, wanted to learn a language, from the very beginning steps, and those lessons had to be free and on the internet, what would you recommend. Keeping in mind that this individual has tried to learn VB before, and failed miserably because he couldn't wrap his head around the object oriented basis of the language.

DeltaLover

02-06-2013, 12:38 PM

So, you'd somehow parse the fields in the raw data file, extracting only the data you want to populate the object with, then you would use that object, along with others I assume, to manipulate or research or whatever you need to do, calling the individual objects and joining them as needed?

sure...

Remember, our interaction with the machine is based in abstractions. The higher the abstraction the closer the 'messages' we sent to the machine are to real world analogies.

To understand this concept think a little about your excel program. From what I read you are loading all the fields in your spreadsheet and if you want for example to find the best speed rating of a starter then you somehow iterate through a specific range of cells knowing explicitly which one is containing the speed rating and you use some mechanism to locate the largest one for the starter.

This is a very 'imperative' approach where you need to be aware of too many details and your instructions must contain a lot of low level noise.

Assume now that instead of your race to be represented as a collection of cells in a spreadsheet, it was an actual 'object' providing all the needed functionality in the form of an interface. This means that now your job as a developer will be as simple as asking the race object to give you a reference to a starter object and then ask the starter object to give you the top speed figure..

Code wise this will look like this: race.starters['Rachel'].best_speed_rating.

Similarly you can have another function to load the race in your excel spreadsheet if you need something like this, always keeping a connection between your race object and your container (excel in this case) so you will be able to exchanges messages from one end to the other.

Following such an object centric design will make it possible (and easy if designed correctly) to achieve platform independence. You will be able to use any storage means (access, sql server, mongo, flat files or excel) without any coupling to the internals of your application.

I strongly encourage you and anyone else who is an amateur programmer to experiment with this type of an approach, never be reluctant to try new technologies...

For more you can read my related blog posting:

http://alogatas.wordpress.com/category/programming/

Capper Al

02-06-2013, 12:39 PM

If someone who knows nothing, or very little, of programming language, wanted to learn a language, from the very beginning steps, and those lessons had to be free and on the internet, what would you recommend. Keeping in mind that this individual has tried to learn VB before, and failed miserably because he couldn't wrap his head around the object oriented basis of the language.

I took a class on C++ at a community college. C++ outputting to Excel for displaying the data works well. Now the database is the question. I like how Access interacts with Excel and is available with Office. I look now at Oracle next. Oh, another thing about Access is that a lot of handicappers use it.

Ciao

DeltaLover

02-06-2013, 12:41 PM

If someone who knows nothing, or very little, of programming language, wanted to learn a language, from the very beginning steps, and those lessons had to be free and on the internet, what would you recommend. Keeping in mind that this individual has tried to learn VB before, and failed miserably because he couldn't wrap his head around the object oriented basis of the language.

Tonight I will make a small research and let you know.. One thing I can assure you is that there are plenty of (always FREE) resourses and this is something that you can certainly accomplish if you put the right effort!

raybo

02-06-2013, 01:15 PM

Tonight I will make a small research and let you know.. One thing I can assure you is that there are plenty of (always FREE) resourses and this is something that you can certainly accomplish if you put the right effort!

I noticed you mentioned "right effort" and not "enough effort". The amount of effort was never the problem, maybe your "right effort" is?

raybo

02-06-2013, 01:23 PM

sure...

Remember, our interaction with the machine is based in abstractions. The higher the abstraction the closer the 'messages' we sent to the machine are to real world analogies.

To understand this concept think a little about your excel program. From what I read you are loading all the fields in your spreadsheet and if you want for example to find the best speed rating of a starter then you somehow iterate through a specific range of cells knowing explicitly which one is containing the speed rating and you use some mechanism to locate the largest one for the starter.

This is a very 'imperative' approach where you need to be aware of too many details and your instructions must contain a lot of low level noise.

Assume now that instead of your race to be represented as a collection of cells in a spreadsheet, it was an actual 'object' providing all the needed functionality in the form of an interface. This means that now your job as a developer will be as simple as asking the race object to give you a reference to a starter object and then ask the starter object to give you the top speed figure..

Code wise this will look like this: race.starters['Rachel'].best_speed_rating.

Similarly you can have another function to load the race in your excel spreadsheet if you need something like this, always keeping a connection between your race object and your container (excel in this case) so you will be able to exchanges messages from one end to the other.

Following such an object centric design will make it possible (and easy if designed correctly) to achieve platform independence. You will be able to use any storage means (access, sql server, mongo, flat files or excel) without any coupling to the internals of your application.

I strongly encourage you and anyone else who is an amateur programmer to experiment with this type of an approach, never be reluctant to try new technologies...

For more you can read my related blog posting:

http://alogatas.wordpress.com/category/programming/

Code wise this will look like this: race.starters['Rachel'].best_speed_rating.

So, in the highlighted code, I assume "race" is an object, "starters" is another object containing the horses' names, among other horse data, now, the last part gets me, how does the program know what it is you're asking, or is that another object too?

JJMartin

02-06-2013, 01:44 PM

How would you write a query that would return the difference in the average speed for the field between the 2nd race back and 3rd race back claiming races only when the lead horse is no more than 5 lngths ahead at the str and at the finish and then what the %strike rate is for the highest speed under those conditions for every race in the db ? lmao

DeltaLover

02-06-2013, 01:45 PM

Code wise this will look like this: race.starters['Rachel'].best_speed_rating.

So, in the highlighted code, I assume "race" is an object, "starters" is another object containing the horses' names, among other horse data, now, the last part gets me, how does the program know what it is you're asking, or is that another object too?

Very good!

You can thing of the race as an object containing other objects called horses.
In our example starters could the array holding the object horses while the ['Rachel'] is a mechanism to find a specific horse object using an id (which in this case is the name of the horse). Finally best_speed_rating is an attribute of the horse object representing some internal information of it, that we do not really know not even care to know where exactly leaves and how it is populated!

DeltaLover

02-06-2013, 01:47 PM

How would you write a query that would return the difference in the average speed for the field between the 2nd race back and 3rd race back claiming races only when the lead horse is no more than 5 lngths ahead at the str and at the finish? lmao

Personally I would not mess with SQL code to do this.

What you are descibing is a simple handicapping factor that can easily be descibed in a language like python or C#.

There is a great deal of convinience gain folllowing this approach.

raybo

02-06-2013, 01:56 PM

Very good!

You can thing of the race as an object containing other objects called horses.
In our example starters could the array holding the object horses while the ['Rachel'] is a mechanism to find a specific horse object using an id (which in this case is the name of the horse). Finally best_speed_rating is an attribute of the horse object representing some internal information of it, that we do not really know not even care to know where exactly leaves and how it is populated!

Ok, so the "best speed rating" attribute is just the name of one of many fields in horses object/table? Or, must you join several fields, somehow first using another language like C++ or Python, or calculate it in an application like Excel, making it an individual composite factor?

DeltaLover

02-06-2013, 02:04 PM

Exactly... Your horse object will provide a contract to each environment so other object can exchange data (attributes) with it and also send 'messages' (calling a function)

for example a horse object might comply to the following interface:

starter:
# attributes
name: string
jockey: string
trainer: string
age: int
prime_power: int

# methods
is_first_time_out() : bool
tries_turf_first_time() : bool
get_real_time_odds() : double

while the race can be something like this:

race:
date: string
track: string
race_number: int
starters: starter[] #array of starter

JJMartin

02-06-2013, 02:22 PM

Personally I would not mess with SQL code to do this.

What you are descibing is a simple handicapping factor that can easily be descibed in a language like python or C#.

There is a great deal of convinience gain folllowing this approach.

Although I admit not having any experience in doing queries within a db, I am sure my example query could be done as you suggested. The issue with me really is not whether a certain complex calculation can be done or not. It is rather that a db imo, doesn't provide a stimulating environment for the type of insight required to perform the types of analysis needed to gain an edge in profitability. Because I use Excel extensively and the way that I use it, I am able to be immersed in the data in a way that allows me to view subtle differences between each entry in the field, between the races and so on. So i might be biased in the use of a spreadsheet but the types of queries I've seen other people perform here and there with a large db are always of a very static nature imo. Things like how often does the #1 speed win or dropping in class combined with this or that factor or trying to recreate all the conditions of a race and compare it historically never really amount to anything of practical use. Sure you might find a spot play where you may be profitable at the end of the year but at the expense of finding such a play once every thousand races. That being said, the pp's data alone are really not enough because they don't show you a big enough picture and nothing shows you the whole picture. But, that's just my opinion, maybe I'm wrong.

DeltaLover

02-06-2013, 02:34 PM

Things like how often does the #1 speed win or dropping in class combined with this or that factor or trying to recreate all the conditions of a race and compare it historically never really amount to anything of practical use.

Your thought process is certainly correct. The technology you are using is not that important as the algorithm is.

Having the proper methodology to make this type of a judgement is the main objective. Everything else is an implementation detail.

To see more about my approach to your handicapping dilemma you read this posting in my blog:

http://alogatas.wordpress.com/2013/02/06/dont-be-fooled-by-da-hoss/

Red Knave

02-07-2013, 04:22 PM

Now the database is the question. I like how Access interacts with Excel and is available with Office. I look now at Oracle next.
Going from Access to Oracle is like going from a tricycle to a fighter jet.
They can both get you around but they differ in the training required to operate them as well as other things (and yes, I like to be dramatic :) ). I assume you are looking at the Standard Edition One. It's a lot of db power for a pretty reasonable cost but they get you in annual maintenance fees.
Microsoft has a 'free to use' version of SQL Server called Express. It might suit your purpose. If you set up your db properly you can get at it with Excel. The only limit as far as I am aware is that the database maximum size is 10GB, still pretty big. I *think* it has built-in tools to allow upgrading from Access. If not it isn't that hard. It's what we use to use to test and prototype with before moving software to clients that had MSSQL.

Red Knave

02-07-2013, 05:21 PM

How would you write a query that would return the difference in the average speed for the field between the 2nd race back and 3rd race back claiming races only when the lead horse is no more than 5 lngths ahead at the str and at the finish and then what the %strike rate is for the highest speed under those conditions for every race in the db ? lmao
As DeltaLover says, this is trivial for virtually any database software.

The differences between spreadsheets and databases are many but the concept behind them is similar, rows and columns of data. For databases, cells or data elements have morphed into much more abstract types of information; objects, graphics, other databases and who knows what else. Spreadsheets are excellent for presenting or visualizing limited sets of information. With a database you have few limits on the size of the dataset but you need to do the work to get the data into a presentable, visual format.

Your explanation that you like to see the subtle differences between current contenders makes me think that you misunderstand how others use a database. When I do it, I have today's information in front of me and I can compare each entrant to any other entrant in this race and/or against any entrant in any other race in the last 10 years and/or against any entrant 'like' that one in any race 'like' this one etc. etc. etc.

When you only have a hammer, everything looks enough like a nail that you can't see that you may get better results using an impact gun. Try it, you might like it. :)

DeltaLover

02-07-2013, 05:37 PM

When you only have a hammer, everything looks enough like a nail :ThmbUp: :ThmbUp:

traynor

02-07-2013, 08:28 PM

If you're going to do datamining then this info from DeltaLover is really important to understand. If the amount of data is small maybe less so; you can brute force your queries. But efficiency starts with good design.

Of all the mistakes it is possible to make in designing or building an application, poor initial design of the database is probably the most frustrating (and the most costly and time-consuming to correct later). It would be well to pay attention to what DeltaLover and headhawg are saying. It is not frivolous.

DeltaLover

02-07-2013, 09:42 PM

Of all the mistakes it is possible to make in designing or building an application, poor initial design of the database is probably the most frustrating (and the most costly and time-consuming to correct later). It would be well to pay attention to what DeltaLover and headhawg are saying. It is not frivolous.

This is a statement that cannot be overemphasized.

A wrong decision we make during early stages of our development will soon become an significant burden to the whole platform as it tends to appear in some many components that it is very time consuming to get rid off.

Start the development of a system one of the most fundamental directions we need to decide upon is the functionality of the data tier. We can make the platform data base or middle tier centric, a judgment call that I think is the most important for any kind of a system.

A data base centric platform is one that not only implements the Data Model using a relational data base but it also using higher level mechanisms (like stored procedures, functions, triggers etc) to define business logic and rules. Following this model the rest of the application, essentially becomes an implementation detail revolving around the data base feeding and requesting from it with data.

The other direction we can take is to use the data base as a relatively thin layer of data storage keeping it completely ignorant of business rules who will be implemented in an object oriented middle tier which will serve as the proxy between clients and data.

Whenever I can I always select the latter architecture.

Although SQL represents a great DSL paradigm, excelling in its declarative and expressive features its higher level abstractions, namely stored procedures and functions remain class citizens clearly inferior to any general purpose modern programming language like python or C++ for example.

Concepts like object orientation, polymorphic behavior, lambdas, iterators, generic programming are not supported by SQL, meaning that the implementor is limited to very primitive decisions that later will be reflected to the application as inflexibility, rigidness. Such an application will rapidly exceed its evolutionary capabilities and it will require a very radical reverse engineering to adopt to new needs.

The alternative will be to implement an Object Model as a client of the data base and server of the end clients. Following this approach the data base's role is diminished to the point that it can easily be replaced by a completely different data base engine (for example moving from SQL Server to Oracle) or to other means of data storage (like flat files, xml or NoSQL) in transparent mode to the rest of the ecosystem.

This direction is well perceived by modern frameworks and usually implemented by ORMs which serve as technology neutral abstraction proxies between the data and business layer making the implementation of very sophisticated platforms possible with minimal effort and risk.

vegasone

02-07-2013, 10:50 PM

There is a hard limit of 4096 columns per table, but the effective maximum may be less for a given table. The exact limit depends on several interacting factors.

Every table (regardless of storage engine) has a maximum row size of 65,535 bytes. Storage engines may place additional constraints on this limit, reducing the effective maximum row size.

MySQL 5.0 Reference Manual

http://dev.mysql.com/doc/refman/5.0/en/column-count-limit.html

traynor

02-07-2013, 11:20 PM

This is a statement that cannot be overemphasized.

A wrong decision we make during early stages of our development will soon become an significant burden to the whole platform as it tends to appear in some many components that it is very time consuming to get rid off.

Start the development of a system one of the most fundamental directions we need to decide upon is the functionality of the data tier. We can make the platform data base or middle tier centric, a judgment call that I think is the most important for any kind of a system.

A data base centric platform is one that not only implements the Data Model using a relational data base but it also using higher level mechanisms (like stored procedures, functions, triggers etc) to define business logic and rules. Following this model the rest of the application, essentially becomes an implementation detail revolving around the data base feeding and requesting from it with data.

The other direction we can take is to use the data base as a relatively thin layer of data storage keeping it completely ignorant of business rules who will be implemented in an object oriented middle tier which will serve as the proxy between clients and data.

Whenever I can I always select the latter architecture.

Although SQL represents a great DSL paradigm, excelling in its declarative and expressive features its higher level abstractions, namely stored procedures and functions remain class citizens clearly inferior to any general purpose modern programming language like python or C++ for example.

Concepts like object orientation, polymorphic behavior, lambdas, iterators, generic programming are not supported by SQL, meaning that the implementor is limited to very primitive decisions that later will be reflected to the application as inflexibility, rigidness. Such an application will rapidly exceed its evolutionary capabilities and it will require a very radical reverse engineering to adopt to new needs.

The alternative will be to implement an Object Model as a client of the data base and server of the end clients. Following this approach the data base's role is diminished to the point that it can easily be replaced by a completely different data base engine (for example moving from SQL Server to Oracle) or to other means of data storage (like flat files, xml or NoSQL) in transparent mode to the rest of the ecosystem.

This direction is well perceived by modern frameworks and usually implemented by ORMs which serve as technology neutral abstraction proxies between the data and business layer making the implementation of very sophisticated platforms possible with minimal effort and risk.

I could not agree more. I think it is short-sighted to go the reverse direction, piling stuff into a database just to amass a bunch of data, and then trying to devise the queries to manipulate that data to create meaningful information. Databases do well with relatively stable data that persists over long periods, less well with data that is continually being searched in new and different ways to uncover new and different insights about that data.

Considering the relative capabilities of a programming language and SQL, it is not so much that it cannot be done in SQL as that it is clumsy and cumbersome to do things beyond the simplistic. I understand that hardcore Access, SQL Server, and Excel users may have incredibly complex stored procedures or formulas developed that do (what they believe to be) complex things, but those same complex things can be done (and much more) with relatively simple programming in a decent programming language.

For anyone serious about handicapping horse races, the time spent writing SQL queries and Excel formulas would be much better spent learning to code basic applications in Python, Java, or C#. And the more you learn, the more you can do.

traynor

02-07-2013, 11:30 PM

There is a hard limit of 4096 columns per table, but the effective maximum may be less for a given table. The exact limit depends on several interacting factors.

Every table (regardless of storage engine) has a maximum row size of 65,535 bytes. Storage engines may place additional constraints on this limit, reducing the effective maximum row size.

MySQL 5.0 Reference Manual

http://dev.mysql.com/doc/refman/5.0/en/column-count-limit.html

A good argument can be made for DeltaLovers suggestion that the database be used as essentially a storage container (what is designed for) rather than being duct taped, hose clamped, and glued together as a combination of storage container and information generator/data manipulator(something it is not designed for in anything more than rudimentary processes). The same argument could be made against the use of spreadsheets--good for storage and simple manipulations, but definitely NOT the best tool for analyzing races.

JJMartin

02-07-2013, 11:41 PM

As DeltaLover says, this is trivial for virtually any database software.

The differences between spreadsheets and databases are many but the concept behind them is similar, rows and columns of data. For databases, cells or data elements have morphed into much more abstract types of information; objects, graphics, other databases and who knows what else. Spreadsheets are excellent for presenting or visualizing limited sets of information. With a database you have few limits on the size of the dataset but you need to do the work to get the data into a presentable, visual format.

Your explanation that you like to see the subtle differences between current contenders makes me think that you misunderstand how others use a database. When I do it, I have today's information in front of me and I can compare each entrant to any other entrant in this race and/or against any entrant in any other race in the last 10 years and/or against any entrant 'like' that one in any race 'like' this one etc. etc. etc.

When you only have a hammer, everything looks enough like a nail that you can't see that you may get better results using an impact gun. Try it, you might like it. :)

Ok, it seems you are under the impression of under-estimating what my skills are so let me explain it another way. First of all I have nothing against using a db, I've been wanting to try it for a while and I've been using Excel for horse racing for over 8 years. I really don't believe a db environment (imo) is conducive to the stimulation of the creativity needed for the handicapping factor creation process and I speak for myself. Its not so much a matter of what a db can do versus what a spreadsheet can do but more a question of what types of ideas are you generating and implementing and what is the vehicle, the inspiration for that generation process.

In the beginning as a novice, I looked at what would be considered really obvious questions in handicapping: Will a single factor such as betting the fastest speed horse be profitable by itself? How about combining this factor with that factor, how about 3 or 4 factors? The reality is each single factor (long term) is a loser in ROI, not a particular revelation on this forum. Combining losing factors no matter how you do it, will not magically create winning ones. You have to somehow make your own proprietary factors that don't exist publicly and test them. This requires a lot of ingenuity and mental process. I suppose some ideas could be carried out with a db but an increasing level of complexity will reach a limit. I am pretty confident in saying at least in my experience, that with Excel you would probably not reach such a limit. I make my own db's with Excel although smaller and several of them. My school of thought is that if something doesn't work in a small batch of races, like 1000-2000, then I really don't care if it works in 50,000, its not worth doing. Your factors should be solid and effective enough to work often and with some degree of reasonable predictability.

I make profiles for each track. When I'm working in Excel I have all the columns easily accessible and ready to experiment with. I can put together extremely complex formulas in minutes because I've been doing it so long. Beyond the 1435 bris columns I have an additional 108 of my own columns with calculations. Some of the formula's in just one cell have over 30 nested 'IF' statements. Trying to recreate what the formula's do in a db query would be a nightmare. The query creation process is not flexible enough to handle intricate handicapping factors in a practical and timely way and the amount of convoluted lines of code required would cause an undesirable mass of confusion. Before I receive another response saying anything can be done in a db just remember I said in a practical and timely way for complex calculations. If you are content with your level of analysis with a db, that's good for you but frankly you have no idea what I do in Excel. At the risk of sounding a bit pretentious, the type of things I do with formula's and macros in Excel would make your head explode and I say that with respect. I also do web queries within Excel (Import data) and scrape info from sites on the internet that automatically integrate the imported data directly into my files which trigger calculations with a series of macros for automation. I can add live odds with auto scratch sensor for late scratches for live betting, all with formulas. I've built elaborate spreadsheets for Betfair also with autobet triggers based on preset criteria. All that being said, I can see how certain types of more simple queries in a db would be desirable from time to time and I would like to try it eventually. I looked at Access a couple of times but I really don't like it. So again, I don't question the value and capability of db power, I just don't find the environment conducive to creativity in factor creation and analysis.--Just my opinion of course and I'm not trying to impose or imply anything.

One last thing I should emphasize is, the amount of results an individual gets with this type of work will be commensurate to their individual input and abilities. Dedication and personal ability cannot be taught. Learning Excel is a task in itself, not hard but still requiring a desire to learn and time. After developing some skills, you can focus on the handicapping and don't forget to look outside for what you can't find in the pp's files. If after some time you still keep losing and you want to give up, I don't blame you its a hard game.

DeltaLover

02-08-2013, 12:31 AM

Combining losing factors no matter how you do it, will not magically create winning ones.

I disagree..

traynor

02-08-2013, 12:32 AM

Ok, it seems you are under the impression of under-estimating what my skills are so let me explain it another way. First of all I have nothing against using a db, I've been wanting to try it for a while and I've been using Excel for horse racing for over 8 years. I really don't believe a db environment (imo) is conducive to the stimulation of the creativity needed for the handicapping factor creation process and I speak for myself. Its not so much a matter of what a db can do versus what a spreadsheet can do but more a question of what types of ideas are you generating and implementing and what is the vehicle, the inspiration for that generation process.

In the beginning as a novice, I looked at what would be considered really obvious questions in handicapping: Will a single factor such as betting the fastest speed horse be profitable by itself? How about combining this factor with that factor, how about 3 or 4 factors? The reality is each single factor (long term) is a loser in ROI, not a particular revelation on this forum. Combining losing factors no matter how you do it, will not magically create winning ones. You have to somehow make your own proprietary factors that don't exist publicly and test them. This requires a lot of ingenuity and mental process. I suppose some ideas could be carried out with a db but an increasing level of complexity will reach a limit. I am pretty confident in saying at least in my experience, that with Excel you would probably not reach such a limit. I make my own db's with Excel although smaller and several of them. My school of thought is that if something doesn't work in a small batch of races, like 1000-2000, then I really don't care if it works in 50,000, its not worth doing. Your factors should be solid and effective enough to work often and with some degree of reasonable predictability.

I make profiles for each track. When I'm working in Excel I have all the columns easily accessible and ready to experiment with. I can put together extremely complex formulas in minutes because I've been doing it so long. Beyond the 1435 bris columns I have an additional 108 of my own columns with calculations. Some of the formula's in just one cell have over 30 nested 'IF' statements. Trying to recreate what the formula's do in a db query would be a nightmare. The query creation process is not flexible enough to handle intricate handicapping factors in a practical and timely way and the amount of convoluted lines of code required would cause an undesirable mass of confusion. Before I receive another response saying anything can be done in a db just remember I said in a practical and timely way for complex calculations. If you are content with your level of analysis with a db, that's good for you but frankly you have no idea what I do in Excel. At the risk of sounding a bit pretentious, the type of things I do with formula's and macros in Excel would make your head explode and I say that with respect. I also do web queries within Excel (Import data) and scrape info from sites on the internet that automatically integrate the imported data directly into my files which trigger calculations with a series of macros for automation. I can add live odds with auto scratch sensor for late scratches for live betting, all with formulas. I've built elaborate spreadsheets for Betfair also with autobet triggers based on preset criteria. All that being said, I can see how certain types of more simple queries in a db would be desirable from time to time and I would like to try it eventually. I looked at Access a couple of times but I really don't like it. So again, I don't question the value and capability of db power, I just don't find the environment conducive to creativity in factor creation and analysis.--Just my opinion of course and I'm not trying to impose or imply anything.

One last thing I should emphasize is, the amount of results an individual gets with this type of work will be commensurate to their individual input and abilities. Dedication and personal ability cannot be taught. Learning Excel is a task in itself, not hard but still requiring a desire to learn and time. After developing some skills, you can focus on the handicapping and don't forget to look outside for what you can't find in the pp's files. If after some time you still keep losing and you want to give up, I don't blame you its a hard game.

What you have accomplished with Excel is truly something to be proud of, but that is the result of your effort and work--not Excel. You could more than likely have done the same thing in Python or C# with much less work, as well as having the potential to do considerably more.

DeltaLover

02-08-2013, 12:40 AM

What you have accomplished with Excel is truly something to be proud of, but that is the result of your effort and work--not Excel. You could more than likely have done the same thing in Python or C# with much less work, as well as having the potential to do considerably more.

Exactly... What seems so complicated and extreme using excel can be greatly simplified with a deeper understanding of how to write your application. What you are describing here is like trying to cross Atlantic Ocean swiming... There are much better alternatives you can try... As traynor says, try to learn a different way of thinking when it comes to programming, use a modern programming language like what is suggested and everything will look much simpler...

traynor

02-08-2013, 12:41 AM

I disagree..

Yes. "Losing" factor A and "losing" factor B in combination AB, with another "losing" factor C, in combination ABC, may be highly predictive as well as profitable. That is something you will never know unless you extensively test the specific combinations of AB and ABC--in conjunction with the presence or absence of DEFGHIJKLMN, etc.--regarding each "losing" or "winning" factor combination as separate and discrete.

One of the reasons handicappers lose so often is that they try to add together things--"good" or "bad"--that are individually considered good or bad, but form something completely different when combined. Considering only single factors and simplistic combinations of several factors while ignoring a plethora of confounding variables is not useful for serious race analysis.

vegasone

02-08-2013, 12:49 AM

Excel is essentially a database with lots of built in formulaes and queries as well as being able to custom write your own.

Any current decent database, free or otherwise gives you the freedom and expandability to do as much and more than excel. There are lots of front end options available to get the information out in any way imaginable. You can store all data and create other databases out of your primary to focus on certain variables etc.
You can also use excel to pull your variables into a spreadsheet if you want etc.
Your imagination and skill are your only limits.

DeltaLover

02-08-2013, 01:06 AM

If someone who knows nothing, or very little, of programming language, wanted to learn a language, from the very beginning steps, and those lessons had to be free and on the internet, what would you recommend. Keeping in mind that this individual has tried to learn VB before, and failed miserably because he couldn't wrap his head around the object oriented basis of the language.

I am congratulating your intention to learn more about computer programming and I assure you that you can do it.

You mentioned that you have try VB before but you did not really like it... That's fine don't wary, there are better ways to go around..

Let me tell you here, that while learning you must forget for awhile the domain you are trying to develop software and focus only to programming... This is going to accelerate your learning curve and allow you to get more complete education on the craft of software development.

The introduction I consider to be the best (and I have suggest it even to intermediate programmers who used to work for me in the past) is based in the monumental book of computer science:

Structure and Interpretation of Computer Programs

http://www.amazon.com/Structure-Interpretation-Computer-Programs-Edition/dp/0070004846/ref=sr_1_1?ie=UTF8&qid=1360302683&sr=8-1&keywords=structure+and+interpretation+of+computer+ programs

You can find the book in pdf format for free under this link:

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&sqi=2&ved=0CDIQFjAA&url=http%3A%2F%2Fdeptinfo.unice.fr%2F~roy%2Fsicp.p df&ei=epIUUdvsK8fC0AHFv4G4Bg&usg=AFQjCNHq1z2ocM5kh8XHnlJcZqlK3ByOrQ&sig2=5o0hVgfhbVDONoB9IGaQ8Q&bvm=bv.42080656,d.dmQ

On youtube you can find the whole MIT course video taped:

http://www.youtube.com/watch?v=2Op3QLzMgSY

Here you can find a very good free Scheme interpreter:

http://racket-lang.org/

that you can use while you study the book...

Again while studying, never think for a moment that the knowledge you are getting is useless.. This is the worst mistake you can make... I assure you everything in this book will eventually become useful to you as you will be maturing as a programmer...

After completing this course I would suggest downloading python (which of course if free as most of its classical texts are) and start learning it... Its learning curve is not steep, at least to start be productive to some degree... Of course the simplicity of the language is deceiving as it presents extremely advanced capabilities but you will not immediately need to be aware about them, they will come gradually... With python you will be able to access any database you want, to control open office to any level you want, to read real time feeds and do pretty much anything you can imagine... You can even start using HADOOP since Mr TRAYNOR finds it usefull :)

When it comes to python you tube is full of videos (since it is one of google's favorite technologies):

For example:

http://www.youtube.com/watch?v=tKTZoB2Vjuk

traynor

02-08-2013, 03:20 AM

Excel is essentially a database with lots of built in formulaes and queries as well as being able to custom write your own.

Any current decent database, free or otherwise gives you the freedom and expandability to do as much and more than excel. There are lots of front end options available to get the information out in any way imaginable. You can store all data and create other databases out of your primary to focus on certain variables etc.
You can also use excel to pull your variables into a spreadsheet if you want etc.
Your imagination and skill are your only limits.

It is true that Excel has a lot of built-in stuff. What many fail to understand (and thereby make things way more complex than they need to be) is that much of the functionality of Excel is in the dll's (dynamic link libraries)

http://en.wikipedia.org/wiki/Dynamic-link_library

that are available with a few simple instructions in VB, C#, etc. Other languages have similar libraries. Rather than doing a whole lot of coding, many of the functions of Excel can be executed directly (in a programming language) with very little code.

The basics of programming are relatively simple. Learn the basics, then learn a few built-in function calls. If programming seems complex, start off with:
http://www.math.ucdavis.edu/~exploration/mme/2012/byteOfPython.pdf

and then:
http://www.greenteapress.com/thinkpython/thinkpython.pdf

In a couple of hours you will be coding your own functions, creating and sorting arrays, parsing strings, and a world of other things that will amaze you.

Most people are able to start coding in Python almost immediately. The basics are easy, but it can do almost anything. Easier, faster, simpler, and way more elegantly than Excel.

If you want to build your own user interface, rather than looking at that horrible Excel columns and rows nonsense, download the NetBeans IDE for Python:
http://dlc.sun.com.edgesuite.net/netbeans/6.7/python/ea2/

There are many other ways to learn programming. The process above is one of the very best. The difficulties in learning to program are not in learning to program--they are in getting past the nonsense that programming instructors who barely understand the topic themselves try to teach to people who don't know enough to realize that the instructors are not much more skilled than they are, and are mostly faking it. Unfortunately, many of those instructors are teaching introductory programming classes, and are the cause of many people abandoning any interest they may have had in learning to program.

Capper Al

02-08-2013, 09:22 AM

Sartin was a believer in the last 20 races to tag a trend. This would render a database as useless. I think this would be extreme. But having been a DBA and a programmer and developer of handicaping software, I agree with those using Excel first. A database is just a collection. If it is good data, good for you. Beware if it isn't. Too many handicappers believe that they can uncover handicapping truths with a database. It's not so. All a database can do for is help you confirm your beliefs and provide a reference and refine (not define) your system. That's why I waited until now before considering a database.

Red Knave

02-08-2013, 09:51 AM

Ok, it seems you are under the impression of under-estimating what my skills are Sorry if it came across that way. I see that Traynor and Delta Lover have covered the idea much better than I did. Actually I'm in awe of how you and guys like raybo can wring information out of Excel.

I suppose some ideas could be carried out with a db but an increasing level of complexity will reach a limit. I am pretty confident in saying at least in my experience, that with Excel you would probably not reach such a limit. I think the opposite is true, that you will reach the limit sooner with Excel.
And, you're right, I don't know what you do in Excel but I do know that some time in the future you will come up against a problem that you can't solve with Excel or at least not to your complete satisfaction. I'd actually bet that you have had a few moments like that already since you have already looked into Access.
All I was trying to say is that it seemed that you had blinkers on when it came to databases. That there was no way a db could do anything better than Excel. I don't happen to agree.

Tom

02-08-2013, 09:56 AM

I find lots of stuff that is new from my dbs.
I use the HTR robot to find out what is winning and where really quick.
Saturday morning, over a tasty Westrock, I run the robot, find out what tracks are "having sales" and zoom in on those. No need to be wed any track anymore.

I use Access to uncover other things that are not just confirming anything, but all new stuff. There are not enough hours in the day to make queries.

JJMartin

02-08-2013, 10:00 AM

Sorry if it came across that way. I see that Traynor and Delta Lover have covered the idea much better than I did. Actually I'm in awe of how you and guys like raybo can wring information out of Excel.

I think the opposite is true, that you will reach the limit sooner with Excel.
And, you're right, I don't know what you do in Excel but I do know that some time in the future you will come up against a problem that you can't solve with Excel or at least not to your complete satisfaction. I'd actually bet that you have had a few moments like that already since you have already looked into Access.
All I was trying to say is that it seemed that you had blinkers on when it came to databases. That there was no way a db could do anything better than Excel. I don't happen to agree.
I know what you were trying to say but I think we are emphasizing 2 different points here. My point is not whether or not Excel can do something more than a db but rather that the atmosphere in Excel is more productive in the goal of race factor creation. And frankly some things are better in Excel. Lets just call it 2 different tools for different ideas.

Capper Al

02-08-2013, 10:09 AM

I find lots of stuff that is new from my dbs.
I use the HTR robot to find out what is winning and where really quick.
Saturday morning, over a tasty Westrock, I run the robot, find out what tracks are "having sales" and zoom in on those. No need to be wed any track anymore.

I use Access to uncover other things that are not just confirming anything, but all new stuff. There are not enough hours in the day to make queries.

No doubt that database queries can help. It's just part of the research. The whys aren't found there.

JJMartin

02-08-2013, 10:30 AM

Traynor
Most people are able to start coding in Python almost immediately. The basics are easy, but it can do almost anything. Easier, faster, simpler, and way more elegantly than Excel.

Excel uses VB, which I code in. What you're suggesting is that I abandon using VB and start learning and using something else like Python. When something better than Python emerges in the future, should we abandon Python and start using that also? Here's the thing, I don't have time for all that, I'm not a software developer. I don't want to keep re-inventing the wheel. But that's just me, I have nothing against learning new skills, it should be encouraged as you do.

...horrible Excel columns and rows nonsense...

LOL!! You are making a judgment on the appearance of Excel versus a custom made software. For me, the horrible Excel rows and columns puts your face right in the raw data that has served as the inspiration to gain insight on how to analyze it. I'm not really concerned with the appearance at this point. Raybo has done a good job in his work for appearance. Which software do you like for its appearance and useability?

DeltaLover

02-08-2013, 10:51 AM

Sartin was a believer in the last 20 races to tag a trend. This would render a database as useless. I think this would be extreme.

I am not sure about what exactly you mean here.. What is that makes the db useless assuming the last 20 races are tagging a trend? If anything it is the data base that should be used to make this decision.

Please note that I am not sure about what you mean when you are referring to db use. In my mind higher level conclusions are driven by some sort of a middle tier which adds a processing level to the raw data..

I also can make a point about the phrase 'Sartin was a believer'. The reason we are doing research is exactly to get rid of the need for beliefs, this is one of the reasons I consider Sartin's work to some expression of a pseudo science where arbitrary concepts are used to derive conclusions with no substantial evidential backup.

All a database can do for is help you confirm your beliefs and provide a reference and refine (not define) your system. That's why I waited until now before considering a database.

I don't agree.

Using your definition of a data base as a data collection, then a db itself, cannot help you in nothing else than as a means to store and retrieve data. To reach any time of confirmation you will need higher level components encapsulating the algorithmic processing of the data to convert them to useful metrics.

More than this, nothing is further from the reality than a statement line 'All a database can do for is help you confirm your beliefs' .. This might have been the case in the very early stages of the computing era, but in our days 'beliefs' are shaped by the machine in a completely automatic and I dare to say unpredicted way.

The data base 'per se' should be viewed as a small detail of the complete ecosystem. It should be used in places where it excels, as for example in data models that can be expressed in a relational form while other technologies should be used for other types of data.

Excel or any other spreadsheet should also be user where it excels, which is mostly in early research phases where a lot of numerical calculations will be needed to understand a model.

Excel is not a relational database and was never meant to be one. It does have some characteristics of a db but this not what its core responsibility should be. If you really like it you can always use it as a front end to a RDMS.

By the same token, a RDBMS does not excel as a middle tier implementor; yes, it does have some features that can be used for something like this but this layer can be implemented much more efficiently using general programming languages...

We should always try to use the right tool for the job, trying to minimize dependencies to technologies use meanwhile avoid the get married to any specific programming language, data base, or anything else. Our solution should be as more platform and technology id independent as possible..

sjk

02-08-2013, 10:53 AM

I believe that those of you who have never used Access or another db underestimate its capabilities. I have a black box Access program that does all the work and which has afforded me good results over tens of thousands of races bet with real money.

Don't just think in terms of what you can do with a few tables and queries but rather what can be done with hundreds of tables and hundreds of queries.

I am sure I could have done similar things in a different programming environment but Access has served me well. Having learned through the handicapping program I have used it for lots of applications in my work.

DeltaLover

02-08-2013, 10:57 AM

When something better than Python emerges in the future, should we abandon Python and start using that also?

This is absolutely correct.

It is not that bad as it sounds though... Learning python (which I strongly suggest) will help you to realize things that you cannot see right now.. From several perspectives is a much better language than VB, the problem is that you can not understand it right now since you only now one specific paradigm of programing...

You will become a better programmer and you will be able to learn new languages easier than before...

You have to try...

JJMartin

02-08-2013, 11:08 AM

This is absolutely correct.

It is not that bad as it sounds though... Learning python (which I strongly suggest) will help you to realize things that you cannot see right now.. From several perspectives is a much better language than VB, the problem is that you can not understand it right now since you only now one specific paradigm of programing...

You will become a better programmer and you will be able to learn new languages easier than before...

You have to try...

Since you are acquainted with Python, I have read that Python slows Excel down due to interpretation process, what are your thoughts on this?

Capper Al

02-08-2013, 11:23 AM

Delta,

What I'm objecting to is that technology will provide the answer. It doesn't. It finds correlations and the handicapper has to figure out what to do with it.

DeltaLover

02-08-2013, 11:30 AM

Since you are acquainted with Python, I have read that Python slows Excel down due to interpretation process, what are your thoughts on this?

Martin,

python have been my tool of choice during the last years; of course my toolkit contains many more technologies so as I am in position to make the correct judgement calls when it comes to selecting the proper tool for the job. For me the specific technology I am using has the same significance as a telescope has to the astronomer. What really makes the difference is the algorithmic design while I see anything else as a mere implementation detail..

You are asking me about performance issues when using python with excel. I have to admit that this is something I have never tried before; I am completely opposed to proprietary software thus excel would be among the applications I will try to avoid at any cost. For my few spreadsheet needs I use Open Office and I have used python with it although not very extensively.

Since your question is about performance and python I can give you a more generic answer that probably will be on topic. It is true that python as any other 'scripting' environment will always be inferior performance-wise to a compiled language like C and mostly this has been the reason that this family of languages did not have a wide audience until recently when modern computers provide the necessary processing power to overcome this disadvantage. A casual reference to the history of computing will easily convince you about the truth of this statement.

More than this python is not meant to present the most efficient programming language in terms of speed of execution. Where it really excels is in expressibility and flexibility. The good thing with it that having close relation with C, makes it very easy to implement lower level chunks of code that are really performance critical, directly in C and easily embed them as first class python citizens to any program you need.

traynor

02-08-2013, 11:31 AM

Excel uses VB, which I code in. What you're suggesting is that I abandon using VB and start learning and using something else like Python. When something better than Python emerges in the future, should we abandon Python and start using that also? Here's the thing, I don't have time for all that, I'm not a software developer. I don't want to keep re-inventing the wheel. But that's just me, I have nothing against learning new skills, it should be encouraged as you do.

LOL!! You are making a judgment on the appearance of Excel versus a custom made software. For me, the horrible Excel rows and columns puts your face right in the raw data that has served as the inspiration to gain insight on how to analyze it. I'm not really concerned with the appearance at this point. Raybo has done a good job in his work for appearance. Which software do you like for its appearance and useability?

If you are already coding in VB, there is no reason at all to learn Python. VB is great, especially using the Visual Studio IDE. The problem with starting out in Excel is that it is initially simple, but quickly becomes clunky. That is, for more than simple functions, the amount of coding in Excel becomes progressively more convoluted and complex compared to coding the same thing (for example) in straight VB or Python. Sticking VB or VB code blocks in Excel is fine, as long as what you are doing is not overly complex. When you get to a point that you really want to look at your data, a spreadsheet is NOT the best way to do it.

Appearance is subjective. I design my own GUIs (graphical user interface) so I can pick and choose exactly what I want to see where, and how I want it displayed.

It is not so much that I am "against" Access or Excel as that both are designed to do very simple business processes. Anything more complex may be possible to do, but becomes increasingly clunky the more complex the processes become. You wind up using a hammer when what you need is a screwdriver. Or a socket wrench.

Microsoft made billions by designing simplistic apps that enable people with little or no skill to do very simple things and pretend they are "programming." It is at the point that their analytical skills develop and they seek the answers to more complex questions that Access and Excel become increasingly the "wrong" tool to use to find the answers to those questions. Most programming languages have no such limitations. That is the whole point of preferring a programming language to Excel or Access.

None of that is to say you cannot take out a screw or remove a nut with a hammer. Of course you can. Just keep banging away at it until it does what you need it to do.

oldman

02-08-2013, 11:39 AM

I guess I have my answer yes some of you use Access
Access-Excel use which ever works for you
I use Access because if I need a formula I can do it,if I need to rank a col,that too roi yes win% yep,combine this with that of course,find a trend NOT interested. The funny thing is doing all this programming I failed to learn how to handicap I'm in good company though very few pros out there but the process if fun.

BIG49010

02-08-2013, 12:23 PM

I believe that those of you who have never used Access or another db underestimate its capabilities. I have a black box Access program that does all the work and which has afforded me good results over tens of thousands of races bet with real money.

Don't just think in terms of what you can do with a few tables and queries but rather what can be done with hundreds of tables and hundreds of queries.

I am sure I could have done similar things in a different programming environment but Access has served me well. Having learned through the handicapping program I have used it for lots of applications in my work.

Your on the money here, I started my database with Paradox 20 years ago, and still use it today in combination with Excel. The two are not perfectly compatible, but you can't do 600 searches in Excel on a race card for 120 horses. A database can do it, and report back to you with a answer table that expresses your handicapping ideas, and do it consistantly day after day.

raybo

02-08-2013, 01:00 PM

A good argument can be made for DeltaLovers suggestion that the database be used as essentially a storage container (what is designed for) rather than being duct taped, hose clamped, and glued together as a combination of storage container and information generator/data manipulator(something it is not designed for in anything more than rudimentary processes). The same argument could be made against the use of spreadsheets--good for storage and simple manipulations, but definitely NOT the best tool for analyzing races.

While I agree databases offer much more "resources" for analysis, than does Excel, it really depends on what and how you want to "analyze races". if what you want to do is ask what factors, or combinations of factors performed well, or anything similar to that, then databases are the way to go. But, if you already know what factors and any composite factors you have already created, that you want to use to analyze a race, then there are simple ways to set up "views" in Excel that will give you a perfect picture of all the horses and their attributes, including all the factors you want to use, for each of them, in one picture, allowing great visual handicapping value, because you can see the whole race with exactly what you want to see, in one picture. of course you can also do that in a database, but it will take you months or years to ever get the database created and set up to do what takes a day or two in Excel.

headhawg

02-08-2013, 01:43 PM

What I'm objecting to is that technology will provide the answer. It doesn't. It finds correlations and the handicapper has to figure out what to do with it.Well this is the fundamental difference between "data" and "information". Unless one is doing their own workout clocking or timing of races or coding physicality the raw data is the more or less the same, with the differences being the integrity of the data provider. But I didn't think this was a discussion of how to use the information. Rather, what might be the most effective tool? I mean Word contains data. And you can "search" a document. But would it be an efficient method to see how variables might be related? Hardly.

It seems as if the "Excel guys" and the "db guys" are approaching handicapping (or the presentation of the information) from two different perspectives which is fine. The OP was asking about Access, and I was working under the assumption that he was trying to find out how the data might be related, and how to arrive at that in the quickest manner. If you already know that relationship then, yes, Excel, Python, Basic, or whatever will work.

Capper Al

02-08-2013, 02:12 PM

Heaadhog,

I'm saying there is a place for each. Most handicappers would be better served with the Excel and Access combination in that order. Let the necessity of managing the data be the reason to go to a database. Of course, it's a whole other world if one wants to buy their way into a database instead of develop their own methology.

traynor

02-08-2013, 06:48 PM

This is absolutely correct.

It is not that bad as it sounds though... Learning python (which I strongly suggest) will help you to realize things that you cannot see right now.. From several perspectives is a much better language than VB, the problem is that you can not understand it right now since you only now one specific paradigm of programing...

You will become a better programmer and you will be able to learn new languages easier than before...

You have to try...

To extend your comments a bit, the specific paradigm of programming developed by Excel and Access users is the root of the problem, not deficiencies in the applications. They can do a lot of things, but so can GW BASIC (if you can still find it). That does not say that GW BASIC (or Excel or Access) are the best choice for race analysis applications. Unfortunately, I have tried, worked with, become frustrated with, and abandoned all three in favor of other programming languages.

What that paradigm includes is getting a basic app running and doing stuff, then adding on, and adding on, and adding on--a process generally called "spaghetti code." That type of sloppy coding and minimal planning is what Microsoft caters to--it enables them to sell a lot of software.

What is lacking is design, or planning. That is a different way of thinking about programming--a different paradigm--rather than starting with something that does some trivial thing and adding more and more and more to it until it is almost dysfunctional. Not non-functional, dysfunctional.

On the other hand, even a concept as old and clunky as object orientation is way ahead of the "Access/Excel paradigm of duct tape, hose clamps, superglue, and spaghetti code." It makes changes simple, because the "things"--the objects--are separated from the "stuff the things do or have done to them"--the functions and subroutines.

Why is that important and what does it have to do with Access databases or Excel spreadsheets? Because both are best viewed as storage areas for data, not jumbled together with the processes that a separate component should do. And because the display functions should be separated from both. It is that hopelessly cluttered mess of spaghetti code generated by old-time "programmers" and Excel and Access users playing at being programmers (because they could write a few simple instructions in SQL or VBA) that was the driving force behind object-oriented analysis and design (OOAD) and object-oriented programming (OOP).

DeltaLover

02-08-2013, 06:59 PM

To extend your comments a bit, the specific paradigm of programming developed by Excel and Access users is the root of the problem, not deficiencies in the applications. They can do a lot of things, but so can GW BASIC (if you can still find it). That does not say that GW BASIC (or Excel or Access) are the best choice for race analysis applications. Unfortunately, I have tried, worked with, become frustrated with, and abandoned all three in favor of other programming languages.

What that paradigm includes is getting a basic app running and doing stuff, then adding on, and adding on, and adding on--a process generally called "spaghetti code." That type of sloppy coding and minimal planning is what Microsoft caters to--it enables them to sell a lot of software.

What is lacking is design, or planning. That is a different way of thinking about programming--a different paradigm--rather than starting with something that does some trivial thing and adding more and more and more to it until it is almost dysfunctional. Not non-functional, dysfunctional.

On the other hand, even a concept as old and clunky as object orientation is way ahead of the "Access/Excel paradigm of duct tape, hose clamps, superglue, and spaghetti code." It makes changes simple, because the "things"--the objects--are separated from the "stuff the things do or have done to them"--the functions and subroutines.

Why is that important and what does it have to do with Access databases or Excel spreadsheets? Because both are best viewed as storage areas for data, not jumbled together with the processes that a separate component should do. And because the display functions should be separated from both. It is that hopelessly cluttered mess of spaghetti code generated by old-time "programmers" and Excel and Access users playing at being programmers (because they could write a few simple instructions in SQL or VBA) that was the driving force behind object-oriented analysis and design (OOAD) and object-oriented programming (OOP).

:ThmbUp: :ThmbUp: :ThmbUp:

traynor

02-08-2013, 07:01 PM

Heaadhog,

I'm saying there is a place for each. Most handicappers would be better served with the Excel and Access combination in that order. Let the necessity of managing the data be the reason to go to a database. Of course, it's a whole other world if one wants to buy their way into a database instead of develop their own methology.

I think most handicappers would be better served by tossing both Excel and Access and saving their data in plain text or XML-format files. Both are faster, easier, and simpler to use than either a database or spreadsheet. Whatever analysis one wants to do can be done quickly and easily in almost any programming language, VB, Python, Ruby, whatever. Displays are much simpler, and much easier to customize.

Bluntly, having used all three extensively, I cannot imagine why anyone handicapping horse races would want to use Excel or Access as anything more than a glorified cardboard box to hold raw data, with manipulation of that data handled by an app in a regular programming language. I suppose one could put wings and jet engines on a Cadillac Escalade and make it fly. That does not mean it is a good idea.

vegasone

02-08-2013, 07:25 PM

Just because you can"t do it doesn"t mean everyone else is in the same boat. It doesn"t take much to write code to get any kind of info, calculations etc done using any kind of data. If I want a new report or calculation done it is done using prior code as a basis. To each his own. If I want to see results in Excel format I write the results to an Excel spreadsheet etc. Imagination and knowledge and research go a long way to solving problems. Saying my way is the only way to do something doesn"t fly in the real world. There a lots of ways to get things done.

And it doesn"t takes months or years. Hours maybe days depending on how industrious I feel and how complicated the problem. Usually takes longer to think and research the problem than it does to solve it. My way works for me, your way for you.

Besides coding to solve any problem is similar no matter where the underlying data is. Assuming you are using the same language.

traynor

02-08-2013, 08:08 PM

Just because you can"t do it doesn"t mean everyone else is in the same boat. It doesn"t take much to write code to get any kind of info, calculations etc done using any kind of data. If I want a new report or calculation done it is done using prior code as a basis. To each his own. If I want to see results in Excel format I write the results to an Excel spreadsheet etc. Imagination and knowledge and research go a long way to solving problems. Saying my way is the only way to do something doesn"t fly in the real world. There a lots of ways to get things done.

And it doesn"t takes months or years. Hours maybe days depending on how industrious I feel and how complicated the problem. Usually takes longer to think and research the problem than it does to solve it. My way works for me, your way for you.

Besides coding to solve any problem is similar no matter where the underlying data is. Assuming you are using the same language.

Whoever said I "can't do it"? If someone is just starting out, and using Access for raw data storage (or using Excel for raw data storage), I think that person can accomplish far more by using the Access or Excel for the raw data storage, and building a simple app in VB to manipulate that raw data, rather than trying to do the same thing in Access or Excel. YMMV.

raybo

02-08-2013, 10:07 PM

Whoever said I "can't do it"? If someone is just starting out, and using Access for raw data storage (or using Excel for raw data storage), I think that person can accomplish far more by using the Access or Excel for the raw data storage, and building a simple app in VB to manipulate that raw data, rather than trying to do the same thing in Access or Excel. YMMV.

I agree, if you're storing lots of data, however, not all of us using a database in either Excel or any database app is storing lots of data. Yes, if I wanted to store more than 1 track and more than 100 or so cards and results, your way would of course be the better choice, however, I don't include more than 1 track in the same Excel workbook, and the database in that workbook seldom exceeds 30 cards and results at any one time. Plus, I started using Excel with only horse racing in mind, learning to do what I do primarily by myself with a little help on the VBA code that I cannot simply record and modify, I never used it in a business or educational environment, so what I do, and how I do it, in Excel, is probably not even close to what someone with a business/educational Excel background would even consider. You'd probably laugh at what I do i Excel and the ways I do it, but, it's based strictly on racing with no previous habits to forget.

MightBeSosa

02-12-2013, 09:25 PM

...but you can't do 600 searches in Excel on a race card for 120 horses. ..

Say what?

raybo

02-12-2013, 10:16 PM

That one got me too. 600 searches for what? And if he's talking about queries, why would you need 600 for a single card? You might have hundreds of models set up, in Excel, based on research, done in Excel, of 1000s of races I think HCap mentioned to me once that he was databasing thousands of races in Excel, and using VBA, within Excel, for querying that database. Don't remember his ever saying that he had any problems doing that. I know, in AllDataBase, we can query about 60 individual factors, and any combination of those factors, for as many races as there are rows in Excel. Then we can create models from that research and apply them, in daily play, getting selections from them. No other app is needed.

hcap

02-13-2013, 08:42 AM

That one got me too. 600 searches for what? And if he's talking about queries, why would you need 600 for a single card? You might have hundreds of models set up, in Excel, based on research, done in Excel, of 1000s of races I think HCap mentioned to me once that he was databasing thousands of races in Excel, and using VBA, within Excel, for querying that database. Don't remember his ever saying that he had any problems doing that. I know, in AllDataBase, we can query about 60 individual factors, and any combination of those factors, for as many races as there are rows in Excel. Then we can create models from that research and apply them, in daily play, getting selections from them. No other app is needed.Yes, using the "advanced filter" built in, I have created workbooks with 64,000 runners(divide by 8-or approx 8,000 races), and can query those runners with any combination of of 100 or more individual factors. I have setup my own UI in excel which facilitates going from one query to another No coding required. On my older win 7 machine it takes approx 8-10 seconds. On my newer I7 processor machine much faster. The advanced filter does flat file queries, but pre-processing first in another workbook using pivot tables does aggregates of say all trainers, jockeys, breeders, etc at ONE time, and those aggregate totals can be imported into the flat file advanced filter workbook, and combined with the advanced filter to generate both flat and relational data queries. I have also done pivot tables based on rankings of any factor. Generally 10 deep in rank. Coding in VBA is not as fast as using both pivot tables and the advanced filter "built ins". I believe excel and most of Office is written in C++, so using VBA to hook into and manipulate these built in applications is generally the way to go. Although there some routines that may be competitive in purely VBA, like some sorting code, I still would use excels' built in sorting feature for simplicity.

I generally use one track at a time, and although even some year round tracks may provide a few years worth of races to model on, will not create models from old data. I have experimented with using recent races and find short term trends at particular tracks are very useful. Ray uses fewer cards generally than I do, but at some tracks, trends can change in a few weeks, and finding the balance between fresh data and normal variance is the key.

BTW, there are substantial difference between older versions of excel and the newer larger workbook versions. Sometimes excel 2002, and 2003 are superior to 2007 and 2010. For instance the advanced filter will bog down in the later versions, but the pivot tables will calculate quicker.

A simple flat file database "AllDataBase", I created with Ray, is available for download for free on Ray's site. Some modelling features included. Works much better in 2002. and 2003.

BIG49010

02-14-2013, 10:58 AM

Say what?

600+ Queries for a single horse is what I referring to. Just look at a single item like finish last race, and use where the horse finished last race and look how the trainer has done with that particular type of horse.

Flr 1st, 2nd, 3rd, Good lost by less than 3 Lengths Out of Money, and Bad.

1. Total Runners for this trainer
2. Wins
3. Places
4. Ave. Odds

Now you look at current year, previous 4 years, total for last 5 years or more for some items.

Now perhaps you look at it and ask was this off a layoff or current form cycle, so you double the queries from this particular querry. If my math is correct your looking at 250 queries for this particular handicapping angle.

All I am trying to say is it tough to build all this logic into Excel, if it isn't then perhaps I should change all my programing over.

MightBeSosa

02-14-2013, 11:45 AM

There's nothing that can't be done effectively and fast in Excel, given a competent programmer.

raybo

02-14-2013, 11:48 AM

While it might be easier to do such a large amoint of queries in a DB app, if you already know how to write your queries, I doubt Harry would have much trouble doing the same thing in Excel. What my point is, is that if you already know how to create a racing database, which is a monumental task in the first place, for non-DB professionals or semi experts, then use a DB, but if you are like most of us, and possess average technical computer and Excel skills, Excel is so much easier to create and use, to accomplish almost anything you would want to do.

So, if you're thinking about creating either a database app, in a DB or in Excel, decide before you start which would be more appropriate for you.

That being said, I wish I could wrap my head around creating a true DB, I've been trying for years and have failed every time.

DeltaLover

02-14-2013, 12:47 PM

Ray,

creating a horse racing DB should not be a monumental task. Actually it is a very straight forward process that does not present any special challenges more than common things like data cleaning and some decisions about normalization.

I like your excel approach and your enthusiastic attitude about its capabilities. I also believe you have done a very good job pushing excel to its limits.

One thing though that you have to view with a little more flexibility is that technology is moving very fast and what today is considered state of the art very soon will be substituted by a better technology. This process never ends in the computing world and failure to follow the evolution of the industry results to obsolete frameworks very soon.

Under today's landscape we have a very wide spectrum of technologies to choose from when it comes to data storage and retrieval, most of them been light years ahead from the stars of the past. VBA, Access, Paradox to name a few, are proprietary solutions of the previous century that have already completed their life cycle and do not have much more to offer compared against open source technologies like MySql, MongoDb, SQlLite, Python etc. Excel remains a great tool when it comes to number crunching and visualisation but at no means can be seen as a competitive back end solution; this does not mean that you can not use it as such, it just means that there are much more suitable solutions to choose from.

When deciding about technology directions always remember that you never know all your options and most likely those you are aware about are already staled and obsolete.

BIG49010

02-14-2013, 02:33 PM

W
That being said, I wish I could wrap my head around creating a true DB, I've been trying for years and have failed every time.

Two things I would recommend you think about, if your going to approach a database.

1. The Items in the database need to be pulled from data available in charts or drf files, you really don't want to be entering data manually.
2. When you come up with your list of items you want to put in the database, review it a few times before you actually start. Once you start feeding data into it, it's tough to change after you compile a few weeks, months, years of data.

Delta is correct, there are many great new solutions for the young at heart, but after you do this for a few years and are successful you find it difficult to drop everything and start over.

raybo

02-14-2013, 05:24 PM

Two things I would recommend you think about, if your going to approach a database.

1. The Items in the database need to be pulled from data available in charts or drf files, you really don't want to be entering data manually.
2. When you come up with your list of items you want to put in the database, review it a few times before you actually start. Once you start feeding data into it, it's tough to change after you compile a few weeks, months, years of data.

Delta is correct, there are many great new solutions for the young at heart, but after you do this for a few years and are successful you find it difficult to drop everything and start over.

I realize what you're saying, I have created the tables several times, but can't get any further. It's the querying part that I can't get in my head, just as it is the object portion of VB, for example, that I can't get in my head. Sure I can stumble along in VBA, but that's because most of what I need to do, I already have example code for, but writing somethig different, from scratch, nope, gotta get help from the gurus.

hcap

02-14-2013, 06:33 PM

Not to change thed subject too much but I think the real problem is how do we find predictive factors or predictive compound factors? After many years of fooling around with analyzing old data and then going forward and being frustrated by what looked like dynamite. I have found that one problem is the subtleties that arise from back fitting. Even though by statistical tests, certain queries using past data should project forward, however usually they fail to do so.

Of course the final test is your bankroll.

To avoid the main problems associated with back fitting what I have done is simulate betting in real time. I break a few years at one track into numerous segments-selectable by weeks, months or anything in between. 10 days to 100 days. My program builds a model(s) on the first segment and handicaps the following day after that segment. It records the results. It then updates the model by dropping the first day in the model, and adding the data for the first day handicapped. Repeats the process for all days in the data table and provides total results at the end of the run. My model is constantly changing to reflect fresh data and simulates what I would bet based on this dynamic model over the course of the entire data table. Sometimes a year sometimes more. I do this all in excel. Using the database built ins and VBA to sequence them together Generally each new day to handicap based on the updated model takes a few second. So to run a years worth of races may take under one hour.

BTW, Ray does something similar with his RS program.

Of course I am still back fitting to a certain extent, choosing the spectrum of factors and how to judge which should be included in the model at initial setup, but most of the grunt work and determination of which of those hundred or so factors I use is done for me after setup.

I find it more predictive than the usual statistical testing. Using relatively fixed models. Of course it is helpful to create decent factors that out perform the publicly available ones.