PDA

View Full Version : What happened?


raybo
01-28-2007, 10:08 PM
I was searching the archives and ran across this thread. Was wondering what happened, if anything, to this proposed project. Sounded very interesting and I wish I had been around the forum at the time as I would have loved to participate in any role but primarily on the handicapping side.

COUGAR
Off The Pace

Join Date: Oct 2003
Location: Houston
Posts: 42
vCash: 3000

Lets design a program
I thought since so many people on this board talk about so many different programs and it seems nobody is exactly happy with what they have ( if it just had a tweek here a twench there) well you know what i mean. Personally i dont think it matters ( etc etc )numbers that is all you need. The main thing is how you push the bets through the windows...Discipline and Patience. There are 2 programs i use and they work for me, but ... IF... I could have the perfect program i would design it myself as i am sure anyone would, but time and know how( a big lacking gap in my brain, where is that on switch) well , i know there are a least 4 or more designers on this site and i thought it would be fun for us averege joes to create OUR PERFECT PROGRAM. May be we can get someone to design it for us and we can all pool the cost. Is that possible??....Hummmmmmm..

I will start and just be brief. I would just like the basic stuff on one screen 20 factors or so, pace, speed, class velocity, feet per sec( etc etc ) on one screen so i can see it well and print out on one sheet. The main thing i would like would be a SUPER DATABASE. The kind of data that woud find all the best combiations of winning factors. Lets say you had 30 factors to go through. The engine would seek out evey factor one by one and every combination up to the 30 combinations to find the best results for. ANY CLASSIFICTION you wont, not only the top horse but may be up to 5 horses deep in each factor. seek exactas, tri, and supers along with pic 3s seeking the best combinations of factors and the best ROI...

Anyone have any ideas????


COUGAR

Greyfox
01-29-2007, 12:03 AM
I will start and just be brief. I would just like the basic stuff on one screen 20 factors or so, pace, speed, class velocity, feet per sec( etc etc ) on one screen so i can see it well and print out on one sheet. The main thing i would like would be a SUPER DATABASE. Anyone have any ideas????


COUGAR

Anyone have any ideas????
Yeah. You've got too many factors. Obviously God could deal with that many.
You and I can't.
I was watching a race today at Golden Gate, where a spill took place.
They would not replay it. You could have 35 factors and that wouldn't account for that.
The bottom line is,- yes probably everything including the crows on the track have an impact on a horse race. But if you want to make money on the "basic
stuff" 20 factors won't help you in this game.

raybo
01-29-2007, 12:14 AM
Anyone have any ideas????
Yeah. You've got too many factors. Obviously God could deal with that many.
You and I can't.
I was watching a race today at Golden Gate, where a spill took place.
They would not replay it. You could have 35 factors and that wouldn't account for that.
The bottom line is,- yes probably everything including the crows on the track have an impact on a horse race. But if you want to make money on the "basic
stuff" 20 factors won't help you in this game.

I agree, but the idea of a collaborative project is what interested me concerning the thread. I assume either: 1. It died on the vine without any progress at all. 2. It went forward and after noone could take the lead well enough to enable productive decisions, it died on the vine. 3. It went forward, the result was disastrous and it died. 4. It succeeded, and now none of them are saying a word because they're getting rich.

I favor #1, but who knows?

tupper
01-29-2007, 12:41 AM
I intend to start a database that involves a lot of fields. I can't deal with 30 factors, but the database could combine many factors into a small, human readable spreadsheet or chart.

The database will probably use the regular DRF files (which probably lack info on a horses reaction to crows :)). The data will be pasted into a spreadsheet, and I imagine that the formulas and factoring will evolve with time. I will probably pay someone to get this database started.

To use the DRF files, I think I need to make a corresponding csv file that lists the field names and column formatting shown in this Bris DRF data structure guide: http://www.brisnet.com/cgi-bin/static.cgi?page=drfsff

If anyone can provide a csv file with these DRF field names and formatting info, please post it. Otherwise, I guess it will take a couple of hours to make one manually (which I will post here).

Any suggestions or collaboration would be welcome.

Thanks,
-Dale

raybo
01-29-2007, 01:46 AM
I intend to start a database that involves a lot of fields. I can't deal with 30 factors, but the database could combine many factors into a small, human readable spreadsheet or chart.

The database will probably use the regular DRF files (which probably lack info on a horses reaction to crows :)). The data will be pasted into a spreadsheet, and I imagine that the formulas and factoring will evolve with time. I will probably pay someone to get this database started.

To use the DRF files, I think I need to make a corresponding csv file that lists the field names and column formatting shown in this Bris DRF data structure guide: http://www.brisnet.com/cgi-bin/static.cgi?page=drfsff

If anyone can provide a csv file with these DRF field names and formatting info, please post it. Otherwise, I guess it will take a couple of hours to make one manually (which I will post here).

Any suggestions or collaboration would be welcome.

Thanks,
-Dale


Are you talking about making a DB table with all those field names? Or are you talking about making a control file to be used with a parser like "Infotran"? If it's the latter, I can send you a control file that has most of the fields in it, the ones I feel are important for what information I want to use. If the former, I'm DB dumb, but wish I wasn't. I've been using Excel for years and am very proficient with it. Have been wanting to create my own DB but can't get past creating a couple of tables and importing some data into it. I've read some DB theory, relational databases, but it's pretty much Greek to me. That was why I was interested in the post in the first place. I was searching the archives for info on basic DB construction when I ran across it.

tupper
01-29-2007, 02:38 AM
Are you talking about making a DB table with all those field names? Or are you talking about making a control file to be used with a parser like "Infotran"?

I want in my database all of the data from every DRF file that I purchase, to retain the possibility of using the data for future queries/research. So, I think that every DRF field name and every DRF field property must be entered into the database. If I start with one DRF file that has this field info in the top rows, I think it would be easier to create the database.

By the way, I use Linux, so INFOTRAN is not a great option for me. I want something universal -- a cross-platform database that can funnel figures into any spreadsheet, so anybody can use the system on any OS.


If it's the latter, I can send you a control file that has most of the fields in it, the ones I feel are important for what information I want to use.

I'm not sure not sure if your control file is what I seek, but I would love to see it, regardless.

If the former, I'm DB dumb, but wish I wasn't. I've been using Excel for years and am very proficient with it. Have been wanting to create my own DB but can't get past creating a couple of tables and importing some data into it. I've read some DB theory, relational databases, but it's pretty much Greek to me. That was why I was interested in the post in the first place. I was searching the archives for info on basic DB construction when I ran across it.

I'm "DB dumb," too -- I can't even import (nor create a table from) the DRF files, which is why I will probably pay someone to get me started. I just need the framework right now.

K9Pup
01-29-2007, 08:27 AM
I intend to start a database that involves a lot of fields. I can't deal with 30 factors, but the database could combine many factors into a small, human readable spreadsheet or chart.

The database will probably use the regular DRF files (which probably lack info on a horses reaction to crows :)). The data will be pasted into a spreadsheet, and I imagine that the formulas and factoring will evolve with time. I will probably pay someone to get this database started.

To use the DRF files, I think I need to make a corresponding csv file that lists the field names and column formatting shown in this Bris DRF data structure guide: http://www.brisnet.com/cgi-bin/static.cgi?page=drfsff

If anyone can provide a csv file with these DRF field names and formatting info, please post it. Otherwise, I guess it will take a couple of hours to make one manually (which I will post here).

Any suggestions or collaboration would be welcome.

Thanks,
-Dale

Well I'm "horse dumb" but I can help a little here. I am most familar with MS products like Excel and Access. But the principles apply to most all DBs.

Those files ARE CSV files. Problem is they have more columns than most standard software can EASILY handle at once. I think the INFOTRAN software is just a tool to help extract the COLUMNS (fields) you want to use.

The data in the form presented is not really "database friendly". It isn't truly in relational format, something databases like to see. Quickly looking at the file description I see several "sections" of data. Sections like Today's race data, today's horse/trainer/jockey/owner data, horse history data, horse lifetime records, workout data, past performance data, etc. etc.

In a database you would probably want to make each of these sections a separate table (file) in the database. Then to produce reports you would write queries to combine data from 2 or more of these tables.

To get the data into the database you would probably need to have SOME knowledge of VBA coding OR use the INFOTRAN tool to create input FILES that correspond to each of the DB tables you want to create/maintain. The imported data could be appended to the existing tables in order to build historic data.

AFA a joint effort I can tell you from past experience it is a hard thing to do. Getting 2 or more handicappers to agree on how to do things can be a problem. Making the software flexible can help, but making software flexible for the user means making it more complicated for the programmer.

hcap
01-29-2007, 09:28 AM
To make a relational database with a table for each category, as K9 points out, requires at least 1 "key" field.
For instance if you want to store a horses running line and then connect it to the trainer you would do something like
"20061111SUF1012"-- Date year 2006, date month and day 1111 track SUF race #10 program #12

The running line table would have that piece of data as well as the trainer table and other tables for linking those tables in queries

I do have an Excel program that imports all fields, so you don't have to use infotran. Works in batch mode. You can import 6 months worth of races and results at a time.

At the moment it outputs one table of info and compound factors on 1 sheet
I could modify that so it creates 4 or 5 tables on seperate sheets and Access could then import each sheet to corresponding tables.

But be aware after all this you would have to know what you are doing in Access in writing multi-table queries.

raybo
01-29-2007, 09:47 AM
I want in my database all of the data from every DRF file that I purchase, to retain the possibility of using the data for future queries/research. So, I think that every DRF field name and every DRF field property must be entered into the database. If I start with one DRF file that has this field info in the top rows, I think it would be easier to create the database.

By the way, I use Linux, so INFOTRAN is not a great option for me. I want something universal -- a cross-platform database that can funnel figures into any spreadsheet, so anybody can use the system on any OS.



I'm not sure not sure if your control file is what I seek, but I would love to see it, regardless.

I'm "DB dumb," too -- I can't even import (nor create a table from) the DRF files, which is why I will probably pay someone to get me started. I just need the framework right now.


As K9Pup suggests, having header names for every field in a Bris DRF file would be difficult, if not impossible, at least in Access. Much of the data in the Bris files, in my opinion, would not be needed for handicapping, my control file extracts 50 columns and includes 13 rows under those columns for each horse in each race on a race card. The last 10 rows being the last 10 races the horse ran, the first 3 include race data, and workouts. The control file I allude to is what tells Infotran which data you want to extract from the Comma-delimited file, Bris' DRF in this case. If you were to create, say 4 or 5 sheets in an Excel workbook and then "draw" into each one the data you want for each corresponding table in Access, or whatever DB you are using, then you could import each of the sheets into the DB, thus separating the data into, race specific data, horse specific data, results data, trainer/jockey data, etc. I am considering this approach myself but will have to grind my way through the DB portion. As I said I'm a whiz at Excel, including macros and automation, but the Access portion is foreign to me. If you want to see my control file send me your email or tell me how to put an attachment into a private message here on the forum.

K9Pup
01-29-2007, 10:41 AM
As I said I'm a whiz at Excel, including macros and automation, but the Access portion is foreign to me. If you want to see my control file send me your email or tell me how to put an attachment into a private message here on the forum.


Raybo based on what I've seen of your excel code, getting into a DB won't be a big problem for you. If you need some help getting starting contact me.

hcap
01-29-2007, 10:45 AM
Although you may feel that only running lines and workouts and a few other factors are the way to go now, you may not feel the same after feeding your database for six months. Limiting yourself to which fields now you think are pertinent makes it real tough later on to change your database. Probably better to go for more then less.

Recently I decided to look at sires, dams, and breeders. They are listed in the data files for each horse. Along with 32 columns of trainer moves. Changed my mind about how important some stuff is

Access is limited to under 256 columns per table. Would require six tables to store all the info in the data files, without breaking it down into logical categories. More, breaking it down into logical categories
extracts 50 columns and includes 13 rows under those columns for each horse in each race on a race card. The last 10 rows being the last 10 races the horse ran, the first 3 include race data, and workouts.To import into a database, best to have each field in a seperate column. If workouts fields are listed bellow running line fields, it is better to increase the number of columns instead. And you would need a seperate table for works since there is not a one to one correspondence between number of running lines and number of works. I know this works in Excel, but databases don't like this format.

raybo
01-29-2007, 12:03 PM
Although you may feel that only running lines and workouts and a few other factors are the way to go now, you may not feel the same after feeding your database for six months. Limiting yourself to which fields now you think are pertinent makes it real tough later on to change your database. Probably better to go for more then less.

Recently I decided to look at sires, dams, and breeders. They are listed in the data files for each horse. Along with 32 columns of trainer moves. Changed my mind about how important some stuff is

Access is limited to under 256 columns per table. Would require six tables to store all the info in the data files, without breaking it down into logical categories. More, breaking it down into logical categories
To import into a database, best to have each field in a seperate column. If workouts fields are listed bellow running line fields, it is better to increase the number of columns instead. And you would need a seperate table for works since there is not a one to one correspondence between number of running lines and number of works. I know this works in Excel, but databases don't like this format.


So, basically, what we would have to do is get the data into Excel, into 4 or 5 different sheets by category, and then import into Access, into 4 or 5 different corresponding tables?

If so, Bris' DRF can be bought as a single file format or a multifile format, which means I could write a different control file for the multifile format and have the data sent to sheets named: 1. Race, 2. Trainer/Jockey/Owner [Current Race Data for trainer/jockey/owner/horse (pedigree, starts vs WPS, trainer/jockey starts vs WPS%, horse's workouts,etc.)], 3. Past Performances, 4. Horse/Trainer/Jockey Key Stats Categories.
(The multifile format has 4 files in it).

raybo
01-29-2007, 12:10 PM
So, basically, what we would have to do is get the data into Excel, into 4 or 5 different sheets by category, and then import into Access, into 4 or 5 different corresponding tables?

If so, Bris' DRF can be bought as a single file format or a multifile format, which means I could write a different control file for the multifile format and have the data sent to sheets named: 1. Race, 2. Trainer/Jockey/Owner [Current Race Data for trainer/jockey/owner/horse (pedigree, starts vs WPS, trainer/jockey starts vs WPS%, horse's workouts,etc.)], 3. Past Performances, 4. Horse/Trainer/Jockey Key Stats Categories.
(The multifile format has 4 files in it).

Of course, then we would have to have another sheet for the results files, and in my case, I would want to have my numbers and rankings in the database for testing purposes (track to track, distance to distance, surface to surface, adjustments for each regarding the algorithms I use for my numbers, etc.).

tupper
01-29-2007, 12:28 PM
So, basically, what we would have to do is get the data into Excel, into 4 or 5 different sheets by category, and then import into Access, into 4 or 5 different corresponding tables?

I run Linux. Excel and Access are no good for me.

I would use free, open source, multi-OS programs that anyone can use, like OpenOffice Base/Calc, MySQL, PostregSQL, mysqladmin, Gnumeric, etc.

raybo
01-29-2007, 12:36 PM
I run Linux. Excel and Access are no good for me.

I would use free, open source, multi-OS programs that anyone can use, like OpenOffice Base/Calc, MySQL, PostregSQL, mysqladmin, Gnumeric, etc.

That part doesn't bother me, the main thing is getting the data into a database in the correct way and then proceeding from there. I'm sure any good database program would suffice. Would another spreadsheet program be needed to do the import from? Or could it all be done within the database?

hcap
01-29-2007, 12:50 PM
tupper, sorry I can't help out in Linux.

raybo,

Yes, Correct. the only difference is I use my excel importer and the single TSN or Bris file instead of infotran. K9Pup uses Vb code. But anyone of us would have to spend a considerable amount of time setting this up.

But putting stuff in seperate sheets all nicely lined up in seperate columns allows easy import. You can have Excel save each table from each sheet as a seperate Excel file. Export that to one folder. Call one Raceinfo.xl, another Runnerinfo.xl, trainer_jockeyinfo.xl, etc. You can tell Excel to replace the older file in that folder with the same name with a new one.

Access has a macro section that allows automated imports of spreadsheet tables. You can tell it what folder and the name of the file to import. You can either run each macro that imports each specific file seperately, or you can loop it all together.Of course, then we would have to have another sheet for the results files, and in my case, I would want to have my numbers and rankings in the database for testing purposes (track to track, distance to distance, surface to surface, adjustments for each regarding the algorithms I use for my numbers, etc.).Results would be imported into a different table and linked again with each runners key identifier fields. I import results at the same time I import data file info. I do rankings in excel. There are guys who know how to do it in Access however

Some thought is really needed when you set up a relational database. How you structure it now seriously affects what you do with it down the line. Like you I am more skilled with Excel than Access. However I know enough to know you are in a different area. What appears visually apparent as in Excel, is done very differently in a relational database.
Major project tho' no matter how you do it.

raybo
01-29-2007, 01:11 PM
tupper, sorry I can't help out in Linux.

raybo,

Yes, Correct. the only difference is I use my excel importer and the single TSN or Bris file instead of infotran. K9Pup uses Vb code. But anyone of us would have to spend a considerable amount of time setting this up.

But putting stuff in seperate sheets all nicely lined up in seperate columns allows easy import. You can have Excel save each table from each sheet as a seperate Excel file. Export that to one folder. Call one Raceinfo.xl, another Runnerinfo.xl, trainer_jockeyinfo.xl, etc. You can tell Excel to replace the older file in that folder with the same name with a new one.

Access has a macro section that allows automated imports of spreadsheet tables. You can tell it what folder and the name of the file to import. You can either run each macro that imports each specific file seperately, or you can loop it all together.Results would be imported into a different table and linked again with each runners key identifier fields. I import results at the same time I import data file info. I do rankings in excel. There are guys who know how to do it in Access however

Some thought is really needed when you set up a relational database. How you structure it now seriously affects what you do with it down the line. Like you I am more skilled with Excel than Access. However I know enough to know you are in a different area. What appears visually apparent as in Excel, is done very differently in a relational database.
Major project tho' no matter how you do it.

I have read enough about relational DB's to know this statement is correct. This is the reason I haven't created my own DB before now. I have 20+ years of handicapping experience and started using Excel for handicapping in the 90's. Self taught and many, many hours of trial and error. DBing is a major step for me.

hcap
01-29-2007, 01:31 PM
raybo,

Are you familar with the Advanced filter in Excel. Much better than the auto-filter. Requires more set up than the auto-filter. I use it to query 65,000 runners, 100+fields at once. Although this is using Excel as a "flat" database, it does allow rather quick queries. Pivot tables are also very useful.

Although I have used Access a lot I still use Excel most of the time
Recently I found an older program that I set uo 4 or 5 years ago. Used pivot tables extensively. You need a years worth of data tho'. I am looking at it again. Did trainer, jockey, owner, breeder, sire and dam studies. Also in combinations, like trainer-jockey, sire-dam, trainer-breeder duos. I never really perfected it. Got it working anew with some new interesting twists.

Thats why I said you never know for sure what you want to look at.

tupper
01-29-2007, 01:48 PM
...the main thing is getting the data into a database in the correct way and then proceeding from there.
I agree. Starting the database is the hardest part.

It seems that it would make it easier to start the datatbase if one began with a file that contains all the DRF column headings and DRF column properties configured in the same format/order as a DRF file. Import the headings/properties file first -- then the format of the database is set, and is identical to the DRF format.

I'm sure any good database program would suffice. Would another spreadsheet program be needed to do the import from? Or could it all be done within the database?
I don't plan on importing with a spreadsheet program. I think that the most recent version of Excel might be the only such program that allows 1400+ columns. Most of the database people I have talked to have suggested using a script or database administration app to do the importing.

As I understand, MySQL and PostregSQL are fairly robust compared to the database cores of Access and OpenOffice Base. I think that Access and OO Base can connect to MySQL/PostregSQL and can provide a front-end which makes it easier to work with the data.

By the way, thank you, Raybo, for resurrecting this thread, and thank you to everyone who is contributing suggestions and comments in this forum. If I am ever able to get this database started, I will happily email (or post here) its basic form to anyone who wants to start their own database.

It would be great if someone could briefly explain the difference between a relational database and its advantages compared to a normal database. Touching on object oriented databases would be welcome, too.

Thanks,
-Dale

raybo
01-29-2007, 01:54 PM
I agree. Starting the database is the hardest part.

It seems that it would make it easier to start the datatbase if one began with a file that contains all the DRF column headings and DRF column properties configured in the same format/order as a DRF file. Import the headings/properties file first -- then the format of the database is set, and is identical to the DRF format.


I don't plan on importing with a spreadsheet program. I think that the most recent version of Excel might be the only such program that allows 1400+ columns. Most of the database people I have talked to have suggested using a script or database administration app to do the importing.

As I understand, MySQL and PostregSQL are fairly robust compared to the database cores of Access and OpenOffice Base. I think that Access and OO Base can connect to MySQL/PostregSQL and can provide a front-end which makes it easier to work with the data.

By the way, thank you, Raybo, for resurrecting this thread, and thank you to everyone who is contributing suggestions and comments in this forum. If I am ever able to get this database started, I will happily email (or post here) its basic form to anyone who wants to start their own database.

It would be great if someone could briefly explain the difference between a relational database and its advantages compared to a normal database. Touching on object oriented databases would be welcome, too.

Thanks,
-Dale


No prob. Gotta go for now but will continue this evening. Thanks all for your input!!!

K9Pup
01-29-2007, 03:03 PM
So, basically, what we would have to do is get the data into Excel, into 4 or 5 different sheets by category, and then import into Access, into 4 or 5 different corresponding tables?

If so, Bris' DRF can be bought as a single file format or a multifile format, which means I could write a different control file for the multifile format and have the data sent to sheets named: 1. Race, 2. Trainer/Jockey/Owner [Current Race Data for trainer/jockey/owner/horse (pedigree, starts vs WPS, trainer/jockey starts vs WPS%, horse's workouts,etc.)], 3. Past Performances, 4. Horse/Trainer/Jockey Key Stats Categories.
(The multifile format has 4 files in it).

You don't really need to move it to Excel first. Access can import (open) CSV files similar to Excel. So if you can take the DRF data and create control files to create the 4 files in CSV format those files can be used to populate tables in Access.

You CAN move things to Excel first, if you desire. Once the tables are defined in Access you could copy and paste "records" from the excel sheet directly into Access tables.

K9Pup
01-29-2007, 03:33 PM
I agree. Starting the database is the hardest part.

It seems that it would make it easier to start the datatbase if one began with a file that contains all the DRF column headings and DRF column properties configured in the same format/order as a DRF file. Import the headings/properties file first -- then the format of the database is set, and is identical to the DRF format.


Most DBs have limits on the number of columns and the DRF data asis is not in relational format.


I don't plan on importing with a spreadsheet program. I think that the most recent version of Excel might be the only such program that allows 1400+ columns. Most of the database people I have talked to have suggested using a script or database administration app to do the importing.


I don't know of any version of Excel that can handle 1400 columns. And there really isn't any reason to pass the data through Excel before sending it to the DB. Most if not all DBs have import functions that allow a TXT (or CSV) type file to be imported into a DB table(s). Again I think you will be forced to create multiple tables because of column limits (it should be separated anyway).


As I understand, MySQL and PostregSQL are fairly robust compared to the database cores of Access and OpenOffice Base. I think that Access and OO Base can connect to MySQL/PostregSQL and can provide a front-end which makes it easier to work with the data.

I know for sure Access can front end MYSQL. I suspect it can handle PostreqSQL and others. The advantage of using Access as the front end is that Access contains a drap and drop tool to build queries. You don't even have to know native SQL. Also Forms, macros and code can be used to automate things and make them "user-friendly".


It would be great if someone could briefly explain the difference between a relational database and its advantages compared to a normal database. Touching on object oriented databases would be welcome, too.

Relational DBs are composed of groups of tables of related data. From the DRF data I can easily see 4 or 5 tables made up of horse data, trainer data, past performance lines, workouts, etc. Tables are made up of row and columns. Excel sheets are a form of "relational" tables. In a relational database data is not replicated in several places. Each "row" in a table represent one "record". Each column represent one field.

hcap
01-29-2007, 03:37 PM
I have to say I never used infotran. I went directly to Excel. I guess the following concerns affected my decision. How do you deal with importing many data files at one time? If you want to import many files, do you have to type in the name of each, or can it be automated? Do you have to specify each field you you want in the control file(s)?And as the number of fields increase, does it bog down?

And for tupper-will it run under Linux?
Then again any body know if MS Office programs will?

tupper
01-29-2007, 05:00 PM
Relational DBs are composed of groups of tables of related data. From the DRF data I can easily see 4 or 5 tables made up of horse data, trainer data, past performance lines, workouts, etc. Tables are made up of row and columns. Excel sheets are a form of "relational" tables. In a relational database data is not replicated in several places. Each "row" in a table represent one "record". Each column represent one field.
Thanks for the explanation.

I don't know if I need a relational database. I anticipate starting out like someone who uses Infotran and a spreadsheet -- I'll select the fields to use and then apply calculations and formulas to the data to get results.

Later on, I'll want to use the database to test a theory, for instance, I might want to know the win percentage and ROI of 3-year-olds who just came off of a 67-73 day layoff, who are running on even-numbered dates with blinkers, etc. Would one need a relational database to get these results or can one merely do a complex search of the data?

And for tupper-will it run under Linux?
Then again any body know if MS Office programs will?
MS Office programs run in Linux via emulators and VMs (Virtual Machine software). Some combinations of these emulators/VMs, MS software and hardware can behave a little kludgy. There have been rumors of an MS Office for Linux, now that M$oft is "partnering" with Novell (who control SUSE Enterprise Linux). A few software companies have claimed to provide for Linux exact clones of Excel and Word.

However, OpenOffice is similar to MS Office in many ways, and MS Office macros can be ported to OO (Visual Basic --> StarBasic). And I would prefer to use native, open-source software rather than proprietary software through an emulator/VM.

K9Pup
01-29-2007, 05:18 PM
Thanks for the explanation.

I don't know if I need a relational database. I anticipate starting out like someone who uses Infotran and a spreadsheet -- I'll select the fields to use and then apply calculations and formulas to the data to get results.

Later on, I'll want to use the database to test a theory, for instance, I might want to know the win percentage and ROI of 3-year-olds who just came off of a 67-73 day layoff, who are running on even-numbered dates with blinkers, etc. Would one need a relational database to get these results or can one merely do a complex search of the data?
.

If you just used the "flat" file data asis you could query it as above. You don't even need a DB for that. Save all the DRF files and write a program to read through them all using the criteria above to calculate the results. You would just need to know how to do some programming to get there.

If the data was in a DB in relational format you would have one table that contained all past performance lines for every horse. You would have another table with the results and payouts of races. Using the Access front end you could easily query the LINES table and select the lines of horses that meet the criteria above. The output of that query would be "joined" to the results table to match up each horse with it's result in that race. Summaries could be created to determine Win% and ROIs.

In the first case it would require a decent amount of programming to get the final reports. In the second case the amount of work would be on the front end getting the data into the DB.

raybo
01-29-2007, 08:32 PM
Most DBs have limits on the number of columns and the DRF data asis is not in relational format.


I don't know of any version of Excel that can handle 1400 columns. And there really isn't any reason to pass the data through Excel before sending it to the DB. Most if not all DBs have import functions that allow a TXT (or CSV) type file to be imported into a DB table(s). Again I think you will be forced to create multiple tables because of column limits (it should be separated anyway).


I know for sure Access can front end MYSQL. I suspect it can handle PostreqSQL and others. The advantage of using Access as the front end is that Access contains a drap and drop tool to build queries. You don't even have to know native SQL. Also Forms, macros and code can be used to automate things and make them "user-friendly".


Relational DBs are composed of groups of tables of related data. From the DRF data I can easily see 4 or 5 tables made up of horse data, trainer data, past performance lines, workouts, etc. Tables are made up of row and columns. Excel sheets are a form of "relational" tables. In a relational database data is not replicated in several places. Each "row" in a table represent one "record". Each column represent one field.

It appears that, at least in my case, extracting all the Bris data in single file format into a master spreadsheet and then drawing out the relavent data to 4 or 5 other sheets and then pasting them into the tables in Access would be the way to go. That way I only have to use "Infotran" once, which is a DOS program requiring typing from the keyboard. Once I get the data into each table and assign the correct columns as keys I should be able to start working on some queries. How does Access handle missing data? Will this screw up the query outputs? If so, how do handle errors? I imagine this can be pretty tricky.

Anyway, first things first, I need to make the sheets in Excel and the tables in Access, correct?

K9Pup
01-29-2007, 09:46 PM
It appears that, at least in my case, extracting all the Bris data in single file format into a master spreadsheet and then drawing out the relavent data to 4 or 5 other sheets and then pasting them into the tables in Access would be the way to go. That way I only have to use "Infotran" once, which is a DOS program requiring typing from the keyboard. Once I get the data into each table and assign the correct columns as keys I should be able to start working on some queries. How does Access handle missing data? Will this screw up the query outputs? If so, how do handle errors? I imagine this can be pretty tricky.

Anyway, first things first, I need to make the sheets in Excel and the tables in Access, correct?

Missing data depends on field definitions. By default missing data will be just that, null. You also can define default values for fields if that would be better. But since you are good in excel I would suggest maybe having your excel code check field values you might be concerned with and either correct them automatically or point them out to you so they can be manually corrected before you import them to Access.

raybo
01-29-2007, 09:48 PM
Missing data depends on field definitions. By default missing data will be just that, null. You also can define default values for fields if that would be better. But since you are good in excel I would suggest maybe having your excel code check field values you might be concerned with and either correct them automatically or point them out to you so they can be manually corrected before you import them to Access.

Gotcha, good idea. Think I'm ready to start building tables yet?

burrhog
01-29-2007, 10:11 PM
It sounds like ya'll have figured out that this is a big project. Big.
A couple of Questions:
1. Specifically, what data source will you be using? DRF PP's in the "Single File" format
2. What database management system (DBMS) will you use? Access is not going to cut it. An Access Project can only hold 10,000 records per table. A season in Ruidoso (sixty-two days) produces 38K Past Performance Records. If you eliminate duplication you still have near or over 10K.
3. What source will you use for results files?


It is going to take a full-fledged DBMS like Oracle, MSSQL or MYSQL. If you are going to go to the trouble and expense of setting it up, you might as well make it scalable. Even these full DBMS have limitations on how big a table can be. Usually 1024 fields. This is due to the way a DBMS works when querying (for instance, a simple query on a single table of 1000 fields would begin with the DBMS creating a virtual table of every field joined with every other field; 1000x1000. That’s a million pidgin holes in memory before anything happens).


Need a robust DBMS. You can get a state of the art, full-blown version of MSSQL 2005 for $50 from Microsoft. It’s a developer edition. This is not the free SQL2005 desktop engine. This even has neural networks and genetic algorithm components built in.


If you are going to go to the trouble of building this thing, I’d strongly recommend this software. If you have done any work in Access you shouldn’t have that many problems jumping to MSSQL. Once you do, you won’t look back.
There are all kinds of free tutorials and help support.

The choice of DBMS will probably be where the project flies apart. It doesn't have to be that way. There are many pre-DBMS decisions that are going to be common no matter what software you use.
Once the DBMS, data source, and purpose are decided it would be time to look at the data with consideration to how it will be used. More later…

raybo
01-29-2007, 10:32 PM
It sounds like ya'll have figured out that this is a big project. Big.
A couple of Questions:
1. Specifically, what data source will you be using? DRF PP's in the "Single File" format
2. What database management system (DBMS) will you use? Access is not going to cut it. An Access Project can only hold 10,000 records per table. A season in Ruidoso (sixty-two days) produces 38K Past Performance Records. If you eliminate duplication you still have near or over 10K.
3. What source will you use for results files?


It is going to take a full-fledged DBMS like Oracle, MSSQL or MYSQL. If you are going to go to the trouble and expense of setting it up, you might as well make it scalable. Even these full DBMS have limitations on how big a table can be. Usually 1024 fields. This is due to the way a DBMS works when querying (for instance, a simple query on a single table of 1000 fields would begin with the DBMS creating a virtual table of every field joined with every other field; 1000x1000. That’s a million pidgin holes in memory before anything happens).


Need a robust DBMS. You can get a state of the art, full-blown version of MSSQL 2005 for $50 from Microsoft. It’s a developer edition. This is not the free SQL2005 desktop engine. This even has neural networks and genetic algorithm components built in.


If you are going to go to the trouble of building this thing, I’d strongly recommend this software. If you have done any work in Access you shouldn’t have that many problems jumping to MSSQL. Once you do, you won’t look back.
There are all kinds of free tutorials and help support.

The choice of DBMS will probably be where the project flies apart. It doesn't have to be that way. There are many pre-DBMS decisions that are going to be common no matter what software you use.
Once the DBMS, data source, and purpose are decided it would be time to look at the data with consideration to how it will be used. More later…

Thanks for the info burrhog. Where is Cedar Creek, Tx.? Anywhere near Cedar Creek Lake? I live on Cedar Creek Lake.

raybo
01-30-2007, 12:32 AM
It sounds like ya'll have figured out that this is a big project. Big.
A couple of Questions:
1. Specifically, what data source will you be using? DRF PP's in the "Single File" format
2. What database management system (DBMS) will you use? Access is not going to cut it. An Access Project can only hold 10,000 records per table. A season in Ruidoso (sixty-two days) produces 38K Past Performance Records. If you eliminate duplication you still have near or over 10K.
3. What source will you use for results files?


It is going to take a full-fledged DBMS like Oracle, MSSQL or MYSQL. If you are going to go to the trouble and expense of setting it up, you might as well make it scalable. Even these full DBMS have limitations on how big a table can be. Usually 1024 fields. This is due to the way a DBMS works when querying (for instance, a simple query on a single table of 1000 fields would begin with the DBMS creating a virtual table of every field joined with every other field; 1000x1000. That’s a million pidgin holes in memory before anything happens).


Need a robust DBMS. You can get a state of the art, full-blown version of MSSQL 2005 for $50 from Microsoft. It’s a developer edition. This is not the free SQL2005 desktop engine. This even has neural networks and genetic algorithm components built in.


If you are going to go to the trouble of building this thing, I’d strongly recommend this software. If you have done any work in Access you shouldn’t have that many problems jumping to MSSQL. Once you do, you won’t look back.
There are all kinds of free tutorials and help support.

The choice of DBMS will probably be where the project flies apart. It doesn't have to be that way. There are many pre-DBMS decisions that are going to be common no matter what software you use.
Once the DBMS, data source, and purpose are decided it would be time to look at the data with consideration to how it will be used. More later…

10 running lines per horse x 10 horses per race x 10 races per card (I don't do turf or any races over 1m70 or any 2 year old races or any stakes/high allowance races, so I wouldn't be storing all races on a card), still gives me at least 100 cards or 100 race days and if I create tables for each track I deal with then I have at least 100 cards for each track. Wouldn't that be enough? I plan to replace oldest with newest anyway in order to keep the DB up to date. Couldn't tables for older records be created also in cases where a larger sample is wanted? From what I've read in this forum, many people are using Access and seem to have plenty of room. Are all these guys/gals running out of space for records?

raybo
01-30-2007, 12:35 AM
I'm not even sure I can handle creating a DB in the first place, so I sure don't want to buy a new one, like you suggest, and maybe not even be able to use it.

raybo
01-30-2007, 01:01 AM
It sounds like ya'll have figured out that this is a big project. Big.
A couple of Questions:
1. Specifically, what data source will you be using? DRF PP's in the "Single File" format

3. What source will you use for results files?




1. I'm not tied to DRF single file, but it appears that running data into Excel first is logical for me, since my fully automated handicapping program is in Excel and it would be simple to separate the data into different sheets before sending it to the DB.

3. I will probably use Bris' .xrd results files, as I have already used them in my spreadsheet and I have tons of them stored on my computer along with both corresponding DRF single or MCP past performance files. Would allow me to use the DB immediately, albeit, with older records than I prefer. Easily updated as I go though.

tupper
01-30-2007, 02:10 AM
I'm not even sure I can handle creating a DB in the first place, so I sure don't want to buy a new one, like you suggest, and maybe not even be able to use it.
All the software I will use for my database is free and open-source (free as in beer and free as in speech). I don't have to buy anything.

As was mentioned, Access can act as a front-end for more robust databases, such as free and open-sourced MySQL. But OpenOffice Base can probably also be used as a front-end for MySQL and PostregSQL, and OpenOffice is free and open-source.

In regards to throwing away or not loading certain data, such as turf records, what if some day one wants to try a theory about turf horses that start running in the dirt? I want to have all the data that I paid for at my disposal -- I would not trash any of it. And robust databases, like MySQL and PostregSQL can handle a lot of info.

BillW
01-30-2007, 02:21 AM
tupper,

I don't use OOBase, but I have it installed on my system. In looking at the connection wizard, I don't see postgreSQL, only MySQL. I do use postgreSQL which I can highly recommend, it is rock solid.

Kelso
01-30-2007, 04:06 AM
I buy BRIS "ultimate" PPs. They are in .pdf format.

I download the free "flash" results. They are in .csv format ... which I then save as .drf for consistancy.

I keep these files with an eye toward doing, eventually, what Raybo is undertaking to do now (with Excel and/or Access).

To K9 and other DB gurus in this thread: will I be able to use the .drf files in the same manner as you have described using .csv files?

Thanks very much.

BillW
01-30-2007, 04:20 AM
Kelso,

If you are talking about these flash results:

http://www.brisnet.com/cgi-bin/trk_report.cgi?frr

they are in .html/text format (not .csv) and won't do what you want without some programming. The .pdf files are not compatible with importing either. As a matter of fact, BRIS changed their chart format to .pdf to prevent exactly what you are trying to do. Of course with programming anything can be done (a lot more is required with the .pdf files than with the .html/text files)

BTW I am not a DB guru but the above I'm relatively sure of.

Bill

Kelso
01-30-2007, 04:40 AM
Kelso,

If you are talking about these flash results:

http://www.brisnet.com/cgi-bin/trk_report.cgi?frr

they are in .html/text format (not .csv) and won't do what you want without some programming. The .pdf files are not compatible with importing either. As a matter of fact, BRIS changed their chart format to .pdf to prevent exactly what you are trying to do. Of course with programming anything can be done (a lot more is required with the .pdf files than with the .html/text files)

BTW I am not a DB guru but the above I'm relatively sure of.

Bill



Bill,

I guess I mislabeled the "results" I download. BRIS calls them "Instant Charts" and they are here:
http://www.brisnet.com/cgi-bin/instant_pdf.cgi?type=inc

I just saved one in original format and it shows up in Windows Explorer as simply "file" format. Now I don't remember where I saw that I was downloading .csv files from BRIS. Perhaps it was some other info of theirs that I was trying out.

Very disappointed to learn that .pdf files will not export easily to Excel or Access. Any insight as to why BRIS would object to a customer using their files in a private database/spreadsheet after the customer has paid to download them?

Thank you very much for responding to my original question.

BillW
01-30-2007, 04:50 AM
Bill,

I guess I mislabeled the "results" I download. BRIS calls them "Instant Charts" and they are here:
http://www.brisnet.com/cgi-bin/instant_pdf.cgi?type=inc

Very disappointed to learn that .pdf files will not export easily to Excel or Access. Any insight as to why BRIS would object to a customer using their files in a private database/spreadsheet after the customer has paid to download them?

Thank you very much for responding to my original question.

Those are the charts I was referring to that BRIS changed to .pdf format. The original .html format was too easy to convert to import format. BRIS, Equibase and other data providers sell files for import. I think it was Equibase that squeezed the data providers to take steps to prevent free databasing of their data. The .pdf pp files you pay for simply are for display/printing only. If you want to build a database of PP's you can buy BRIS .csv import data files for $1.00 ->

http://www.brisnet.com/cgi-bin/trk_report.cgi?drs

Of course you cannot display or print these files without using a compatible program, of which there are many.

sjk
01-30-2007, 06:34 AM
[What database management system (DBMS) will you use? Access is not going to cut it. An Access Project can only hold 10,000 records per table. A season in Ruidoso (sixty-two days) produces 38K Past Performance Records.


I have 3.75 million races records in an Access table with no issue.

K9Pup
01-30-2007, 07:46 AM
Gotcha, good idea. Think I'm ready to start building tables yet?

Yep! Time to take the leap.

K9Pup
01-30-2007, 07:53 AM
2. What database management system (DBMS) will you use? Access is not going to cut it. An Access Project can only hold 10,000 records per table. A season in Ruidoso (sixty-two days) produces 38K Past Performance Records. If you eliminate duplication you still have near or over 10K.
3. What source will you use for results files?


It is going to take a full-fledged DBMS like Oracle, MSSQL or MYSQL. If you are going to go to the trouble and expense of setting it up, you might as well make it scalable. Even these full DBMS have limitations on how big a table can be. Usually 1024 fields. This is due to the way a DBMS works when querying (for instance, a simple query on a single table of 1000 fields would begin with the DBMS creating a virtual table of every field joined with every other field; 1000x1000. That’s a million pidgin holes in memory before anything happens).


Need a robust DBMS. You can get a state of the art, full-blown version of MSSQL 2005 for $50 from Microsoft. It’s a developer edition. This is not the free SQL2005 desktop engine. This even has neural networks and genetic algorithm components built in.


If you are going to go to the trouble of building this thing, I’d strongly recommend this software. If you have done any work in Access you shouldn’t have that many problems jumping to MSSQL. Once you do, you won’t look back.
There are all kinds of free tutorials and help support.

The choice of DBMS will probably be where the project flies apart. It doesn't have to be that way. There are many pre-DBMS decisions that are going to be common no matter what software you use.
Once the DBMS, data source, and purpose are decided it would be time to look at the data with consideration to how it will be used. More later…

That's not true. I have over 2 million rows in one Access table. You can use these more robust DBs but then you throw in the learning curve of DB management along with coding of the import, writing queries. etc. etc. Access "management" is simple because Access actually stores the tables in a flat sequential file. So the major management function is to ocassionally compress the database file.

And since Access can front end most of these other "robust" DBs you can start using the Access DB. Then once you reach a point where Access doesn't meet your needs the data can be easily ported to other products. Personally I haven't reach that point yet.

K9Pup
01-30-2007, 08:00 AM
I buy BRIS "ultimate" PPs. They are in .pdf format.

I download the free "flash" results. They are in .csv format ... which I then save as .drf for consistancy.

I keep these files with an eye toward doing, eventually, what Raybo is undertaking to do now (with Excel and/or Access).

To K9 and other DB gurus in this thread: will I be able to use the .drf files in the same manner as you have described using .csv files?

Thanks very much.

Any "normal" TXT or CSV type file can be fairly easily imported. Even HTML files can be "read" and data stripped and added to records. That may require coding, the txt and csv files can be imported easier.

Sorry but PDF files aren't a good candidate for programs to read. There IS software available that converts PDFs to TXT but I have found them to not work as well as you need. Any file that you can open in a text editor, look at and easily understand where the data is can be imported. Open a PDF file and you will see that isn't the case with them.

burrhog
01-31-2007, 12:06 AM
I have 3.75 million races records in an Access table with no issue. I stand corrected. I had been using Access Project as the front end for a SQL DB. My queries only returned 10,000 records. I just looked it up and the limitation is on the JET connector, specifically the locking. There is a way around this by altering the registry (this is per the Microsoft Knowledge Base). I quit using Access the first time I realized my queries were coming up short. Could have hacked around it I guess.

I was certain that Access had cut me off!

burrhog
01-31-2007, 12:22 AM
I am not sure what people are talking about when they are saying they load the files up in Excel or they intend to load them up into Excel before importing them into Access. Are they using the multifile format? I am positive (famous last words) Excel won’t handle a sheet with more than 256 fields.

Are people pre-processing drf files or using something else?

BillW
01-31-2007, 12:35 AM
Excel won’t handle a sheet with more than 256 fields.


Supposedly the new version does.

raybo
01-31-2007, 12:35 AM
Yep! Time to take the leap.

After thinking about this for a day and looking at all the data in the Bris .MCP files, it appears I need the following DB tables:

1. Current Race Data (Approx. 50 field names)
2. Trainer/Jockey/Owner Data (Approx. 75 field names)
3. Horse Data (to include both current and historical data) (Approx. 75 field names)
4. Workouts (Approx 10 field names)
5. Past Performances (Approx. 100 field names)
6. Results (Bris .XRD files) (Approx. 120 field names)

Does this look about right?

(I'm thinking I might add another one later containing handicapped races with with running lines selected, etc., so I can track what I'm doing on the handicapping side)

BillW
01-31-2007, 12:41 AM
After thinking about this for a day and looking at all the data in the Bris .MCP files, it appears I need the following DB tables:

1. Current Race Data (Approx. 50 field names)
2. Trainer/Jockey/Owner Data (Approx. 75 field names)
3. Horse Data (to include both current and historical data) (Approx. 75 field names)
4. Workouts (Approx 10 field names)
5. Past Performances (Approx. 100 field names)
6. Results (Bris .XRD files) (Approx. 120 field names)

Does this look about right?

(I'm thinking I might add another one later containing handicapped races with with running lines selected, etc., so I can track what I'm doing on the handicapping side)

I put exotic payouts in a separate table. It simplifies the multiple to one relationship with the race table.

raybo
01-31-2007, 12:43 AM
I am not sure what people are talking about when they are saying they load the files up in Excel or they intend to load them up into Excel before importing them into Access. Are they using the multifile format? I am positive (famous last words) Excel won’t handle a sheet with more than 256 fields.

Are people pre-processing drf files or using something else?

I use Excel and load almost all the data from a single file format (.DRF or .MCP) file into my master sheet and then reference those cell locations on other sheets where I do calculations and print formatting, etc. In Excel you don't have to have every data field named at the top of the sheet in a single row. For example: on a card of 10 races with 10 horses in each race, I would have 1300 rows of data about 50 columns wide (13 rows for each horse on the card). Only trouble I ran into was with dates. I have to do some things on some of the dates to get them to all appear in the same format.

raybo
01-31-2007, 12:44 AM
I put exotic payouts in a separate table. It simplifies the multiple to one relationship with the race table.

Appreciate the suggestion, sounds logical, that's where all my wagering is done anyway.

raybo
01-31-2007, 12:55 AM
Will have to revamp my Infotran control file to accomodate a complete single file format Bris file. Currently I'm not extracting every field in the Bris files, just the ones I currently use. Oh well, it'll be worth it in the end.

Anyone have code to separate the data without having to use Infotran? That would probably be more efficient, instead of making a control file for each category of data?

tupper
01-31-2007, 01:58 AM
Evidently, Excel 2007 can handle 16,000 columns:
http://msdn2.microsoft.com/en-us/library/aa730921.aspx#Office2007excelPerf_Overview

tupper
01-31-2007, 02:09 AM
I don't use OOBase, but I have it installed on my system. In looking at the connection wizard, I don't see postgreSQL, only MySQL. I do use postgreSQL which I can highly recommend, it is rock solid.
Thanks for the info on OOBase/MySQL.

Do you use a GUI manager for postgreSQL, like pgAdmin? If so, could such a GUI manager help in working with handicapping data?

Thanks,
-Dale

BillW
01-31-2007, 02:25 AM
Thanks for the info on OOBase/MySQL.

Do you use a GUI manager for postgreSQL, like pgAdmin? If so, could such a GUI manager help in working with handicapping data?

Thanks,
-Dale
Dale,

No, just using the command line interface (psql). It does the same thing as pgAdmin which is to perform administration functions on the database. I'm not sure it will do anything application specific that would be useful. I'm doing the actual application stuff using G++/Gtkmm with the postgreSQL interface library. If you're not into programming, you're probably better off using OOBase/MySQL - after confirming that it will do what you want of course.

Bill

hcap
01-31-2007, 07:43 AM
rayboAnyone have code to separate the data without having to use Infotran? That would probably be more efficient, instead of making a control file for each category of data?I mentioned this before. I have an excel program that will. Not 2007. Puts all fields from the data files on 6 rows on one sheet. 256 or less columns per row. If you want to stay in excel and do some pre-processing before you import to Access or another database, you may find it useful.

I calculate factors from each runners 10 lines and consolidate 10 lines into a hundred plus factors. Although I have set it up to output those 10 lines as raw info to be inported into another program.

If you want to set up tables on 5 or 6 sheets to be imported into Access you would just reference the raw data. It will run in batch mode. You put data files in one folder, result files in another. Will process 100+cards at a time automatically.

K9Pup
01-31-2007, 07:54 AM
After thinking about this for a day and looking at all the data in the Bris .MCP files, it appears I need the following DB tables:

1. Current Race Data (Approx. 50 field names)
2. Trainer/Jockey/Owner Data (Approx. 75 field names)
3. Horse Data (to include both current and historical data) (Approx. 75 field names)
4. Workouts (Approx 10 field names)
5. Past Performances (Approx. 100 field names)
6. Results (Bris .XRD files) (Approx. 120 field names)

Does this look about right?

(I'm thinking I might add another one later containing handicapped races with with running lines selected, etc., so I can track what I'm doing on the handicapping side)

2. Would you store 1 record per trainer? Another for Jockey and another for Owner? Or would you combine them into one? So you would have one record per trainer/jockey/owner combination? Over time would you have only one occurrance or would you want to store this information each time ?

3. Same question, would you only want to maintain one record per horse, replacing with the recent info, or multiple records?

6. I see that BillW says to store exotics in a separate table. I don't do that. I have a payouts table that contains one row for each race. The columns are the finish information and pays for WPS, exotics. I'm not sure what his breakout would be.

In most of these cases I suspect you will want to ADD the new records and retain the old. If so you probably want to include fields in each file that can be used to "link" them together. For example in the jockey file you might want Track,RaceDate,Race. That might make it easier to join up the info later.

Also define unqiue keys where needed so you don't get unwanted duplicate records. In the workouts table the key would probably be horse/date. That way Access will not add a second record for the same date/horse.

tupper
01-31-2007, 12:31 PM
... just using the command line interface (psql). It does the same thing as pgAdmin which is to perform administration functions on the database. I'm not sure it will do anything application specific that would be useful. I'm doing the actual application stuff using G++/Gtkmm with the postgreSQL interface library. If you're not into programming, you're probably better off using OOBase/MySQL - after confirming that it will do what you want of course.

Bill
Another database person suggested that something other than a GUI or command-line interface is needed, but I am not capable of programming.

At any rate, it looks like OOBase works with postgreSQL (see 3rd paragraph): http://en.wikipedia.org/wiki/OpenOffice.org_Base

Thanks,
-Dale

BillW
01-31-2007, 12:42 PM
Another database person suggested that something other than a GUI or command-line interface is needed, but I am not capable of programming.

At any rate, it looks like OOBase works with postgreSQL (see 3rd paragraph): http://en.wikipedia.org/wiki/OpenOffice.org_Base

Thanks,
-Dale

Of course, via ODBC/JDBC duh! :bang: :) Sorry about the misinfo.

richrosa
01-31-2007, 03:12 PM
I'll stick my nose in here, if only to promote a viable solution.

HandicappingOS was built using MySQL. Not only is it feature-rich and scaleable, it is also well supported on the Internet and richly well-documented. Maybe most important of all it is FREE!!

I don't want to take to many arrows for this, but IMHOO, writing queries in MySQL is probably easier than trying to stretch Excel to do something its not really good at, or doing anything in Access. Once you get the hang of directly , writing SQL, you'll have the most power, and may never go back to the clunky, hard to use GUI tools that Microsoft is so famous for.

I have a book in me that I'm threatening to write about how to make a handicapping program using purely open source tools.

K9Pup
01-31-2007, 03:22 PM
I'll stick my nose in here, if only to promote a viable solution.

HandicappingOS was built using MySQL. Not only is it feature-rich and scaleable, it is also well supported on the Internet and richly well-documented. Maybe most important of all it is FREE!!

I don't want to take to many arrows for this, but IMHOO, writing queries in MySQL is probably easier than trying to stretch Excel to do something its not really good at, or doing anything in Access. Once you get the hang of directly , writing SQL, you'll have the most power, and may never go back to the clunky, hard to use GUI tools that Microsoft is so famous for.

I have a book in me that I'm threatening to write about how to make a handicapping program using purely open source tools.

I think it depends on where you are coming as to whether it is easier to write SQL queries than it is to use the Access drag and drop tool. Writing simple SELECTs and doing some things in SQL isn't bad. But the drag and drop query writer in Access requires almost 0 programming knowledge.

Besides Access can front end MySQL. If one chooses to use MySQL I would suggest they also use Access to front end it, at least initially. Queries can be built using the Access design tool and then you can easily select SQL view and see the native SQL code. That code can then be copied and pasted into MySQL if so desired.

I can't imagine that form building, macro automating and grouping of queries can be that much easier than what Access already gives.

Just my opinion of course.

richrosa
01-31-2007, 03:56 PM
I think it depends on where you are coming as to whether it is easier to write SQL queries than it is to use the Access drag and drop tool. Writing simple SELECTs and doing some things in SQL isn't bad. But the drag and drop query writer in Access requires almost 0 programming knowledge.

I agree with you on simple queries. Access is easy to make simple selects and joins. My point is targeted at those queries that are more complex where Access does not perform well. When you take advantage of an RDBMS by using normalized table structures to represent complex relationships, you are probably writing queries that cannot be handled through the GUI. Since you are going to wind up writing SQL anyway, it might as well be ANSI compliant (and free!!!), as opposed to the strange variant that Access uses (SQL-Server is ANSI).

sjk
01-31-2007, 04:08 PM
I don't know what all of the acronyms mean but would be curious to see examples of what would be hard to get done in Access. I have never found anything that I needed to do and was unable to do.

K9Pup
01-31-2007, 04:10 PM
I don't know what all of the acronyms mean but would be curious to see examples of what would be hard to get done in Access. I have never found anything that I needed to do and was unable to do.

Yeah I agree. I have tons of queries, but none of them were written in native SQL.

richrosa
01-31-2007, 04:18 PM
This is far from complex, but maybe meaningful as an example. I can certainly elaborate if needed. As you get into deeper questions, the queries get more complicated. This query contains left joins and subqueries, but not derived tables.



SELECT ppID, ppRaceDate, ppDaysPrev, ppTrackCode, raceComments, tripNote
FROM pp as p
LEFT JOIN race as r
ON (r.raceDate = p.ppRaceDate
AND p.ppTrackCode = r.track
AND p.ppRaceNum = r.raceNUM)
LEFT JOIN race_Comments as rc
ON (r.raceID = rc.raceID
AND rc.saveComments = 1
AND rc.userID = 1)
LEFT JOIN entry as e ON (p.entryID = e.entryID)
LEFT JOIN trip_notes as tn ON (r.raceID = tn.raceID
AND e.horseName = tn.horseName
AND tn.userID = 1)
WHERE p.entryID = 394877
ORDER BY p.ppRaceDate DESC

sjk
01-31-2007, 04:33 PM
I was hoping for a problem posed in English that I might be able to think about how I would deal with in Access. I have never used the SQL so it is not worthwhile for me to take the time to understand the example above.

Probably my question is a silly one. Every one has a language that they are confortable with and works for them.

I have a full black box ahndicapping program written in Access so I am guessing that an example that I would have a hard time getting done with Access would be more than a few lines of SQL.

K9Pup
01-31-2007, 04:48 PM
This is far from complex, but maybe meaningful as an example. I can certainly elaborate if needed. As you get into deeper questions, the queries get more complicated. This query contains left joins and subqueries, but not derived tables.



SELECT ppID, ppRaceDate, ppDaysPrev, ppTrackCode, raceComments, tripNote
FROM pp as p
LEFT JOIN race as r
ON (r.raceDate = p.ppRaceDate
AND p.ppTrackCode = r.track
AND p.ppRaceNum = r.raceNUM)
LEFT JOIN race_Comments as rc
ON (r.raceID = rc.raceID
AND rc.saveComments = 1
AND rc.userID = 1)
LEFT JOIN entry as e ON (p.entryID = e.entryID)
LEFT JOIN trip_notes as tn ON (r.raceID = tn.raceID
AND e.horseName = tn.horseName
AND tn.userID = 1)
WHERE p.entryID = 394877
ORDER BY p.ppRaceDate DESC


Ok so you have
1. PP table that contains post position data?
2. RACE table that contains the race data
3. RACE_COMMENTS table that contains comments for that race?
4. ENTRY table that contains the entries for multiple races?
5. TRIP_NOTES table that contains notes by horse.

In Access I would
1. add each of these tables to the query grid.
2. drag and drop (JOIN) PPRACEDATE in the PP table to RACEDATE in RACE table.
3. JOIN PPTC to TRACK in RACE table
4. JOIN ENTRYID to ENTRYID in the ENTRY table.
5. Drag ENTRYID to the field grid and specify criteria as =394877
6. From the RACE table I would JOIN RACEID to RACEID in the RACE_COMMENTS table.
7. JOIN RACEID to RACEID in theTRIP_NOTES table
8. Drag SAVECOMMENTS and USERID to field grid, specify =1 for both.
9. Drag TRIP_NOTES USERID to grid and also specify =1.
10. Drag the PP fields you wish to output to the field grid.

sjk
01-31-2007, 04:56 PM
Of course Access can show SQL for the queries. Here is some that would seem like reasonably complex queries in Access (as translated to SQL):

SELECT DISTINCTROW [PP RPT ODDS MK1x].POST, [PP RPT ODDS MK1x].TRACK, [PP RPT ODDS MK1x].RACE, [PP RPT ODDS MK1x].NAME, [PP RPT ODDS MK1x].ODDS, [PP RPT ODDS MK1x].PCT AS P, [PP RPT ODDS MK1x].[ODDS CALC], [PP RPT ODDS MK1x].MPCT AS MP, [PP RPT ODDS MK1x].ENTRY, [P]*([LS]*[SFACT]+[LN]*[NFACT]) AS SP, [P]*([MS]*[SFACT]+[MN]*[NFACT]+[MC]*[CFACT]+[MC+]*[C+FACT]) AS NP, [P]*([UC]*[CFACT]+[UC+]*[C+FACT]) AS CP, [p]*([SPCT]*[Sfact]+[NPCT]*[Nfact]+[CPCT]*[Cfact]+[C+PCT]*[C+fact]) AS AP1, [p]*(IIf([SdPCT] Is Null,[sadpct],[sdpct])*[Sfact]+IIf([NdPCT] Is Null,[nadpct],[ndpct])*[Nfact]+IIf([CdPCT] Is Null,[cadpct],[cdpct])*[Cfact]+IIf([C+dPCT] Is Null,[c+adpct],[c+dpct])*[C+fact]) AS AP2, 0.9*IIf([AP2] Is Null,[ap1],[ap2])+0.1*IIf([AP1] Is Null,[ap2],[ap1]) AS AP, [ap]*[place ax factor] AS pap, IIf([prob factor]=0.0001,"SCR",[pp rpt odds factors].[NEW DIST]) AS [new dist], [PP RPT ODDS MK1x].[odds replace], [PP RPT ODDS MK1x].[TWO TURNS], 0 AS [trainer good layoff], [PP RPT ODDS MK1x].badjockadd, [distance/bias all pcts].POLY
FROM (((([PP RPT ODDS FACTORS] RIGHT JOIN [PP RPT ODDS MK1x] ON ([PP RPT ODDS FACTORS].TRACK = [PP RPT ODDS MK1x].TRACK) AND ([PP RPT ODDS FACTORS].RACE = [PP RPT ODDS MK1x].RACE) AND ([PP RPT ODDS FACTORS].NAME = [PP RPT ODDS MK1x].NAME)) LEFT JOIN [PP RPT TRACK BIAS NEW] ON ([PP RPT ODDS MK1x].SURFACE = [PP RPT TRACK BIAS NEW].SURFACE) AND ([PP RPT ODDS MK1x].TRACK = [PP RPT TRACK BIAS NEW].TRACK)) LEFT JOIN [distance/bias pcts] ON ([PP RPT ODDS MK1x].[f/m] = [distance/bias pcts].[f/m]) AND ([PP RPT ODDS MK1x].[DISTANCE TR] = [distance/bias pcts].[DISTANCE TR]) AND ([PP RPT ODDS MK1x].TRACK = [distance/bias pcts].TRACK) AND ([PP RPT ODDS MK1x].SURFACE = [distance/bias pcts].SURFACE)) INNER JOIN [distance/bias all pcts] ON ([PP RPT ODDS MK1x].[f/m] = [distance/bias all pcts].[f/m]) AND ([PP RPT ODDS MK1x].[DISTANCE TR] = [distance/bias all pcts].[DISTANCE TR]) AND ([PP RPT ODDS MK1x].SURFACE = [distance/bias all pcts].SURFACE)) INNER JOIN [QUARTILE FACTORS] ON ([PP RPT ODDS MK1x].[DISTANCE TR] = [QUARTILE FACTORS].[DISTANCE TR]) AND ([PP RPT ODDS MK1x].SURFACE = [QUARTILE FACTORS].SURFACE)
WHERE ((([distance/bias all pcts].POLY)=IIf([IS POLY] Is Null,"N",[IS POLY])))
WITH OWNERACCESS OPTION;

Or
SELECT DISTINCTROW [PP REPORT ENTRIES].NAME, [PP REPORT ENTRIES].PDATE, [PP REPORT ENTRIES].PTRACK, [PP REPORT ENTRIES].TTRACK AS TRACK, [PP REPORT ENTRIES].date AS TDATE, [pp report entries].[Date]-[PP REPORT ENTRIES].[PDATE] AS DAYS, [PP REPORT ENTRIES].RACE, [PP REPORT ENTRIES].[1ST CALL], [PP REPORT ENTRIES].[2ND CALL], [PP REPORT ENTRIES].[3RD CALL], [PP REPORT ENTRIES].PSURF, IIf([PP REPORT ENTRIES].[RACE INT]=1,0.4243*(1-[days]*0.001349),IIf([PP REPORT ENTRIES].[RACE INT]=2,0.2648*(1-0.000355*([LATEST DATE]-[PP REPORT ENTRIES].[PDATE])),IIf([PP REPORT ENTRIES].[RACE INT]=3,0.1945,IIf([pp report entries].[race int]=4,0.1617,0.12))))*IIf([days]>366,0,IIf([days]<200,(1-[DAYS]/300),0.3333-([days]-200)/500))*IIf([TRK CND]="FT",1,[MUD WT]) AS [RACE WT], [PP REPORT ENTRIES].[RACE INT], -([5TH CALL] Is Null) AS PSPRINT, [PP REPORT ENTRIES].PDIST, [PP TRAINER CHG 3B].[PCT ADJ], [PP RPT ASR PROJ S2].[RUN IN ADJ], [SPEED RATING]+[ADJ/AGE]+IIf([pp report entries].[RUN IN ADJ] Is Null,0,[pp report entries].[RUN IN ADJ])+IIf([AVGOFSRD] Is Null,0,[FACTOR]*([AVGOFSRD]-[pp report entries].[run in adj]))+[CLEAR ADJ]+[BIAS/STYLE ADJ]+[PCT ADJ]-[wt pts]+IIf([JOCK PCT] Is Null,0,19.75*([jock pct]-0.12))-IIf([prob factor]=0.0001,30,0) AS SR, [PP RPT ASR PROJ S2].[STYLE VAL] AS [AvgOfSTYLE VAL], [PP REPORT ENTRIES].[RACE PAR] AS PPAR INTO [PP RPT ASR PROJ T0A]
FROM (([PP REPORT MUD WT 2] INNER JOIN ([PP RPT ASR PROJ T0XXX] INNER JOIN (([PP TRAINER CHG 3B] RIGHT JOIN ([PP REPORT ENTRIES] LEFT JOIN [PP RUN BACK FACTORS] ON [PP REPORT ENTRIES].START = [PP RUN BACK FACTORS].START) ON ([PP TRAINER CHG 3B].[RACE INT] = [PP REPORT ENTRIES].[RACE INT]) AND ([PP TRAINER CHG 3B].NAME = [PP REPORT ENTRIES].NAME)) INNER JOIN [PP RPT ASR PROJ S2] ON ([PP REPORT ENTRIES].PDATE = [PP RPT ASR PROJ S2].PDATE) AND ([PP REPORT ENTRIES].NAME = [PP RPT ASR PROJ S2].NAME)) ON [PP RPT ASR PROJ T0XXX].NAME = [PP REPORT ENTRIES].NAME) ON [PP REPORT MUD WT 2].NAME = [PP REPORT ENTRIES].NAME) INNER JOIN [exacta odds adjust] ON ([PP REPORT ENTRIES].TTRACK = [exacta odds adjust].TRACK) AND ([PP REPORT ENTRIES].RACE = [exacta odds adjust].RACE) AND ([PP REPORT ENTRIES].NAME = [exacta odds adjust].NAME)) INNER JOIN [today's tracks] ON ([PP REPORT ENTRIES].date = [today's tracks].date) AND ([exacta odds adjust].TRACK = [today's tracks].TRACK)
WHERE ((([PP REPORT ENTRIES].PDATE)>[pp report entries].[date]-IIf([SURF]="DIRT",75,120)) AND (([PP REPORT ENTRIES].PSURF)=[SURF]) AND (([today's tracks].[in])=1)) OR ((([PP REPORT ENTRIES].PSURF)=[SURF]) AND (([PP REPORT ENTRIES].[RACE INT])<5) AND (([today's tracks].[in])=1))
ORDER BY [PP REPORT ENTRIES].NAME, [PP REPORT ENTRIES].PDATE
WITH OWNERACCESS OPTION;

If you like these there are hundreds more where these came from.

njcurveball
01-31-2007, 05:16 PM
Nice job fellas. That is far from complex! :jump:

raybo
01-31-2007, 05:49 PM
Nice job fellas. That is far from complex! :jump:





LOL!!!

K9Pup
01-31-2007, 05:52 PM
Nice job fellas. That is far from complex! :jump:

:) Complex of course is always relative. The code SJK was NOT code he had to manually type himself. That is the code generated by using the Access query grid process like I TRIED to describe above.

Trust me, describing it is much harder than actually doing it. But it IS always beneficial to have SOME programming knowledge. At least the basic concepts.

richrosa
01-31-2007, 05:55 PM
Ok so you have
1. PP table that contains post position data?
2. RACE table that contains the race data
3. RACE_COMMENTS table that contains comments for that race?
4. ENTRY table that contains the entries for multiple races?
5. TRIP_NOTES table that contains notes by horse.

In Access I would
1. add each of these tables to the query grid.
2. drag and drop (JOIN) PPRACEDATE in the PP table to RACEDATE in RACE table.
3. JOIN PPTC to TRACK in RACE table
4. JOIN ENTRYID to ENTRYID in the ENTRY table.
5. Drag ENTRYID to the field grid and specify criteria as =394877
6. From the RACE table I would JOIN RACEID to RACEID in the RACE_COMMENTS table.
7. JOIN RACEID to RACEID in theTRIP_NOTES table
8. Drag SAVECOMMENTS and USERID to field grid, specify =1 for both.
9. Drag TRIP_NOTES USERID to grid and also specify =1.
10. Drag the PP fields you wish to output to the field grid.

I'm not trying to sway you from Access. I'm trying to establish a point that many things that are hard to do in Acess through the GUI are easy in SQL.

Your response above is correct but not exact.
1) How would you deal with the possibility that there are no entries in the trip_notes table. I assume (maybe incorrectly so) that your query would return no rowns if it was a straight join and there wasn't a trip note.
2) How would you handle the subqueries on the left joins?

K9Pup
01-31-2007, 06:09 PM
I'm not trying to sway you from Access. I'm trying to establish a point that many things that are hard to do in Acess through the GUI are easy in SQL.

I understand what you are saying. But I think it has a lot to do with what you are used to using. That SQL code may seem simple to you, but to me the Access solution is easier. To others digging through a stack of old programs and index cards might be answer.


Your response above is correct but not exact.
1) How would you deal with the possibility that there are no entries in the trip_notes table. I assume (maybe incorrectly so) that your query would return no rowns if it was a straight join and there wasn't a trip note.

Depends on which option you choose on the join. In this case you would select to include rows from the first table AND rows from the joined table when the joined fields match. So I would get the rows from the entries table, the fields from the trip_notes table would be null when the join fields didn't exist in the trip_notes.

If instead I select the option that requires BOTH tables to have the join fields then yes I would have not returned rows.

I know this process pretty well. It is almost exactly what I do for my dog handicapping program.


2) How would you handle the subqueries on the left joins?

You mean where you have selection criteria on those other tables? Those fields can be dropped on the query grid and the criteria for each specified there. The criteria can be either ANDs or ORs. The fields can be marked for output or not depending on the desired results.

raybo
01-31-2007, 06:10 PM
rayboI mentioned this before. I have an excel program that will. Not 2007. Puts all fields from the data files on 6 rows on one sheet. 256 or less columns per row. If you want to stay in excel and do some pre-processing before you import to Access or another database, you may find it useful.

I calculate factors from each runners 10 lines and consolidate 10 lines into a hundred plus factors. Although I have set it up to output those 10 lines as raw info to be inported into another program.

If you want to set up tables on 5 or 6 sheets to be imported into Access you would just reference the raw data. It will run in batch mode. You put data files in one folder, result files in another. Will process 100+cards at a time automatically.

This sounds like the ticket! Infotran is pretty clutcy, to say the least, and you can only extract data from one card at a time.

If you don't mind letting me look at it, I would appreciate it.

You can message me or Yahoo me at the link under my name.

Thanks hcap !!

raybo
01-31-2007, 06:43 PM
2. Would you store 1 record per trainer? Another for Jockey and another for Owner? Or would you combine them into one? So you would have one record per trainer/jockey/owner combination? Over time would you have only one occurrance or would you want to store this information each time ?

3. Same question, would you only want to maintain one record per horse, replacing with the recent info, or multiple records?

6. I see that BillW says to store exotics in a separate table. I don't do that. I have a payouts table that contains one row for each race. The columns are the finish information and pays for WPS, exotics. I'm not sure what his breakout would be.

In most of these cases I suspect you will want to ADD the new records and retain the old. If so you probably want to include fields in each file that can be used to "link" them together. For example in the jockey file you might want Track,RaceDate,Race. That might make it easier to join up the info later.

Also define unqiue keys where needed so you don't get unwanted duplicate records. In the workouts table the key would probably be horse/date. That way Access will not add a second record for the same date/horse.

K9Pup,

Let me think about this a little. Had been thinking that keeping the number of tables smaller would be better. That way I wouldn't have so many tables to worry about joining or whatever. However, as you suggest, it might be better, for future considerations, to do as you say. At the moment, I am concerned with keeping it simple, since this is my first foray into the DB realm.

I would want to retain all records for a period of time, as yet undecided, and once I reach that period of time, start replacing the oldest records with new ones to keep the DB up to date with what is currently happening, in all areas of the industry. However I would like to be able to archive the records that get replaced somewhere in the DB in case I want to run something against a larger sample.

I would think that for Race, Horse, PP's, and Results I would need "Track, Date, Race#" as common fields. Possibly I would also need that info in the trainer/jockey/owner table also. I don't really know at the moment, as I don't normally use this data much in my handicapping unless I am wondering why money is being bet on horses that I feel is unusual, or if I'm looking hard for signs of a trainer really going for it by putting a jock up that was successful on the horse in the past, etc., etc. I guess I haven't thought far enough ahead, concerning possible uses for the data in the future. I realize that when creating any program one should know what the end result needs to be before starting in order to do it right from the git-go, but I just don't know for sure how this thing will evolve over time.

Any suggestions along these lines would certainly be appreciated from you or anyone reading or participating in this thread. But please, be as specific as possible and state reasons why you would suggest doing things one way or another.

K9Pup
02-01-2007, 07:47 AM
K9Pup,

Let me think about this a little. Had been thinking that keeping the number of tables smaller would be better. That way I wouldn't have so many tables to worry about joining or whatever. However, as you suggest, it might be better, for future considerations, to do as you say. At the moment, I am concerned with keeping it simple, since this is my first foray into the DB realm.


The joining isn't a big issue, provided that you have fields in common to join on. I would just suggest that you group data together that makes logical sense. You wouldn't want "race" data in the same table as owner data, etc. No different than a spreadsheet, the rows in the table need to be logical.


I would want to retain all records for a period of time, as yet undecided, and once I reach that period of time, start replacing the oldest records with new ones to keep the DB up to date with what is currently happening, in all areas of the industry. However I would like to be able to archive the records that get replaced somewhere in the DB in case I want to run something against a larger sample.


In some of the tables you might want to also include fields like DateAdded and maybe SourceFile to later be able to tell where that piece of data came from. SourceFile is a good item to have when you find data problems down the road. Then you know exactly which raw file provided that data and can determine WHY it was a problem.


I would think that for Race, Horse, PP's, and Results I would need "Track, Date, Race#" as common fields. Possibly I would also need that info in the trainer/jockey/owner table also. I don't really know at the moment, as I don't normally use this data much in my handicapping unless I am wondering why money is being bet on horses that I feel is unusual, or if I'm looking hard for signs of a trainer really going for it by putting a jock up that was successful on the horse in the past, etc., etc. I guess I haven't thought far enough ahead, concerning possible uses for the data in the future. I realize that when creating any program one should know what the end result needs to be before starting in order to do it right from the git-go, but I just don't know for sure how this thing will evolve over time.


Think of this whole thing as a learning process. You can spend weeks, months designing what you want. But I can tell you no matter how much time you spend, changes will occur. So I suggest spending SOME time on the design, but also expect the design to change many times. I was a mainframe systems programmer, so our motto was always JUST DO IT!! Heck problems can be fixed later. :)

For the trainer/jockey/owner table you might use the racedate and track. You might also want to include fields in the PP data that has the name of the trainer, jockey and owner. Then those fields can be joined to the t/j/o table to get DETAIL information about that t/j/o.



Any suggestions along these lines would certainly be appreciated from you or anyone reading or participating in this thread. But please, be as specific as possible and state reasons why you would suggest doing things one way or another.

The data you are dealing with is a little different than what I am use to. In my case I get past program lines and "today's" race setup from a race program. There is some "historic" data in the race programs, but I don't use that.

In your case the file contains "current" data but also some historic. So when you get a jockey's lifetime record in today's data that includes all of his races up to that day. Next week when you get that jockeys history it is updated through that race. So it probably makes sense to keep BOTH of those history records in your database. After you have collected more data you may want to go back and do some "what ifs". And you would probably want to base t/j/o selections on the data that was current as of that race day.

Do you plan to keep trainer, owner and jockey data in the same table? Or separate tables for each? Is the data for each basically the same?

raybo
02-01-2007, 07:52 AM
The joining isn't a big issue, provided that you have fields in common to join on. I would just suggest that you group data together that makes logical sense. You wouldn't want "race" data in the same table as owner data, etc. No different than a spreadsheet, the rows in the table need to be logical.



In some of the tables you might want to also include fields like DateAdded and maybe SourceFile to later be able to tell where that piece of data came from. SourceFile is a good item to have when you find data problems down the road. Then you know exactly which raw file provided that data and can determine WHY it was a problem.



Think of this whole thing as a learning process. You can spend weeks, months designing what you want. But I can tell you no matter how much time you spend, changes will occur. So I suggest spending SOME time on the design, but also expect the design to change many times. I was a mainframe systems programmer, so our motto was always JUST DO IT!! Heck problems can be fixed later. :)

For the trainer/jockey/owner table you might use the racedate and track. You might also want to include fields in the PP data that has the name of the trainer, jockey and owner. Then those fields can be joined to the t/j/o table to get DETAIL information about that t/j/o.



The data you are dealing with is a little different than what I am use to. In my case I get past program lines and "today's" race setup from a race program. There is some "historic" data in the race programs, but I don't use that.

In your case the file contains "current" data but also some historic. So when you get a jockey's lifetime record in today's data that includes all of his races up to that day. Next week when you get that jockeys history it is updated through that race. So it probably makes sense to keep BOTH of those history records in your database. After you have collected more data you may want to go back and do some "what ifs". And you would probably want to base t/j/o selections on the data that was current as of that race day.

Do you plan to keep trainer, owner and jockey data in the same table? Or separate tables for each? Is the data for each basically the same?

Thanks for the info, gotta run right now but will get back to you this afternoon.

RobinFromIreland
02-01-2007, 09:40 AM
Ok, I'll just mention what I'm currently doing - I'll try not to use computer-speak. I'm parsing racing data from a website (Java), normalising the data (section up the data into logical blocks) and putting it into my own relational database (MySQL). I'm building it so that it should be able to parse any type of data source and hopefully in the end, it can come out with a worldwide racing results database to be used for handicapping purposes. I'm hoping to make it open-source (free to use).

I'm designing my code so that, if the webpages or data sources change (they do more often than you would think) - the code can self-configure and find the new data, or, perhaps more usefully, you simply plug-in a new version of the code that looks for a certain bit of data.

So, in terms of the parsing, I have the following interfaces:

a Parser - this parses the webpage/datasource and gets all data from the source into a type of container
an Extractor - this limits the parsed data to the locations (defined in an XML file - easy to read/modify) that contain the data I think is useful
a Filterer - this filters out the actual data (it could be text, hyperlinks, images or whatever) thats contained within the extracted data; another XML file specifies what to get
a Sanitizer sanitises the filtered data by first checking the raw filtered data against a regex (basically an expression saying what the data should generally look like - i don't want crap data) and then if everything's A-OK, splits up the data into the format I want (simple example would be converting a name into a surname, firstname; or converting 1m 48.01s into 108.01)
a Participant represents a logical object in horseracing (a trainer, a jockey, a horse, a race, a performance and so on) - often i have more defined interfaces i.e a Horse; but they all descend from a participant

The thing is that the code just expects these higher level interfaces - but you can replace them with more specific instances i.e creating an extractor for a certain object at a certain website. This well-defined interface means if a new datasource is introduced i.e. i don't know some kind of new BRIS file or something, it should be easy to plug it in (so to speak).

Now, the database; i just really model the tables on the participants and it would be similar to richrosa's tables; performances, races, horses, trainers, entries, race comments etc. - I need to work more with Americans here as you have access to a lot more data plus different system of betting.

Any questions, ask away.

K9Pup
02-01-2007, 10:06 AM
So, in terms of the parsing, I have the following interfaces:

a Parser - this parses the webpage/datasource and gets all data from the source into a type of container
an Extractor - this limits the parsed data to the locations (defined in an XML file - easy to read/modify) that contain the data I think is useful
a Filterer - this filters out the actual data (it could be text, hyperlinks, images or whatever) thats contained within the extracted data; another XML file specifies what to get
a Sanitizer sanitises the filtered data by first checking the raw filtered data against a regex (basically an expression saying what the data should generally look like - i don't want crap data) and then if everything's A-OK, splits up the data into the format I want (simple example would be converting a name into a surname, firstname; or converting 1m 48.01s into 108.01)
a Participant represents a logical object in horseracing (a trainer, a jockey, a horse, a race, a performance and so on) - often i have more defined interfaces i.e a Horse; but they all descend from a participant



So does the parser go get the data automatically? Or do I have to go to the website and find the data myself? What formats can you parse? How far along are you?

richrosa
02-01-2007, 10:23 AM
Please let me jump in again, because I want to help.

Let me think about this a little. Had been thinking that keeping the number of tables smaller would be better. That way I wouldn't have so many tables to worry about joining or whatever. However, as you suggest, it might be better, for future considerations, to do as you say. At the moment, I am concerned with keeping it simple, since this is my first foray into the DB realm.

Think of this whole thing as a learning process. You can spend weeks, months designing what you want. But I can tell you no matter how much time you spend, changes will occur. So I suggest spending SOME time on the design, but also expect the design to change many times. I was a mainframe systems programmer, so our motto was always JUST DO IT!! Heck problems can be fixed later.

For my two cents and whatever that's worth, I think its important for you to consider a normalized design structure in your databases. I found an article for you that explains this.

http://b62.tripod.com/doc/dbbase.htm

What I'd want you to avoid is creating a design that has you making all kinds of data changes. I built the HandicappingOS data structures once, and have added different data elements to the normalized design, but in no way have I needed to make large changes. You will find that this generally creates more tables, but is generally far more efficient and easier to manage later on.

K9Pup
02-01-2007, 10:33 AM
Please let me jump in again, because I want to help.





For my two cents and whatever that's worth, I think its important for you to consider a normalized design structure in your databases. I found an article for you that explains this.

http://b62.tripod.com/doc/dbbase.htm

What I'd want you to avoid is creating a design that has you making all kinds of data changes. I built the HandicappingOS data structures once, and have added different data elements to the normalized design, but in no way have I needed to make large changes. You will find that this generally creates more tables, but is generally far more efficient and easier to manage later on.

Yeah that's the difference between an application programmer and us systems guys. You are correct, it is best to do it right the first time. But there is also something to be said about the learning process of doing it "wrong" a few times first. :)

RobinFromIreland
02-01-2007, 11:31 AM
So does the parser go get the data automatically? Or do I have to go to the website and find the data myself?
Yes and no. I started with the Racing Post. Have a look at the following page's HTML source (View->Source):

Grafton (IRE) (http://www.racingpost.co.uk/horses/racing_horse_form.sd?horse_id=636539)

It's all tables within tables within tables - messy. However since each bit of data is contained with a <td> (table data/table column) tag - even though the data may be in an <a> (hyperlink) tag or an <img> (image) tag - you can still identify the location of all the pieces of data. So what I do is extract all the <td> tags out (and this includes all the child nodes of each <td> tag). Now, about finding the data yourself. I knocked up some code that told me the "location" of every piece of data in the page. So, for instance, the horse's name GRAFTON is actually available in two "locations":

0, 6, 1, 1, 0, 2
6, 0, 2

What the first number means is that if you went to the 1st <td> tag (index starts at zero - its a computer thing), got its 7th child tag, got that tag's 2nd child, got that child's 2nd child, got that child's first child and got that child's 3rd child - you would find your self at a type of node that contained the text "GRAFTON".

Due to the (bad?) design of the RP website (probably due to an unclosed <td> somewhere), if you actually went to the 7th top-level <td> tag and took 2 more steps (as above), you'd be at the same node "location". So I obviously use that. So its pretty easy to pick out the bits of data that you want and put them into an xml file:


<horse>
<field type="namebred">
<location>6/0/2</location>
<type>TextNode</type>
</field>
...

"namebred" just means that the horse's name and country of birth are both available at the same node (this value is important - is used often). The <type> just refers to that I want to extract out text (as opposed to a hyperlink address or an image source)

Now about self-configuring - well I'd just use a retired horse (details won't change) - give it the locations as above, then run a parse and compare the actual values versus the expected. If there is a difference, then I'll check it out. If the page design has changed, then I'll just run some code that tries to find the new location of the data and then it will put that replace the location of the field in the XML.

XML is cool by the way because computers like it (strict, well-defined structure), techies like it (simple way of expressing hierarchies, text based, non-proprietary) and normies like it (its easy to read).

What formats can you parse? How far along are you?
At the moment I'm just working with HTML, but anything text based should work just fine. Well I've had this goal for this parsing for a couple of years, and I've built it before and parsed over 1GB of data. But I didn't do much with the raw data, there were lots of crap data parsed and when the RP changed their design, it took ages to try and fix the code up - basically I designed it badly. So I put a lot of effort into learning about proper software engineering and now, on my 3rd attempt with it, I'm happy enough with the way its going. I'm still working with the RP, but with basically one click of a button I can go from the webpage linked above to:


<?xml version="1.0" encoding="UTF-8"?>
<RPHorse>
<id>636539</id>
<name>GRAFTON</name>
<country-bred>IRE</country-bred>
<date-of-birth>25 Apr 2003</date-of-birth>
<colour>b</colour>
<gender>g</gender>
<trainer-id>11748</trainer-id>
<owner-id>149754</owner-id>
<breeder>Yeomanstown Stud</breeder>
<sire-id>94325</sire-id>
<dam-id>102127</dam-id>
<dam-sire-id>21402</dam-sire-id>
</RPHorse>

rich is dead right about normailization - can't go wrong with it.

richrosa
02-01-2007, 12:21 PM
Yeah that's the difference between an application programmer and us systems guys. You are correct, it is best to do it right the first time. But there is also something to be said about the learning process of doing it "wrong" a few times first.

Trust me, I've done it wrong plenty of times. I just do it wrong a lot less now.

BillW
02-01-2007, 12:37 PM
6. I see that BillW says to store exotics in a separate table. I don't do that. I have a payouts table that contains one row for each race. The columns are the finish information and pays for WPS, exotics. I'm not sure what his breakout would be.


I have a separate table too, with one record per exotic wager. Just a way to handle the many to one relationship with a unique date/track/racenumber. I guess I could have made these fields variable length arrays, but this is the way I chose at the time and have had no substantial problems since.

Bill

K9Pup
02-01-2007, 01:44 PM
I have a separate table too, with one record per exotic wager. Just a way to handle the many to one relationship with a unique date/track/racenumber. I guess I could have made these fields variable length arrays, but this is the way I chose at the time and have had no substantial problems since.

Bill

My PAYOUTS table has one row per race. The columns are the different bet types.

richrosa
02-01-2007, 03:51 PM
Here is my exotics table.

ChartRaceID is the foreign key to the chartRace table, which contains all of the chart information. That table is keyed to a race table. There is one row per exotics type with the program numbers contained in their own consolidated field. I don't do calculations based on that field. If I did, it would require a slightly different design.

Field Type Collation Null Key Default Extra Privileges Comment
-------------- -------- ----------------- ------ ------ ------- -------------- ------------------------------- -------
chartExoticsID int(11) (NULL) NO PRI (NULL) auto_increment select,insert,update,references
chartRaceID int(11) (NULL) YES (NULL) select,insert,update,references
wagerType char(2) latin1_swedish_ci YES (NULL) select,insert,update,references
wagerUnit float (NULL) YES (NULL) select,insert,update,references
wagerPayoff float (NULL) YES (NULL) select,insert,update,references
programNumbers char(30) latin1_swedish_ci YES (NULL) select,insert,update,references

BillW
02-01-2007, 03:52 PM
My PAYOUTS table has one row per race. The columns are the different bet types.

How do you handle multiple exotics in the case of a dead heat or multiple pick-n's in case of scratches/dead heats?

K9Pup
02-01-2007, 04:13 PM
How do you handle multiple exotics in the case of a dead heat or multiple pick-n's in case of scratches/dead heats?

I don't handle multiples. I decided a long time ago that the payoff data should only be used as a gauge. In the dog world the pools are so small that using past payoff information to strongly can just cause problems.

But multiple pays COULD be handled. You would just need multiple columns for each wager, WIN1, WIN2, EX1, EX2, TRI1, TRI2, etc.

How do you use your payout table? In your case you will have several rows for each race? Each row has race,track,date,wagertype?,finish? and pay?

BillW
02-01-2007, 04:47 PM
How do you use your payout table? In your case you will have several rows for each race? Each row has race,track,date,wagertype?,finish? and pay?

Yes, you got it. Typically if I am analyzing for instance, exacta payouts I'm selecting on date/track/racenumber/wagertype. Finish comes from the chart table. This is probably not the optimum solution, but it was my first notion and it has worked so I've never explored anything better. Such is feeling your way around in the dark with DB's. I'm far from a DB expert - having never fooled with one professionally, only at play :) .

K9Pup
02-01-2007, 05:05 PM
Yes, you got it. Typically if I am analyzing for instance, exacta payouts I'm selecting on date/track/racenumber/wagertype. Finish comes from the chart table. This is probably not the optimum solution, but it was my first notion and it has worked so I've never explored anything better. Such is feeling your way around in the dark with DB's. I'm far from a DB expert - having never fooled with one professionally, only at play :) .

Well the only things that matter are that it works and you understand it. And your solutions seems to meet both.

sjk
02-01-2007, 05:52 PM
It is pretty straightforward how to structure the key datasets which would include a table for race-specific information(date,track,race#,times, conds,etc) and a table for horse-specific info (date,track,race#,name,running lines,jockey,trainer, etc).


It seems to me the payoff information is something of a red herring. It is not clear how to db the information in a way that is real usable. Unless you have a clear idea how you plan to use it I would not hold up the whole project while you think about it.

PriceAnProbability
02-01-2007, 06:28 PM
I don't handle multiples. I decided a long time ago that the payoff data should only be used as a gauge. In the dog world the pools are so small that using past payoff information to strongly can just cause problems.

But multiple pays COULD be handled. You would just need multiple columns for each wager, WIN1, WIN2, EX1, EX2, TRI1, TRI2, etc.

How do you use your payout table? In your case you will have several rows for each race? Each row has race,track,date,wagertype?,finish? and pay?

Actually would you even need multiple columns? You could just assign each wager an ID#, add a "wager#" filed, and then the data for one wager.

BillW
02-01-2007, 06:43 PM
It seems to me the payoff information is something of a red herring. It is not clear how to db the information in a way that is real usable. Unless you have a clear idea how you plan to use it I would not hold up the whole project while you think about it.

Yea Steve, I have included the data in a report or two but don't really use it for much analysis. I do use the info in conjunction with my bet table for wager reports though.

Bill

K9Pup
02-01-2007, 06:48 PM
Actually would you even need multiple columns? You could just assign each wager an ID#, add a "wager#" filed, and then the data for one wager.

This isn't really wager data, it is the payoff data from the charts. For me at least I would have multiple columns for dead heats. But in reality I don't worry about DHs. So my table has columns like WIN1,PLACE1,SHOW1,PLACE2,SHOW2,PLACE3,QUPay,QUFin, EXPay,EXFin,TRIPay,TRIFin, etc. etc.

So if I want to test wagers I match up on track,date,race and compare my wager to the finish fields.

sjk
02-01-2007, 06:55 PM
Yea Steve, I have included the data in a report or two but don't really use it for much analysis. I do use the info in conjunction with my bet table for wager reports though.

Bill

Hi Bill,

Hope things are going well and you will be coming to Kee this spring.

SK

BillW
02-01-2007, 06:57 PM
Hi Bill,

Hope things are going well and you will be coming to Kee this spring.

SK
:ThmbUp:

raybo
02-01-2007, 08:24 PM
Do you plan to keep trainer, owner and jockey data in the same table? Or separate tables for each? Is the data for each basically the same?

I'm not sure. If I wanted to look at jockey/trainer records, as a team, would I be better having a trainer table and a jockey table? Same with owner/trainer records. I definately want jockey/trainer/owner names, at least, in the HORSES table with his today's race info.

Trainer data includes current meet, current year, and previous year: starts, wins, places, shows. Also includes the $2 ROI for current year and previous year. In addition, there are also 6 Key Trainer Stats categories with starts, win%, ITM% and $2 ROI for each of the 6 categories.

Jockey data includes the same info as trainer except for the 6 key trainer stats, of course. Jockey data also includes his turf starts, wins, places, shows, ROI and earnings broken out from his overall stats.

All that's available for the owner is name and silks.

raybo
02-01-2007, 09:10 PM
rayboI mentioned this before. I have an excel program that will. Not 2007. Puts all fields from the data files on 6 rows on one sheet. 256 or less columns per row. If you want to stay in excel and do some pre-processing before you import to Access or another database, you may find it useful.

I calculate factors from each runners 10 lines and consolidate 10 lines into a hundred plus factors. Although I have set it up to output those 10 lines as raw info to be inported into another program.

If you want to set up tables on 5 or 6 sheets to be imported into Access you would just reference the raw data. It will run in batch mode. You put data files in one folder, result files in another. Will process 100+cards at a time automatically.

hcap,

Got your email but haven't had a chance to look at it yet, been busy all day. Just now able to sit down for a few and do what I want. I'll definately get busy on this thing this weekend, for sure.

I'm reading all these posts and trying to assimilate, analyze and prioritize all the info so I at least get off to a reasonably efficient start. As is usually the case, for me anyway, I'll probably end up doing what I think makes sense as I go along and change things that don't work. This is the way I learned Excel, even after reading everything I could and asking questions, I ended up jumping in and finding out how to do things myself.

Thanks for what you sent me, I am sure I will use it rather than Infotran. Almost anything has to be better than that DOS dinosaur, LOL!

raybo
02-01-2007, 10:30 PM
Here are some goals I hope to be able to accomplish with my database, for those of you who are participating in this project: 1. Establish accurate relationships between tracks concerning surface speed, class of animals, superfecta payouts vs pool sizes (what approximate % of the pool can I expect to get when I hit, are there many good super players at a particular track or just a few lucky ones? How often, if at all, can I get the whole pool at a particular track?), how those animals tend to perform when shipping from one track to another. 2. Be able to better adjust for distance to distance and surface to surface changes in order to see form cycles easier and widen my choices for running line selections. (Finding races at the distance and surface is not always easy to do with all the moving around that is going on today.) 3. Test formulas against a large enough sample of races to be representative of the true value of the formulas, but small enough to not dilute the results too much. 4. Be able to better see how the pace will setup and will this pace structure advantage higher priced horses or disadvantage them. (I hate low priced animals that look like they will handle the pace well.) 5. Analyze what is proper coverage on my super tickets according to my estimated minimum expected payout. (If I can't separate 2 or 3 horses on the win line, will the minimum payout warrant playing them all.) (Can I afford to add higher priced non-contenders for ITM to the bottom of the ticket?), etc.

I'm sure I will find more ways to use the DB as I become familiar with what is possible with it.

Ths ability to run my numbers against a large database of similar types of races at 5 or 6 other tracks should enable me to make minor adjustments to the numbers for each of those tracks and, therefore, widen the opportunities for wagering at different times of the year. Also, I should be able to adjust the numbers as things change by keeping the DB updated with more recent races. Nothing remains entirely constant in this game.

I hope that others will gain knowledge of DB'ing as this thing evolves. I know there are many members who would love to be able access a DB that is taylored to their needs and not according to someone else's beliefs or prejudices.

I, for one, will publish all tables, queries, forms, etc., as they are developed and will offer, at no cost, the complete DB, upon completion, to any who want to use it as is or as a guide for developing their own. (minus my formulas, of course.)

sjk
02-02-2007, 07:21 AM
You will probably run into trouble with you data supplier if you make the database available to others inclusive of data.

K9Pup
02-02-2007, 07:43 AM
I'm not sure. If I wanted to look at jockey/trainer records, as a team, would I be better having a trainer table and a jockey table? Same with owner/trainer records. I definately want jockey/trainer/owner names, at least, in the HORSES table with his today's race info.



All that's available for the owner is name and silks.

Since the data elements for each is different I would keep them in separate tables. Then you use the jockey/trainer/owner fields in the horses tables to join each when needed.

Another suggestion on design would be to "borrow" from others. You can look at commerical software packages and base your design of tables on theirs. Rich has done lots of work and his design might be a good place to look.

K9Pup
02-02-2007, 07:51 AM
You will probably run into trouble with you data supplier if you make the database available to others inclusive of data.

Raybo are you planning to use pay or free data. sjk is probably right, if you use pay data you probably should be careful to not distribute data. You can actually split the data and code into separate DBs if it comes to that.

AFA the things you want to do with the DB the hardest ones will be the impact of super wagers and such. Just be careful with the results, especially when playing small pools.

richrosa
02-02-2007, 12:55 PM
What K9 refers to was my open source contribution of some time ago. Hundreds downloaded it. I wonder if anyone really used it. They are posted as FREE and are without any support. If you are looking for SQL table contructs and other ideas to start a program, this may be your ticket. Here's the quote from a posting of some time ago.

I have made available a BRIS DRF file parser written in PHP. You can get it for free at https://sourceforge.net/projects/handicappingos. You probably could apply lots of the code in my DRF parser to use against their comma-delimited chart files.

My HandicappingOS tools are released as GPL. They are bit raw and are part of a commerical project that will be released soon, but they are a really good start.

FYI, the commercial version of HandicappingOS does not use BRIS data. We swapped that out and upgraded the program to take advantage of the much higher quality HDW data.

raybo
02-02-2007, 06:16 PM
You will probably run into trouble with you data supplier if you make the database available to others inclusive of data.

Thanks for the heads-up. The data isn't necessary anyway, the table structures, etc., are what unskilled DB creators, like me, want to know. We already know about the data.

raybo
02-02-2007, 06:24 PM
Since the data elements for each is different I would keep them in separate tables. Then you use the jockey/trainer/owner fields in the horses tables to join each when needed.

Another suggestion on design would be to "borrow" from others. You can look at commerical software packages and base your design of tables on theirs. Rich has done lots of work and his design might be a good place to look.

That's cool, I was leaning that way anyhow.

I've been looking around for DB's to get ideas on the tables, and Rich's is one that I have checked out briefly, need to go back and look closer now that I've asked some questions and know a little more about the subject.

raybo
02-02-2007, 06:42 PM
Raybo are you planning to use pay or free data. sjk is probably right, if you use pay data you probably should be careful to not distribute data. You can actually split the data and code into separate DBs if it comes to that.

AFA the things you want to do with the DB the hardest ones will be the impact of super wagers and such. Just be careful with the results, especially when playing small pools.

I'll be using Bris' data, .mcp or .drf files, I have used both over the past 10 or so years (ITF prior to that and DRF print data before that back in the my manual data entry days). I will probably do as you suggest.

AFA the supers go, I know what I'm doing, just want to explore some ideas I've been wondering about for some time. Believe me, I'm not one to rush to judgement. I handicapped races for several years before placing the first wager. I was in it in the beginning for the challenge, until I came to the realization that there are indeed ways to make a living at it.

This DB thing has been in and out of conscious thought for several years, but as I was doing well without it, I didn't really get serious about it until recently.

raybo
02-02-2007, 06:45 PM
What K9 refers to was my open source contribution of some time ago. Hundreds downloaded it. I wonder if anyone really used it. They are posted as FREE and are without any support. If you are looking for SQL table contructs and other ideas to start a program, this may be your ticket. Here's the quote from a posting of some time ago.



FYI, the commercial version of HandicappingOS does not use BRIS data. We swapped that out and upgraded the program to take advantage of the much higher quality HDW data.

Thanks, Rich, for the info. I will definately check it out. Appreciate it.

raybo
02-02-2007, 07:33 PM
What K9 refers to was my open source contribution of some time ago. Hundreds downloaded it. I wonder if anyone really used it. They are posted as FREE and are without any support. If you are looking for SQL table contructs and other ideas to start a program, this may be your ticket. Here's the quote from a posting of some time ago.



FYI, the commercial version of HandicappingOS does not use BRIS data. We swapped that out and upgraded the program to take advantage of the much higher quality HDW data.

Wow, if no one used it, I'm not surprised. Looks like Greek to me. I can make out the field names at least, that might help in my table constructs. I must say, it looks very impressive, too bad I don't know what it's doing, lol.

Maybe after a while some of it will start making sense to me.

Thanks Rich!

richrosa
02-02-2007, 08:31 PM
Its written in PHP, with SQL targeted for a MySQL database. PHP is not hard, but it is a full-blown programming language.

tupper
02-03-2007, 03:38 AM
I'm hoping to make it open-source (free to use).
Any questions, ask away.
When will you need testers?:)

raybo
02-03-2007, 04:14 AM
It is pretty straightforward how to structure the key datasets which would include a table for race-specific information(date,track,race#,times, conds,etc) and a table for horse-specific info (date,track,race#,name,running lines,jockey,trainer, etc).


It seems to me the payoff information is something of a red herring. It is not clear how to db the information in a way that is real usable. Unless you have a clear idea how you plan to use it I would not hold up the whole project while you think about it.

sjk,

It appears that you are suggesting putting results data in the Race table, not in a "Results" table?

tupper
02-03-2007, 05:42 AM
I have made available a BRIS DRF file parser written in PHP. You can get it for free at https://sourceforge.net/projects/handicappingos. You probably could apply lots of the code in my DRF parser to use against their comma-delimited chart files.
This code imports Bris DRFs into a MySQL database?

My HandicappingOS tools are released as GPL. They are bit raw and are part of a commerical project that will be released soon, but they are a really good start.
You'll be selling a finished version of HandicappingOS that runs on local machines?

FYI, the commercial version of HandicappingOS does not use BRIS data. We swapped that out and upgraded the program to take advantage of the much higher quality HDW data.
Are the HDW files are better in just speed/pace figures? A lot of the information in Bris files (and other PP formats) is fairly straightforward: Stewball won by a nose; his trainer wins 25% of races; the track was fast, etc.

From the phpBRIS/README file:
Requirements:
PHP version 4 or better
MySQL version 3.21 or better
ADODB - a database abstaction layer - http://php.weblogs.net
I have: PHP 4.3.10; MySQL 14.12 Distrib 5.0.19.
I downloaded ADODB, just need to compile and install it.

INSTALL INSTRUCTIONS (sparse for now)
- install the requirements
- install the database file
- modify index.php for your database configuration
- modify config.inc for your file paths
- bring up your browser and set it to your default URL path
- choose which BRIS file to import
- you should be off to the races!!!
Is the database file phpBRIS/sql/bris.sql? If so, where does one install it? I run Slackware/Vector Linux.

By "modify the index.php," do you mean enter the appropriate value for $db_hostname, $db_user, $db_password and $db_database? I am not sure of what to enter.

I am not sure what I should enter for the paths in config.inc -- one of the examples is a URL -- I have never understood the location of "http://localhost/".

I am not sure where to place the phpBRIS directory.

What is my default URL path? Is it "http://localhost/"?

Thanks!
-Dale

RobinFromIreland
02-03-2007, 06:11 AM
This code imports Bris DRFs into a MySQL database?

Yes.

From the phpBRIS/README file:

I have: PHP 4.3.10; MySQL 14.12 Distrib 5.0.19.
I downloaded ADODB, just need to compile and install it.


I'd also recommended getting the MySQL GUI tools (administrator, query browser etc)


Is the database file phpBRIS/sql/bris.sql? If so, where does one install it? I run Slackware/Vector Linux.

Yes, that is the file to create the tables. If you have downloaded MySQL Query Browser you can simply go Open Script and choose that file and kick it off. Else just use the mysql command line - thats easy enough too, just read manual.

By "modify the index.php," do you mean enter the appropriate value for $db_hostname, $db_user, $db_password and $db_database? I am not sure of what to enter.

Before you do anything (i'd highly recommend MySQL Administrator for this) - create a new schema (database) called handicappingos. Then either use the root user or create a new user (don't forget to give him privileges to the schema too) - either way you need to know the username and password you use to access the DB. When you are changing index.php, hostname = the address of the machine that hosts the db (its you, localhost means "this computer" - its just monikor for the 127.0.0.1 IP address - so just use "localhost"). Database is "handicappingos" and the user and password is whatever you set when creating the DB.

I am not sure what I should enter for the paths in config.inc -- one of the examples is a URL -- I have never understood the location of "http://localhost/".

See above. http://localhost is just that IP address and port 80 (the default so no need to specify). You will need a web server to run all this code in as this is a web-based application (even though everything runs locally). So I would (highly) recommend using Apache HTTP Server. Set that up according to instructions. Configure PHP to work with apache (look this up too - i'm on windows so its a little different for me). There are loads of tutorials for setting up apache, php and mysql on a linux system (so common is it that it can be referred to as LAMP). You need to put adodb package in a location that the web server can access too - i just put it in the same folder as the rest of the code (see below).

I am not sure where to place the phpBRIS directory.

What is my default URL path? Is it "http://localhost/"?

Thanks!
-Dale

Yes, default path should be that.

Wherever you install your apache to - there will be a htdocs folder. Inside that is the default location of where to put your web files. put the phpBris directory here and put the adodb folder package inside that too. So inside the phpBris folder should be:

adodb, brisfiles, includes, sql folders
license and readme
and index.php, parse_bris.php and getfiles.php

finally, in config.inc, change the UPLOAD_DIR to the absolute directory that contains the brisfiles.

When you open up your browser and go to "http://localhost/phpBRIS/index.php" - you should see some sql and the 3 .DRF files ready to be processed.

If its blank, check the your upload_dir config setting is correct. If there are errors, probably something went wrong setting up something from above.

sjk
02-03-2007, 08:39 AM
sjk,

It appears that you are suggesting putting results data in the Race table, not in a "Results" table?

The "race" data that I collect (many of the fields listed above) has a simple stucture because things like conditions, times, are always there. This table will be used all the time to derive things such as variants, pars etc.

The payoff information will not come into your future handicapping. It will only be used if you have a specific study in mind. The number of payoff items and the format of the winning combination(s) is so varied that I would keep it apart from the race information.

I have chosen to ignore the payoff information. If I did try to collect it I would probably collect it as a single string for each race and try figure it out how to parse for the studies that I needed to do. Some of the elements are more easily dealt with than others and there is no value in figuring out the hard ones if this is something that will go unused.

Of course it is possible that your data supplier has already parsed the payoffs for you in which case you will want to preserve whatever structure he has given it.

raybo
02-03-2007, 08:45 AM
This code imports Bris DRFs into a MySQL database?


You'll be selling a finished version of HandicappingOS that runs on local machines?


Are the HDW files are better in just speed/pace figures? A lot of the information in Bris files (and other PP formats) is fairly straightforward: Stewball won by a nose; his trainer wins 25% of races; the track was fast, etc.

From the phpBRIS/README file:

I have: PHP 4.3.10; MySQL 14.12 Distrib 5.0.19.
I downloaded ADODB, just need to compile and install it.


Is the database file phpBRIS/sql/bris.sql? If so, where does one install it? I run Slackware/Vector Linux.

By "modify the index.php," do you mean enter the appropriate value for $db_hostname, $db_user, $db_password and $db_database? I am not sure of what to enter.

I am not sure what I should enter for the paths in config.inc -- one of the examples is a URL -- I have never understood the location of "http://localhost/".

I am not sure where to place the phpBRIS directory.

What is my default URL path? Is it "http://localhost/"?

Thanks!
-Dale


LOL, I run into this stuff everytime I try to get instructions from my brother-in-law, an electrical engineer with NASA experience. People like that have a very hard time thinking "down" to technical levels below their own.

raybo
02-03-2007, 08:49 AM
LOL, I run into this stuff everytime I try to get instructions from my brother-in-law, an electrical engineer with NASA experience. People like that have a very hard time thinking "down" to technical levels below their own.

This is a main reason for starting this thread, and my intention to make it public to any one interested, and my intention to make it understandable to anyone capable of opening Excel and Access. Most 4 year olds can do that now.

raybo
02-03-2007, 09:26 AM
The "race" data that I collect (many of the fields listed above) has a simple stucture because things like conditions, times, are always there. This table will be used all the time to derive things such as variants, pars etc.

The payoff information will not come into your future handicapping. It will only be used if you have a specific study in mind. The number of payoff items and the format of the winning combination(s) is so varied that I would keep it apart from the race information.

I have chosen to ignore the payoff information. If I did try to collect it I would probably collect it as a single string for each race and try figure it out how to parse for the studies that I needed to do. Some of the elements are more easily dealt with than others and there is no value in figuring out the hard ones if this is something that will go unused.

Of course it is possible that your data supplier has already parsed the payoffs for you in which case you will want to preserve whatever structure he has given it.


So basically, you have only 2 tables in your DB? If so, what kind of difficulties do you run into when querying the DB? Can you extract any combination of data subjects that, in your opinion, would ever come up in any handicapper's "wish list"? If so, what would be the purpose of going to all the extra work of creating several more tables and data separations for importation into the DB?

For those who don't know, what exactly does a "parser" do? Is Infotran a parser? It takes a Bris comma-delimited data file and assigns each field of data to a specific order in Excel by means of a "control" file that the user creates, although I think there is a control file that comes with the download that will extract the running lines for each horse, I had to modify it to get the "race" info and "workout" info, also.

I agree that the "payoff" is secondary to the main project and probably needs to reside in a separate table, simply because of it's infrequent/specific use.

The Bris .xrd data files for "results" includes the payoff info, pools etc. in a comma-delimited format like the race data. I made a separate control file for Infotran to get this data into my Excel spreadsheet.

richrosa
02-03-2007, 09:32 AM
Thanks Robin,

You answered Tuppers questions perfectly.


You'll be selling a finished version of HandicappingOS that runs on local machines?

I do sell a finished version of HandicappingOS, which is the first full-blown handicapping program available via the web. I hear DRF is going to follow me and release Formulator this way too. You can find it www.handicappingos.com

Are the HDW files are better in just speed/pace figures? A lot of the information in Bris files (and other PP formats) is fairly straightforward: Stewball won by a nose; his trainer wins 25% of races; the track was fast, etc.

Not to get into a separate thread, as its well documented on this board, however HDW data is superior in many ways, not just pace and speed figures. The added value also comes from things such as running style designations, break rank projections, a better power number, superior track to track adjustments, a better variant, and overall much more error-free complete data that is possible with BRIS or TSN data.

If you have any more questions on the open-source stuff I'm sure me or Robin will get them answered. They do assume a certain amount of expertise using PHP and MySQL on either Windows or especially Linux.

sjk
02-03-2007, 10:00 AM
Raybo,

I have about 250 tables in the database but all of the information as it initially comes in from the charts goes into the two tables discussed above.

All of the other tables are derived from that information or have been created to help with the use/analysis of that information.

raybo
02-03-2007, 10:18 AM
Raybo,

I have about 250 tables in the database but all of the information as it initially comes in from the charts goes into the two tables discussed above.

All of the other tables are derived from that information or have been created to help with the use/analysis of that information.

I see, 250 tables, huh? Hmmmm.

How are you getting the data into the DB?

sjk
02-03-2007, 10:26 AM
I suscribe to the HDW data through HSH and import comma-delimited files into the database.

That is just what I happen to be doing now. For 10+ years I parsed the Equibase charts until they made that impossible.

Once you decide where you want to source your data importing it into the db will probably be relatively easy.

raybo
02-03-2007, 10:43 AM
I suscribe to the HDW data through HSH and import comma-delimited files into the database.

That is just what I happen to be doing now. For 10+ years I parsed the Equibase charts until they made that impossible.

Once you decide where you want to source your data importing it into the db will probably be relatively easy.

It appears that in order to import .drf or .mcp or .xrd files by way of the Access import function those files must be put in a different .xxx format. If so, I will still have to use an external "parser?" like Infotran. Or could I just change the file type to one that the DB will import?

sjk
02-03-2007, 11:08 AM
I am not familiar with those file types. Probably going to need to translate them into delimited text files.

BillW
02-03-2007, 12:18 PM
I am not familiar with those file types. Probably going to need to translate them into delimited text files.

Those are already .csv files (I forget whether they are zipped or not?). The suffix isn't significant in this case. :) BTW raybo, I believe Infotran simply splits .csv files up into smaller .csv files. I believe Tom uses/has used it - ask him to make sure.

Bill

raybo
02-03-2007, 12:50 PM
Those are already .csv files (I forget whether they are zipped or not?). The suffix isn't significant in this case. :) BTW raybo, I believe Infotran simply splits .csv files up into smaller .csv files. I believe Tom uses/has used it - ask him to make sure.

Bill

All Bris programmer's data files are comma-delimited files with a specific suffix depending on which file you download, .drf are the $1 file, .mcp are for the MultiCap software but can be used in any way a comma-delimited file can be used, .xrd are the exotic results files. They are of the same format as TSN comma-delimited files.

All Infotran does is rearrange the fields in the comma-delimited files so they are in the order you want them in for entry into specific cell locations in Excel. The result is a .txt file that can be imported by Excel. I have tried importing the raw Bris file into Access but Access, when browsing for the files, only offers suffixes specific to Access and won't show the .drf, .mcp, .xrd, etc. files. Would changing the .drf or .mcp suffix to .xls or one of the Access suffixes work?

raybo
02-03-2007, 01:03 PM
I asked earlier, what is a parser? I still don't know.

richrosa
02-03-2007, 01:06 PM
A parser takes data structured in one format and re-structures into another format. In my offering, the program takes BRIS comma delimited files and "parses" them into my database format.

raybo
02-03-2007, 01:50 PM
A parser takes data structured in one format and re-structures into another format. In my offering, the program takes BRIS comma delimited files and "parses" them into my database format.

By doing what? Rearranging the data into correct order and removing unwanted data? That's what Infotran does.

K9Pup
02-03-2007, 01:55 PM
All Infotran does is rearrange the fields in the comma-delimited files so they are in the order you want them in for entry into specific cell locations in Excel. The result is a .txt file that can be imported by Excel. I have tried importing the raw Bris file into Access but Access, when browsing for the files, only offers suffixes specific to Access and won't show the .drf, .mcp, .xrd, etc. files. Would changing the .drf or .mcp suffix to .xls or one of the Access suffixes work?

Yeah if the file is truly comma separated then change the suffix to CSV and Access will be able to find it. But you will still have the problem of 255 columns.

richrosa
02-03-2007, 01:57 PM
I don't know Infotran. The program I have contributed "parses" the data into multiple database tables in a normalized format. It does all the heavy lifting for you to have a good import into a real RDBMS. My contribution does not take flat file data to transform it into more flat file data. It does a lot more than that.

raybo
02-03-2007, 03:40 PM
Yeah if the file is truly comma separated then change the suffix to CSV and Access will be able to find it. But you will still have the problem of 255 columns.

Yeah you're right, I get "Type Conversion" errors in many many fields. Well, looks like I'm going to have to write several control files for Infotran to separate the data into tables with less than 256 fields and put them into Excel before importing into Access. I wanted to get away from using Infotran but it appears that is not going to work. I sure wish I knew some other way to do this, some way that I can understand. I have had a couple of guys send me stuff that supposedly does this but I can't make heads or tails of any of it. You would think that with all the people here who use Bris or TSN data files and also use Access, someone would have solved this problem by now.

Tom
02-03-2007, 04:22 PM
All Bris programmer's data files are comma-delimited files with a specific suffix depending on which file you download, .drf are the $1 file, .mcp are for the MultiCap software but can be used in any way a comma-delimited file can be used, .xrd are the exotic results files. They are of the same format as TSN comma-delimited files.

All Infotran does is rearrange the fields in the comma-delimited files so they are in the order you want them in for entry into specific cell locations in Excel. The result is a .txt file that can be imported by Excel. I have tried importing the raw Bris file into Access but Access, when browsing for the files, only offers suffixes specific to Access and won't show the .drf, .mcp, .xrd, etc. files. Would changing the .drf or .mcp suffix to .xls or one of the Access suffixes work?

I name the output files from Infotran *.txt, then Access will import them no problem.

K9Pup
02-03-2007, 04:26 PM
Yeah you're right, I get "Type Conversion" errors in many many fields. Well, looks like I'm going to have to write several control files for Infotran to separate the data into tables with less than 256 fields and put them into Excel before importing into Access. I wanted to get away from using Infotran but it appears that is not going to work. I sure wish I knew some other way to do this, some way that I can understand. I have had a couple of guys send me stuff that supposedly does this but I can't make heads or tails of any of it. You would think that with all the people here who use Bris or TSN data files and also use Access, someone would have solved this problem by now.

I don't think the type conversion errors are because of the number of columns, it must be something.

The files can be imported with some code. Several months ago I started on a project to import data from PCS type CSV files. It isn't really finished but it could give you a start if you are interested. The code uses a function I found that breaks out fields from a CSV record. If you interested let me know. I might even try to add some comments to the code so you have some idea what it is trying to do.

raybo
02-03-2007, 06:17 PM
I don't think the type conversion errors are because of the number of columns, it must be something.

The files can be imported with some code. Several months ago I started on a project to import data from PCS type CSV files. It isn't really finished but it could give you a start if you are interested. The code uses a function I found that breaks out fields from a CSV record. If you interested let me know. I might even try to add some comments to the code so you have some idea what it is trying to do.

K9Pup,

I figure it's because when it was being imported it was row upon row of data of different types in the same columns. There was no way to specify data types while importing the whole file so I didn't specify any data types. Someone else here had said they dumped the whole file into Access and then broke it up into separate tables within Access, that was what I was hoping to be able to do.

At this point I'd welcome what you have worked up. Are you using VB to write the code for it? I'm a little familiar with that language, having written some Excel macros and having studied the book on VB 6.0.

Send it if you don't mind. My yahoo email icon under my
name on this post should get it to me.

Thanks

raybo
02-04-2007, 10:51 AM
I don't think the type conversion errors are because of the number of columns, it must be something.

The files can be imported with some code. Several months ago I started on a project to import data from PCS type CSV files. It isn't really finished but it could give you a start if you are interested. The code uses a function I found that breaks out fields from a CSV record. If you interested let me know. I might even try to add some comments to the code so you have some idea what it is trying to do.

K9Pup,

Thanks for the send. Is the data entry box for the path to the data files? I haven't tried putting that in there yet before asking the question.

BTW, after a long phone conversation with hcap concerning how he tackled this problem, 1400 fields, I think his approach is valid even though it involves sending the data to Excel first before importing to Access in multi-table form, to address the 256 field limitation. I'm going to do some work with it and try to get it to both, get the data prepared for importation to Access and at the same time using this method to get the data into my own handicapping spreadsheet, rather than using the DOS program "Infotran". His method loads a card in about 7 seconds or so, both race data and results data, or can be used as a "daily", without results, for handicapping. He spoke of your prowess with VB code and was very complementary. I think your approach is a much cleaner, polished, and perhaps efficient method and look forward to investigating it further.

hcap impressed me also, having explained in depth much of his method. He is certainly more knowledgeable of advanced Excel techniques than I am, and I thought I was pretty good. Most of my prowess in Excel is in writing formulas and automation via macros, both recorded/looped , and thrashing about finding code for the things not recordable. Also, I pride myself on the formatting/visual clarity and definition of my various handicapping blocks.

That said, I appreciate your participation in this effort, for myself and any others who may profit from this project. I know this will be a long process but there are many out there who are willing to put in the time if they know that there is an understandable way to develop a viable DB on their own.

Thanks again!!

K9Pup
02-04-2007, 11:50 AM
K9Pup,

Thanks for the send. Is the data entry box for the path to the data files? I haven't tried putting that in there yet before asking the question.

Yeah. Just specify the full path and file name. You can change the code to default to certain folder, etc.


BTW, after a long phone conversation with hcap concerning how he tackled this problem, 1400 fields, I think his approach is valid even though it involves sending the data to Excel first before importing to Access in multi-table form, to address the 256 field limitation.


Yeah his method is very valid. He uses code to break out the CSV records.


I'm going to do some work with it and try to get it to both, get the data prepared for importation to Access and at the same time using this method to get the data into my own handicapping spreadsheet, rather than using the DOS program "Infotran". His method loads a card in about 7 seconds or so, both race data and results data, or can be used as a "daily", without results, for handicapping. He spoke of your prowess with VB code and was very complementary. I think your approach is a much cleaner, polished, and perhaps efficient method and look forward to investigating it further.

hcap impressed me also, having explained in depth much of his method. He is certainly more knowledgeable of advanced Excel techniques than I am, and I thought I was pretty good. Most of my prowess in Excel is in writing formulas and automation via macros, both recorded/looped , and thrashing about finding code for the things not recordable. Also, I pride myself on the formatting/visual clarity and definition of my various handicapping blocks.

Yeah hcap definitely knows what he is doing. As with just about any programming project this one has many solutions. The "best" one is the one that work and is easiest for you to understand and maintain.


That said, I appreciate your participation in this effort, for myself and any others who may profit from this project. I know this will be a long process but there are many out there who are willing to put in the time if they know that there is an understandable way to develop a viable DB on their own.

Thanks again!!

No problem!! Good luck with your project!!

raybo
02-04-2007, 12:38 PM
[QUOTE=K9Pup]Yeah. Just specify the full path and file name. You can change the code to default to certain folder, etc.QUOTE]

Well, darnit. I can't get it to work. I made a folder in C: and named it K9Pup, then put the data file you sent with the program in it. Then went to the data entry box and entered: C:\K9Pup, then clicked "Read Program File", but got an error "file not found". ??

raybo
02-04-2007, 12:42 PM
[QUOTE=K9Pup]Yeah. Just specify the full path and file name. You can change the code to default to certain folder, etc.QUOTE]

Well, darnit. I can't get it to work. I made a folder in C: and named it K9Pup, then put the data file you sent with the program in it. Then went to the data entry box and entered: C:\K9Pup, then clicked "Read Program File", but got an error "file not found". ??

Oh, I just saw that I must enter the file name too, not just the folder. Duh!

raybo
02-04-2007, 12:45 PM
[QUOTE=raybo]

Oh, I just saw that I must enter the file name too, not just the folder. Duh!

Entered : c:\k9pup\sar0813.pcs and still didn't work.

K9Pup
02-04-2007, 12:46 PM
[QUOTE=raybo]

Entered : c:\k9pup\sar0813.pcs and still didn't work.

No error messages?????

raybo
02-04-2007, 12:51 PM
[QUOTE=raybo]

No error messages?????

Yes, same error. OOps, forgot to put DB on the end of the folder, hold on.

raybo
02-04-2007, 12:56 PM
[QUOTE=K9Pup]

Yes, same error. OOps, forgot to put DB on the end of the folder, hold on.

It's "reading program" now, how long should it take to read the file?

K9Pup
02-04-2007, 01:15 PM
[QUOTE=raybo]

It's "reading program" now, how long should it take to read the file?

5 or 6 seconds.

raybo
02-04-2007, 02:12 PM
[QUOTE=raybo]

5 or 6 seconds.

It's locking up. Hour glass the whole time. Must be something wrong somewhere.

K9Pup
02-04-2007, 03:06 PM
[QUOTE=K9Pup]

It's locking up. Hour glass the whole time. Must be something wrong somewhere.


You may have to kill Access from Windows task manager. Then if you know how to get into the module editor, go to ReadRoutines module. Click on Debug on the menu bar and select Compile. See if you get errors there.

raybo
02-04-2007, 03:38 PM
[QUOTE=raybo]


You may have to kill Access from Windows task manager. Then if you know how to get into the module editor, go to ReadRoutines module. Click on Debug on the menu bar and select Compile. See if you get errors there.

I think I did it right, shut it down, opened it back up, selected modules, opened ReadRoutines, clicked Debug/Compile horses. Nothing appeared to happen when I did the compile, I'm assuming nothing was suppose to, that you can see anyway. Tryed to read program again, same thing just hangs up and Access becomes non-responsive with the hour glass showing constantly as it's "Reading Program".

K9Pup
02-04-2007, 05:20 PM
[QUOTE=K9Pup]

I think I did it right, shut it down, opened it back up, selected modules, opened ReadRoutines, clicked Debug/Compile horses. Nothing appeared to happen when I did the compile, I'm assuming nothing was suppose to, that you can see anyway. Tryed to read program again, same thing just hangs up and Access becomes non-responsive with the hour glass showing constantly as it's "Reading Program".

The nothing happening tells me there weren't any compile error.

Open the module again. In the code window find the line that says:
InputFileNo = FreeFile()
click in the gray bar to the left of this line. You should see a solid circle appear. This is setting a breakpoint there. Leave this open, go back to the Access screen. Open the form and try to read the program. You SHOULD see it pop up the code window again, with the above line selected. Use the F8 key to single step through the code until it hangs again. Tell me where it is.

raybo
02-04-2007, 05:21 PM
[QUOTE=raybo]

The nothing happening tells me there weren't any compile error.

Open the module again. In the code window find the line that says:
InputFileNo = FreeFile()
click in the gray bar to the left of this line. You should see a solid circle appear. This is setting a breakpoint there. Leave this open, go back to the Access screen. Open the form and try to read the program. You SHOULD see it pop up the code window again, with the above line selected. Use the F8 key to single step through the code until it hangs again. Tell me where it is.

Ok, be back in a few.

raybo
02-04-2007, 05:40 PM
[QUOTE=K9Pup]

Ok, be back in a few. It stepped through the rest of that sub and stepped into the next one:

Public Function SplitWithQualifiers(ByVal SourceText As String, _
ByVal TextDelimiter As String, ByVal TextQualifier As String, _
Optional ByVal TreatMultipleDelimitersAsOne As Boolean) As String()
Dim strTemp() As String, strRes() As String, i As Long, J As Long, _
A As String, B As String, blnStart As Boolean

and loops in this:

For i = 0 To UBound(strTemp)
J = InStr(1, strTemp(i), TextQualifier, vbTextCompare)
If J Then
A = Replace(strTemp(i), TextQualifier, "")
Select Case strTemp(i)
Case TextQualifier & A & TextQualifier ' "xxx"
B = B & A & vbCrLf
blnStart = False
Case TextQualifier & A ' "xxx
B = B & A & TextDelimiter
blnStart = True
Case A ' xxx
B = B & A & TextDelimiter
blnStart = False
Case A & TextQualifier ' xxx"
B = B & A & vbCrLf
blnStart = False
End Select
Else
If blnStart Then
B = B & strTemp(i) & TextDelimiter
Else
B = B & strTemp(i) & vbCrLf
End If
End If
Next i

K9Pup
02-04-2007, 05:49 PM
[QUOTE=raybo] It stepped through the rest of that sub and stepped into the next one:

Public Function SplitWithQualifiers(ByVal SourceText As String, _
ByVal TextDelimiter As String, ByVal TextQualifier As String, _
Optional ByVal TreatMultipleDelimitersAsOne As Boolean) As String()
Dim strTemp() As String, strRes() As String, i As Long, J As Long, _
A As String, B As String, blnStart As Boolean

and loops in this:

For i = 0 To UBound(strTemp)
J = InStr(1, strTemp(i), TextQualifier, vbTextCompare)
If J Then
A = Replace(strTemp(i), TextQualifier, "")
Select Case strTemp(i)
Case TextQualifier & A & TextQualifier ' "xxx"
B = B & A & vbCrLf
blnStart = False
Case TextQualifier & A ' "xxx
B = B & A & TextDelimiter
blnStart = True
Case A ' xxx
B = B & A & TextDelimiter
blnStart = False
Case A & TextQualifier ' xxx"
B = B & A & vbCrLf
blnStart = False
End Select
Else
If blnStart Then
B = B & strTemp(i) & TextDelimiter
Else
B = B & strTemp(i) & vbCrLf
End If
End If
Next i

Ok let's move the checkpoint. Click on the previous checkpoint to remove it. Then place on at the RaceSetupRST.AddNew line. That is where it starts adding records.

If you want we can continue this using the message function of the forum???? So others don't have to read all these messages???

raybo
02-04-2007, 05:52 PM
[QUOTE=raybo]

Ok let's move the checkpoint. Click on the previous checkpoint to remove it. Then place on at the RaceSetupRST.AddNew line. That is where it starts adding records.

If you want we can continue this using the message function of the forum???? So others don't have to read all these messages???

Ok, sounds like the right thing to do.

BillW
02-04-2007, 05:53 PM
[QUOTE=raybo]

Ok let's move the checkpoint. Click on the previous checkpoint to remove it. Then place on at the RaceSetupRST.AddNew line. That is where it starts adding records.

If you want we can continue this using the message function of the forum???? So others don't have to read all these messages???

War Room may facilitate things too :)

raybo
02-05-2007, 12:03 PM
Just an update for those of you who are interested in the DB project.

After conversing with a few people concerning the best way to get CSV files into Access, I have decided to use a program that was sent to me by K9Pup to create the tables and put the data into them. It will require some changes to incorporate all the data from the csv files, which is my goal: getting all data into the DB so the user has the full range of choices concerning types of data he/she wants to explore.

Also, on a side note, I will be using an Excel program supplied by hcap, at least temporarily, for getting my handicapping data into Excel, rather than using the antiquated "Infotran" program. So, as you can see, I have 2 projects going at the same time. I am a multi-tasker BTW.;)

raybo
02-05-2007, 12:38 PM
oops, double post, my bad.

Gone :)

JackS
02-05-2007, 02:57 PM
IMO, beyond any DB which should remain pristine with as accurate numbers as can be produced , handicappers should also be able to get heavily into subjectivity.
This will require a second program which can incorporate jocky, trainer, pp's and anything else you may be interested in.
I think many people will turn this "subjective" program into an "objective" point of view if using published percentages to gain a final output.
An example is a horse you like is going to be ridden by a 5% jock and from a 5% stable. These percentages are for the masses and may not accuratly reflect the true chances of this horse running well or winning.
In this example, it would be reasonable to expect that jock/trainer stats would improve 2-3 fold if this horse ,jock ,trainer were to run in every race. Of course high pct'age jocks and trainers can at times be viewed just the opposite. 20+ pct j/t combo with poor records below 6f would have to be reduced to reflect actual chances. How you would do this would be up to the individual and hopefully based on something "objective".
Incorporation of these subjective ideas into a third program using the original DB is also possible and maybe advisable but should never be used to corrupt our original program. Numbers generated here would not resemble a database and would be a rating method only.
For anyone doing this, the output of each of the three programs would have to be viewed seperatly and decisions would still have to be made.
The goal here is to produce data and numbers that are not available to every one else.
One word of caution- This may work satisfactorly for the expierenced handicapper, for the rest of us, we may need to improve our basic knowledge beyond basic to get te same results,

raybo
02-06-2007, 06:02 AM
IMO, beyond any DB which should remain pristine with as accurate numbers as can be produced , handicappers should also be able to get heavily into subjectivity.
This will require a second program which can incorporate jocky, trainer, pp's and anything else you may be interested in.
I think many people will turn this "subjective" program into an "objective" point of view if using published percentages to gain a final output.
An example is a horse you like is going to be ridden by a 5% jock and from a 5% stable. These percentages are for the masses and may not accuratly reflect the true chances of this horse running well or winning.
In this example, it would be reasonable to expect that jock/trainer stats would improve 2-3 fold if this horse ,jock ,trainer were to run in every race. Of course high pct'age jocks and trainers can at times be viewed just the opposite. 20+ pct j/t combo with poor records below 6f would have to be reduced to reflect actual chances. How you would do this would be up to the individual and hopefully based on something "objective".
Incorporation of these subjective ideas into a third program using the original DB is also possible and maybe advisable but should never be used to corrupt our original program. Numbers generated here would not resemble a database and would be a rating method only.
For anyone doing this, the output of each of the three programs would have to be viewed seperatly and decisions would still have to be made.
The goal here is to produce data and numbers that are not available to every one else.
One word of caution- This may work satisfactorly for the expierenced handicapper, for the rest of us, we may need to improve our basic knowledge beyond basic to get te same results,

The user can decide if a second or third program is required for his/her needs. Some good DB people incorporate all you're talking about in a single DB. Others, including me, may use a second program for formulating other things, such as you suggest. My reasoning for using a second program, at least for the time being, is simply that I have well over 10 years of progressive work in my Excel handicapping program. To replicate that in Access or any other DB would not be a matter of weeks or months, but years. The only reason I can see for a 3rd program, 2nd DB, would be for reasons of overflow capacity concerning the original DB.

I agree that the original data should remain "pristine" somewhere in the DB, but, in my opinion, based upon some study of relational database design, a myriad of things can be done within that DB without "corrupting" the original data.

Again, my thoughts behind this project are a reflection, over the past several years, concerning a better way of testing new and existing methods. Excel has been my choice of platforms for developing what I use for handicapping, while others out there use other programs or pen and paper/calculator. It really doesn't matter how one has accomplished their current handicapping abilities, or lack thereof, a viable testing method that incorporates thousands of races HAS to be of value for examining and adjusting what one is doing or will do in the future.

I have read many, many posts, in this forum alone, that concerned the posters' wish to have a database of races for investigative analysis.

This is my goal. A basic DB, ready to import Bris or TSN into ( and any other csv formatted data,with a little modification in the code for fields, etc.) and start running some basic, modifiable, queries. Hopefully the user can "hit the ground running", gaining some answers to questions they already have, and as they become more proficient with it's usage begin to explore more complex, style specific options. I'm not trying to develop another "HTR" or "HSH", nor anything proprietary, nor anything style specific, just the platform, or template if you wish, that is horse racing specific, and ready to use in whatever manner the user wants. Whatever they get out of it, over and above that, will be largely up to the user's imagination and learning curve.

K9Pup
02-06-2007, 08:25 AM
The user can decide if a second or third program is required for his/her needs. Some good DB people incorporate all you're talking about in a single DB. Others, including me, may use a second program for formulating other things, such as you suggest. My reasoning for using a second program, at least for the time being, is simply that I have well over 10 years of progressive work in my Excel handicapping program. To replicate that in Access or any other DB would not be a matter of weeks or months, but years. The only reason I can see for a 3rd program, 2nd DB, would be for reasons of overflow capacity concerning the original DB.


You should use the tools that do the "job" for you!! Using Excel along with Access is a great combination. I would suggest that you might want to add an additional table to your DB at some point where you can easily store the results of your handicapping process. If your current Excel process creates handicapping "factors" those can be stored in the DB for future research.

I see no need for additional DBs, just additional tables within the original database.



I agree that the original data should remain "pristine" somewhere in the DB, but, in my opinion, based upon some study of relational database design, a myriad of things can be done within that DB without "corrupting" the original data.


I think in most cases the semi-raw data should be stored in DATA tables. Handicapping factors that are calculated from those data should be stored in separate tables. And I would probably save the RAW data files also outside the DB. Then you can always reprocess them if needed.

raybo
02-06-2007, 09:30 PM
You should use the tools that do the "job" for you!! Using Excel along with Access is a great combination. I would suggest that you might want to add an additional table to your DB at some point where you can easily store the results of your handicapping process. If your current Excel process creates handicapping "factors" those can be stored in the DB for future research.

I see no need for additional DBs, just additional tables within the original database.




I think in most cases the semi-raw data should be stored in DATA tables. Handicapping factors that are calculated from those data should be stored in separate tables. And I would probably save the RAW data files also outside the DB. Then you can always reprocess them if needed.

I agree, I plan to use Excel as my handicapping platform and Access as an investigative tool and storage medium. I have already decided to add my own numbers in another table for investigation against the database of races.

Concerning the raw data, that's why I asked you the question about showing all the fields somewhere in your program, so that I could use that to create a complete race file, I realize that this might cause a problem, regarding 256 fields per table, I haven't really checked to see how many actual fields there are in a complete data file yet, but I assume there will be at least 6 tables that would contain the data in field order: 1-256, 257-512, 513-768, etc., up to about 1400 or so. Storing the data, in csv format outside the DB, as I now do, would be the easiest method, I believe. I will probably store my own numbers outside the DB also, as well as in the DB tables for research purposes.

raybo
02-06-2007, 10:08 PM
K9Pup,

BTW, I was getting an error concerning one of the fields being "zero length", had to go into the field listing and change the properties for that particular field to allow for there sometimes being no data there, "program post position" was the field, I think.

I succumbed and broke out my copies of "Designing Relational Database Systems" from Microsoft Press and "SQL: A Beginner's Guide" by Forrest Houlette, Ph.D.. They're making more sense now, than the first time I studied them several years ago.

I'm ready to start modifying the code to add additional fields and tables. Yippee!!

Is there a way to include a combo box with a drop-down list for the data files that let's the user select the race card he/she wants to import from the folder? That way, one could leave all the race cards in that same folder and not have to type in a particular card every time, or use the Auto Read Programs button and read the whole folder of cards into the DB.

raybo
02-07-2007, 12:16 AM
Ok, back to jockey/trainer/owner data. My thought was to have a separate table for this info, including the key trainer stats. Should the jockey, trainer, and owner names be in the HorseTable, along with their meet, current year and previous year, starts, wins, places, shows and ROI. Or should I just put jockey, trainer, owner names in the Horse table and put all the other stats for them in the JockeyTrainerTable? This is about the only question concerning tables that I have not answered. I'm leaning toward putting their names in HorseTable and everything else in JockeyTrainerTable.

K9Pup
02-07-2007, 08:15 AM
Concerning the raw data, that's why I asked you the question about showing all the fields somewhere in your program, so that I could use that to create a complete race file, I realize that this might cause a problem, regarding 256 fields per table, I haven't really checked to see how many actual fields there are in a complete data file yet, but I assume there will be at least 6 tables that would contain the data in field order: 1-256, 257-512, 513-768, etc., up to about 1400 or so. Storing the data, in csv format outside the DB, as I now do, would be the easiest method, I believe. I will probably store my own numbers outside the DB also, as well as in the DB tables for research purposes.

So you would want to store the "raw" data in the DB?? Currently the process breaks out each field from the CSV records. But that doesn't have to be. Where the program read a CSV line you could store the entire line in a table if you want. Or any parts of it. But just saving the raw input files would serve the same purpose, I think.

K9Pup
02-07-2007, 08:21 AM
K9Pup,

BTW, I was getting an error concerning one of the fields being "zero length", had to go into the field listing and change the properties for that particular field to allow for there sometimes being no data there, "program post position" was the field, I think.


Is it valid to have no program post position? You may need to specify zero length fields, default valids, etc for several of the fields. Another thing you can do is some data verification in the import process. Some things can be corrected programmatically. Or you might just add the data and then use a query to list records that may have invalid data.


Is there a way to include a combo box with a drop-down list for the data files that let's the user select the race card he/she wants to import from the folder? That way, one could leave all the race cards in that same folder and not have to type in a particular card every time, or use the Auto Read Programs button and read the whole folder of cards into the DB.

Yeah you can add it to the form. You want to do it or me?

raybo
02-07-2007, 09:37 PM
So you would want to store the "raw" data in the DB?? Currently the process breaks out each field from the CSV records. But that doesn't have to be. Where the program read a CSV line you could store the entire line in a table if you want. Or any parts of it. But just saving the raw input files would serve the same purpose, I think.

No, I think the CSV files saved outside the DB is the way to go. However, producing tables that contain all fields in the CSV would be one way of importing the complete file into Excel sheets, which some people, including myself, will want to use in conjunction with the DB. At present, another program is required to get the data into Excel, if we can incorporate both into your DB program then we kill 2 birds with one stone. We get the data into the DB for the tables and at the same time we create tables for all the data that could be exported to Excel sheets which is in the same field order as the data file format. The user would then only need to reference these sheet cells in their Excel work. I use Infotran currently to get the data into my Excel program, for example, which requires writing a control file for Infotran. The problem is that you can't get all the fields in this way, to my knowledge, because of column restraints of 256. There may be a better way of accomplishing this , I don't know. I was just investigating this problem while looking at your program for the DB and thought that it might be possible to do both tasks at the same time, via your program.

raybo
02-07-2007, 10:00 PM
Is it valid to have no program post position? You may need to specify zero length fields, default valids, etc for several of the fields. Another thing you can do is some data verification in the import process. Some things can be corrected programmatically. Or you might just add the data and then use a query to list records that may have invalid data.



Yeah you can add it to the form. You want to do it or me?

The post position is not necessarily the saddle cloth number and, yes, there are times when the program post position is not available. The card I used had this event in it and was producing the error. The data file format info from Bris states which fields may be unavailable, so specifying which of these fields may be subject to containing no values is the solution, IMO. Missing values for fields that are not specified happens occasionally and should be known, in some manner, to prevent mistakes in handicapping/research due to bad data from the supplier.

Concerning the combo list box idea, let me tinker with it and if I can't get it going I'll let you know.

I think the Auto Read Program is important too, as some users will like to have the capability to import several cards into the DB at the same time. If you find your old code for this function let me know, I think it would be a good addition to the program.

ps: If I am becoming a pain in the butt, at anytime, please let me know.

raybo
02-08-2007, 02:33 AM
I have written the proposed tables in Wordpad for Bris and TSN comma delimited files including MultiCaps and ProCaps with all the field names, data type, max characters, and comments. I will proof-read them and then post them for those who are interested. Comments/suggestions are welcome. Keep in mind, these were designed to contain ALL the data available from the CSV files for each of the 5 categories: Race, Horse, Jockey/Trainer, Workouts, and Past Performance Running Lines. I will probably need some help with the keys before actually creating the tables in the DB, so, any help would be appreciated. I've studied the theory of candidate keys, primary keys, and foreign keys but have never created tables before.

The Results table will be next. I will write it for Bris and TSN Exotic Results data file.

K9Pup
02-08-2007, 07:28 AM
No, I think the CSV files saved outside the DB is the way to go. However, producing tables that contain all fields in the CSV would be one way of importing the complete file into Excel sheets, which some people, including myself, will want to use in conjunction with the DB. At present, another program is required to get the data into Excel, if we can incorporate both into your DB program then we kill 2 birds with one stone. We get the data into the DB for the tables and at the same time we create tables for all the data that could be exported to Excel sheets which is in the same field order as the data file format. The user would then only need to reference these sheet cells in their Excel work. I use Infotran currently to get the data into my Excel program, for example, which requires writing a control file for Infotran. The problem is that you can't get all the fields in this way, to my knowledge, because of column restraints of 256. There may be a better way of accomplishing this , I don't know. I was just investigating this problem while looking at your program for the DB and thought that it might be possible to do both tasks at the same time, via your program.

I would probably suggest you just continue designing the tables as you already have. Make tables that group the data logicially as it is in the raw CSV file. You could even store the un-normalized data similar to the race1, race2, etc. format. Then those files can all be linked in a query, the user could then select which pieces they really need in their excel. Run the query and paste the results to excel.

K9Pup
02-08-2007, 07:33 AM
Concerning the combo list box idea, let me tinker with it and if I can't get it going I'll let you know.


No problem. I actually found some code (thats how I do my best work) that uses the standard excel open file dialog box. If you get stumped or just want the code I can send it to you and you can figure out where to add it, how to use it.


I think the Auto Read Program is important too, as some users will like to have the capability to import several cards into the DB at the same time. If you find your old code for this function let me know, I think it would be a good addition to the program.

I have it. I will send it to you, again you can try to figure out where to use it.



ps: If I am becoming a pain in the butt, at anytime, please let me know.

LOL. No problem. I find by helping other people do these things that I learn myself.

raybo
02-08-2007, 07:45 PM
I would probably suggest you just continue designing the tables as you already have. Make tables that group the data logicially as it is in the raw CSV file. You could even store the un-normalized data similar to the race1, race2, etc. format. Then those files can all be linked in a query, the user could then select which pieces they really need in their excel. Run the query and paste the results to excel.

I think reading all those theory books in the past might be more of a hindrance than a help.They all seem to want to drill into your head that if you don't create a perfect database model, prior to creating the schema then the DB won't work right. So, I have been engrossing myself in all the technicalities of "proper database design", setting up the proper relationships so that when you actually query the DB you get the results you want and they are accurate. Then, I read a post from someone, forget who, that says it's ok sometimes to de-normalize. Sheeeesh!!! No wonder people become application programmers instead of database programmers.

I'm in the middle of conceptual data modeling and trying to get it right before moving on to the schema, the physical layout of the tables. I don't want to rush this phase unduly, feeling that some extra time spent now will save me gobs of headaches down the road. But, after toiling over the relationships and how the entities are going to be represented so that the results of queries can be used in other queries, etc., that I've about decided to just create the tables and see if the thing works. I think this is basically what you are advising.

raybo
02-08-2007, 07:59 PM
No problem. I actually found some code (thats how I do my best work) that uses the standard excel open file dialog box. If you get stumped or just want the code I can send it to you and you can figure out where to add it, how to use it.


I have it. I will send it to you, again you can try to figure out where to use it.




LOL. No problem. I find by helping other people do these things that I learn myself.

That's great! Yeah, send me the stuff and I'll see if I can get it all working. It seems the longer I think about this thing the more ideas I get for bells and whistles, and I like messing with things like that, the "neat stuff". This project may look nothing like the DB you sent me by the time I'm done messing with it, lol.

I certainly appreciate all the answers you've given me, and hcap, too. Between the 2 of you I'm seeing things that are possible and achieveable, even for a non-db guy like me. I'm enjoying the heck out of most of this, no, not the theory so much as the coding and the possibilities. I've been a technician, designer and craftsman most of my life and I really get a kick out of designing and building things, granted this isn't made of wood which was my specialty, but it's still "building".

K9Pup
02-09-2007, 08:12 AM
I think reading all those theory books in the past might be more of a hindrance than a help.They all seem to want to drill into your head that if you don't create a perfect database model, prior to creating the schema then the DB won't work right. So, I have been engrossing myself in all the technicalities of "proper database design", setting up the proper relationships so that when you actually query the DB you get the results you want and they are accurate. Then, I read a post from someone, forget who, that says it's ok sometimes to de-normalize. Sheeeesh!!! No wonder people become application programmers instead of database programmers.

I'm in the middle of conceptual data modeling and trying to get it right before moving on to the schema, the physical layout of the tables. I don't want to rush this phase unduly, feeling that some extra time spent now will save me gobs of headaches down the road. But, after toiling over the relationships and how the entities are going to be represented so that the results of queries can be used in other queries, etc., that I've about decided to just create the tables and see if the thing works. I think this is basically what you are advising.

There is nothing wrong with spending some time to design things the way they should be. But at some point you just have to jump in and do it. What is the worst thing that can happen? 6 month down the road you might find your design doesn't work and you have to do it all over again. Heck I can guarantee if you still with it long enough that you will probably redo it at some point no matter what.

I think in your case you intend to use the DB for two purposes. One is to store the data in a form that is "relational DB friendly". The other purpose is to store the un-normalized data in a similar form as the raw files. So that they can basically be re-created for other purposes. In the first case you might want to spend the time to normalize and design. In the second case you want to leave the data pretty much the way it comes to you.

K9Pup
02-09-2007, 08:16 AM
I'm enjoying the heck out of most of this, no, not the theory so much as the coding and the possibilities. I've been a technician, designer and craftsman most of my life and I really get a kick out of designing and building things, granted this isn't made of wood which was my specialty, but it's still "building".

I've found that guys that can visualize something and actually build it pick up programming very quickly. Logical thinking is a big part of being a computer programmer.

raybo
02-10-2007, 09:45 AM
I've found that guys that can visualize something and actually build it pick up programming very quickly. Logical thinking is a big part of being a computer programmer.

I think that is probably correct. All the programming material I have read says that you must know what you want to end up with first, before starting. So, if you can visualize the result, you can better understand the steps required to get there.

raybo
02-10-2007, 11:00 AM
If I can get this "attachment" thing to work I'll post the field lists for the 5 DB tables I have so far.

tupper
02-10-2007, 08:28 PM
I have written the proposed tables in Wordpad for Bris and TSN comma delimited files including MultiCaps and ProCaps with all the field names, data type, max characters, and comments.Wow! Thanks!

You did that in Workpad? Does Wordpad have block selection? If not, that's a lot of work.

I was planning to do the same in the Kate text editor, using block selection on the Bris Single DRF datastructure page. I figured it would take about 2-3 hours. I was planning to keep it all in one table -- the original, single DRF format, and just let everyone break it up as they please. Can one easily recombine your tables to make one big file that matches the datastructure of the single DRF, MultiCaps and Procaps?


Attached Files images/attach/doc.gif RaceTable.doc (http://attachment.php?attachmentid=2030) (17.3 KB, 9 views) images/attach/doc.gif HorseTable.doc (http://attachment.php?attachmentid=2031) (20.1 KB, 5 views) images/attach/doc.gif PPsTable.doc (http://attachment.php?attachmentid=2032) (37.9 KB, 4 views) images/attach/doc.gif Workouts.doc (http://attachment.php?attachmentid=2033) (19.4 KB, 3 views) images/attach/doc.gif JkyTrnOwnTable.doc (http://attachment.php?attachmentid=2034) (23.8 KB, 8 views)Thanks! Unfortunately, I have been unable to download these -- not sure why. Can you post them on a free file hosting site, like filewind.com or bestsharing.com, and then post the download link(s) here?

-Dale

raybo
02-10-2007, 10:43 PM
Wow! Thanks!

You did that in Workpad? Does Wordpad have block selection? If not, that's a lot of work.

I was planning to do the same in the Kate text editor, using block selection on the Bris Single DRF datastructure page. I figured it would take about 2-3 hours. I was planning to keep it all in one table -- the original, single DRF format, and just let everyone break it up as they please. Can one easily recombine your tables to make one big file that matches the datastructure of the single DRF, MultiCaps and Procaps?


Thanks! Unfortunately, I have been unable to download these -- not sure why. Can you post them on a free file hosting site, like filewind.com or bestsharing.com, and then post the download link(s) here?

-Dale

The 5 files contain all the fields in the MultiCap/Procaps single file format data file. There are some common fields, Today's date, track, race#,etc., for ID purposes. I suppose you could copy/paste them all into one document. Remember, these are only text documents showing the fields that will go into DB tables.

I don't know what you mean by not being able to download them. If you can view them then just paste them into a text editor or print them out.

raybo
02-10-2007, 10:57 PM
Wow! Thanks!

You did that in Workpad? Does Wordpad have block selection? If not, that's a lot of work.

I was planning to do the same in the Kate text editor, using block selection on the Bris Single DRF datastructure page. I figured it would take about 2-3 hours. I was planning to keep it all in one table -- the original, single DRF format, and just let everyone break it up as they please. Can one easily recombine your tables to make one big file that matches the datastructure of the single DRF, MultiCaps and Procaps?


Thanks! Unfortunately, I have been unable to download these -- not sure why. Can you post them on a free file hosting site, like filewind.com or bestsharing.com, and then post the download link(s) here?

-Dale

If you can't download them because of the .doc extension I can change it to .txt if that would help.

tupper
02-11-2007, 03:27 AM
The 5 files contain all the fields in the MultiCap/Procaps single file format data file. There are some common fields, Today's date, track, race#,etc., for ID purposes. I suppose you could copy/paste them all into one document. Remember, these are only text documents showing the fields that will go into DB tables.I haven't decided whether or not to use the standard DRF or the fancy MultiCap/Procap files. Somebody here posted that the MultiCap files include the Bris race ratings and class ratings, which could save a calculation or two. It would be interesting to get opinions here on the quality of these race/class ratings.

If you retained the Bris structure within each table, it should be easy to paste these files together to get the single version (barring any limitations of the text editor one uses).


I don't know what you mean by not being able to download them. If you can view them then just paste them into a text editor or print them out. If you can't download them because of the .doc extension I can change it to .txt if that would help.When I click on the link, my Firefox browser switches to a blank page. I think that some sort of authentication is required to download files linked in this forum. I can login with cookies and javascript activated, but I still get a blank page. I doubt that changing the file extensions will help, but I guess it's worth a try.

One of the many free, Internet file hosts will undoubtedly work.

Thanks,
-Dale

BillW
02-11-2007, 03:36 AM
When I click on the link, my Firefox browser switches to a blank page. I think that some sort of authentication is required to download files linked in this forum. I can login with cookies and javascript activated, but I still get a blank page. I doubt that changing the file extensions will help, but I guess it's worth a try.

One of the many free, Internet file hosts will undoubtedly work.

Thanks,
-Dale
It's a suffix issue. When I click on them FF brings the document up in OOWrite as it would any MSWord .doc file. If you don't have MS Office or Open Office installed, you will probably get the blank page. Probably the extension change will help. Another fix would be to change the way FF handles .doc files to "download" instead of display in browser.

Bill

raybo
02-11-2007, 08:26 AM
I haven't decided whether or not to use the standard DRF or the fancy MultiCap/Procap files. Somebody here posted that the MultiCap files include the Bris race ratings and class ratings, which could save a calculation or two. It would be interesting to get opinions here on the quality of these race/class ratings.

If you retained the Bris structure within each table, it should be easy to paste these files together to get the single version (barring any limitations of the text editor one uses).


When I click on the link, my Firefox browser switches to a blank page. I think that some sort of authentication is required to download files linked in this forum. I can login with cookies and javascript activated, but I still get a blank page. I doubt that changing the file extensions will help, but I guess it's worth a try.

One of the many free, Internet file hosts will undoubtedly work.

Thanks,
-Dale

I don't have a problem with downloading them through Firefox. I get the dialogue box asking if I want to save to disk or open with ---, etc. Have you tried right clicking the files and saving to your desktop, then opening them from there with a text editor?

raybo
02-11-2007, 08:58 AM
I haven't decided whether or not to use the standard DRF or the fancy MultiCap/Procap files. Somebody here posted that the MultiCap files include the Bris race ratings and class ratings, which could save a calculation or two. It would be interesting to get opinions here on the quality of these race/class ratings.

If you retained the Bris structure within each table, it should be easy to paste these files together to get the single version (barring any limitations of the text editor one uses).


When I created the docs I copied and pasted fields from the original Bris .MCP data file structure listing, to create the 5 categories of data. If you combined them into one document you would have to remove the duplicated fields and put them back in field order, if that's the format you want them. Or just go to the data file structure and copy the whole list into your text editor and do with it as you wish. I did a lot of editing to change the names of the fields to names that would be more compatible with the future DB tables, IE: PPDate, PPRace# (in the PPTable) vs. Today's Date, Today's Race# (in the RaceTable). The purpose of these documents was to set the data that would be included in the 5 categories of data prior to creating the actual DB tables.

The Multicaps/Procaps files do contain RR and CR, I don't personally use these 2 pieces of data but included them for those who wish to use them. I suggest that you, too, include them even if you don't currently use them as you might want to in the future. Also, I will be adding the reserved fields to the actual tables in the DB for future expansion by Bris/TSN. This will save you some time if and when they become populated with data. I have already changed my PPTable doc file to include the Bris "Pace Shape" fields that were recently added by Bris (fields 696-705 and 756-765).

As far as the value of RR(Race Rating) and CR(Class Rating), I haven't found anything meaningful in trying to use them, I'm sure some of the DB gurus here could answer that question for you, concerning any impact value they might have, after running them against their DBs.

The RR includes all horses in the race, so the value of RR becomes skewed by lows in the field, the CR is related to the RR, so, the same type of inaccuracy would exist for it, too. A better class number, IMO, would be to just concentrate on the pace of the leaders at the calls and the winners final figure. To include all the field in these figures is almost laughable, most had no influence on how the race was run, as they neither set the pace nor pressed the pace. Why include them in the class figure?

sjk
02-11-2007, 10:32 AM
Raybo,

Your database as proposed is going to get very big in a hurry. There are size limitations with Access that are going to come into play.

If you DB each of 10 PP lines per starter, each PP line will potentially be represented 10 times in the database. The info about the past races 1-2-3 finishers will occur up to 10 times for each starter or possibly over 100 times in all.

It is usually recommended that a database be designed where each piece of information occurs in only one place. This is not just to save space but to avoid the possibility of conflicts.

You propose to store a massive amount of trainer/jockey/owner information which could arguably be calculated from the PP information. How do you plan to use this? Unless it is frequently used you may not want to store this.

raybo
02-11-2007, 10:46 AM
Raybo,

Your database as proposed is going to get very big in a hurry. There are size limitations with Access that are going to come into play.

If you DB each of 10 PP lines per starter, each PP line will potentially be represented 10 times in the database. The info about the past races 1-2-3 finishers will occur up to 10 times for each starter or possibly over 100 times in all.

It is usually recommended that a database be designed where each piece of information occurs in only one place. This is not just to save space but to avoid the possibility of conflicts.

You propose to store a massive amount of trainer/jockey/owner information which could arguably be calculated from the PP information. How do you plan to use this? Unless it is frequently used you may not want to store this.

Could you explain the bold paragraph? I don't understand how that could happen.

The TJO data is only included in case someone wants to utilyze this data. I personally don't use anything but the basic meet/previous year data for jockey/trainer sts,wps

sjk
02-11-2007, 10:58 AM
Suppose you have a race with a 12 horse field. For each of the next 10 times those 12 horses run you will add a line in the PPsTable that will relate to their performance in this race.

The info as to who ran 1-2-3 in the race will be identical in each of the 120 records.

raybo
02-11-2007, 11:00 AM
Suppose you have a race with a 12 horse field. For each of the next 10 times those 12 horses run you will add a line in the PPsTable that will relate to their performance in this race.

The info as to who ran 1-2-3 in the race will be identical in each of the 120 records.

Ok, I understand now. What would be your solution for handling this?

sjk
02-11-2007, 11:09 AM
Take some time to think about what information you will need and how you plan to use it. There is information in your Bris data that you will probably not want to add to the database since it duplicates what you already have.

Figure out what these are and leave them out.

If there is no forseeable use for some of the information or if it is easily derived from information that you will have leave it out.

raybo
02-11-2007, 11:17 AM
Take some time to think about what information you will need and how you plan to use it. There is information in your Bris data that you will probably not want to add to the database since it duplicates what you already have.

Figure out what these are and leave them out.

If there is no forseeable use for some of the information or if it is easily derived from information that you will have leave it out.

Are you suggesting that I only DB my final handicapping data for each horse, the running line I use or a combintion of data compiled into 1 set of data per horse?

What about the user that wants to analyze his/her methods for selecting a representative running line or, if they don't use a running line, their method of grading the horses?

I plan to not include data that I don't intend to use in my personal handicapping method. But, others may want all the data available for future examination.

sjk
02-11-2007, 11:34 AM
I would not leave out any running lines but would not duplicate them when they again appear in your Bris data.

The purpose of the Bris info is to give a user who has no database all of the information he might need to do his handicapping. Not all of the information in the Bris files needs to be added to a database because some of it is already there and some of it can be computed from what is already there.

Information that is intended to be used visually is probably going to be organized differently from a database since duplication of information helps the viewer but not the database.

I think I am getting repetitive here and hope you see what I am trying to say.

K9Pup
02-11-2007, 11:38 AM
In the case of the PPsTable and WorkoutTable you probably want to create a unique key for each that is defined as Horse and RaceDate (not today's race, but the date of the past race/workout). Set it to unique = yes and the DB will only allow one copy of that past line in the table.

For these tables you might drop the "today's information". Or store in as information to track where and when the data got into the DB> When processing you could include these past races in a query by specifying that the date be < Today's race date. So you run the query for a race today it will include workouts/lines up to today. Run the query for a race last week and it will only include races up to that date.

raybo
02-11-2007, 11:56 AM
I would not leave out any running lines but would not duplicate them when they again appear in your Bris data.

The purpose of the Bris info is to give a user who has no database all of the information he might need to do his handicapping. Not all of the information in the Bris files needs to be added to a database because some of it is already there and some of it can be computed from what is already there.

Information that is intended to be used visually is probably going to be organized differently from a database since duplication of information helps the viewer but not the database.

I think I am getting repetitive here and hope you see what I am trying to say.

sjk,

Will K9Pup's suggestion about the unique key solve the repetition you're speaking of, it appears it does.

By computing from existing data, can you give me an example?

raybo
02-11-2007, 12:01 PM
In the case of the PPsTable and WorkoutTable you probably want to create a unique key for each that is defined as Horse and RaceDate (not today's race, but the date of the past race/workout). Set it to unique = yes and the DB will only allow one copy of that past line in the table.

For these tables you might drop the "today's information". Or store in as information to track where and when the data got into the DB> When processing you could include these past races in a query by specifying that the date be < Today's race date. So you run the query for a race today it will include workouts/lines up to today. Run the query for a race last week and it will only include races up to that date.

K9pup, will leaving out the Today's info, in any way, restrict the user's ability to accomplish whatever he/she desires? If not, I will certainly follow your suggestion, adding this data was an after thought on my part, thinking that in order to join/query the DB these common fields would be needed. My DB lack of experience is showing, huh?

sjk
02-11-2007, 12:16 PM
sjk,

Will K9Pup's suggestion about the unique key solve the repetition you're speaking of, it appears it does.

By computing from existing data, can you give me an example?

The key should work. (Not for s-breds that can run more than once a day.)

An example of calculating data from the db is that you can make your own trainer/jockey stats once you have some history in the database.

raybo
02-11-2007, 12:37 PM
The key should work. (Not for s-breds that can run more than once a day.)

An example of calculating data from the db is that you can make your own trainer/jockey stats once you have some history in the database.

Are you saying that some horses run in a state bred race and a non-state bred race on the same card? I don't even look at horse names usually so I've never noticed this happening at the tracks I handicap. That would seem to be unwise, IMO.

So, leave the jockey/trainer table in for now? Oh heck, some handicappers focus their whole wagering program on jockey/trainer combinations, if they want to use the data space for that info, so be it. I will use only what I consider important in my own DB anyway. But, knowing that some of the data can be calculated from existing DB data is something that each user will have to decide, on their own. My thought at this point is to include everything and let the user add or subtract as needed for their particular style. If this DB project was just for myself I would not have gone to the trouble to include every piece of data in the proposed tables that is in the raw csv file.

sjk
02-11-2007, 12:40 PM
S-bred= Standardbred

raybo
02-11-2007, 12:42 PM
S-bred= Standardbred

OK.

K9Pup
02-11-2007, 12:45 PM
K9pup, will leaving out the Today's info, in any way, restrict the user's ability to accomplish whatever he/she desires? If not, I will certainly follow your suggestion, adding this data was an after thought on my part, thinking that in order to join/query the DB these common fields would be needed. My DB lack of experience is showing, huh?

I don't think it will be a problem. You can always get Today's info by joining the PP table with today's races.

In the case of S-bred horses you can add RaceNo to the key and that will allow horses to have multiple races for the same day.

raybo
02-11-2007, 12:57 PM
I don't think it will be a problem. You can always get Today's info by joining the PP table with today's races.

In the case of S-bred horses you can add RaceNo to the key and that will allow horses to have multiple races for the same day.

Gotcha.

Concerning the unique key, I will have to do a concatenation of HorseName, Date, and Race#, right?

sjk
02-11-2007, 12:59 PM
You can use multiple fields for key

raybo
02-11-2007, 01:02 PM
You can use multiple fields for key

Wouldn't it be better to have a field that contains the concatenation so I could set that field's "unique" property to "yes" to block duplicate records from being recorded in the DB?

sjk
02-11-2007, 01:09 PM
Use those fields as primary key and you will block duplication.

raybo
02-11-2007, 01:10 PM
Use those fields as primary key and you will block duplication.

Ok, cool!

raybo
02-11-2007, 02:25 PM
In the WorkoutsTable shouldn't the field names be WorkDate, WorkTime, WorkTrack, etc and then the data be under that as records (up to 12 records for each horse), rather than having a field for each WorkDate (12 fields), WorkTime (12 fields), etc.?

raybo
02-11-2007, 02:53 PM
In the WorkoutsTable shouldn't the field names be WorkDate, WorkTime, WorkTrack, etc and then the data be under that as records (up to 12 records for each horse), rather than having a field for each WorkDate (12 fields), WorkTime (12 fields), etc.?

I will have to do a "For--Next" loop to record these as records rather than fields, right?

sjk
02-11-2007, 04:05 PM
I have never used a for-next loop (or any VB) in Access for anything.

There is nothing wrong with the workout data. What is goofy is that you get it when the horse runs rather than when the horse works; so you ignore the fields relating to track date race# post and program. Then Horse Name and Date of Workout are the table keys and the others are additional fields.

raybo
02-11-2007, 04:09 PM
I have never used a for-next loop (or any VB) in Access for anything.

There is nothing wrong with the workout data. What is goofy is that you get it when the horse runs rather than when the horse works; so you ignore the fields relating to track date race# post and program. Then Horse Name and Date of Workout are the table keys and the others are additional fields.

Yeah, I had already removed the Today's data from that table and assigned primary keys to HorseName and WorkDate. So, just leave the 12 fields for each of the data types as fields?

K9Pup
02-11-2007, 04:11 PM
I will have to do a "For--Next" loop to record these as records rather than fields, right?

Yeah. You loop through 10 times writing each of the workout records, same with the PPs.

raybo
02-11-2007, 04:12 PM
I have never used a for-next loop (or any VB) in Access for anything.

There is nothing wrong with the workout data. What is goofy is that you get it when the horse runs rather than when the horse works; so you ignore the fields relating to track date race# post and program. Then Horse Name and Date of Workout are the table keys and the others are additional fields.

Is your reason for not using any VB in your DB for reasons of speed?

sjk
02-11-2007, 04:13 PM
I am not seeing 12.

Horse Name
Date
Time
Track
Distance
Tr Cnd
Description
Main/Inner
# of Works
Rank

Only 10

sjk
02-11-2007, 04:15 PM
Never found anything I could not do with macros and the learning curve for VB was steeper.

raybo
02-11-2007, 04:15 PM
I am not seeing 12.

Horse Name
Date
Time
Track
Distance
Tr Cnd
Description
Main/Inner
# of Works
Rank

Only 10

Ahhhh! That was what I meant by just using those 10 fields rather than putting each of the 12 work dates, work times etc., in their own fields as I have layed them out in the table.

K9Pup
02-11-2007, 04:15 PM
Yeah, I had already removed the Today's data from that table and assigned primary keys to HorseName and WorkDate. So, just leave the 12 fields for each of the data types as fields?

I think you need to define ONE key that includes both horse and date. To do that in design mode go to View, Indexes. The ADD and index (call it HorseDate?). First line select the Horse field, second line (index name is blank) select Date. Then mark this as primary and unique. I think if you make Horse unique AND Date unqiue you will have problems.

Also you want to create 1 record for each workout in the raw file. So you DO want to loop through them and create the records.

raybo
02-11-2007, 04:17 PM
Never found anything I could not do with macros and the learning curve for VB was steeper.

Ok, I understand, thought it was a processing issue.

sjk
02-11-2007, 04:19 PM
I think you need to define ONE key that includes both horse and date. To do that in design mode go to View, Indexes. The ADD and index (call it HorseDate?). First line select the Horse field, second line (index name is blank) select Date. Then mark this as primary and unique. I think if you make Horse unique AND Date unqiue you will have problems.

Also you want to create 1 record for each workout in the raw file. So you DO want to loop through them and create the records.

Do they really have all the workouts in one record? Still no need for looping. Easy enough to use a query to break the data apart.

K9Pup
02-11-2007, 04:20 PM
Ahhhh! That was what I meant by just using those 10 fields rather than putting each of the 12 work dates, work times etc., in their own fields as I have layed them out in the table.

Yeah your workout table should only have one date of workout,time, track. etc. You will create up to 12 workout records (rows) per horse. When you loop through make sure you check that there is data for that piece of the workout segment.

K9Pup
02-11-2007, 04:23 PM
??

Unless I can't read they are in individual records

You are looking at his Workouts.doc file? I think that needs to be changed. IMO the workout file should be
1 Track CHARACTER XXX 3

2 Date CHARACTER XXXXXXXX 8

3 Race # NUMERIC 99 2

4 Post Position NUMERIC 99 2

43 Program Number (if available) CHARACTER XXX 3

45 Horse Name CHARACTER 25



102 Date of Workout DATE 99999999 8 CYMD

114 Time of Workout NUMERIC 9999.99 7 (seconds & hundredths)

126 Track of Workout CHARACTER 10
etc etc.

K9Pup
02-11-2007, 04:25 PM
Do they really have all the workouts in one record? Still no need for looping. Easy enough to use a query to break the data apart.

Sure you CAN, but that wouldn't be a normalized table. In the true relational model the workout and PP tables will have one row for each workout/past line.

raybo
02-11-2007, 04:28 PM
I think you need to define ONE key that includes both horse and date. To do that in design mode go to View, Indexes. The ADD and index (call it HorseDate?). First line select the Horse field, second line (index name is blank) select Date. Then mark this as primary and unique. I think if you make Horse unique AND Date unqiue you will have problems.

Also you want to create 1 record for each workout in the raw file. So you DO want to loop through them and create the records.

So there will be another field in the table called HorseDate or whatever I name the index?

sjk
02-11-2007, 04:30 PM
I would see that as a temporary table to help process the data and put it where it belongs.

No big deal to do it my way and I'm sure the same for yours.

raybo
02-11-2007, 04:35 PM
You are looking at his Workouts.doc file? I think that needs to be changed. IMO the workout file should be
1 Track CHARACTER XXX 3

2 Date CHARACTER XXXXXXXX 8

3 Race # NUMERIC 99 2

4 Post Position NUMERIC 99 2

43 Program Number (if available) CHARACTER XXX 3

45 Horse Name CHARACTER 25



102 Date of Workout DATE 99999999 8 CYMD

114 Time of Workout NUMERIC 9999.99 7 (seconds & hundredths)

126 Track of Workout CHARACTER 10
etc etc.

If you notice the field numbers you will see ranges for some of the field names, this means that each of the fields in a particular range are different workouts, starting with the most recent and continuing to the most distant of the 12 possible workouts.

raybo
02-11-2007, 04:43 PM
Yeah your workout table should only have one date of workout,time, track. etc. You will create up to 12 workout records (rows) per horse. When you loop through make sure you check that there is data for that piece of the workout segment.

Ok, that's what I thought after rethinking the layout of the table to prevent "duplicate" fields, (not really duplicates per se but "repeating groups", they should be records, not fields), in First Normal Form.

K9Pup
02-11-2007, 05:17 PM
So there will be another field in the table called HorseDate or whatever I name the index?

No, you don't need a field just an INDEX. The INDEX is maintained internally and can be made up of various fields. Access combines them internally, you don't need to actually create a field.

raybo
02-12-2007, 08:26 AM
I think you need to define ONE key that includes both horse and date. To do that in design mode go to View, Indexes. The ADD and index (call it HorseDate?). First line select the Horse field, second line (index name is blank) select Date. Then mark this as primary and unique. I think if you make Horse unique AND Date unqiue you will have problems.

Also you want to create 1 record for each workout in the raw file. So you DO want to loop through them and create the records.

Ok, didn't work as you stated, let me see if I got this right. In design mode I clicked "View", then "Indexes, then when the Indexes window popped up I named the Index in the first row "HorseName", then in row 1 under "field name", I selected "HorseName", got the options in the lower left area of the window and clicked "Primary" and selected "no", clicked "Unique" and selected "no", when I went to 2nd row field name and selected "WorkDate" I didn't get the popup, in the lower left area of the window, to make it primary and unique.

raybo
02-12-2007, 08:48 AM
Ok, didn't work as you stated, let me see if I got this right. In design mode I clicked "View", then "Indexes, then when the Indexes window popped up I named the Index in the first row "HorseName", then in row 1 under "field name", I selected "HorseName", got the options in the lower left area of the window and clicked "Primary" and selected "no", clicked "Unique" and selected "no", when I went to 2nd row field name and selected "WorkDate" I didn't get the popup, in the lower left area of the window, to make it primary and unique.

Should I have made "HorseName" primary and unique, it won't let me make "WorkDate" primary or unique, it automatically made it a key.

K9Pup
02-12-2007, 08:52 AM
Ok, didn't work as you stated, let me see if I got this right. In design mode I clicked "View", then "Indexes, then when the Indexes window popped up I named the Index in the first row "HorseName", then in row 1 under "field name", I selected "HorseName", got the options in the lower left area of the window and clicked "Primary" and selected "no", clicked "Unique" and selected "no", when I went to 2nd row field name and selected "WorkDate" I didn't get the popup, in the lower left area of the window, to make it primary and unique.


I would name the index something like HorseWorkDate in column 1. Then in column 2 select HorseName. On the second row leave index name blank and in column 2 select WorkDate. That will give you an index called HorseWorkDate that is made up of HorseName and WorkDate. Then select row 1 and set it as unique and primary.

BillW
02-12-2007, 09:13 AM
Raybo,

A point regarding indexing any record relating to a horse's name. Include the country of foaling with the horse's name in the key. It is relatively rare but you will get horses with the same name from different countries in your database. I once came across a race in Ca. (HOL?) that had 2 horses with the same name entered in the same race :eek: .

Bill

raybo
02-22-2007, 08:30 PM
I'm still working on the DB, for those of you who are interested in the project. Finalizing the tables before testing them with some queries. Have a few days left of working with the tables , then I'll start running some data through it. I have 8 tables so far that will receive the CSV data, plus 2 other tables related to track pars and equalization charts which may or may not interest you, not sure myself. Haven't created a results table yet or a table for post-handicapped figures, I use Excel for my handicapping formulas and plan to import those figures into the DB for analysis purposes against "my" numbers. My numbers will not be included, of course, when I distribute the DB upon completion. That's the user's job. I will try to put together enough basic queries that the user will be able to modify them easily enough to be able to create personalized queries of their own design.

Just wanted to let you know,those of you who have an interest in a horseracing DB, that I haven't abandoned the idea, just trying to get things sorted out. If anyone wants to see the tables I have so far send me a message and I'll attach them so you can see, tentatively, what data will be in which tables.