PDA

View Full Version : Say You Wanted to Start Your Own Database


NormanTD
05-25-2005, 07:59 PM
Say you wanted to start your own database as much for the challenge as for the potential rewards that may go along with such an endevor. And, say you're going to use something like MS Access '97 because you still have the whole set of MS Office 97 but when you upgraded to 2000, Access was not included and say you're going to use TSN Procaps files (probably single file format) and say you could parse out whatever you wanted from the files.

How would be the best way to set up you Access Tables? Like, what do you include in each table and about how many tables do you think someone would need?

I don't want to reinvent the wheel here and I figure you guys have already configured the ultimate set of tables for handicapping reseach, so any help or suggestions are appreciated.

Brian Flewwelling
05-25-2005, 08:07 PM
"It's a long way to ..."

I know this because you refer to single-file format. The first step is to read about db's where you want a table for each 'entity' ... like Races, Runlines.

So the single file will have to be broken out! If you can get a multi-file format some of that may already be done for you.

I would be willing to chat about the basics but his is not the place to do that, so send me a pm.

Fleww

sjk
05-25-2005, 08:19 PM
You need two tables to hold the historical information. One is race specific with date, track, race, distance, surface, track cond, race conditions, times, payoffs etc. The other is horse specific with date, track, race, running lines, odds, weight, trainer, jockey, etc.

hurrikane
05-25-2005, 08:41 PM
you will need more than 2 tables. take the info, break it out into related 'buckets' or information.

that would be a start. you have at least 4 before anything happens,
Jockey,
Trainer,
Horse,
race.

that is at the very least the begining

DJofSD
05-25-2005, 09:49 PM
How to design a data base. A data base for "mining" thoroughbred racing data.

Seems like there's been other threads that have discussed this previously on this board.

How many tables? What's in each one? The honest answer is: it depends.

Are you designing a data base for specific approachs to handicapping such as trainer angles? Or just trying to "suck up" as much data as possible to peform general queries?

Are your imputs guarenteed to have absolutley 100% correct data or do you expect the occasional misspelled name (horse or jockey or trainer or....)? What about missing data?

This is just a round about way of saying there is not a single solution to the problem. How big the data base grows, how quick do the queries need to be, the ease of forming ad hoc queries, how easy is it to maintain the data and more will factor into how you define the tables and the relationships they represent.

DJofSD

hurrikane
05-26-2005, 07:16 AM
I might also suggest that if you have some compter/db skills do not use access.

YOu will find that once you have a few years of data you run out of room in access. a cheap(free) solution is MySql. Great db but it will have a little learning curve. Howevey, If you have to learn access you may as well learn a real db.

NormanTD
05-26-2005, 03:04 PM
First, thanks for the replies. And as for replies to replies:

Brian, a PM is on its way as soon as I finish this post.

SKJ and Hurrikane, this looks to be about where I would need to start but how would you link all of these tables together? Track, Date, Race Number?

DJofSD, I believe I'm leaning toward specific approachs to handicapping such as trainer angles.

Hurrikane, I don't have any access skills at this point but over the last year or so, I've collected 4 or 5 good books on Access 97 (for Dummies, Programming A97 for Dummies, Peter Norton and a Sams book) so I'm geared toward Access just from that perspective. If I had these sort of resourses available in MySql, I could just as easily go in that direction. I went to the MySql website and for the life of me, I could not figure out what to download :confused: What would you recommend?

I did search Handicapping Software for databases but did not come up with the sort of information I was looking for, thus, my original post. So I'm not trying to be a nag or a leach, I'm just another punter looking to get an edge somewhere.

Thanks again.

sjk
05-26-2005, 05:59 PM
NormanD,

In any query involving both tables you would join the fields as you suggested.

I would not agree with Hurricane as to running out of space with Access. I have been using it 12 years and have not run out of space yet. On the other hand the 2 gig space limitation is an ongiong nuisance.

Charlie Judge
05-26-2005, 07:23 PM
I started out in dos dBase, and now use ms Visual FoxPro. I am very happy with it. You can write old fashioned linear code or do object oriented programming as well.

In regards to tables.. I strongly suggest that you get your data in multi table formats from your vendor. It is much easier to work with than one long file which requires more parsing.

For individual race days, the data is usually provided in 3 files by the vendor:

1. race data.. class conditions distance surface etc.
2. horse data.. name, sire,trainer,jockey, stats
3. pp lines for each horse in each race.

Link by track-date-racenumber files 1 2 and 3
link by track-date-racenumber and horsename files 2 and 3

Rank each horse in each race by whatever criteria you choose.
Save the ranks in the horse table.

Save all your data.
Get your results from your vendor.
Re-rank the data after the scratches have been removed




Then link the times and conditions to the race file, drop the scratched horses from the horse file, and add in the position, beaten lengths and odds to the horse file.

Write programs to automatically download, process and store the data. Write programs to process the results automatically.

Now you have a real database!
Save all pp lines forever! (only one copy of each line for each horse.. most vendors give 10 pp lines per horse, so each time the horse runs back, you will get 9 duplicate lines to deal with).

Once you start building the database, you can then build research tables on trainers, jockeys,sires, etc.
Suggest you find a vendor which supplies data on a monthly subscription basis so you can download every track every day. That way you can rapidly build a real research database, and you will be amazed at what you can find in your data.

CAJ

BillW
05-26-2005, 07:32 PM
Link by track-date-racenumber files 1 2 and 3
link by track-date-racenumber and horsename files 2 and 3

CAJ

Don't forget Country of birth - it may only happen within the same race a few times in a lifetime, but it is easy to get 2 horses with the same name in a database.

Bill

sjk
05-26-2005, 07:36 PM
Bill or anyone,

Do you have an up to date list of duplicate names? I am afraid I am missing a few.

BillW
05-26-2005, 07:46 PM
I don't keep one - but I'll see what I can dig up.

sjk
05-26-2005, 07:51 PM
I appreciate any additions. Even one duplicate that I don't know about can save some bad bets.

BillW
05-26-2005, 08:01 PM
I appreciate any additions. Even one duplicate that I don't know about can save some bad bets.

There are two classes of dupes in this list - errors from Equibase i.e. two different states of foaling and real dupes - two different countries of foaling.
This is out of about 105,000 horses in my database. If you want me to mail you the list just PM me with your e-mail address.

Bill


A CASE OF CLASS
AFFIRMED ILLUSION
ALISE'S FIGHTER
ALWAYS FIRST
AMBITIOUS
ANJA
APPROACH
ARMSTRONG
AY CARAMBA
AZILLION
BLACK SILK
BLUFF
BOCELLI
BOREAL
BYZANTIUM
CAGNEY
CAMACHO
CANNED HEAT
CASSY'S STAR
CHARIOT
CHARMING BOY
CHIMING
CLASSMATE
CLIFDEN
CREATIVE DANCE
DAY OF RECKONING
DESERT DANCER
DONIZETTI
DOWN RIGHT CRAFTY
DREAM MACHINE
EASY APPROVAL
EMPRESS ANNA
EYES WIDE OPEN
FLAG DAY
FRANCE
FUMPH AROUND
GENERAL PLOT
GERONIMO
GODSEND
GOLDEN DRAGON
GOLDEN SANDS
GONE QUIET
GOUACHE
GREEK PRIDE
HAPPY AT LAST
HIGH SOCIETY
HOLY MACKEREL
HOMELAND
HOST
IGOR
I GOT YOUR NUMBER
INCITATUS
INDIAN PROSPECTOR
INTIMATELY
IQUITOS
IRGUNETTE
JAZZ BEAT
JOE BEAR
KANDAHAR
KARSAVINA
KING OF THE BLUES
KWAME
LAST MINUTE DETAIL
LICENSE TO RUN
LIFE
LOGGER
MACCHIATO
MAKE NO MISTAKE
MANIPULATOR
MARINE
MATLOCK
MAUMEE
MISS LUCIA
MISS VICTORY
MIU MIU
MONA LISA
MOUNTAIN TOP
MY DREAM
NATIONAL PARK
NITTANY EXPRESS
NORTHERN CANDY
NOT FOR ME
ORO DE ORO
OTHELLOS FELLOW
OUT OF MIND
PAGLIACCI
PAVILLON
PETRUS
PHAEDRA
PRINTEMPS
PUMA
RED BRIAR
REFINE
REFUSE TO BEND
RIGHT HERE
ROAD TO JUSTICE
RUTHLESS REBECCA
SANFRAN
SCHILLER
SHEBEEN
SHOCKING DUNN
SIR HOWARD
SNAP JUDGEMENT
SOLID GOLD
SONIA
STAGE CALL
STAR PARADE
STATEMENT
STRATEGY
STRATTON
STRATUS
SUPER BOWL
TAMBURELLO
TANDY LAKES
TANGLE
TAPATIO
TEMPEST
TORUN
TRIPLE GOLD
TYCOON
WATER SKI
WILD SPIRIT
WINE AT DAWN
WINNING FEVER

sjk
05-26-2005, 08:10 PM
Bill,

Thanks for the list. Have appended the new ones to my list. Better too many than too few since I check when entered to see if recent starts appear to be scrambled.

hurrikane
05-26-2005, 10:18 PM
If you want to use Access fine. I wasn't trying to bad mouth it. I have run out of room but my data is maybe much more involved. As you said te 2 gig space is an annoyance. I prefer to elimate annoyances from my life.

some comments on the above information.

keep things in 'buckets' of relavant data.
for instance, what does not belong in this table.

horse sire dam birth trainer jockey.

obviously trainer and jockey are not realted to horse.
if you put them in there then you will have the same horse in there several times with different jockies or trainers. That info is really race specific and should be in a table of race info with the horse_ID from the horse table, the Jockey_ID etc. and information specific to the race.

Creating rankings for your stats is great for mining speed but some problems to overcome.

1. if you plan on searching your db for spot plays and then at some point pulling out the plays every day and playing them you will have to keep 2 tables. One with rankings before scratches and one with results for after scratches. The rankings change from scratches. The results from you mining are misleading.

2. rankings can give you some rather misleading results. be sure to check your results very closely.

DarkDream
05-27-2005, 06:48 PM
I'm a developer, so I've had some experience with database design.

I have currently around 80,000 races or so in my MySQL database for all the major race tracks. I have all the races for 2003, tracks close to me for 2004 and all the races for 2005 so far.

To be perfectly honest, if you don't have any development skills I would not try to do it. Database design is difficult and unless you are a coder, parsing all of the data is no an easy task.

For example, with my database I did not try to capture all the information on exotic bets. Even so, I have a total of 32 tables. To help you get started if you really want, here are some of the main tables I created:

race
jockey
horse
track
track_day (race card)
trainer
distance
owner
entry

--DarkDream

sjk
05-27-2005, 06:57 PM
You never know if you have database development skills until you try. Plenty of money to be made by those who are willing to put in the effort.

DarkDream
05-27-2005, 07:46 PM
Not to contradict myself, one can learn and develop a database but unless you have a high-level of motivation and resources to do so, it is a difficult thing to do.

I was trying to point out that devloping a database with horse race data is no easy task. For example, it took me at least a year to get everything working correctly as I wanted it. Obviously I worked on it off and on because of a full time job, but it is no easy task.

The database design was not so much an issue but it was the parsing of all the data, and creating an object-relational bridge between the parsed data and the database.

If someone is really interested in doing so, I would recommend to read some database design books, a book on SQL, take a class on databases at your local community college and start out with developing a real simple database at first (maybe a small database with a couple of tables like horses and jockeys). From there you can expand and get better.

--DarkDream

hurrikane
05-27-2005, 08:01 PM
you make some good points. I'll just want to add that

you don't have to make it this complex. i know a whole lot of people that have access dbs that are nothing more than a big spreadsheet with 127 fields and hundreds of thousands of rows of racing.

You can query a lot of info from that format. It is just cumbersome, not rettilational, and hard to maintain. But you can do it very simply.

I would suggest getting a download service for your data. you really need to get every race every day to make valuable queries. The data is much tougher than the database is to handle

Start simple a few tables and build from there. There is nothing that says you have to know how to build a rocketship starting out..

PS....
If anyone tells you not to do something....tell them to kiss your ass.
no one ever succeeded at anything by NOT doing something. :D

BillW
05-27-2005, 08:24 PM
Another point for anyone learning how to build a database - keep the raw data, you will be changing your design a lot and will do a lot of re-importing.

DJofSD
05-27-2005, 10:02 PM
I think the feedback of DarkDream and BillW is good.

There's the theory and the practicum. A little theory goes a long way.

On the other hand, you'll be spending a lot of time on the implementation details. Learning the characteristics and limitations of your DBMS of choice. And as has already been pointed out, where you get your data, what it "looks" like, how to get it into the tables, etc., will take lots of time. Errors -- how to detect them and handle them. And more.

There are plenty of books and articles online to help with the learing curve. And likely, the framework you are going to use will have some kind of wizard or design tool to help you create your initial tables.

If you live in a town with a good technical bookstore or are near a major university with a data processing courses, you might find spending an hour or two looking through the selections to find a book to help with the process a worthwhile use of your time.

I would suggest the basic theory of relational data bases is fairly straightforward. A well written book will illustrate the concepts so they're clear and easy to understand. I would offer the following list as "core" concepts. Others can add to or substract from it. Learn about: tables (rows and columns), (primary) keys both simple and compound, normal forms up to and including 3rd NF, queries. Along the way, you'll get exposed to one-to-one, one-to-many and many-to-many relationships and deriving from those is relational integrity which has to do with being able to get to all the data in the tables and not breaking your queries.

Further down the road, you'll likely want to learn about 4th NF for time dependencies, more sophisiticated queries like outer joins, foreign keys and design methods such as Chen or others whose names escape me at the moment. (They'll use discriptive terms like "crows feet" or Chen notation. They're different ways to create diagrams generally referred to as entity-relationship diagrams. A great tool to layout the data into different tables and to show how the fields relate to one another. You can draw them by hand or spend some bucks to get a tool that will draw the pictues and a whole lot more -- but that's only if you're really seriously into the project.)

Good luck.

DJofSD