PDA

View Full Version : People with big databases


GameTheory
07-09-2002, 07:59 PM
This is addressed to those of you that manage your own database of horsey info.

How big is your database?

What db system and supplementary tools do you use to manage it?

Etc.


I'm building a whopper of a db and wondering what people have found is the best way to manage such a thing.


I know a number of you use MS Access. Anyone using anything else?

BillW
07-09-2002, 08:06 PM
Originally posted by GameTheory
This is addressed to those of you that manage your own database of horsey info.

How big is your database?

What db system and supplementary tools do you use to manage it?

Etc.


I'm building a whopper of a db and wondering what people have found is the best way to manage such a thing.


I know a number of you use MS Access. Anyone using anything else?

Using postgresql here ... rock solid (DB is not huge though, just over 20,000 races)

Supplementary tools are mostly of my own creation although standard SQL clients exist.

Bill

ranchwest
07-10-2002, 07:10 AM
I've only got about 5,500 races. I'm using dBase files and write programs in Clipper.

Tim
07-10-2002, 03:37 PM
GT,

I've got approx 93,000 charts, 820,000 running lines and 1,400,000 workouts. I have an Access XP frontend with a SQL Server 7 backend. I have had to redesign the data schema and batch update processes a number of times over the years because of performance issues. In fact, I'm in the middle of yet another redesign. Your chartgrabber and parser are great tools. I'm estatic about the variety and volume of data that's now available to me. That's the good news. The bad news is that the additional data is causing the DSS side of the system to perform poorly.

Tim

GameTheory
07-10-2002, 04:07 PM
I'm on a Windows ME system, which unfortunately leaves out postgresql. After checking out my (free) options, I am going with MySQL for now. I'm going to have nearly *2 million* running lines when I'm finished building it (I've got about 1.4 mil in there now). And of course it will grow all the time. Performance is tolerable -- I'm not sure what to compare it with.

One limitation of the current version of MySQL is that you can't do "subselects" in your SQL statements which can make it a pain to get to what you want. Are all you db guys defacto programmers as well?

Does anybody "make their own pps" or are you strictly doing stats and general research? My goal is to be able to do both.

BillW
07-10-2002, 04:34 PM
GT,

Haven't used MySQL much, but only real rap I have found is that it is not as compliant to SQL std as postgresql (I think mostly stuff missing as opposed to being in opposition to the std.) Reliably running your database at the size target you expect (2M running Lines) probably has more to do with windows handling it than MySQL so can't comment there.

A few other points, SCSI makes a world of difference in performance on my system. The serial nature of IDE chokes under high load that even a single db user presents and I tend to find myself running multiple apps concurrently often. Your mileage may vary though, I'm not familiar with performance issues of the disk sub-system on windows.

I'm also not sure of MySQL's networking ability ... I use the ability of socket access with postgresql to run stuff on multiple machines and find it very handy (Sometimes I go a bit overboard!). If this is an issue, you may want to look into this also.

Best of luck,

Bill

GameTheory
07-10-2002, 04:53 PM
BillW --

All of those things are definitely issues, but MySQL seemed the best choice even with the compromises. MySQL also is being improved all the time -- they will be supporting things like subselects probably within a year.

Right now it is just a single-user system. I think it does choke on the IDE, but MySQL is built for speed otherwise. I am suprised by how relatively small the db is -- about 800MB total at the moment. (Each table is a file, and the indexes are separate also). I don't think Windows will have a problem till it is near 2GB.

Some of the other db systems I tried really expanded the data size quite a bit more.

If I get a second machine, I will run Linux on that one. For that reason I want to use a cross-platform system.


Basically, I am forced to write my own horse-racing specific client to access the db to be able to get at what I want easily. (MySQL has an API in a .dll so I can write programs that access it directly, which is nice.)

BillW
07-10-2002, 05:49 PM
Originally posted by GameTheory
BillW --

All of those things are definitely issues, but MySQL seemed the best choice even with the compromises. MySQL also is being improved all the time -- they will be supporting things like subselects probably within a year.

Right now it is just a single-user system. I think it does choke on the IDE, but MySQL is built for speed otherwise. I am suprised by how relatively small the db is -- about 800MB total at the moment. (Each table is a file, and the indexes are separate also). I don't think Windows will have a problem till it is near 2GB.

Some of the other db systems I tried really expanded the data size quite a bit more.

If I get a second machine, I will run Linux on that one. For that reason I want to use a cross-platform system.


Basically, I am forced to write my own horse-racing specific client to access the db to be able to get at what I want easily. (MySQL has an API in a .dll so I can write programs that access it directly, which is nice.)

GT

I wasn't trying to talk you out of MySQL, just bringing up issues I came across as I went through the same thing as you are now. Sounds like you've done your homework though :)

Re. writing your own app ... the only way to go. I have found the libraries very well written and easy to use (I have C, Python, and even some Perl running on this db). I have no reason to believe you won't find the same thing with MySQL. ... have fun

Bill

rrbauer
07-11-2002, 12:42 AM
60,000 horses with all of their races and workouts for the past 2.5 years (purged annually to remove the 3rd year back). Horse info is kept in one table. Running info, workout info, sire info, dam info each in their own tables and appropriately keyed. Also, trip notes are in a separate table keyed to horse and race date. All home grown from programs that I wrote with data updates from BRIS and DRF downloads. Started building the stuff in 1991 and just keep enhancing as I get new ideas or when better data comes along. Just checked the directory that all the data is in and it's at 47mb in size. (Remember your first 40mb disk drive that you could "never" fill up?)

Increased processor speeds and increased disk capacities just keep bailing me out!

BillW
07-11-2002, 12:58 AM
Richard,

Your DB mgr is homegrown also?

GameTheory
07-11-2002, 01:32 AM
Originally posted by rrbauer
60,000 horses with all of their races and workouts for the past 2.5 years (purged annually to remove the 3rd year back). Horse info is kept in one table. Running info, workout info, sire info, dam info each in their own tables and appropriately keyed. Also, trip notes are in a separate table keyed to horse and race date. All home grown from programs that I wrote with data updates from BRIS and DRF downloads. Started building the stuff in 1991 and just keep enhancing as I get new ideas or when better data comes along. Just checked the directory that all the data is in and it's at 47mb in size. (Remember your first 40mb disk drive that you could "never" fill up?)



47mb for 60,000 horses? That seems pretty low.

rrbauer
07-11-2002, 12:06 PM
Game Theory wrote:

47mb for 60,000 horses? That seems pretty low.

Based on? Everything is compressed when it's written. The largest table is workouts which is currently 22.8 mb and "starts" is 15 mb. Some of the horses no longer have current activity as far as works/starts and could be removed. I use a 5-digit field for horse number and I'm up in the 83,000 range for new numbers. When that crosses the 90,000 threshold I plan to do a wholesale purge on the horse file, renumber and reindex everything.

BTW...When Racing Times was around, their db system was all on Fox Base (Now FoxPro I think) and they sold that to DRF and DRF used it for some time after the acquisition because it was superior to what DRF had. Don't know what DRF does now.

BillW wrote:
Your DB mgr is homegrown also?

The I/O routines were part of a package (circa 1990). All of the program logic, etc., is homegrown and in turn I access the I/O routines when I want to read/write/delete/update. Everything runs in a DOS session.

NNMan
07-12-2002, 12:08 AM
We're running Unisys Mapper for Windows. DB is up to 115 Mb on back-up. Not sure if that includes a compression, but I think it does. I would have to do a query to get total # races, horses, etc. All
overlying code is Mapper based language. No slowdown experienced yet with this DB.

Cheers...

GameTheory
07-12-2002, 01:11 AM
Mapper?

Isn't that the one they wrote about in HorsePlayer mag a long time ago? (First year of the mag.) As I remember, they started advertising in the magazine after the database guy wrote an article about how great it was.


Richard --

Based on the fact that I've using 800MB for roughly 118K horses & 3 years of data (about half of that is index files), and every other system I tried made it even bigger! The data size (not including index files) in MySQL is roughly the same size as it was in plain text raw format before I imported it all. Now I included everything there is to include in the data, except for workouts cause I don't have those.

If I was only using 100mb I'm sure I could go much faster...