PDA

View Full Version : How much can a DB calculate?


2low
10-13-2011, 03:09 PM
I've been working off and on on a "black box" of sorts, and it's showing promise. The problem is I programmed the entire thing in excel using VBA, and it is obviously starting to feel the piano on its back.

My process uses a ton of calculations using many variables (multi-layered if/than statements, etc.) None of the calculations are mathematically complex - just addition, subtraction, multiplication,division within long formulas. No square roots or anything like that.

I need to move to a proper database, and I'm thinking MySQL. As a complete novice, I'm wondering what the database would be able to do calculations wise. Would I likely have to use excel for calculations? If so, how easy is it for excel and MySQL to communicate? Is it better do do the calculations using a programming language without the need to export to excel?

Sorry - hard question to phrase correctly when I don't know what I'm talking about:lol: I have absolutely no programming background other than the VBA disaster I created, and no database experience to speak of.

sjk
10-13-2011, 03:20 PM
I use Access and it can do any calculation I need or could imagine. You can build queries and macros that do innumerable calculations. My program is black box; good luck with yours.

2low
10-13-2011, 03:28 PM
I have an old version of access too (2003 maybe?) I could also use that.

sjk
10-13-2011, 03:33 PM
I am still running Access 2000 even though I use the newer versions for work applications. There are risks in upgrading and I never saw a reason to do it.

DJofSD
10-13-2011, 03:52 PM
If the problem with the current data base is one where it is getting slower and slower as the data base grows, then I would say you probably need to redesign the data base.

2low
10-13-2011, 03:55 PM
If the problem with the current data base is one where it is getting slower and slower as the data base grows, then I would say you probably need to redesign the data base.

It's currently excel. Slow is only one of the problems.

guckers
10-13-2011, 04:52 PM
I'm currently building my db in MSSQL, you can get a free version here (http://msdn.microsoft.com/en-us/library/ms365247.aspx).

I plan on building a data driven custom .net app. I have 10 years of db and programming experience, so its not for the faint of heart.

Mysql is another great and free solution, I would suggest you hit the learning curve of mysql before doing it in Access.

Red Knave
10-14-2011, 10:29 AM
Abcess, um I mean, Access and also the free version of MSSQL have database size limitations. They may not be material to you but it's something to be aware of.
I agree with guckers that you focus on MySQL or any of the other free SQL products out there rather than Access/Excel. I don't think you will have any problems with doing the math using their built-in languages.
Also, any SQL that has an ODBC driver will allow you to get at it with Excel if that 's what you really want to do.

sjk
10-14-2011, 11:02 AM
The size limitation requires attention but I have never found it to be a deal-breaker. I have all the races back to 1/1/03 in my main database now (along with hundreds of working tables) and it runs about 1.4 gb (the limit is 2gb in the version I am using).

2low
10-14-2011, 11:11 AM
OK, so it's learning curve with MySQL vs, something somewhat familiar but limited with Access. I'm inclined to go with MySQL just to learn something new.

Is the MySQL Workbench what people generally use for a front end? I can't for the life of me figure out how to even import data files with that thing.

DJofSD
10-14-2011, 11:23 AM
It appears importing is done via their CSV Engine: http://dev.mysql.com/doc/refman/5.1/en/csv-storage-engine.html

dukeofperl
10-14-2011, 01:32 PM
I've been working off and on on a "black box" of sorts, and it's showing promise. The problem is I programmed the entire thing in excel using VBA, and it is obviously starting to feel the piano on its back.

Whenever you're manipulating data that grows in size (over time) you will eventually run into the the piano-on-the-back situation. When that happens, it's usually all downhill from there ... feather ->piano -> elephant -> event horizon -> singularity.

Although you can't make the data-growth go away, you can control it and control how your program processes the data. The best approach is to use data structures, not in-line procedural syntax that simply does one step then moves on to the next step and so forth. Data structures are seriously powerful and can turn a slow and inefficient program into a dynamo.

Not being an expert on VBA, it's my understanding that VBA offers at least some opportunity to build data structures either directly built into the program ( arrays and collections) or available as add-on classes.

Your other option (and your best IMO) would be to use a data structure-centric language such as C, C++, Perl (my favorite), Python, Ruby et al that can offer real data structure relief (arrays, arrays of arrays, hashes, hashes of hashes, arrays of hashes, hashes of arrays, slices, linked lists, binary trees, stacks, queues) to do the heavy lifting and then have VBA source the results. Using any of these languages, properly coded, will show demonstrative improvement in runtime results. And they, for the most part clean up after themselves, hence no memory leaks. And they're all free.

My process uses a ton of calculations using many variables (multi-layered if/than statements, etc.)

I suspect this is where your problem is ... multiple if-thens (or any nested conditionals) can quickly add baggage to a program.

I need to move to a proper database, and I'm thinking MySQL. As a complete novice, I'm wondering what the database would be able to do calculations wise. Would I likely have to use excel for calculations? If so, how easy is it for excel and MySQL to communicate? Is it better do do the calculations using a programming language without the need to export to excel?

MySQL would be my choice ... hands down. My next choice (also hands down) would be to drive MySQL with one of the languages I already mentioned (Perl, Ruby, Python) using the DBI (databae interface) and DBD (database driver) that is appropriate for whichever language you choose. And I would apply this to both the front-end and back-end.

This is fairly easy programming as the classes do most of the work -- all your program needs to do is call the correct method in the class with the correct arguments (correct syntax helps :D) and things happen -- no fuss, no muss.

Sorry - hard question to phrase correctly when I don't know what I'm talking about:lol: I have absolutely no programming background other than the VBA disaster I created, and no database experience to speak of.

All of these techniques, are relatively easy to accomplish. Given you already know one language (VBA -- self-taught???) says you can learn a little bit (or a whole lot) about an additional language. Especially since the Internet offers free 101 classes in nearly everything a person needs to know these days -- complete with examples, including data structures.

One more point and another personal, hands-down choice ... use Linux.

Just my 2 cents -- please excuse the bandwidth.

JustRalph
10-14-2011, 02:18 PM
there is handicapping software available that might be a ton easier to learn and do even more than you want........

2low
10-14-2011, 02:19 PM
Wow - thanks for all the info everybody. I think I'll give MySQL an honest attempt. It appears it will be quite a learning curve at first, but that's half the original point of my black box anyway - to pick up some programming skill.

dukeofperl - thanks for all that. Yes, my VBA is self taught. I'm sure a real programmer would puke if they saw what I did:D, but the goal was to push a button and have excel handicap every race card in the country, give me a wager upload sheet, and record yesterday's results. It does that. It just takes about 5 minutes per track and gives me goofy error messages way more often than I'd like. Time to move on. I like math and enjoyed building my program, so I'm hoping I'll be able to learn what I need fairly quickly.

dukeofperl
10-14-2011, 03:18 PM
... I'm sure a real programmer would puke if they saw what I did:D...

Been there done that so I can fully appreciate how you feel.

As a morale booster, keep in mind, the definition of a good program is one that works and you get it done before your boss fires you.

No more -- no less.

Best of luck with your programming experiences and should you choose to do any of your program in Perl and get stuck ... pm me.

michiken
10-14-2011, 04:10 PM
I use mysql and php on linux............. I like the price (free).

If you have the time to read the mysql manual, it would be worth while to use. Since it is command line operated (although some front ends exist), I feel that it gives you more freedom than Access i,e. fields can be longer than 255 chars.

Here is some code that may help you get an idea on how to import to mysql using php... it also creates a mysql table if it does not exist.



<?php
// ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++
// Terminval Velocity Handicapping Software for Linux
// File: _tv-autoimport.php date: 11-12-05
// Usage: This is a sub module for _tv-handicap.php used by TermV.
// This module will open the DRF file and import the race data into a MYSQL Database.
// ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++

// ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++
// Create the racelog database table to hold the general race information if it does not exist.
// ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++
$_sql = "select * from $_racelog";
$_result = mysql_query($_sql);
if (empty($_result)) {
$_sql = "create table $_racelog ( ";
$_sql .= "_id int(5) not null primary key auto_increment, ";
$_sql .= "_track varchar(22) not null, ";
$_sql .= "_date date not null, ";
$_sql .= "_race int(2) not null, ";
$_sql .= "_dist varchar(4) not null, ";
$_sql .= "_surf varchar(4) not null, ";
$_sql .= "_cond text not null, ";
$_sql .= "_wagers text not null );";
mysql_query($_sql) or die('TermV was not able to able to create a racelog table.');
}

// ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++
// Create an individual DRF table for each racing day. This table wild hold all the horses pacelines.
// ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++
$_sql = "select * from $_drftable";
$_result = mysql_query($_sql);
if (empty($_result)) {
$_sql = "create table $_drftable ( ";
$_sql .= "_rid int(5) not null primary key auto_increment, ";
$_sql .= "_hide varchar(1) not null, ";
$_sql .= "_track varchar(12) not null, ";
$_sql .= "_date date not null, ";
$_sql .= "_race int(2) not null, ";
$_sql .= "_dist decimal(4,2) not null, ";
$_sql .= "_pgno varchar(12) not null, ";
$_sql .= "_post varchar(12) not null, ";
$_sql .= "_odds varchar(12) not null, ";
$_sql .= "_pline int(4) not null, ";
$_sql .= "_horse text not null, ";
$_sql .= "_esp varchar(12) not null, ";
$_sql .= "_power int(6) not null, ";
$_sql .= "_ldate date not null, ";
$_sql .= "_ltrack varchar(12) not null, ";
$_sql .= "_ldist decimal(2,2) not null, ";
$_sql .= "_lsurf varchar(12) not null, ";
$_sql .= "_lcond varchar(12) not null, ";
$_sql .= "_finish int (2) not null, ";
$_sql .= "_fig02 int(4) not null, ";
$_sql .= "_fig04 int(4) not null, ";
$_sql .= "_fig06 int(4) not null, ";
$_sql .= "_fig08 int(4) not null, ";
$_sql .= "_fig10 int(4) not null, ";
$_sql .= "_figat int(4) not null, ";
$_sql .= "_figap int(4) not null, ";
$_sql .= "_figsp int(4) not null, ";
$_sql .= "_figtt int(4) not null, ";
$_sql .= "_figlp int(4) not null, ";
$_sql .= "_figtv int(4) not null, ";
$_sql .= "_figen int(4) not null, ";
$_sql .= "_figff int(4) not null );";
mysql_query($_sql) or die("TermV was not able to able to create a the new table for $_drftable");
} else {
$_sql = "delete from $_drftable;";
mysql_query($_sql);
$_sql = "delete from $_racelog where _track = '$_drftable';";
mysql_query($_sql);
}

// ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++
// Import the Race Data from the DRF File.
// ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++
$_import_racedata = file("$_drfexists");
// Loop through the file and extract desired info
$_racecount = 1;
$_row = str_replace('\"','',$_row);
foreach ($_import_racedata as $_row) {
$_field = explode(',',$_row);
$_count = count($_field);
if ($_count > 1435) {
$_total = strlen($_row);
for($_x = 0; $_x <= $_total; $_x++ ) {
$_before = substr($_row,$_x,1);
$_after = substr($_row,$_x+1,1);
if(ereg(',',$_after) and ereg('^([a-zA-Z])$',$_before)) {
$_row = substr_replace($_row,'', $_x+1, 1);
}
}
}
$_field = explode(',',$_row);
$_count = count($_field);
//Check and Eliminate Scratched Horses.
$_entry = str_replace('"','',$_field[4]);
if ($_entry <> 'S') {
// ++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++
// Get Track and Race Info
// ++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++
$_track = strtolower(str_replace('"','',($_field[0]))); // Get Racetrack Abbreviation
$_date = str_replace('"','',$_field[1]); // Get Race date
$_race = str_replace('"','',$_field[2]); // Get Race Number
$_dist = str_replace('"','',$_field[5]); // Get Distance of Race
$_dist = abs(round((($_dist*3)/660),2)); // Convert Distance in Yards to Furlongs
$_surf = str_replace('"','',$_field[6]); // Get Todays Running Surface
$_cond = ucwords(strtolower(str_replace('"','',$_field[15]))); // Get Race Conditions
$_len = strlen($_cond);
$_cond = addslashes(rtrim(substr($_cond,10,$_len)));
$_repl = array('Con','Cun','Cum','Bun','Bum','Bon');
foreach($_repl as $_var) {
$_cond = str_replace($_var,'',$_cond);
}
// Get Wager Types
$_wagers = addslashes(ucwords(strtolower(str_replace('"','',$_field[239] . ' ' . $_field[240] . ' ' . $_field[241] . ' ' . $_field[242] . ' ' . $_field[243] . ' ' . $_field[244]))));
// Insert general race info into the racelog table
if ($_race >= $_racecount) {
$_sql = "insert into $_racelog set _track = '$_drftable', _date = '$_date', _race = $_race, _dist = '$_dist', ";
$_sql .= "_surf = '$_surf', _cond = '$_cond', _wagers = '$_wagers'";
mysql_query($_sql) or die("TermV was not able to able to insert the general race data into the $_drftable table.");
$_racecount++;
}

// ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++
// Get Horse General Info
// ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++
$_horse = addslashes(ucwords(strtolower(str_replace('"','',$_field[44])))); // Get Horse Name
$_pgno = str_replace('"','',$_field[42]); // Get Program Number
$_odds = str_replace('.00','',$_field[43]); // Get Morining Line Odds
$_odds = str_replace('.0','',$_odds);
$_odds = str_replace('.50','.5',$_odds);
$_post = str_replace('"','',$_field[3]); // Get Post Position
$_esp = rtrim(str_replace('"','',$_field[209])); // Get Running Style
$_power = str_replace('"','',$_field[250]); // Get Prime Power Rating

//++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++ +
// Get Pacelines
// ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++
for ($_i=0; $_i<=$_maxpacelines; $_i++) {
$_ldate = str_replace('"',"",$_field[254+$_i]);
if (!empty($_ldate)) {
$_pline = $_i;
$_ltrack = ucwords(strtolower(str_replace('"',"",$_field[274+$_i])));
$_lcond = strtolower(str_replace('"',"",$_field[304+$_i]));
$_ldist = str_replace('"',"",$_field[314+$_i]);
$_ldist = str_replace('-','',round(( ($_ldist * 3) / 660 ),2));
$_lsurf = strtolower(str_replace('"',"",$_field[324+$_i]));
$_fig02 = str_replace('"',"",$_field[764+$_i]);
$_fig04 = str_replace('"',"",$_field[774+$_i]);
$_fig06 = str_replace('"',"",$_field[784+$_i]);
$_fig08 = str_replace('"',"",$_field[794+$_i]);
$_fig10 = str_replace('"',"",$_field[804+$_i]);
$_figlp = str_replace('"',"",$_field[814+$_i]);
$_figsp = str_replace('"',"",$_field[844+$_i]);
// Insert data into database
$_sql = "insert into $_drftable set ";
$_sql .= "_hide = '1', ";
$_sql .= "_track = '$_track', ";
$_sql .= "_date = '$_date', ";
$_sql .= "_race = '$_race', ";
$_sql .= "_dist = '$_dist', ";
$_sql .= "_pgno = '$_pgno', ";
$_sql .= "_post = '$_post', ";
$_sql .= "_horse = '$_horse', ";
$_sql .= "_odds = '$_odds', ";
$_sql .= "_pline = '$_pline', ";
$_sql .= "_esp = '$_esp', ";
$_sql .= "_power = '$_power', ";
$_sql .= "_ldate = '$_ldate', ";
$_sql .= "_ltrack= '$_ltrack', ";
$_sql .= "_ldist = '$_ldist', ";
$_sql .= "_lsurf = '$_lsurf', ";
$_sql .= "_lcond = '$_lcond', ";
$_sql .= "_fig02 = '$_fig02', ";
$_sql .= "_fig04 = '$_fig04', ";
$_sql .= "_fig06 = '$_fig06', ";
$_sql .= "_fig08 = '$_fig08', ";
$_sql .= "_fig10 = '$_fig10', ";
$_sql .= "_figsp = '$_figsp', ";
$_sql .= "_figlp = '$_figlp'";
mysql_query($_sql) or die("TermV was not able to able to insert the DRF data into the $_drftable table.");
}
}
} // End If
} // End Foreach

DeltaLover
10-14-2011, 07:28 PM
So, you store the entire DRF in the db?
In my case (using SQLServer as my main db engine and mongo for some minor experiments) I found it better to just keep them in text format and populate my object model directly from there..

I am interested to see how mysql behaves as the drf files universe expands... (both from performance and sizing scopes of view)...

sjk
10-15-2011, 07:04 AM
Using Access I currently have about half a million races and 3 1/2 million race line records in my working database and performance has not been an issue. As was pointed out before the size issue comes into play more than any performance issue.

All of this is hugely dependent on your structure. It is important to break things into pieces; if you don't you will immediately run into things that do not perform in any reasonable time.

I would think the SQL type database programs would do as well or better.

DJofSD
10-15-2011, 09:10 AM
All of this is hugely dependent on your structure. It is important to break things into pieces; if you don't you will immediately run into things that do not perform in any reasonable time.

I would think the SQL type database programs would do as well or better.
A couple of comments in order to clarify things. Some will know this already.

First, these data bases are relational data bases and the tool used to query the data base is Structured Query Language (SQL).

The structure is defined by the components used to create the relational data base. The components are called tables. You can think of a table as a collection of data elements called columns. Each instance of a collection of data items in those columns is called a row.

How well your data base performs and how easy it is to ask the question to get the answer depends on how you define the tables and create the relations between the tables. The important concept to understand is called normal form. Your data base should be made up of tables that are in 3rd normal form.

Those are fundamental aspects of relational data bases. Tools exist to help design data bases and are typically using the entity-relation (ER) model.

sjk
10-15-2011, 09:32 AM
I will add some more about data structure for those few who might be interested.

In my database there is a table with results of past races and another which shows the running line of each horse that ran in those races. Very nearly everything else in the db comes from that information but it is useful to have many additional tables storing information that is calculated from the two basic tables, such as a table of track variants or par times.

Consider the question: Which of the horses running in a certain race today had a better finish in his last race than in the one before?

If you try to answer this directly from the two basic tables you will get bogged down but if you have the right additional tables available it becomes a snap.

guckers
10-15-2011, 12:31 PM
Great comments on data structure as this is the foundation of your system and you will either live or die by it. Another approach to increasing the performance of your database (whether it is in Access, MySQL, etc..) is to create aggregate tables that have done all of the heavy processing in advance. For example, running a query that is counting the total wins of each horse can take a long time to run if you have millions of horses. Instead, do a job that performs this for you every time you update your data and have the results dumped in to an aggregate table which you can directly query.

Native Texan III
10-15-2011, 05:57 PM
Some other suggestions:

1. As a free database try SQLite - it holds everything in fast memory and does all the calculations you need without disk read and writes. Very fast and simple.

2. Rent some space on a commercial webserver and put your database on that. They have MySQL at the server end and can do all the calculation there - just sending you the database query answers.

3. MySQL have a utility (MySQL Migration Toolkit) to import Access and Excel etc directly. Download is on Oracle's website as stand alone download. You can always convert your Excel files into comma separated variable (CSV) files though and import those one at a time into any database.

vegasone
10-16-2011, 05:03 PM
If you are familiar with VBA and want try MYSQL you should(may) be able to use their free Visual Basic Express with an interface to MYSQL or MSSQL. I still use an older version of VB as a front end to MYSQL or MSSQL or ACCESS. Write all your programs in VB if you are more familiar with that.