PDA

View Full Version : reading the comma delimited drf file


formula_2002
04-18-2005, 08:14 AM
One of the programs I run is similar to visual basic.
I ‘m writing a program that reads the drf comma delimited file.
There are over 1400 fields in the cd file. I only need to read a few of the fields.

The problem is, the program has to read and count every field to get to the fields I want.

That process takes a few seconds for each horse.
Any way to speed up the process?


Thanks
Joe M
(forever trying!! :bang: )

DJofSD
04-18-2005, 09:40 AM
Quit VB.

DJofSD

Jeff P
04-18-2005, 12:05 PM
The problem isn't VB. I use it to read all 1435 fields for each horse- and do so for every horse in a typical data file- and it happens pretty fast- just under 2.5 seconds. That's every horse in the file. What you are describing is very likely caused by excessive recursion or multiple nested loops. If you start nesting your loops by placing them inside of each other- you can slow down any programming environment/platform to a crawl- C, Java, SQL, Access, Excel, VB.NET, and... yes VB- it doesn't matter. A good programming practice to remember is to avoid nested loops whenever possible.


.

Steve 'StatMan'
04-18-2005, 12:18 PM
Comma delimited DRF files. 1435 fields? You must be using something other than DRF Formulator. The Formulator Exports break out into I think 7 separate files - if the data you're looking for is on just 1 or 2 of those 7 files, you could just read those specific files. Of course, recalling your studies Joe, I think you're probably looking at tens of thousands of races or even race cards - even 2.5 seconds a race card times 10,000 race cards is a lot of time.

If you have some basic data that you always use, you might want to make a stripped down version of the file to just the fields you use most often, and read those smaller files for your frequent runs. That way, each run only needs to read a few fields per horse instead of 1435. Might be worth a shot for very large amounts of data.

cj
04-18-2005, 01:20 PM
I think the DRF file Joe is referring to is the BRIS file. There is no way it should take more than a couple seconds to read in the entire file on a fairly new machine. How long does it take my program to run on your computer Joe? Needless to say, its reading in every field and doing a lot more, and it only takes maybe two seconds on my computer.

rrbauer
04-18-2005, 03:53 PM
One of the programs I run is similar to visual basic.
I ‘m writing a program that reads the drf comma delimited file.
There are over 1400 fields in the cd file. I only need to read a few of the fields.

The problem is, the program has to read and count every field to get to the fields I want.

That process takes a few seconds for each horse.
Any way to speed up the process?


Thanks
Joe M
(forever trying!! :bang: )

You're writing the program? In what language? Is your process doing an extract of selected data from the file that you will use in other programs? I have been using the BRIS multi-record DRF data files for years. (When you unzip the download file you get four (4) files with .DRF .DR2 .DR3 and .DR4 filenames.) My program reads the files and creates an extract file (formatted) that I use for other processing. It takes maybe 5 seconds per track to open, read, extract and write the data I need. I have a similar program that does the same thing with data from a couple of the DRF Formulator files from their export process. Again, lickety-split and it's done. These are compiled, .EXE programs that run in a MS-DOS task. The only thing complex about the read statement is that it has to be able to match-up the data stream with a format expression that tells it which delimited fields to read and their data type and which fields to skip over. Even if you were bringing in the entire record and then using a subroutine to parse it to get the specific data items you wanted that should be nothing in a pentium-class machine.

So if you can redo your problem statement to better define the nature of the file you're working with (e.g. BRIS DRF Single-record or Multi-record) and what data fields you're trying to get to, and what programming tool you're using maybe we can help you out.

PS. If you're trying this in Java LOL!!

formula_2002
04-18-2005, 04:04 PM
I think the DRF file Joe is referring to is the BRIS file. There is no way it should take more than a couple seconds to read in the entire file on a fairly new machine. How long does it take my program to run on your computer Joe?

Instantaneously.

I'm sure the problem is with me. I'm using win task and it may not be suited for what I'm doing.

cj
04-18-2005, 04:07 PM
What fields do you want? Make a list, I can do it pretty quick. Of course, I haven't gotten around to adding BRIS Prime Power to your program either, so maybe not that quick! ;)

richrosa
04-18-2005, 08:01 PM
i use PHP to parse and write the DRF/BRIS data to a MySQL database. The whole thing takes 2 maybe 3 seconds at best. I parse the entire file storing all the data, as you'll never know when you'll need it.

formula_2002
04-18-2005, 08:53 PM
i use PHP to parse and write the DRF/BRIS data to a MySQL database. The whole thing takes 2 maybe 3 seconds at best. I parse the entire file storing all the data, as you'll never know when you'll need it.

Thanks to everyone for your thoughts.
The problem with the software(perhaps the programmer), is the program must count every “,”. There are 1435 ‘,” in a line. Apparently, I can only count by “looping”.

In a 8 horse field for a 9 race card, that’s 72 lines, or about 3 minuets per card.

I’ll develop the program further and see if I can improve upon it.

I can always run it at night. I don’t need it for handicapping, just for analysis.

PaceAdvantage
04-19-2005, 01:41 AM
I'm using win task and it may not be suited for what I'm doing.

I think you've answered your own question!

hurrikane
04-19-2005, 07:30 AM
I'm with richrosa on this one

if using vb I'd just ram it all into an array and pull what I needed when I needed it. Much better than trying to do it piece-meal.

Do it in a module(even better, com/dll) and you can call it any time you want to pull in data without having to rewrite code.

hcap
04-19-2005, 08:37 AM
Joe,

If your interested in using Excel, I have a set up that imports full TSN cards (.eg), and corresponding result files (.xrd). Yes you can import over 256 fields. You must put Data files in one directory and results files in another. Still experimenting. Tested 30 cards. Importing data files and results files for all 30 took about 1 minute. Then pulling out @ 20 identifying fields, and @ 20 complex calculated fields took about .1 second per horse. There were about 2500 horse lines with 1400+fields. But not all 1400 fields were used in the calculated fields. Even though they are all stored. in 1400+ cells as excel reads each horse. Using more will slow excel down. It makes a table to then analyse further. If you want to fool around with it let me know.

BTW in vba there is a split function that allows you to pull out selective fields. No looping required. But when I set it up in Excel even after dumping it into an temp array was still slower than what I eventually wound up using. The brute force approach. Evidentally if you use built in objects coded in I think C, it is much faster. Takes a larger workbook though.

Light
04-19-2005, 10:56 AM
If anyone here has questions about Excel,go to:

http://excelforum.com

Similar format to PA.Many experts there at VBA. Solved several problems I had with my Excel horseracing program. For VBA use the "Excel programming" forum. If you present the code you've written,someone will show you a better,more efficient one.

Zaf
04-19-2005, 10:59 AM
Thanks Light :)

ZAFONIC

formula_2002
04-19-2005, 11:30 AM
Joe,

If your interested in using Excel, I have a set up that imports full TSN cards (.eg), and corresponding result files (.xrd). Yes you can import over 256 fields.

No that is interesting.. can you expand a bit futher on it.

thanks

Joe M

formula_2002
04-19-2005, 11:44 AM
would this do the job?
Borland C++ Builder 6 Personal

cj
04-19-2005, 02:06 PM
If you go the C++ route, I'll send you the source code I have written.

richrosa
04-19-2005, 02:15 PM
I have made this offer before, with no takers. I'd be willing to open source my PHP classes for importing BRIS into MySQL database which will give you the ability to import files so that you can write your own handicapping program on top of it. I would do it through SorceForge and treat it like any other open source project.

formula_2002
04-19-2005, 02:27 PM
No that is interesting.. can you expand a bit futher on it.

thanks

Joe M

that should have read "NOW THAT IS INTERESTING...."

formula_2002
04-19-2005, 02:29 PM
If you go the C++ route, I'll send you the source code I have written.


CJ, I DONT FIND C++ route on the Borland web site.
Whose product is it ?.

cj
04-19-2005, 04:04 PM
You can find the Borland C++ compiler free on their site. I'm still using one I bought over 6 years ago when you had to pay, I think it costs me $100. It's been worth every penny. I haven't tried the free version, so I can't really help you out there.

http://www.borland.com/products/downloads/download_cbuilder.html#

Here are a bunch more on this page, I may have to check them out myself.

http://www.thefreecountry.com/compilers/cpp.shtml

Speed Figure
04-19-2005, 05:10 PM
Which one do you download?

cj
04-19-2005, 05:12 PM
For Borland, it would be the compiler. It looks like you have to register first. I haven't looked at the others yet, will do tomorrow.

hcap
04-19-2005, 05:59 PM
Joe,

You can't use normal importing, will only do 256 fields and may not even tell you there are more fields.

Use Workbooks.Open Filename instead. This will open the entire file putting all fields for each horse in one cell. I believe the number of charaters allowed in one cell is around 32,000. Number of charcaters for one horse record in TSN is usually 6000+

Then copy one line at a time to another cell. Or you can use a vba statement to let the target cell = the data cell. What I do is to set up rows of formulas to find the location of each comma in this "giant cell" Believe it or not 6 rows of 256 formulas.

Like in cell A10 =SEARCH(",",($C$1))where C1 is the target cell.
Then in cell B10 =SEARCH(",",($C$1),A10+1). And in sequence across the page, in cell C10 =SEARCH(",",($C$1),B10+1).

=SEARCH(",",($C$1)) =SEARCH(",",($C$1),A10+1) =SEARCH(",",($C$1),B10+1)
=SEARCH(",",($C$1),IV10+1) =SEARCH(",",($C$1),A11+1) =SEARCH(",",($C$1),B11+1)
=SEARCH(",",($C$1),IV11+1) =SEARCH(",",($C$1),A12+1) =SEARCH(",",($C$1),B12+1)
=SEARCH(",",($C$1),IV12+1) =SEARCH(",",($C$1),A13+1) =SEARCH(",",($C$1),B13+1)
=SEARCH(",",($C$1),IV13+1) =SEARCH(",",($C$1),A14+1) =SEARCH(",",($C$1),B14+1)
=SEARCH(",",($C$1),IV14+1) =SEARCH(",",($C$1),A15+1) =SEARCH(",",($C$1),B15+1)


These formulas continue to column IV

6 rows x 256 columns is over 1500 formulas. Each determining where each succesive comma is located. Then I contruct 6 cells, one under the other...
A37 is the max number in first row of formulas
B37 is max in second row of formulas and so forth. =+A38-A37 is how many fields in second row

Col A ColB
=MAX(10:10) =+A38-A37
=MAX(11:11) =+A39-A38
=MAX(12:12) =+A40-A39
=MAX(13:13) =+A41-A40
=MAX(14:14) =+A42-A41
=MAX(15:15)

These are the new cells each having no more than 256 fields

=MID(($C$1),1,A37-1)
=MID(($C$1),A37+1,B37-1)
=MID(($C$1),A38+1,B38-1)
=MID(($C$1),A39+1,B39-1)
=MID(($C$1),A40+1,B40-1)
=MID(($C$1),A41+1,B41-1)

I convert these six cells to values then I use TextToColumns to parse each of the six cells simultaneously.

The 1400 formulas calculate almost instantaneously. The parsing is super fast. So by using the Excel "built ins", things go much faster than vba.
I am using an athlon 1500 with 256M of ram. Excel 2000.

Each horse is read in about 1/10 of a second. I could speed it up some.All fields are available.
I realize Excel is not the best to do this, but allows all sort of complex calcs after the data is in. And been using Excel for years.
Should work with any csv file from any data provider. I don't believe renaming is necessary. But fields will be different. Still experimenting.

Que
04-20-2005, 07:11 AM
formula_2000,

I'm not sure this will help, but use the Split command in visual basic instead of using a loop to seperate the fields or to count the number "," on a line. For example,

Dim vArray as variant
vArray = Split(mydata, ",")
numberofelements=ubound(vArray)

Then if you need information from fields 4, 8, 13, etc. use the following code, while remembering that the returned array is zero-based, i.e. the first field is vArray(0):

myfileld1 = vArray(3)
myfileld2 = vArray(7)
myfileld3 = vArray(12)
etc....

Que.

cj
04-20-2005, 07:14 AM
You have to make sure you don't count commas inside the quotations marks. BRIS is good about not using those, they use ;s, but Formulator is not. They also don't mind throwing an occasional ,, inside the quotes in the owner or trip comments field.

formula_2002
04-21-2005, 07:04 AM
Joe,

Then copy one line at a time to another cell. Or you can use a vba statement to let the target cell = the data cell. different.

Thanks for the time hcap, I'll look into it, but the above sounds very time intensive.

formula_2002
04-21-2005, 07:09 AM
formula_2000,

I'm not sure this will help, but use the Split command in visual basic instead of using a loop to seperate the fields or to count the number "," on a line. For example,


Que.

My software, "WinTask" does not have that command.
I think I have a way speeding things up by reading all of the consecutive fields I don't require as a single array.

Thanks for your time

Joe M

formula_2002
04-21-2005, 07:15 AM
For Borland, it would be the compiler. It looks like you have to register first. I haven't looked at the others yet, will do tomorrow.


Cj as you know, go no further. Thanks again for all the information you sent along.

By the way, when I logged into Borland, I was alread registered. I guess it goes back to the time I first bought their Dbase, too many years ago.. It goes as far back as my Prodigy days

Joe M

hcap
04-21-2005, 07:27 AM
Joe,

I can send you what I already have. Uses TSN data files and and TSN xrd result files. Cleaning it up some now. Should work with bris single file format, but would require remapping field locations.

"Then copy one line at a time to another cell. Or you can use a vba statement to let the target cell = the data cell", is not very time intensive. The most time intensive part of the setup is locating comma positions.All in all takes about 1/10 sec per horse. Since you know excel, you should be able to modify it any way you want.