Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board


Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board (http://www.paceadvantage.com/forum/index.php)
-   General Handicapping Discussion (http://www.paceadvantage.com/forum/forumdisplay.php?f=2)
-   -   Automating Scratches (http://www.paceadvantage.com/forum/showthread.php?t=159722)

CBYRacer 08-07-2020 05:00 PM

Automating Scratches
 
Does anyone know of a way to pull scratches programatically from Equibase? It appears that they use Captchas to prevent bots from scraping. Any way around this or a different website to use for this?

Speed Figure 08-07-2020 05:09 PM

I simply download this file multiple times per day. https://www.equibase.com/static/late...techanges.html

CBYRacer 08-07-2020 05:52 PM

Quote:

Originally Posted by Speed Figure (Post 2640022)
I simply download this file multiple times per day. https://www.equibase.com/static/late...techanges.html

How do you do it though? Programatically or manually? I get hit with a Captcha screen if I try to scrape it.

Speed Figure 08-07-2020 05:59 PM

I manually download it & my software does all the scratches. Takes about 10 seconds.

Jeff P 08-07-2020 06:17 PM

I've been parsing the xml file linked to separately in the bottom center of the page that Speed Figure linked to.

My first hand experience this year (and yes I'm willing to cut Equibase some slack given the current environment) is that accuracy and quality control have suffered a bit compared to prior years.

If you are going to parse Equibase scratches and changes info - be prepared to do some sanity checking and special handling.

For example, right now as I type this, dirt course track condition for Monmouth is listed as Sloppy. But track condition for Monmouth R6 (which was taken off the turf) is missing from the XML.

EDIT - and now that I took the time to mention track condition for off turf races - I can clearly see that someone at Monmouth has now added an entry to the XML to note track condition for R6 at Monmouth as Sloppy.

Moving on to Penn National... right now as I type this, dirt course track condition is listed as Sloppy. But track condition for races 2, 3, and 4 (which were taken off the turf) are missing from the xml.

Missing track condition for off turf races happens several times a week at some track somewhere - and (Imo) is one of those areas that requires special handling.

At the same time there ARE tracks that make the effort to get correct track condition into the xml for races that are taken off the turf - at a rate bordering on 100% of the time.

Another area that (Imo) requires special handling is rider changes.

A lot of times rider changes in the latter legs of multi-race exotics seem to show up in the xml as an afterthought (well after leg 1 of a multi-race exotic has already gone off.)

Yet a high percentage of the time you can pick up the same rider changes from track video before leg 1 of that same multi-race exotic goes off.

That said --

I think having constantly updated scratches and changes info available in one place (the Equibase xml) is great.

Kudos to Equibase for making it happen. :ThmbUp::ThmbUp:



-jp

.

classhandicapper 08-07-2020 06:52 PM

Quote:

Originally Posted by Speed Figure (Post 2640040)
I manually download it & my software does all the scratches. Takes about 10 seconds.

Same here.

CBYRacer 08-08-2020 10:48 AM

Thanks, guys. I found the RSS feed link at the bottom of the page that you guys were referencing. Really appreciate your help!

Tom 08-08-2020 05:34 PM

How do you guys open the XML file so it is readable?
I tried several things Google suggested,note worked.
Opening with a browser shows all the non-text elements.

Maybe my browsers are all too old?

Speed Figure 08-08-2020 06:39 PM

My software is programmed to read the file and do all the scratches. I simply have to delete it once it’s done and download it again later in the day to get updated scratches from any upcoming races.

headhawg 08-08-2020 06:49 PM

Quote:

Originally Posted by Tom (Post 2640347)
How do you guys open the XML file so it is readable?
I tried several things Google suggested,note worked.
Opening with a browser shows all the non-text elements.

Maybe my browsers are all too old?

You could try XML Notepad. It's old, but it works on my Win7 'puter. It may work for you.

XML Notepad

headhawg 08-08-2020 07:05 PM

Quote:

Originally Posted by CBYRacer (Post 2640225)
Thanks, guys. I found the RSS feed link at the bottom of the page that you guys were referencing. Really appreciate your help!

If you don't want to go the RSS route, you could my HDST program. Just click the "Download Scratch File" button and it will save a file named scratches.xml. HDST

The download is post #45.

classhandicapper 08-09-2020 09:35 AM

Quote:

Originally Posted by Tom (Post 2640347)
How do you guys open the XML file so it is readable?
I tried several things Google suggested,note worked.
Opening with a browser shows all the non-text elements.

Maybe my browsers are all too old?

I open it in excel and then import the excel file into my database.

Jeff P 08-09-2020 01:07 PM

Quote:

Originally Posted by Tom (Post 2640347)
How do you guys open the XML file so it is readable?
I tried several things Google suggested,note worked.
Opening with a browser shows all the non-text elements.

Maybe my browsers are all too old?

I started parsing the xml in 2009 when it first came out. Back then I was using the (then) latest version of Microsoft's xml parser.

I quickly discovered Microsoft's xml parser was using way too much RAM - approx 1.2 gigabytes of the 4.0 gigabytes that my then 2007 machine had.

To me, this seemed ridiculous given that at end of day after all changes had been added the xml files themselves generally averaged about 200 kb in size... and that early in the day when only east coast tracks were running an xml file might only be 70 kb in size.

So I wrote my own xml parser. (Of course it helps that I have a background as a developer.)

That said, there are many different xml parsers out there and (Imo) just about all of them can get the job done.

In its simplest form xml is a standardized schema designed to deliver data wrapped inside of tags (also called nodes.)

The tags or nodes in the file follow a certain order.

At the very top of the document you'll find a tag that looks like this: "<late_changes>" (without the quotes.)

At the very bottom of the document you'll find a closing tag that looks like this: "</late_changes>" (without the quotes.)

Everything between the two tags (the late_changes node) contains data for late changes.

The next row in today's xml file looks like this: "<race_date>08/09/2009</race_date>" (without the quotes.)

As you might intuitively guess, the string text between the "<race_date>" and "</race_date>" tags or the race_date node (without the quotes) is where you'll find the date for all of the changes data found in the current file.

The xml parser that I wrote simply scans the file and reads string text contained between predefined tags (or data contained in predefined nodes.)

Later in the day a track employee or someone working for Equibase will add similar rows for both Albuquerque and Arlington Park... but right now as I type this, the next row in the file looks like this: "<track country="USA" id="CNL" track_name="COLONIAL DOWNS">"

This is the opening track tag for Colonial Downs.

Several rows further down in the file you'll find a closing track tag for Colonial Downs that looks like this: "</track>" (without the quotes.)

Everything between each opening track tag and closing track tag (or within each track node) contains changes data for that track:

Course changes, distance changes, track condition changes, temp rail changes, scratches, rider overweights, horse weights, rider changes, and reported first time geldings, etc.

The xml parser that I wrote simply scans the predefined nodes in the file one track at a time and one race at a time - reading the data contained between each pair of predefined tags or within each node, and writes the data read from each node to a database - where it can then be used to generate a changes report and/or used for live play.

I hope I managed to type most of that out in a way that makes sense,


-jp

.

classhandicapper 08-09-2020 03:33 PM

Quote:

I started parsing the xml in 2009 when it first came out. Back then I was using the (then) latest version of Microsoft's xml parser.
I don't understand why you went through all this.

I just download the XML file, open it in excel, it gets translated and puts each field into neat columns with headings, and then I import it into my database. It literally takes me a minute. I do it once late morning when the east coast tracks I might play come in and then once later in the day if I am going to play west coast tracks.

Where I've had huge difficulty is with the Timeform API. No matter what I did I couldn't translate the XML in a similar fashion and I'm way too lazy at this point to learn how to do it or write a parser. So I threw in the towel.

I find text files way easier to work with anyway.

headhawg 08-09-2020 04:13 PM

Jeff may have needed to use it programmatically. He couldn't tell his users to use Excel, save the file, and then import into JCapper. If it was for personal use, sure, use Excel. But coders like to code.


All times are GMT -4. The time now is 10:10 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved

» Advertisement
» Current Polls
Wh deserves to be the favorite? (last 4 figures)
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 10:10 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.