Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Thoroughbred Horse Racing Discussion > General Handicapping Discussion


Reply
 
Thread Tools Rate Thread
Old 08-07-2020, 05:00 PM   #1
CBYRacer
Registered User
 
Join Date: Jun 2020
Posts: 178
Automating Scratches

Does anyone know of a way to pull scratches programatically from Equibase? It appears that they use Captchas to prevent bots from scraping. Any way around this or a different website to use for this?
CBYRacer is offline   Reply With Quote Reply
Old 08-07-2020, 05:09 PM   #2
Speed Figure
DJ M.Walk
 
Speed Figure's Avatar
 
Join Date: Aug 2002
Location: Compton, CA!
Posts: 2,066
I simply download this file multiple times per day. https://www.equibase.com/static/late...techanges.html
Speed Figure is offline   Reply With Quote Reply
Old 08-07-2020, 05:52 PM   #3
CBYRacer
Registered User
 
Join Date: Jun 2020
Posts: 178
Quote:
Originally Posted by Speed Figure View Post
I simply download this file multiple times per day. https://www.equibase.com/static/late...techanges.html
How do you do it though? Programatically or manually? I get hit with a Captcha screen if I try to scrape it.
CBYRacer is offline   Reply With Quote Reply
Old 08-07-2020, 05:59 PM   #4
Speed Figure
DJ M.Walk
 
Speed Figure's Avatar
 
Join Date: Aug 2002
Location: Compton, CA!
Posts: 2,066
I manually download it & my software does all the scratches. Takes about 10 seconds.
Speed Figure is offline   Reply With Quote Reply
Old 08-07-2020, 06:17 PM   #5
Jeff P
Registered User
 
Jeff P's Avatar
 
Join Date: Dec 2001
Location: JCapper Platinum: Kind of like Deep Blue... but for horses.
Posts: 5,257
I've been parsing the xml file linked to separately in the bottom center of the page that Speed Figure linked to.

My first hand experience this year (and yes I'm willing to cut Equibase some slack given the current environment) is that accuracy and quality control have suffered a bit compared to prior years.

If you are going to parse Equibase scratches and changes info - be prepared to do some sanity checking and special handling.

For example, right now as I type this, dirt course track condition for Monmouth is listed as Sloppy. But track condition for Monmouth R6 (which was taken off the turf) is missing from the XML.

EDIT - and now that I took the time to mention track condition for off turf races - I can clearly see that someone at Monmouth has now added an entry to the XML to note track condition for R6 at Monmouth as Sloppy.

Moving on to Penn National... right now as I type this, dirt course track condition is listed as Sloppy. But track condition for races 2, 3, and 4 (which were taken off the turf) are missing from the xml.

Missing track condition for off turf races happens several times a week at some track somewhere - and (Imo) is one of those areas that requires special handling.

At the same time there ARE tracks that make the effort to get correct track condition into the xml for races that are taken off the turf - at a rate bordering on 100% of the time.

Another area that (Imo) requires special handling is rider changes.

A lot of times rider changes in the latter legs of multi-race exotics seem to show up in the xml as an afterthought (well after leg 1 of a multi-race exotic has already gone off.)

Yet a high percentage of the time you can pick up the same rider changes from track video before leg 1 of that same multi-race exotic goes off.

That said --

I think having constantly updated scratches and changes info available in one place (the Equibase xml) is great.

Kudos to Equibase for making it happen.



-jp

.
__________________
Team JCapper: 2011 PAIHL Regular Season ROI Leader after 15 weeks
www.JCapper.com

Last edited by Jeff P; 08-07-2020 at 06:32 PM.
Jeff P is offline   Reply With Quote Reply
Old 08-07-2020, 06:52 PM   #6
classhandicapper
Registered User
 
classhandicapper's Avatar
 
Join Date: Mar 2005
Location: Queens, NY
Posts: 20,523
Quote:
Originally Posted by Speed Figure View Post
I manually download it & my software does all the scratches. Takes about 10 seconds.
Same here.
__________________
"Unlearning is the highest form of learning"
classhandicapper is offline   Reply With Quote Reply
Old 08-08-2020, 10:48 AM   #7
CBYRacer
Registered User
 
Join Date: Jun 2020
Posts: 178
Thanks, guys. I found the RSS feed link at the bottom of the page that you guys were referencing. Really appreciate your help!
CBYRacer is offline   Reply With Quote Reply
Old 08-08-2020, 05:34 PM   #8
Tom
The Voice of Reason!
 
Tom's Avatar
 
Join Date: Mar 2001
Location: Canandaigua, New york
Posts: 112,446
How do you guys open the XML file so it is readable?
I tried several things Google suggested,note worked.
Opening with a browser shows all the non-text elements.

Maybe my browsers are all too old?
__________________
Who does the Racing Form Detective like in this one?
Tom is offline   Reply With Quote Reply
Old 08-08-2020, 06:39 PM   #9
Speed Figure
DJ M.Walk
 
Speed Figure's Avatar
 
Join Date: Aug 2002
Location: Compton, CA!
Posts: 2,066
My software is programmed to read the file and do all the scratches. I simply have to delete it once it’s done and download it again later in the day to get updated scratches from any upcoming races.
Speed Figure is offline   Reply With Quote Reply
Old 08-08-2020, 06:49 PM   #10
headhawg
crusty old guy
 
headhawg's Avatar
 
Join Date: Aug 2003
Location: Snarkytown USA
Posts: 3,909
Quote:
Originally Posted by Tom View Post
How do you guys open the XML file so it is readable?
I tried several things Google suggested,note worked.
Opening with a browser shows all the non-text elements.

Maybe my browsers are all too old?
You could try XML Notepad. It's old, but it works on my Win7 'puter. It may work for you.

XML Notepad
__________________
"Don't believe everything that you read on the Internet." -- Abraham Lincoln
headhawg is offline   Reply With Quote Reply
Old 08-08-2020, 07:05 PM   #11
headhawg
crusty old guy
 
headhawg's Avatar
 
Join Date: Aug 2003
Location: Snarkytown USA
Posts: 3,909
Quote:
Originally Posted by CBYRacer View Post
Thanks, guys. I found the RSS feed link at the bottom of the page that you guys were referencing. Really appreciate your help!
If you don't want to go the RSS route, you could my HDST program. Just click the "Download Scratch File" button and it will save a file named scratches.xml. HDST

The download is post #45.
__________________
"Don't believe everything that you read on the Internet." -- Abraham Lincoln
headhawg is offline   Reply With Quote Reply
Old 08-09-2020, 09:35 AM   #12
classhandicapper
Registered User
 
classhandicapper's Avatar
 
Join Date: Mar 2005
Location: Queens, NY
Posts: 20,523
Quote:
Originally Posted by Tom View Post
How do you guys open the XML file so it is readable?
I tried several things Google suggested,note worked.
Opening with a browser shows all the non-text elements.

Maybe my browsers are all too old?
I open it in excel and then import the excel file into my database.
__________________
"Unlearning is the highest form of learning"
classhandicapper is offline   Reply With Quote Reply
Old 08-09-2020, 01:07 PM   #13
Jeff P
Registered User
 
Jeff P's Avatar
 
Join Date: Dec 2001
Location: JCapper Platinum: Kind of like Deep Blue... but for horses.
Posts: 5,257
Quote:
Originally Posted by Tom View Post
How do you guys open the XML file so it is readable?
I tried several things Google suggested,note worked.
Opening with a browser shows all the non-text elements.

Maybe my browsers are all too old?
I started parsing the xml in 2009 when it first came out. Back then I was using the (then) latest version of Microsoft's xml parser.

I quickly discovered Microsoft's xml parser was using way too much RAM - approx 1.2 gigabytes of the 4.0 gigabytes that my then 2007 machine had.

To me, this seemed ridiculous given that at end of day after all changes had been added the xml files themselves generally averaged about 200 kb in size... and that early in the day when only east coast tracks were running an xml file might only be 70 kb in size.

So I wrote my own xml parser. (Of course it helps that I have a background as a developer.)

That said, there are many different xml parsers out there and (Imo) just about all of them can get the job done.

In its simplest form xml is a standardized schema designed to deliver data wrapped inside of tags (also called nodes.)

The tags or nodes in the file follow a certain order.

At the very top of the document you'll find a tag that looks like this: "<late_changes>" (without the quotes.)

At the very bottom of the document you'll find a closing tag that looks like this: "</late_changes>" (without the quotes.)

Everything between the two tags (the late_changes node) contains data for late changes.

The next row in today's xml file looks like this: "<race_date>08/09/2009</race_date>" (without the quotes.)

As you might intuitively guess, the string text between the "<race_date>" and "</race_date>" tags or the race_date node (without the quotes) is where you'll find the date for all of the changes data found in the current file.

The xml parser that I wrote simply scans the file and reads string text contained between predefined tags (or data contained in predefined nodes.)

Later in the day a track employee or someone working for Equibase will add similar rows for both Albuquerque and Arlington Park... but right now as I type this, the next row in the file looks like this: "<track country="USA" id="CNL" track_name="COLONIAL DOWNS">"

This is the opening track tag for Colonial Downs.

Several rows further down in the file you'll find a closing track tag for Colonial Downs that looks like this: "</track>" (without the quotes.)

Everything between each opening track tag and closing track tag (or within each track node) contains changes data for that track:

Course changes, distance changes, track condition changes, temp rail changes, scratches, rider overweights, horse weights, rider changes, and reported first time geldings, etc.

The xml parser that I wrote simply scans the predefined nodes in the file one track at a time and one race at a time - reading the data contained between each pair of predefined tags or within each node, and writes the data read from each node to a database - where it can then be used to generate a changes report and/or used for live play.

I hope I managed to type most of that out in a way that makes sense,


-jp

.
__________________
Team JCapper: 2011 PAIHL Regular Season ROI Leader after 15 weeks
www.JCapper.com

Last edited by Jeff P; 08-09-2020 at 01:22 PM.
Jeff P is offline   Reply With Quote Reply
Old 08-09-2020, 03:33 PM   #14
classhandicapper
Registered User
 
classhandicapper's Avatar
 
Join Date: Mar 2005
Location: Queens, NY
Posts: 20,523
Quote:
I started parsing the xml in 2009 when it first came out. Back then I was using the (then) latest version of Microsoft's xml parser.
I don't understand why you went through all this.

I just download the XML file, open it in excel, it gets translated and puts each field into neat columns with headings, and then I import it into my database. It literally takes me a minute. I do it once late morning when the east coast tracks I might play come in and then once later in the day if I am going to play west coast tracks.

Where I've had huge difficulty is with the Timeform API. No matter what I did I couldn't translate the XML in a similar fashion and I'm way too lazy at this point to learn how to do it or write a parser. So I threw in the towel.

I find text files way easier to work with anyway.
__________________
"Unlearning is the highest form of learning"
classhandicapper is offline   Reply With Quote Reply
Old 08-09-2020, 04:13 PM   #15
headhawg
crusty old guy
 
headhawg's Avatar
 
Join Date: Aug 2003
Location: Snarkytown USA
Posts: 3,909
Jeff may have needed to use it programmatically. He couldn't tell his users to use Excel, save the file, and then import into JCapper. If it was for personal use, sure, use Excel. But coders like to code.
__________________
"Don't believe everything that you read on the Internet." -- Abraham Lincoln
headhawg is offline   Reply With Quote Reply
Reply




Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 04:15 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.