How to code an algorithm that creates a RegEx expression if provided with two identical elements on a website?

bernard

BuSo Pro
Joined
Dec 31, 2016
Messages
2,599
Likes
2,303
Degree
6
How difficult would it be to code an algorithm that can create its own RegEx expression if provided with two identical elements on a website? For use in scraping price on websites.
 
How difficult would it be to code an algorithm that can create its own RegEx expression if provided with two identical elements on a website? For use in scraping price on websites.

Normally it's not all that hard, but depends on how complex the regex is. For scraping prices on websites it's typically not all that difficult because most stores are laid out in a sane, predictable, fashion. RegEx might not even be the best choice honestly there are lots of different ways to scrape a website, xPath can work really well, or you run in to random things like goquery which help out scraping a ton because you can just use css selectors like in jQuery.
 
Normally it's not all that hard, but depends on how complex the regex is. For scraping prices on websites it's typically not all that difficult because most stores are laid out in a sane, predictable, fashion. RegEx might not even be the best choice honestly there are lots of different ways to scrape a website, xPath can work really well, or you run in to random things like goquery which help out scraping a ton because you can just use css selectors like in jQuery.

Hey man, I know how to scrape a little, what I need is something else. There's a plugin I used to have that would be able to figure out regex expressions just from selecting 2 prices from 2 unique products. Do I make sense? Like it would ask for product url 1 and price 1, then product url 2 and price 2, and then you'd be able to scrape prices without writing any regex yourself. So there had to be some kind of smart detection going on behind the scenes.
 
Hmm that's interesting I've never seen anything like that, at least not for regex.

However something along those lines are Chrome and Firefox both have dev tools. Right click the page and open your dev tools, from the Inspector tab (Firefox), or Elements tab (Chrome) you can right click any element and then copy it's XPath. From there it can be pretty simple to just use XPath to scrape. My guess is that is probably what it was, but you never know.
 
I don't know, probably wasn't regex then :smile: it did and does work though, do you want me to send you a link to the plugin?
 
Back