WinBatch® Technical Support Forum

All Things WinBatch => WebBatch => Topic started by: wolla on March 11, 2019, 03:46:50 AM

Title: mail scraper /crawler / spider
Post by: wolla on March 11, 2019, 03:46:50 AM
Hello from Germany,

is there a possibility to create a code for crwaling mailadresses from google?

Thank you 4 reply
Wolfgang
Title: Re: mail scraper /crawler / spider
Post by: td on March 11, 2019, 07:15:24 AM
WebBatch is a CGI scripting language for Web servers so I am not sure how the connects to crawling email addresses. It is also unclear, to me at least, what you even mean by crawling email addresses.  So perhaps a more detailed explanation of what exactly you are wishing to do is in order.
Title: Re: mail scraper /crawler / spider
Post by: snowsnowsnow on March 11, 2019, 07:28:17 AM
Whatever it is, it doesn't sound good.

But I guess, a man's got to eat.

P.S.  In the past, when people posted wanting to do something obviously bad, they would get a talking to and then the thread would die.  Nowadays, that doesn't seem to happen.
Title: Re: mail scraper /crawler / spider
Post by: wolla on March 11, 2019, 08:13:25 AM
Quote from: td on March 11, 2019, 07:15:24 AM
WebBatch is a CGI scripting language for Web servers so I am not sure how the connects to crawling email addresses. It is also unclear, to me at least, what you even mean by crawling email addresses.  So perhaps a more detailed explanation of what exactly you are wishing to do is in order.

For marketing marketing activities I will create a batch to read mail adresses from web-pages.
I will open a browser like google, enter a key word and in the results I will find the mailto:adress from all listed pages.

- The script must open the google results in programming code
- find the www.addresses of the companies
- open their url and find in the code of the mailto:address
- store the address into a file and move to the next entry

Any idea?

Thanks much for all feedback
Wolfgang
Title: Re: mail scraper /crawler / spider
Post by: td on March 11, 2019, 01:35:28 PM
As previously mentioned WebBatch is a CGI scripting language for Web servers so it is a tool for creating Web content and not for scraping Web content.  WinBatch, on the other hand, can be used to scan (and scrape) Webpages for specific content.  There are legitimate reasons for performing Web scraping and there are multiple examples of doing this in the Tech Database.    That said, many if not most Websites disguise email addresses to prevent exactly the kind of activity you are proposing.   The reason for this should be obvious. 
Title: Re: mail scraper /crawler / spider
Post by: td on March 11, 2019, 01:40:26 PM
Quote from: snowsnowsnow on March 11, 2019, 07:28:17 AM
Whatever it is, it doesn't sound good.

But I guess, a man's got to eat.

P.S.  In the past, when people posted wanting to do something obviously bad, they would get a talking to and then the thread would die.  Nowadays, that doesn't seem to happen.

Actually, it does still happen.  It is just that we prefer not to rush to judgment.
Title: Re: mail scraper /crawler / spider
Post by: stanl on March 11, 2019, 02:44:14 PM
Quote from: td on March 11, 2019, 01:40:26 PM
It is just that we prefer not to rush to judgment.


Could probably work with straight WB webscraping. OP needs to provide keyword(s) as an example so one cold see what is returned and how a mailto address is set up <href:> or something similar.
Title: Re: mail scraper /crawler / spider
Post by: wolla on March 12, 2019, 04:08:07 AM
Thanks all for repl.
at the end web scraping is allowed....
Regards
Wolfgang