mail scraper /crawler / spider

Started by wolla, March 11, 2019, 03:46:50 AM

Previous topic - Next topic

wolla

Hello from Germany,

is there a possibility to create a code for crwaling mailadresses from google?

Thank you 4 reply
Wolfgang

td

WebBatch is a CGI scripting language for Web servers so I am not sure how the connects to crawling email addresses. It is also unclear, to me at least, what you even mean by crawling email addresses.  So perhaps a more detailed explanation of what exactly you are wishing to do is in order.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

snowsnowsnow

Whatever it is, it doesn't sound good.

But I guess, a man's got to eat.

P.S.  In the past, when people posted wanting to do something obviously bad, they would get a talking to and then the thread would die.  Nowadays, that doesn't seem to happen.

wolla

Quote from: td on March 11, 2019, 07:15:24 AM
WebBatch is a CGI scripting language for Web servers so I am not sure how the connects to crawling email addresses. It is also unclear, to me at least, what you even mean by crawling email addresses.  So perhaps a more detailed explanation of what exactly you are wishing to do is in order.

For marketing marketing activities I will create a batch to read mail adresses from web-pages.
I will open a browser like google, enter a key word and in the results I will find the mailto:adress from all listed pages.

- The script must open the google results in programming code
- find the www.addresses of the companies
- open their url and find in the code of the mailto:address
- store the address into a file and move to the next entry

Any idea?

Thanks much for all feedback
Wolfgang

td

As previously mentioned WebBatch is a CGI scripting language for Web servers so it is a tool for creating Web content and not for scraping Web content.  WinBatch, on the other hand, can be used to scan (and scrape) Webpages for specific content.  There are legitimate reasons for performing Web scraping and there are multiple examples of doing this in the Tech Database.    That said, many if not most Websites disguise email addresses to prevent exactly the kind of activity you are proposing.   The reason for this should be obvious. 
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

Quote from: snowsnowsnow on March 11, 2019, 07:28:17 AM
Whatever it is, it doesn't sound good.

But I guess, a man's got to eat.

P.S.  In the past, when people posted wanting to do something obviously bad, they would get a talking to and then the thread would die.  Nowadays, that doesn't seem to happen.

Actually, it does still happen.  It is just that we prefer not to rush to judgment.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

stanl

Quote from: td on March 11, 2019, 01:40:26 PM
It is just that we prefer not to rush to judgment.


Could probably work with straight WB webscraping. OP needs to provide keyword(s) as an example so one cold see what is returned and how a mailto address is set up <href:> or something similar.

wolla

Thanks all for repl.
at the end web scraping is allowed....
Regards
Wolfgang