Checking for existence of a string in a webpage

Started by hdsouza, November 17, 2014, 09:12:31 AM

Previous topic - Next topic

hdsouza

I am able to get a handle to the webpage , but when I try to get the HTML contents to file , either I get an error or no text file is generated

This is specific to any youtube url: Example http://www.youtube.com/watch?v=oZRh6J9ezfw
My objective is to check for a string on the youtube page
What am I doing wrong. Please help
Thanks



FileDelete("c:\temp\page_check.txt")
GoSub GetMSIE
timedelay(1)
FilePut(("c:\temp\page_check.txt"), msie.document.GetElementsByTagName("HTML").item(0).outerHTML); get the HTML content to file
Exit

:GetMSIE
   Shell = ObjectOpen("Shell.Application")
   For x = 0 To Shell.Windows.count-1
      swc = Shell.Windows.count-1
      ErrorMode(@OFF)
      swi = Shell.Windows.item(x)
      If !swi
         ErrorMode(@CANCEL)
         Continue
      EndIf
      swif =  Shell.Windows.item(x).fullname
      If !StrIndexNC(swif, "iexplore.exe", 1, @FWDSCAN)
         ErrorMode(@CANCEL)
         Continue
      EndIf

      msie=Shell.Windows.item(x)
      URL = msie.LocationURL
      URL_Valid = StrIndexNC(URL, "youtube", 1, @FWDSCAN)
      If !URL_Valid
         ErrorMode(@CANCEL)
         Continue
      Else
         Break
      EndIf
      ErrorMode(@CANCEL) 
   Next
Return


td

Have you tried 'innerHTML' instead of 'outerHTML'?  You should also make sure you have a valid object before you try to get it's interHTML.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

hdsouza

Quote from: td on November 17, 2014, 01:27:08 PM
Have you tried 'innerHTML' instead of 'outerHTML'?  You should also make sure you have a valid object before you try to get it's interHTML.

I tried Innerhtml and that did not work too
Is there another way to get the contents of a page similar to https://www.youtube.com/watch?v=oZRh6J9ezfw or how would you go about doing it?

JTaylor

I believe it is something related to IE Security.  I can't pull the information either but if I put youtube in my trusted sites list it still fails at default settings but if I change the level to Low then it works but first prompts me to allow it to open the site.  Not sure this is helpful other than to say I don't think your problem is your code...although in my various attempts I ended up using something slightly different so can't vouch for that statement with certainty.

Jim

hdsouza

Thanks Jim. Good to know that I am not the only one seeing the same thing

I have an roundabout way to check for the existence of a string by using

ForEach node in msie.document.body.all
     node_innertext = node.innertext
    ; check for the existence of a string on a specific node
Next


.. although this is a slow process and I was hoping there was an easier way to extract the code as we did for webapges
Not sure in winbatch support has any ideas

td

"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

hdsouza

Quote from: td on November 18, 2014, 10:13:13 AM
Quote from: hdsouza on November 18, 2014, 09:23:11 AM
Not sure in winbatch support has any ideas

We have plenty of ideas...

TD, Not sure, but why are you being cryptic. If you have a suggestion please go ahead and suggest it. But saying "We have plenty of ideas" does not fix a problem. Also this is not the first time you have been cryptic or sarcastic

Also dont let the newbie status fool you. I have been doing winbatch for several years now.. and I am sure you see a newbie status and think the person is dumb !! After the new forum was introduced all my posts that I had made in the earlier system disappeared off my name.. what was that.. about a year back?. So I guess my status changed after that

Is Marty still around?? He was great at understanding problems and providing solutions

td

Just trying to be humorous...  As in, you didn't say good ideas.

I am sorry to say that Mary is no longer around.  He was very good at guessing problems and coming up with solutions.  We all can't be Marty and we certainly don't have the resources to devote a lot of time to coaxing out the necessary details that Marty could devote to such endeavors.

There are may ways to solve your problem but which one works for you depends to a great degree on information you have not provided. For example, you could try to limit your search to only nodes that could possibly have the text you need to look for or you could dump the pages contents to string variable, binary buffer or file using the WinInet extender.  If any of these would work for you is unknown because details are missing.  And if they might then there are examples in the Tech Database that can be adapted to a specific case.

The first step to solving a problem is to define it and recognizing the important details connected to a problem is often the quickest way to a solution.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

Another idea of dubious merit is to search the string returned by

Code (winbatch) Select
str = msie.Document.Body.innerHtml()
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

hdsouza

Quote from: td on November 18, 2014, 02:42:20 PM
Another idea of dubious merit is to search the string returned by

Code (winbatch) Select
str = msie.Document.Body.innerHtml()

Worked Great. Thanks TD