Update to sample script for listing links from web page?

Started by stevengraff, October 22, 2013, 01:14:30 PM

Previous topic - Next topic

stevengraff

This: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE+WebPage~Link~Lister.txt

is a script that's supposed to list all of a web page's links. I think. Anyway, I can't get it to run. Maybe I just don't know how. When I run it in debug, I get two steps in to the first udf, then it hangs:

window1=cWndByWndSpec("IEFrame","IEXPLORE",4,40965,9999,40961,0)

and all I can do is kill it.

Is there an update to this script? or another simple way to list a page's links?

I'm trying this on a Server 08 R2 machine in a terminal server session if that makes a difference.

Deana

That code was written back in 2004. The cWndByWndSpec is processor intensive. It can take quite a while to timeout. The code will need to be updated for modern windows platforms and browser.
Deana F.
Technical Support
Wilson WindowWare Inc.

Deana

Here is some undebugged revised code:


Code (winbatch) Select

#DefineFunction udfIEPageLoadWait(objIE)
    While !(objIE.readyState == "complete" || objIE.readyState == 4 )
Timedelay(0.1)           
    EndWhile
    While !(objIE.document.readyState == "complete" || objIE.document.readyState == 4 )
Timedelay(0.1)
    EndWhile
    Return
#EndFunction

#DefineFunction udfGetCurrentURL(objIE)
  If !udfIsObject(objIE)
      Pause("udfIECreate","Not a Valid Object")
exit
EndIf
   url = objIE.LocationURL
return url
#EndFunction

#DefineFunction udfIsObject(obj)
   Return(VarType(obj)>=1024)
#EndFunction


#DefineFunction udfListLinks(objBrowserDoc)
udfLinks = objBrowserDoc.Links
udfLinkList=Strcat("Number: ",udfLinks.Length)
udfnumberofLinks = udfLinks.Length - 1
for x = 0 to udfnumberofLinks
udfLinks = objBrowserDoc.Links(x)
If itemlocate(udfLinks.href,StrReplace(udfLinkList,@CRLF," ")," ")==0 then udfLinkList=strCat(udfLinkList,@CRLF,udfLinks.href);
next
Return udfLinkList
#EndFunction

#DefineFunction udfListAnchors(objBrowserDoc)
udfanchors = objBrowserDoc.anchors
udfAnchorList=Strcat("Number: ",udfanchors.length)
udfnumberofAnchors = udfanchors.length - 1
for x = 0 to udfnumberofAnchors
udfanchor = objBrowserDoc.anchors(x)
If itemlocate(udfanchor.name,StrReplace(udfAnchorList,@CRLF," ")," ")==0 then udfAnchorList=strCat(udfAnchorList,@CRLF,udfanchor.name)
next
Return udfAnchorList
#EndFunction

#DefineFunction udfListImages(objBrowserDoc)
udfImages = objBrowserDoc.Images
udfImageList=Strcat("Number: ",udfImages.Length)
udfnumberofImages = udfImages.Length - 1
for x = 0 to udfnumberofImages
udfImage = objBrowserDoc.Images(x)
udfAltText=udfImage.alt
if udfAltText=="" then udfAltText="  "
udfSource=udfimage.src
;udfSize=udfimage.size
;breakpoint
If itemlocate(udfSource,StrReplace(udfimagelist,@CRLF," ")," ")==0 then udfImageList=strCat(udfImageList,@CRLF,udfSource," --- '",udfAltText,"'")
next
Return udfImageList
#EndFunction

#DefineFunction udfURLPageBody(objBrowserDoc)
;debug(1)
udfBody = objBrowserDoc.body
udfTextOnly=udfbody.innertext
udfPageHTML=udfbody.innerhtml
udfContents=StrCat("<><><> Content <><><>",@CRLF,udfTextOnly,@CRLF,"<><><> HTML <><><>",@CRLF,udfPageHTML)
; udfImageList=Strcat("Number: ",udfImages.Length)
; udfnumberofImages = udfImages.Length - 1
; for x = 0 to udfnumberofImages
; udfImage = objBrowserDoc.Images(x)
; udfAltText=udfImage.alt
; if udfAltText=="" then udfAltText="  "
; udfSource=udfimage.src
; ;breakpoint
; If itemlocate(udfSource,StrReplace(udfimagelist,@CRLF," ")," ")==0 then udfImageList=strCat(udfImageList,@CRLF,udfSource," --- '",udfAltText,"'")
; next
Return udfContents
#EndFunction



objIE = ObjectCreate("InternetExplorer.Application")
objIE.visible = @True
objIE.navigate('http://www.google.com')
udfIEPageLoadWait(objIE)
url = udfGetCurrentURL(objIE)
pause(0,url)

objBrowserDoc = objIE.Document

ListLinks=udfListLinks(objBrowserDoc)
Message("ListLinks",ListLinks)

ListAnchors=udfListAnchors(objBrowserDoc)
Message("ListAnchors",ListAnchors)

ListImages=udfListImages(objBrowserDoc)
Message("ListImages",ListImages)

URLBody=udfURLPageBody(objBrowserDoc)
Message("URLBody",URLBody)



Deana F.
Technical Support
Wilson WindowWare Inc.

stevengraff


Deana

Here is another swipe at it...


Code (winbatch) Select
;***************************************************************************
;**         
;**          Web Page Scraper
;**
;** Purpose: Extract information from a webpage
;** Inputs:  url
;** Outputs: Messages containing data
;**
;** Deana Falk
;** Revisions: 2013.10.22 Initial Release
;**
;**
;***************************************************************************


#DefineFunction udfIEPageLoadWait( objIE )
    If !udfIsObject(objIE)
      Pause('udfIEPageLoadWait','Not a Valid Object')
exit
EndIf
    While !(objIE.readyState == 'complete' || objIE.readyState == 4 )
Timedelay(0.1)           
    EndWhile
    While !(objIE.document.readyState == 'complete' || objIE.document.readyState == 4 )
Timedelay(0.1)
    EndWhile
    Return 1
#EndFunction

#DefineFunction udfIECreate( strUrl )
   objIE = ObjectCreate( 'InternetExplorer.Application')
   If !udfIsObject(objIE)
      Pause('udfIECreate','Not a Valid Object')
exit
EndIf
   objIE.visible = @True
   objIE.navigate( strUrl )
   udfIEPageLoadWait( objIE )
   Return objIE
#EndFunction

#DefineFunction udfIEAttach( strMode, strString )
   ; strMode
   ;   title strString is the title of the page you are trying to access
   ;   url   strString is the url of the page you are trying to access
   ;   text  strString is some text of the page you are trying to access
   ;   html  strString is some html of the page you are trying to access
   strMode = StrLower(strMode)
  objShell = ObjectCreate('Shell.Application')
objShellWindows = objShell.Windows(); collection of all ShellWindows (IE and File Explorer)
   ;ForEach objWindow In objShellWindows
   For x = 0 To objShellWindows.count-1
      objWindow = objShellWindows.Item(x)
; Check  window object is a valid browser, if not, skip it
bIsBrowser = @True
; Check conditions to verify that the object is a browser
If bIsBrowser
         ErrorMode(@off)
ret = objWindow.type ; Is .type a valid property?
         ErrorMode(@cancel)
         if ret == 0 then bIsBrowser = @False
EndIf
If bIsBrowser
         ErrorMode(@off)
ret = objWindow.document.title ; Does object have a .document and .title property?
      ErrorMode(@cancel)
         if ret == 0 then bIsBrowser = @False
EndIf
If bIsBrowser
Switch @True
Case strMode =='title'
If StrIndex( strString, objWindow.document.title, 1, @Fwdscan ) > 0
Return objWindow
EndIf
               break
Case strMode =='url'
If Strindex(objWindow.LocationURL, strString, 1 , @Fwdscan) > 0
   Return objWindow
EndIf
               break
Case strMode =='text'
If StrIndex(objWindow.document.body.innerText, strString, 1 ,@Fwdscan) > 0
   Return objWindow
EndIf
               break
Case strMode =='html'
If StrIndex(objWindow.document.body.innerHTML, strString) > 0
Return objWindow
EndIf
               break
Case strMode ; Invalid Mode
               Pause('udfIEAttach','Invalid Mode Specified')
Exit
EndSwitch
EndIf
Next
Return 0
#EndFunction

#DefineFunction udfGetURL( objIE )
  If !udfIsObject( objIE )
      Pause('udfIECreate','Not a Valid Object')
exit
EndIf
   strUrl = objIE.LocationURL
return strUrl
#EndFunction

#DefineFunction udfIsObject( obj )
   Return(VarType(obj)>=1024)
#EndFunction


#DefineFunction udfListLinks( objIE )
   If !udfIsObject(objIE)
      Pause('udfListLinks','Not a Valid Object')
exit
EndIf
   objBrowserDoc = objIE.Document
objLinks = objBrowserDoc.Links
strLinkList = ''
numberofLinks = objLinks.Length - 1
for x = 0 to numberofLinks
objLinks = objBrowserDoc.Links(x)
      If strLinkList == '' then strLinkList = objLinks.href
      Else strLinkList = strLinkList : @TAB :objLinks.href
next
Return strLinkList
#EndFunction

#DefineFunction udfListAnchors( objIE )
   If !udfIsObject( objIE )
      Pause('udfListAnchors','Not a Valid Object')
exit
EndIf
   objBrowserDoc = objIE.Document
objAnchors = objBrowserDoc.anchors
strAnchorList = ''
numberofAnchors = objAnchors.length - 1
for x = 0 to numberofAnchors
objAnchor = objBrowserDoc.anchors(x)
If strAnchorList == '' then strAnchorList = objAnchor.name
      Else strAnchorList = strAnchorList : @TAB :objAnchor.name
next
Return strAnchorList
#EndFunction

#DefineFunction udfListImages( objIE )
   If !udfIsObject( objIE )
      Pause('udfListImages','Not a Valid Object')
exit
EndIf
   objBrowserDoc = objIE.Document
objImages = objBrowserDoc.Images
strImageList = ''
numberofImages = objImages.Length - 1
for x = 0 to numberofImages
objImage = objBrowserDoc.Images(x)
strAltText = objImage.alt
if strAltText=='' then strAltText='  '
strSource = objImage.src
;nSize=objImage.size
      If strImageList == '' then strImageList = strSource : ' --- ' : strAltText
      Else strImageList = strImageList : @TAB : strSource : ' --- ' : strAltText
next
Return strImageList
#EndFunction

#DefineFunction udfGetBody(objIE, nOption)
   If !udfIsObject( objIE )
      Pause( 'udfGetBody', 'Not a Valid Object' )
exit
EndIf
   objBrowserDoc = objIE.Document
objBody = objBrowserDoc.Body
   Switch nOption
      Case 0   
      strContents = objBody.innertext
         break
      Case 1
      strContents = objBody.innerhtml
         break
      case nOption
         Pause( 'udfGetBody', 'Invalid Option' )
         Return 0
      break
   EndSwitch
Return strContents
#EndFunction


strUrl = 'http://www.winbatch.com/'
objIE = udfIECreate( strUrl )
if objIE == 0
   Pause('udfIECreate','Unable to create browser')
   Exit
Endif

; Attach to existing browser with this url
;objIE = udfIEAttach('url', 'http://www.winbatch.com/')
;if objIE == 0
;   Pause('udfIEAttach','Unable to locate browser using this mode')
;   Exit
;Endif

url = udfGetURL( objIE )
pause( 'Current Url', url )

ListLinks = udfListLinks( objIE )
AskItemList( 'ListLinks', ListLinks, @tab, @unsorted, @single )

ListAnchors = udfListAnchors( objIE )
AskItemList( 'ListAnchors', ListAnchors, @tab, @unsorted, @single )

ListImages = udfListImages( objIE )
AskItemList( 'ListImages', ListImages, @tab, @unsorted, @single )

URLBodyTxt = udfGetBody( objIE, 0 ) ;inner text
Pause( 'Body Inner Text', URLBodyTxt )

URLBodyHTML = udfGetBody( objIE, 1 ) ;inner html
Pause( 'Body Inner HTML', URLBodyHTML )




Deana F.
Technical Support
Wilson WindowWare Inc.

stevengraff

Is there, somewhere, a bible or ubertutorial for programmatic web page reading and manipulation?

Is this still the one?:
http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+Tutorials+Working~With~Web~Pages.txt

Or has it been superseded?

Deana

Quote from: stevengraff on October 23, 2013, 04:54:36 AM
Is there, somewhere, a bible or ubertutorial for programmatic web page reading and manipulation?

Is this still the one?:
http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+Tutorials+Working~With~Web~Pages.txt

Or has it been superseded?

That specific tutorial focuses on all the various methods for interacting with webpages.  The code above used MSIE COM interface. It might be nice to one day put together a tutorial that focuses on only this method.
Deana F.
Technical Support
Wilson WindowWare Inc.

stevengraff

What's a good method for learning more about the MSIE COM interface? Is there some additional reference documentation somewhere at Microsoft's web site?



stevengraff

Thanks Jim.

Btw, maybe it's just me, but it seems like a wealth of information has somehow gone missing. I tried searching both in the new forum and the old, for variations on words like "submit" "form" "web" looking for articles on how to programmatically submit a web form, and found nothing. I could swear there used to be examples of this, using both GET and POST methods.

JTaylor

I think there is a good example in the WinInet Extender for submitting a form if you need to use GET or POST and someone recently posted a way to "click" a button if you want to automate a web page to do the submission.  Here is some clicking code for easy reference.  It will handle the regular click as well as the Event method.



Code (winbatch) Select
   objEvent = yobrowser.document.createEvent("HTMLEvents")
    i = BROWSER_OBJECT.document.GetElementById("ID_ATTRIBUTE")
    If objEvent == "" Then
      i.click
    Else
      objEvent.initEvent("click", @true, @true)
      i.dispatchEvent(objEvent)
    EndIf



Jim



stevengraff