This: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE+WebPage~Link~Lister.txt (http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE+WebPage~Link~Lister.txt)
is a script that's supposed to list all of a web page's links. I think. Anyway, I can't get it to run. Maybe I just don't know how. When I run it in debug, I get two steps in to the first udf, then it hangs:
window1=cWndByWndSpec("IEFrame","IEXPLORE",4,40965,9999,40961,0)
and all I can do is kill it.
Is there an update to this script? or another simple way to list a page's links?
I'm trying this on a Server 08 R2 machine in a terminal server session if that makes a difference.
That code was written back in 2004. The cWndByWndSpec is processor intensive. It can take quite a while to timeout. The code will need to be updated for modern windows platforms and browser.
Here is some undebugged revised code:
#DefineFunction udfIEPageLoadWait(objIE)
While !(objIE.readyState == "complete" || objIE.readyState == 4 )
Timedelay(0.1)
EndWhile
While !(objIE.document.readyState == "complete" || objIE.document.readyState == 4 )
Timedelay(0.1)
EndWhile
Return
#EndFunction
#DefineFunction udfGetCurrentURL(objIE)
If !udfIsObject(objIE)
Pause("udfIECreate","Not a Valid Object")
exit
EndIf
url = objIE.LocationURL
return url
#EndFunction
#DefineFunction udfIsObject(obj)
Return(VarType(obj)>=1024)
#EndFunction
#DefineFunction udfListLinks(objBrowserDoc)
udfLinks = objBrowserDoc.Links
udfLinkList=Strcat("Number: ",udfLinks.Length)
udfnumberofLinks = udfLinks.Length - 1
for x = 0 to udfnumberofLinks
udfLinks = objBrowserDoc.Links(x)
If itemlocate(udfLinks.href,StrReplace(udfLinkList,@CRLF," ")," ")==0 then udfLinkList=strCat(udfLinkList,@CRLF,udfLinks.href);
next
Return udfLinkList
#EndFunction
#DefineFunction udfListAnchors(objBrowserDoc)
udfanchors = objBrowserDoc.anchors
udfAnchorList=Strcat("Number: ",udfanchors.length)
udfnumberofAnchors = udfanchors.length - 1
for x = 0 to udfnumberofAnchors
udfanchor = objBrowserDoc.anchors(x)
If itemlocate(udfanchor.name,StrReplace(udfAnchorList,@CRLF," ")," ")==0 then udfAnchorList=strCat(udfAnchorList,@CRLF,udfanchor.name)
next
Return udfAnchorList
#EndFunction
#DefineFunction udfListImages(objBrowserDoc)
udfImages = objBrowserDoc.Images
udfImageList=Strcat("Number: ",udfImages.Length)
udfnumberofImages = udfImages.Length - 1
for x = 0 to udfnumberofImages
udfImage = objBrowserDoc.Images(x)
udfAltText=udfImage.alt
if udfAltText=="" then udfAltText=" "
udfSource=udfimage.src
;udfSize=udfimage.size
;breakpoint
If itemlocate(udfSource,StrReplace(udfimagelist,@CRLF," ")," ")==0 then udfImageList=strCat(udfImageList,@CRLF,udfSource," --- '",udfAltText,"'")
next
Return udfImageList
#EndFunction
#DefineFunction udfURLPageBody(objBrowserDoc)
;debug(1)
udfBody = objBrowserDoc.body
udfTextOnly=udfbody.innertext
udfPageHTML=udfbody.innerhtml
udfContents=StrCat("<><><> Content <><><>",@CRLF,udfTextOnly,@CRLF,"<><><> HTML <><><>",@CRLF,udfPageHTML)
; udfImageList=Strcat("Number: ",udfImages.Length)
; udfnumberofImages = udfImages.Length - 1
; for x = 0 to udfnumberofImages
; udfImage = objBrowserDoc.Images(x)
; udfAltText=udfImage.alt
; if udfAltText=="" then udfAltText=" "
; udfSource=udfimage.src
; ;breakpoint
; If itemlocate(udfSource,StrReplace(udfimagelist,@CRLF," ")," ")==0 then udfImageList=strCat(udfImageList,@CRLF,udfSource," --- '",udfAltText,"'")
; next
Return udfContents
#EndFunction
objIE = ObjectCreate("InternetExplorer.Application")
objIE.visible = @True
objIE.navigate('http://www.google.com')
udfIEPageLoadWait(objIE)
url = udfGetCurrentURL(objIE)
pause(0,url)
objBrowserDoc = objIE.Document
ListLinks=udfListLinks(objBrowserDoc)
Message("ListLinks",ListLinks)
ListAnchors=udfListAnchors(objBrowserDoc)
Message("ListAnchors",ListAnchors)
ListImages=udfListImages(objBrowserDoc)
Message("ListImages",ListImages)
URLBody=udfURLPageBody(objBrowserDoc)
Message("URLBody",URLBody)
Well... that's pretty awesome... thanks! 8) :) ;D
Here is another swipe at it...
;***************************************************************************
;**
;** Web Page Scraper
;**
;** Purpose: Extract information from a webpage
;** Inputs: url
;** Outputs: Messages containing data
;**
;** Deana Falk
;** Revisions: 2013.10.22 Initial Release
;**
;**
;***************************************************************************
#DefineFunction udfIEPageLoadWait( objIE )
If !udfIsObject(objIE)
Pause('udfIEPageLoadWait','Not a Valid Object')
exit
EndIf
While !(objIE.readyState == 'complete' || objIE.readyState == 4 )
Timedelay(0.1)
EndWhile
While !(objIE.document.readyState == 'complete' || objIE.document.readyState == 4 )
Timedelay(0.1)
EndWhile
Return 1
#EndFunction
#DefineFunction udfIECreate( strUrl )
objIE = ObjectCreate( 'InternetExplorer.Application')
If !udfIsObject(objIE)
Pause('udfIECreate','Not a Valid Object')
exit
EndIf
objIE.visible = @True
objIE.navigate( strUrl )
udfIEPageLoadWait( objIE )
Return objIE
#EndFunction
#DefineFunction udfIEAttach( strMode, strString )
; strMode
; title strString is the title of the page you are trying to access
; url strString is the url of the page you are trying to access
; text strString is some text of the page you are trying to access
; html strString is some html of the page you are trying to access
strMode = StrLower(strMode)
objShell = ObjectCreate('Shell.Application')
objShellWindows = objShell.Windows(); collection of all ShellWindows (IE and File Explorer)
;ForEach objWindow In objShellWindows
For x = 0 To objShellWindows.count-1
objWindow = objShellWindows.Item(x)
; Check window object is a valid browser, if not, skip it
bIsBrowser = @True
; Check conditions to verify that the object is a browser
If bIsBrowser
ErrorMode(@off)
ret = objWindow.type ; Is .type a valid property?
ErrorMode(@cancel)
if ret == 0 then bIsBrowser = @False
EndIf
If bIsBrowser
ErrorMode(@off)
ret = objWindow.document.title ; Does object have a .document and .title property?
ErrorMode(@cancel)
if ret == 0 then bIsBrowser = @False
EndIf
If bIsBrowser
Switch @True
Case strMode =='title'
If StrIndex( strString, objWindow.document.title, 1, @Fwdscan ) > 0
Return objWindow
EndIf
break
Case strMode =='url'
If Strindex(objWindow.LocationURL, strString, 1 , @Fwdscan) > 0
Return objWindow
EndIf
break
Case strMode =='text'
If StrIndex(objWindow.document.body.innerText, strString, 1 ,@Fwdscan) > 0
Return objWindow
EndIf
break
Case strMode =='html'
If StrIndex(objWindow.document.body.innerHTML, strString) > 0
Return objWindow
EndIf
break
Case strMode ; Invalid Mode
Pause('udfIEAttach','Invalid Mode Specified')
Exit
EndSwitch
EndIf
Next
Return 0
#EndFunction
#DefineFunction udfGetURL( objIE )
If !udfIsObject( objIE )
Pause('udfIECreate','Not a Valid Object')
exit
EndIf
strUrl = objIE.LocationURL
return strUrl
#EndFunction
#DefineFunction udfIsObject( obj )
Return(VarType(obj)>=1024)
#EndFunction
#DefineFunction udfListLinks( objIE )
If !udfIsObject(objIE)
Pause('udfListLinks','Not a Valid Object')
exit
EndIf
objBrowserDoc = objIE.Document
objLinks = objBrowserDoc.Links
strLinkList = ''
numberofLinks = objLinks.Length - 1
for x = 0 to numberofLinks
objLinks = objBrowserDoc.Links(x)
If strLinkList == '' then strLinkList = objLinks.href
Else strLinkList = strLinkList : @TAB :objLinks.href
next
Return strLinkList
#EndFunction
#DefineFunction udfListAnchors( objIE )
If !udfIsObject( objIE )
Pause('udfListAnchors','Not a Valid Object')
exit
EndIf
objBrowserDoc = objIE.Document
objAnchors = objBrowserDoc.anchors
strAnchorList = ''
numberofAnchors = objAnchors.length - 1
for x = 0 to numberofAnchors
objAnchor = objBrowserDoc.anchors(x)
If strAnchorList == '' then strAnchorList = objAnchor.name
Else strAnchorList = strAnchorList : @TAB :objAnchor.name
next
Return strAnchorList
#EndFunction
#DefineFunction udfListImages( objIE )
If !udfIsObject( objIE )
Pause('udfListImages','Not a Valid Object')
exit
EndIf
objBrowserDoc = objIE.Document
objImages = objBrowserDoc.Images
strImageList = ''
numberofImages = objImages.Length - 1
for x = 0 to numberofImages
objImage = objBrowserDoc.Images(x)
strAltText = objImage.alt
if strAltText=='' then strAltText=' '
strSource = objImage.src
;nSize=objImage.size
If strImageList == '' then strImageList = strSource : ' --- ' : strAltText
Else strImageList = strImageList : @TAB : strSource : ' --- ' : strAltText
next
Return strImageList
#EndFunction
#DefineFunction udfGetBody(objIE, nOption)
If !udfIsObject( objIE )
Pause( 'udfGetBody', 'Not a Valid Object' )
exit
EndIf
objBrowserDoc = objIE.Document
objBody = objBrowserDoc.Body
Switch nOption
Case 0
strContents = objBody.innertext
break
Case 1
strContents = objBody.innerhtml
break
case nOption
Pause( 'udfGetBody', 'Invalid Option' )
Return 0
break
EndSwitch
Return strContents
#EndFunction
strUrl = 'http://www.winbatch.com/'
objIE = udfIECreate( strUrl )
if objIE == 0
Pause('udfIECreate','Unable to create browser')
Exit
Endif
; Attach to existing browser with this url
;objIE = udfIEAttach('url', 'http://www.winbatch.com/')
;if objIE == 0
; Pause('udfIEAttach','Unable to locate browser using this mode')
; Exit
;Endif
url = udfGetURL( objIE )
pause( 'Current Url', url )
ListLinks = udfListLinks( objIE )
AskItemList( 'ListLinks', ListLinks, @tab, @unsorted, @single )
ListAnchors = udfListAnchors( objIE )
AskItemList( 'ListAnchors', ListAnchors, @tab, @unsorted, @single )
ListImages = udfListImages( objIE )
AskItemList( 'ListImages', ListImages, @tab, @unsorted, @single )
URLBodyTxt = udfGetBody( objIE, 0 ) ;inner text
Pause( 'Body Inner Text', URLBodyTxt )
URLBodyHTML = udfGetBody( objIE, 1 ) ;inner html
Pause( 'Body Inner HTML', URLBodyHTML )
Is there, somewhere, a bible or ubertutorial for programmatic web page reading and manipulation?
Is this still the one?:
http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+Tutorials+Working~With~Web~Pages.txt (http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+Tutorials+Working~With~Web~Pages.txt)
Or has it been superseded?
Quote from: stevengraff on October 23, 2013, 04:54:36 AM
Is there, somewhere, a bible or ubertutorial for programmatic web page reading and manipulation?
Is this still the one?:
http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+Tutorials+Working~With~Web~Pages.txt (http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+Tutorials+Working~With~Web~Pages.txt)
Or has it been superseded?
That specific tutorial focuses on all the various methods for interacting with webpages. The code above used MSIE COM interface. It might be nice to one day put together a tutorial that focuses on only this method.
What's a good method for learning more about the MSIE COM interface? Is there some additional reference documentation somewhere at Microsoft's web site?
Here are a couple of useful links.
http://msdn.microsoft.com/en-us/library/aa752085(VS.85).aspx
http://msdn.microsoft.com/en-us/library/ms535862(v=vs.85).aspx
Jim
Thanks Jim.
Btw, maybe it's just me, but it seems like a wealth of information has somehow gone missing. I tried searching both in the new forum and the old, for variations on words like "submit" "form" "web" looking for articles on how to programmatically submit a web form, and found nothing. I could swear there used to be examples of this, using both GET and POST methods.
I think there is a good example in the WinInet Extender for submitting a form if you need to use GET or POST and someone recently posted a way to "click" a button if you want to automate a web page to do the submission. Here is some clicking code for easy reference. It will handle the regular click as well as the Event method.
objEvent = yobrowser.document.createEvent("HTMLEvents")
i = BROWSER_OBJECT.document.GetElementById("ID_ATTRIBUTE")
If objEvent == "" Then
i.click
Else
objEvent.initEvent("click", @true, @true)
i.dispatchEvent(objEvent)
EndIf
Jim
Thanks again... that's just what I'm looking for.