Webpage Automation: Search Craigslist.
;***************************************************************************
;**
;** Search Craigslist for an item.
;**
;** Purpose: locate and display craigslist result in IE browser
;** Inputs: starting url, query settings
;** Outputs: displays results in IE browser
;** TO DO: Optionally add code to store the results of to a file
;** Revisions: Deana Falk 2013.10.11
;***************************************************************************
;>search for:
;in:<select id="cAbb" name="catAbbreviation">
;<option value="ccc">all community<option value="eee">all event<option value="sss">all for sale / wanted<option disabled value="">--<option value="art"> art & crafts
;<option value="pts"> auto parts
;<option value="bab"> baby & kid stuff
;<option value="bar"> barter
;<option value="bik"> bicycles
;<option value="boa"> boats
;<option value="bks"> books
;<option value="bfs"> business
;<option value="car"> cars & trucks
;<option value="emd"> cds / dvds / vhs
;<option value="clo"> clothing
;<option value="clt"> collectibles
;<option value="sys"> computers & tech
;<option value="ele"> electronics
;<option value="grd"> farm & garden
;<option value="zip"> free stuff
;<option value="fur"> furniture
;<option value="tag"> games & toys
;<option value="gms"> garage sales
;<option value="for"> general
;<option value="hsh"> household
;<option value="wan"> items wanted
;<option value="jwl"> jewelry
;<option value="mat"> materials
;<option value="mcy" selected> motorcycles/scooters
;<option value="msg"> musical instruments
;<option value="pho"> photo/video
;<option value="rvs"> recreational vehicles
;<option value="spo"> sporting goods
;<option value="tix"> tickets
;<option value="tls"> tools
;<option disabled value="">
;<option value="ggg">all gigs
;<option value="hhh">all housing
;<option value="jjj">all jobs
;<option value="ppp">all personals
;<option value="res">all resume
;<option value="bbb">all services offered</select>
;
Browser = ObjectCreate("InternetExplorer.Application")
browser.visible = @TRUE
;url = 'http://seattle.craigslist.org/mcy/' ;Motorcycle
;url = 'http://seattle.craigslist.org/spo/' ;Sporting goods
url = 'http://seattle.craigslist.org/sss/'
query = "WinBatch Rocks"
browser.navigate(url)
While browser.readystate <> 4
TimeDelay(0.5)
EndWhile
doc = browser.document
form = doc.forms.Item(0)
form.GetElementsByTagName("INPUT").Item(0).Value = query
form.GetElementsByTagName("INPUT").Item(2).Value = 1 ; zoomtoposting ( leave as is )
form.GetElementsByTagName("INPUT").Item(3).Value = "1" ; min price
form.GetElementsByTagName("INPUT").Item(4).Value = "500" ; max price
form.GetElementsByTagName("INPUT").Item(5).Value = 1 ; has pic
form.GetElementsByTagName("INPUT").Item(6).Value = 0 ; only titles
; Unfortunately .Click method doesn't work on IE 9 or newer.
; However this method seems to work:
; Reference: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE+IE9~Click~or~Focus~Methods~Fail.txt
objBtn = form.GetElementsByTagName("INPUT").Item(1)
objEvent = browser.document.createEvent("HTMLEvents")
objEvent.initEvent("click", @TRUE, @TRUE)
objBtn.dispatchEvent(objEvent)
While browser.readystate <> 4
TimeDelay(0.5)
EndWhile
Exit
Reference: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE/User~Samples+Search~Craigslist.txt
Deana, you use this format:
doc = browser.document
form = doc.forms.Item(0)
form.GetElementsByTagName("INPUT").Item(0).Value = query
form.GetElementsByTagName("INPUT").Item(2).Value = 1 ; zoomtoposting ( leave as is )
form.GetElementsByTagName("INPUT").Item(3).Value = "1" ; min price
form.GetElementsByTagName("INPUT").Item(4).Value = "500" ; max price
form.GetElementsByTagName("INPUT").Item(5).Value = 1 ; has pic
form.GetElementsByTagName("INPUT").Item(6).Value = 0 ; only titles
I tried this, which seemed to work as well (using the item's name instead of number):
doc = browser.document
form = doc.forms.Item(0)
form.GetElementsByTagName("INPUT").Item("query").Value = query
form.GetElementsByTagName("INPUT").Item(2).Value = 1 ; zoomtoposting ( leave as is )
form.GetElementsByTagName("INPUT").Item("minAsk").Value = "1" ; min price
form.GetElementsByTagName("INPUT").Item("maxAsk").Value = "500" ; max price
The number, instead of string, in the Item() part... is it just ordered sequentially by how the INPUTs appear on the page?
Steven,
Yes. The HTMLCollection returned by GetElementsByTagName is an interface representing a generic collection of elements (in document order). When dealing with HTMLCollections there are different methods and properties for traversing the list.
Properties
- HTMLCollection.length Read only The number of items in the collection.
Methods
- HTMLCollection.Item(index) - Returns the specific node at the given zero-based index into the list. Returns null if the index is out of range.
- HTMLCollection.namedItem(name)Returns the specific node whose ID or name matches the string specified. Matching by name is only done as a last resort, only in HTML, and only if the referenced element supports the name attribute.
The documentation (http://msdn.microsoft.com/en-us/library/ie/dd347034(v=vs.85).aspx) for the Item method states:
QuoteNote This method indexes collections by the name or id property of the object; this is a known standards-compliance issue. For interoperability with other browsers, do not reference an object by name using this method.
Which is why I prefer to use the index rather than the name....
The line:
form = doc.forms.Item(0)
determines which form, if more than one exists on the page (in this case, zero), is utilized. In my current project the form number turned out to be 13, which I found by guessing wrong 13 times :). Is there a more direct way to determine the form's index number?
Quote from: stevengraff on October 24, 2013, 09:56:45 AM
The line:
form = doc.forms.Item(0)
determines which form, if more than one exists on the page (in this case, zero), is utilized. In my current project the form number turned out to be 13, which I found by guessing wrong 13 times :). Is there a more direct way to determine the form's index number?
The F12 developer tools is a suite of on-demand tools that is built into Internet Explorer. Otherwise you can get it here: http://www.microsoft.com/en-us/download/details.aspx?id=18359.
Open the webpage. Press F12. Either search for the keyword '<form' or select the Find Menu | Select element by click. Then count number of forms listed before the form of interest.
Or you can write a script to List all the forms on the page. Your in luck I already have some code that should help:
;***************************************************************************
;**
;** Form Explorer
;** Discover forms on a webpage using an index
;**
;** Developer: Deana Falk 2013.10.24
;***************************************************************************
#DefineFunction udfIEPageLoadWait( objIE )
; Wait for webpage to load
While !(objIE.readyState == 'complete' || objIE.readyState == 4 )
Timedelay(0.1)
EndWhile
While !(objIE.document.readyState == 'complete' || objIE.document.readyState == 4 )
Timedelay(0.1)
EndWhile
Return 1
#EndFunction
;url = "http://www.yahoo.com"
url = "http://techsupt.winbatch.com/techsupt/sampleform.html"
; Initialize MSIE object
oIE = ObjectCreate("InternetExplorer.Application")
oIE.Visible = @TRUE ; Change to @FALSE to hide the process from the user
oIE.Navigate(url)
; Wait for webpage to load
udfIEPageLoadWait( oIE )
;***************************************************************************
; Get forms collection
;***************************************************************************
; Get document object
oDoc = oIE.Document
; Forms Collection
oForms = oDoc.Forms
; OR
; getElementsByTagName Method
;oForms = oDoc.getElementsByTagName("FORM")
formlist = "" ; Initialize variable
if oDoc != 0 ; Check if you have a valid document handle
count = oForms.Length
; Loop through the collection of forms using index
For index = 0 to count-1
oForm = oForms.Item(index)
If ObjectTypeGet(oForm)=="EMPTY" then continue
formlist = formlist:@tab:index:"|":oForm.id:"|":oForm.name
Next
Else
Pause('Notice','Unable to obtain a handle to the document object on this webpage. Check URL.')
Endif
formlist = StrTrim(formlist) ; remove leading tab
if formlist != ""
formindex = Int(ItemExtract(1,AskItemlist("Form Index on a webpage", formlist, @tab, @unsorted, @single ),"|"))
oForm = oForms.Item(formindex)
Pause('Form Object Handle ', oForm)
Else
Pause('Notice','There are no forms on this page!')
EndIf
; Quit
oIE.Quit
; Close open COM/OLE handles
oDoc = 0
oIE = 0
exit
I have been working on a tutorial that I plan to post to the tech database. It still needs some work but here is a Webpage Explorer I created:
;***************************************************************************
;** Webpage Element Explorer
;**
;** Purpose: Discovering form elements on a webpage
;** Inputs: ignorehidden, inputonly
;** Outputs: Highlights each form element using a red solid border
;** and Display attribute information in a Box
;** Reference:
;**
;**
;** Developer: Deana Falk 2013.10.24
;***************************************************************************
#DefineFunction udfIEPageLoadWait( objIE )
; Wait for webpage to load
While !(objIE.readyState == 'complete' || objIE.readyState == 4 )
Timedelay(0.1)
EndWhile
While !(objIE.document.readyState == 'complete' || objIE.document.readyState == 4 )
Timedelay(0.1)
EndWhile
Return 1
#EndFunction
; Modify these values to change how the script operates
ignorehidden = @TRUE ; change to @FALSE to query hidden form elements
inputonly = @FALSE ; change to @TRUE to query only INPUT form elements.
boxcoords = '750,0,1000,200' ; coordinates of the display box.
boxclr = '255,0,0' ; box background color
url = "http://techsupt.winbatch.com/techsupt/sampleform.html"
; Display form element data in a box
title = 'Discovering form elements on a webpage'
BoxesUp(boxcoords, @NORMAL)
BoxDrawRect(1,'0,0,1000,1000',2)
BoxColor(1,boxclr,0)
BoxCaption(1, title)
WindowOnTop(title, 1)
; Initialize MSIE object
oIE = ObjectCreate("InternetExplorer.Application")
oIE.Visible = @TRUE ; Change to @FALSE to hide the process from the user
oIE.Navigate(url)
; Wait for webpage to load
udfIEPageLoadWait( oIE )
;***************************************************************************
; Get Colection of Forms
;***************************************************************************
; Get document object
oDoc = oIE.Document
; Forms Collection
oForms = oDoc.Forms
; OR
; getElementsByTagName Method
;oForms = oDoc.getElementsByTagName("FORM")
formlist = "" ; Initialize variable
If oDoc != 0 ; Check if you have a valid document handle
; Loop through the collection of forms using index
count = oForms.Length
For index = 0 to count-1
oForm = oForms.Item(index)
If ObjectTypeGet(oForm)=="EMPTY" then continue
formlist = formlist:@tab:index:"|":oForm.id:"|":oForm.name
Next
; ALTERNATE WAY: Loop through the collection of forms
;index = 0
;ForEach oForm in oForms
; If ObjectTypeGet(oForm)=="EMPTY" then continue
; formlist = formlist:@tab:index:"|":oForm.id:"|":oForm.name
; index = index+1
;Next
Else
Pause('Notice','Unable to obtain a handle to the document object on this webpage. Check URL.')
EndIf
formlist = StrTrim(formlist) ; remove leading tab
formindex = Int(ItemExtract(1,AskItemlist("Form Index on a webpage", formlist, @tab, @unsorted, @single ),"|"))
If formindex == "" then formindex = 0 ;Confirm user selected form name
oForm = oForms.Item(formindex)
;***************************************************************************
; Get Collection of Elements in the Form
;***************************************************************************
If oForm != 0 ; Check if you have a valid form handle
cElements = oForm.Elements
elementindex = 0
BoxButtonDraw(1, 1, 'Next Element', '250,750,650,900')
ForEach oElement in cElements
; Get element attributes
; Alternatice option: oElement.getAttribute("attributename")
tag = oElement.nodeName ;The value of tagName is the same as that of nodeName.
type = StrUpper(oElement.Type)
id = oElement.Id
name = oElement.Name
value = oElement.Value
; Check if user wants to ignore HIDDEN form elements
If ignorehidden
If StrUpper(type) == 'HIDDEN' then continue
EndIf
; Check if user wants to see non INPUT form elements
If inputonly
If StrUpper(tag) != 'INPUT' then continue
EndIf
; Highlight each element on the form
oElement.style.border = "1mm solid red"
; Set focus on element
If !oElement.disabled
ErrorMode(@off)
oElement.Focus()
ErrorMode(@cancel)
EndIf
; Display results
BoxDrawText(1, "0,0,1000,1000",'Element Index: ' : elementindex : @lf :'Tag: ' : tag : @lf : 'Type: ' : type : @lf :'Id: ' : id : @lf :'Name: ' : name : @lf :'Value: ': value, @TRUE, 0 )
While !BoxButtonStat(1, 1)
TimeDelay(0.5)
Endwhile
; Remove element highlight
oElement.style.border = ""
; Increment element counter
elementindex = elementindex+1
Next
Else
Pause('Notice','Unable to obtain a handle to a form on this webpage. Check URL and formnumber.')
EndIf
:CleanUp
; Quit
oIE.Quit
; Close open COM/OLE handles
oElement
oDoc = 0
oIE = 0
exit
for your information , I have been unsuccessfully with a webpage trying to click with HTMLEvents, but this just worked EG input.click(1) , hope it helps others
InputCollection = f.document.GetElementsByTagName("a")
ForEach Input In InputCollection
name=Input.attributes.class.value
if strindexnc( name, "xxxxxxxxxx", 1, @Fwdscan ) >0
;input.click ;does not work on IE11
input.click(1) ; Works on IE11
f=WaitForMSIE(f)
break
Endif
Next
Quote from: markgolay on March 06, 2014, 02:59:45 AM
for your information , I have been unsuccessfully with a webpage trying to click with HTMLEvents, but this just worked EG input.click(1) , hope it helps others
InputCollection = f.document.GetElementsByTagName("a")
ForEach Input In InputCollection
name=Input.attributes.class.value
if strindexnc( name, "xxxxxxxxxx", 1, @Fwdscan ) >0
;input.click ;does not work on IE11
input.click(1) ; Works on IE11
f=WaitForMSIE(f)
break
Endif
Next
Interesting. How did you come up with that solution? the documentation for the Click method indicates NO PARAMETERS: http://msdn.microsoft.com/en-us/library/ie/ms536363(v=vs.85).aspx
I was using your HTMLEvents which worked on all my buttons over 2/3 webpages , expect the last final button on a webpage typical, it would not work, I was getting very frustrated , so hunting the web & just found someone say they tried it and it worked , I think it was on Perl forum.. I would still like to understand why HTMLEvents did not work on my last button, but afraid its above my knowledge , I just wanted it to work
Quote from: markgolay on March 10, 2014, 05:26:43 AM
I was using your HTMLEvents which worked on all my buttons over 2/3 webpages , expect the last final button on a webpage typical, it would not work, I was getting very frustrated , so hunting the web & just found someone say they tried it and it worked , I think it was on Perl forum.. I would still like to understand why HTMLEvents did not work on my last button, but afraid its above my knowledge , I just wanted it to work
I modified the code in the tech database. It now searches for input elements by name instead of number. Give this code a try: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE/User~Samples+Search~Craigslist.txt
an old thread, and maybe this doens;t help anyone, but I changed my code to use:
doc.GetElementById("buttonID")
and that was the key to getting my browser sessions(IE10) to click successfully
Rich
Quote from: Deana on March 10, 2014, 08:52:08 AM
I modified the code in the tech database. It now searches for input elements by name instead of number. Give this code a try: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE/User~Samples+Search~Craigslist.txt
Deana, could you please update your sample again. It is not finding any of the named objects and the "count = form.GetElementsByTagName("INPUT").length" is returning a 1. I want to use your sample to play around with and learn.
Thank you.
Quote from: deming on July 29, 2014, 07:50:04 AM
Quote from: Deana on March 10, 2014, 08:52:08 AM
I modified the code in the tech database. It now searches for input elements by name instead of number. Give this code a try: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE/User~Samples+Search~Craigslist.txt
Deana, could you please update your sample again. It is not finding any of the named objects and the "count = form.GetElementsByTagName("INPUT").length" is returning a 1. I want to use your sample to play around with and learn.
Thank you.
Make sure you are using the latest code posted here: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/nftechsupt.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE/User~Samples+Search~Craigslist.txt
If you continue to have an issue running the latest code please post the DebugTrace output. Simply add DebugTrace(@on,"trace.txt") to the beginning of the script and inside any UDF, run it until the error or completion, then inspect the resulting trace file for clues as to the problem. Feel free to post the trace file here ( removing any private info) if you need further assistance.
I looked over the trace.txt (see below) but am stumped and would appreciate your expertise. Looks like the "name = objInput.Name" is returning a null.
************************************************************
*** Debug Initialized ***
==============================
Wed 7/30/2014 11:42:39 AM
WinBatch 32 2013C
WIL DLL 6.13cmc
D:\Users\Documents\WinBatch\Search Craigslist.wbt
Windows platform: NT, version: 6.1, build: 7601 (Service Pack 1)
ErrorMode: @CANCEL
Valid Code Signature: Yes
UAC Manifest Settings: level="highestAvailable" uiAccess="true"
UAC Elevation Level: Standard User or Disabled
==============================
Browser = ObjectCreate("InternetExplorer.Application")
(2730) VALUE INT/COMOBJ => 41544140
browser.visible = @TRUE
(2761) VALUE VARIANT_BOOL => -1
url = 'http://seattle.craigslist.org/sss/'
(2761) VALUE STRING => "http://seattle.craigslist.org/sss/"
query = "WinBatch Rocks"
(2761) VALUE STRING => "WinBatch Rocks"
browser.navigate(url)
(2808) VALUE VARIANT_EMPTY =>
While browser.readystate <> 4
(3011) END OPERATOR
doc = browser.document
(3042) VALUE INT/COMOBJ => 41545004
form = doc.forms.Item(0)
(3058) VALUE INT/COMOBJ => 41844180
count = form.GetElementsByTagName("INPUT").length
(3058) VALUE VARIANT_I4 => 1
For x = 0 To count
(3058) FOR TRUE==>0
objInput = form.GetElementsByTagName("INPUT").Item(x)
(3058) VALUE INT/COMOBJ => 41844684
If objInput == 0 Then Continue
(3058) ==>FALSE=> (skipped)
name = objInput.Name
(3073) VALUE VARIANT_BSTR =>
id = objInput.Id
(3073) VALUE VARIANT_BSTR =>
If name == "query" Then form.GetElementsByTagName("INPUT").Item(x).Value = query
(3073) ==>FALSE=> (skipped)
If name == "minAsk" Then form.GetElementsByTagName("INPUT").Item(x).Value = "1"
(3073) ==>FALSE=> (skipped)
If name == "maxAsk" Then form.GetElementsByTagName("INPUT").Item(x).Value = "500"
(3073) ==>FALSE=> (skipped)
If name == "hasPic" Then form.GetElementsByTagName("INPUT").Item(x).checked = @TRUE
(3073) ==>FALSE=> (skipped)
If id == "searchbtn" Then objBtn = objInput
(3073) ==>FALSE=> (skipped)
Next
(3073) END OPERATOR
To count
(3073) FOR TRUE==>1
objInput = form.GetElementsByTagName("INPUT").Item(x)
(3073) VALUE VARIANT_DISPATCH => 0
If objInput == 0 Then Continue
(3089) END OPERATOR
To count
(3089) END OPERATOR
Pause(0,0)
(4992) VALUE INT => 1
If IsDefined(objBtn)
(4992) ELSE DO==>TRUE
Pause('Notice','No search button found')
(5850) VALUE INT => 1
EndIf
(5850) END OPERATOR
While browser.readystate <> 4
(5850) END OPERATOR
Exit
(5850) VALUE INT => 0
--- Normal termination ---
;;;END OF JOB;;;
The suspect line is :
count = form.GetElementsByTagName("INPUT").length
(3058) VALUE VARIANT_I4 => 1
This indicates that the form only contains one input element. Not sure why at this point.
I wonder if adding a short time delay after the Navigate might help? This amy allow the page to fully load before querying the input elements.
Use this UDF to wait for the page to load:
#DefineFunction udfIEPageLoadWait( objIE )
; Wait for webpage to load
While !(objIE.readyState == 'complete' || objIE.readyState == 4 )
TimeDelay(0.1)
EndWhile
While !(objIE.document.readyState == 'complete' || objIE.document.readyState == 4 )
TimeDelay(0.1)
EndWhile
Return 1
#EndFunction
What version of Internet Explorer do you have installed on your WinXP system?
Here is the most current revision: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/nftechsupt.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE/User~Samples+Search~Craigslist.txt
Thank you Deana! Your latest update works perfectly. My IE is Version 11.0.9600.17207, Update 11.0.10
Thank you for taking the time to solve this. Your samples are a great tool for learning.
:D