Archived Boards > WinBatch Script Exchange
Webpage Automation - Search Craigslist
Deana:
Webpage Automation: Search Craigslist.
--- Code: Winbatch ---;***************************************************************************
;**
;** Search Craigslist for an item.
;**
;** Purpose: locate and display craigslist result in IE browser
;** Inputs: starting url, query settings
;** Outputs: displays results in IE browser
;** TO DO: Optionally add code to store the results of to a file
;** Revisions: Deana Falk 2013.10.11
;***************************************************************************
;>search for:
;in:<select id="cAbb" name="catAbbreviation">
;<option value="ccc">all community<option value="eee">all event<option value="sss">all for sale / wanted<option disabled value="">--<option value="art"> art & crafts
;<option value="pts"> auto parts
;<option value="bab"> baby & kid stuff
;<option value="bar"> barter
;<option value="bik"> bicycles
;<option value="boa"> boats
;<option value="bks"> books
;<option value="bfs"> business
;<option value="car"> cars & trucks
;<option value="emd"> cds / dvds / vhs
;<option value="clo"> clothing
;<option value="clt"> collectibles
;<option value="sys"> computers & tech
;<option value="ele"> electronics
;<option value="grd"> farm & garden
;<option value="zip"> free stuff
;<option value="fur"> furniture
;<option value="tag"> games & toys
;<option value="gms"> garage sales
;<option value="for"> general
;<option value="hsh"> household
;<option value="wan"> items wanted
;<option value="jwl"> jewelry
;<option value="mat"> materials
;<option value="mcy" selected> motorcycles/scooters
;<option value="msg"> musical instruments
;<option value="pho"> photo/video
;<option value="rvs"> recreational vehicles
;<option value="spo"> sporting goods
;<option value="tix"> tickets
;<option value="tls"> tools
;<option disabled value="">
;<option value="ggg">all gigs
;<option value="hhh">all housing
;<option value="jjj">all jobs
;<option value="ppp">all personals
;<option value="res">all resume
;<option value="bbb">all services offered</select>
;
Browser = ObjectCreate("InternetExplorer.Application")
browser.visible = @TRUE
;url = 'http://seattle.craigslist.org/mcy/' ;Motorcycle
;url = 'http://seattle.craigslist.org/spo/' ;Sporting goods
url = 'http://seattle.craigslist.org/sss/'
query = "WinBatch Rocks"
browser.navigate(url)
While browser.readystate <> 4
TimeDelay(0.5)
EndWhile
doc = browser.document
form = doc.forms.Item(0)
form.GetElementsByTagName("INPUT").Item(0).Value = query
form.GetElementsByTagName("INPUT").Item(2).Value = 1 ; zoomtoposting ( leave as is )
form.GetElementsByTagName("INPUT").Item(3).Value = "1" ; min price
form.GetElementsByTagName("INPUT").Item(4).Value = "500" ; max price
form.GetElementsByTagName("INPUT").Item(5).Value = 1 ; has pic
form.GetElementsByTagName("INPUT").Item(6).Value = 0 ; only titles
; Unfortunately .Click method doesn't work on IE 9 or newer.
; However this method seems to work:
; Reference: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE+IE9~Click~or~Focus~Methods~Fail.txt
objBtn = form.GetElementsByTagName("INPUT").Item(1)
objEvent = browser.document.createEvent("HTMLEvents")
objEvent.initEvent("click", @TRUE, @TRUE)
objBtn.dispatchEvent(objEvent)
While browser.readystate <> 4
TimeDelay(0.5)
EndWhile
Exit
Reference: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE/User~Samples+Search~Craigslist.txt
stevengraff:
Deana, you use this format:
--- Code: ---doc = browser.document
form = doc.forms.Item(0)
form.GetElementsByTagName("INPUT").Item(0).Value = query
form.GetElementsByTagName("INPUT").Item(2).Value = 1 ; zoomtoposting ( leave as is )
form.GetElementsByTagName("INPUT").Item(3).Value = "1" ; min price
form.GetElementsByTagName("INPUT").Item(4).Value = "500" ; max price
form.GetElementsByTagName("INPUT").Item(5).Value = 1 ; has pic
form.GetElementsByTagName("INPUT").Item(6).Value = 0 ; only titles
--- End code ---
I tried this, which seemed to work as well (using the item's name instead of number):
--- Code: ---doc = browser.document
form = doc.forms.Item(0)
form.GetElementsByTagName("INPUT").Item("query").Value = query
form.GetElementsByTagName("INPUT").Item(2).Value = 1 ; zoomtoposting ( leave as is )
form.GetElementsByTagName("INPUT").Item("minAsk").Value = "1" ; min price
form.GetElementsByTagName("INPUT").Item("maxAsk").Value = "500" ; max price
--- End code ---
The number, instead of string, in the Item() part... is it just ordered sequentially by how the INPUTs appear on the page?
Deana:
Steven,
Yes. The HTMLCollection returned by GetElementsByTagName is an interface representing a generic collection of elements (in document order). When dealing with HTMLCollections there are different methods and properties for traversing the list.
Properties
* HTMLCollection.length Read only The number of items in the collection.
Methods
* HTMLCollection.Item(index) - Returns the specific node at the given zero-based index into the list. Returns null if the index is out of range.
* HTMLCollection.namedItem(name)Returns the specific node whose ID or name matches the string specified. Matching by name is only done as a last resort, only in HTML, and only if the referenced element supports the name attribute.
The documentation (http://msdn.microsoft.com/en-us/library/ie/dd347034(v=vs.85).aspx) for the Item method states:
--- Quote ---Note This method indexes collections by the name or id property of the object; this is a known standards-compliance issue. For interoperability with other browsers, do not reference an object by name using this method.
--- End quote ---
Which is why I prefer to use the index rather than the name....
stevengraff:
The line:
form = doc.forms.Item(0)
determines which form, if more than one exists on the page (in this case, zero), is utilized. In my current project the form number turned out to be 13, which I found by guessing wrong 13 times :). Is there a more direct way to determine the form's index number?
Deana:
--- Quote from: stevengraff on October 24, 2013, 09:56:45 am ---The line:
form = doc.forms.Item(0)
determines which form, if more than one exists on the page (in this case, zero), is utilized. In my current project the form number turned out to be 13, which I found by guessing wrong 13 times :). Is there a more direct way to determine the form's index number?
--- End quote ---
The F12 developer tools is a suite of on-demand tools that is built into Internet Explorer. Otherwise you can get it here: http://www.microsoft.com/en-us/download/details.aspx?id=18359.
Open the webpage. Press F12. Either search for the keyword '<form' or select the Find Menu | Select element by click. Then count number of forms listed before the form of interest.
Or you can write a script to List all the forms on the page. Your in luck I already have some code that should help:
--- Code: Winbatch ---;***************************************************************************
;**
;** Form Explorer
;** Discover forms on a webpage using an index
;**
;** Developer: Deana Falk 2013.10.24
;***************************************************************************
#DefineFunction udfIEPageLoadWait( objIE )
; Wait for webpage to load
While !(objIE.readyState == 'complete' || objIE.readyState == 4 )
Timedelay(0.1)
EndWhile
While !(objIE.document.readyState == 'complete' || objIE.document.readyState == 4 )
Timedelay(0.1)
EndWhile
Return 1
#EndFunction
;url = "http://www.yahoo.com"
url = "http://techsupt.winbatch.com/techsupt/sampleform.html"
; Initialize MSIE object
oIE = ObjectCreate("InternetExplorer.Application")
oIE.Visible = @TRUE ; Change to @FALSE to hide the process from the user
oIE.Navigate(url)
; Wait for webpage to load
udfIEPageLoadWait( oIE )
;***************************************************************************
; Get forms collection
;***************************************************************************
; Get document object
oDoc = oIE.Document
; Forms Collection
oForms = oDoc.Forms
; OR
; getElementsByTagName Method
;oForms = oDoc.getElementsByTagName("FORM")
formlist = "" ; Initialize variable
if oDoc != 0 ; Check if you have a valid document handle
count = oForms.Length
; Loop through the collection of forms using index
For index = 0 to count-1
oForm = oForms.Item(index)
If ObjectTypeGet(oForm)=="EMPTY" then continue
formlist = formlist:@tab:index:"|":oForm.id:"|":oForm.name
Next
Else
Pause('Notice','Unable to obtain a handle to the document object on this webpage. Check URL.')
Endif
formlist = StrTrim(formlist) ; remove leading tab
if formlist != ""
formindex = Int(ItemExtract(1,AskItemlist("Form Index on a webpage", formlist, @tab, @unsorted, @single ),"|"))
oForm = oForms.Item(formindex)
Pause('Form Object Handle ', oForm)
Else
Pause('Notice','There are no forms on this page!')
EndIf
; Quit
oIE.Quit
; Close open COM/OLE handles
oDoc = 0
oIE = 0
exit
Navigation
[0] Message Index
[#] Next page
Go to full version