Archived Boards > WinBatch Script Exchange

Webpage Automation - Search Craigslist

(1/4) > >>

Deana:
Webpage Automation: Search Craigslist.


--- Code: Winbatch ---;***************************************************************************
;**
;**        Search Craigslist for an item.
;**
;** Purpose: locate and display craigslist result in IE browser
;** Inputs:  starting url, query settings
;** Outputs: displays results in IE browser
;** TO DO: Optionally add code to store the results of to a file
;** Revisions: Deana Falk 2013.10.11
;***************************************************************************

;>search for:
;in:<select id="cAbb" name="catAbbreviation">
;<option value="ccc">all community<option value="eee">all event<option value="sss">all for sale / wanted<option disabled value="">--<option value="art">   art &amp; crafts
;<option value="pts">   auto parts
;<option value="bab">   baby &amp; kid stuff
;<option value="bar">   barter
;<option value="bik">   bicycles
;<option value="boa">   boats
;<option value="bks">   books
;<option value="bfs">   business
;<option value="car">   cars &amp; trucks
;<option value="emd">   cds / dvds / vhs
;<option value="clo">   clothing
;<option value="clt">   collectibles
;<option value="sys">   computers &amp; tech
;<option value="ele">   electronics
;<option value="grd">   farm &amp; garden
;<option value="zip">   free stuff
;<option value="fur">   furniture
;<option value="tag">   games &amp; toys
;<option value="gms">   garage sales
;<option value="for">   general
;<option value="hsh">   household
;<option value="wan">   items wanted
;<option value="jwl">   jewelry
;<option value="mat">   materials
;<option value="mcy" selected>   motorcycles/scooters
;<option value="msg">   musical instruments
;<option value="pho">   photo/video
;<option value="rvs">   recreational vehicles
;<option value="spo">   sporting goods
;<option value="tix">   tickets
;<option value="tls">   tools
;<option disabled value="">
;<option value="ggg">all gigs
;<option value="hhh">all housing
;<option value="jjj">all jobs
;<option value="ppp">all personals
;<option value="res">all resume
;<option value="bbb">all services offered</select>
;
Browser = ObjectCreate("InternetExplorer.Application")
browser.visible = @TRUE
;url  = 'http://seattle.craigslist.org/mcy/' ;Motorcycle
;url  = 'http://seattle.craigslist.org/spo/' ;Sporting goods
url = 'http://seattle.craigslist.org/sss/'
query = "WinBatch Rocks"

browser.navigate(url)

While browser.readystate <> 4
   TimeDelay(0.5)
EndWhile

doc = browser.document
form = doc.forms.Item(0)
form.GetElementsByTagName("INPUT").Item(0).Value = query
form.GetElementsByTagName("INPUT").Item(2).Value = 1     ; zoomtoposting ( leave as is )
form.GetElementsByTagName("INPUT").Item(3).Value = "1"   ; min price
form.GetElementsByTagName("INPUT").Item(4).Value = "500" ; max price
form.GetElementsByTagName("INPUT").Item(5).Value = 1     ; has pic
form.GetElementsByTagName("INPUT").Item(6).Value = 0     ; only titles

; Unfortunately .Click method doesn't work on IE 9 or newer.
; However this method seems to work:
; Reference: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE+IE9~Click~or~Focus~Methods~Fail.txt
objBtn = form.GetElementsByTagName("INPUT").Item(1)
objEvent = browser.document.createEvent("HTMLEvents")
objEvent.initEvent("click", @TRUE, @TRUE)
objBtn.dispatchEvent(objEvent)

While browser.readystate <> 4
   TimeDelay(0.5)
EndWhile

Exit

Reference: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE/User~Samples+Search~Craigslist.txt

stevengraff:
Deana, you use this format:


--- Code: ---doc = browser.document
form = doc.forms.Item(0)
form.GetElementsByTagName("INPUT").Item(0).Value = query
form.GetElementsByTagName("INPUT").Item(2).Value = 1     ; zoomtoposting ( leave as is )
form.GetElementsByTagName("INPUT").Item(3).Value = "1"   ; min price
form.GetElementsByTagName("INPUT").Item(4).Value = "500" ; max price
form.GetElementsByTagName("INPUT").Item(5).Value = 1     ; has pic
form.GetElementsByTagName("INPUT").Item(6).Value = 0     ; only titles

--- End code ---

I tried this, which seemed to work as well (using the item's name instead of number):


--- Code: ---doc = browser.document
form = doc.forms.Item(0)
form.GetElementsByTagName("INPUT").Item("query").Value = query
form.GetElementsByTagName("INPUT").Item(2).Value = 1     ; zoomtoposting ( leave as is )
form.GetElementsByTagName("INPUT").Item("minAsk").Value = "1"   ; min price
form.GetElementsByTagName("INPUT").Item("maxAsk").Value = "500" ; max price

--- End code ---

The number, instead of string, in the Item() part... is it just ordered sequentially by how the INPUTs appear on the page?

Deana:
Steven,
Yes. The HTMLCollection returned by GetElementsByTagName is an interface representing a generic collection of elements (in document order). When dealing with HTMLCollections there are different methods and properties for traversing the list.

Properties

* HTMLCollection.length Read only The number of items in the collection.

Methods

* HTMLCollection.Item(index) - Returns the specific node at the given zero-based index into the list. Returns null if the index is out of range.
* HTMLCollection.namedItem(name)Returns the specific node whose ID or name matches the string specified. Matching by name is only done as a last resort, only in HTML, and only if the referenced element supports the name attribute.
The documentation (http://msdn.microsoft.com/en-us/library/ie/dd347034(v=vs.85).aspx) for the Item method states:

--- Quote ---Note  This method indexes collections by the name or id property of the object; this is a known standards-compliance issue. For interoperability with other browsers, do not reference an object by name using this method.
--- End quote ---
Which is why I prefer to use the index rather than the name....

stevengraff:
The line:

form = doc.forms.Item(0)

determines which form, if more than one exists on the page (in this case, zero), is utilized. In my current project the form number turned out to be 13, which I found by guessing wrong 13 times :). Is there a more direct way to determine the form's index number?

Deana:

--- Quote from: stevengraff on October 24, 2013, 09:56:45 am ---The line:

form = doc.forms.Item(0)

determines which form, if more than one exists on the page (in this case, zero), is utilized. In my current project the form number turned out to be 13, which I found by guessing wrong 13 times :). Is there a more direct way to determine the form's index number?

--- End quote ---

The F12 developer tools is a suite of on-demand tools that is built into Internet Explorer. Otherwise you can get it here: http://www.microsoft.com/en-us/download/details.aspx?id=18359.

Open the webpage. Press F12. Either search for the keyword '<form' or select the Find Menu | Select element by click. Then count number of forms listed before the form of interest.

Or you can write a script to List all the forms on the page. Your in luck I already have some code that should help:


--- Code: Winbatch ---;***************************************************************************
;**                  
;**                                     Form Explorer
;**                             Discover forms on a webpage using an index       
;**
;** Developer: Deana Falk 2013.10.24                
;***************************************************************************
#DefineFunction udfIEPageLoadWait( objIE )
    ; Wait for webpage to load
    While !(objIE.readyState == 'complete' || objIE.readyState == 4 )
       Timedelay(0.1)            
    EndWhile
    While !(objIE.document.readyState == 'complete' || objIE.document.readyState == 4 )
       Timedelay(0.1)
    EndWhile
    Return 1
#EndFunction

;url = "http://www.yahoo.com"
url = "http://techsupt.winbatch.com/techsupt/sampleform.html"

; Initialize MSIE object
oIE   = ObjectCreate("InternetExplorer.Application")
oIE.Visible = @TRUE ; Change to @FALSE to hide the process from the user
oIE.Navigate(url)

; Wait for webpage to load
udfIEPageLoadWait( oIE )

;***************************************************************************
; Get forms collection
;***************************************************************************
; Get document object
oDoc = oIE.Document

; Forms Collection  
oForms = oDoc.Forms
; OR
; getElementsByTagName Method
;oForms = oDoc.getElementsByTagName("FORM")

formlist = "" ; Initialize variable
if oDoc != 0 ; Check if you have a valid document handle
   count = oForms.Length
        ; Loop through the collection of forms using index
        For index = 0 to count-1
         oForm = oForms.Item(index)
         If ObjectTypeGet(oForm)=="EMPTY" then continue
                        formlist = formlist:@tab:index:"|":oForm.id:"|":oForm.name
        Next
Else
        Pause('Notice','Unable to obtain a handle to the document object on this webpage. Check URL.')
Endif
formlist = StrTrim(formlist) ; remove leading tab
if formlist != ""
   formindex = Int(ItemExtract(1,AskItemlist("Form Index on a webpage", formlist, @tab, @unsorted, @single ),"|"))
   oForm = oForms.Item(formindex)
   Pause('Form Object Handle ', oForm)
Else
   Pause('Notice','There are no forms on this page!')
EndIf

; Quit
oIE.Quit

; Close open COM/OLE handles
oDoc = 0
oIE = 0
exit

Navigation

[0] Message Index

[#] Next page

Go to full version