Webpage Automation - Search Craigslist

Started by Deana, October 11, 2013, 08:15:24 AM

Previous topic - Next topic

Deana

Webpage Automation: Search Craigslist.

Code (winbatch) Select

;***************************************************************************
;**
;**        Search Craigslist for an item.
;**
;** Purpose: locate and display craigslist result in IE browser
;** Inputs:  starting url, query settings
;** Outputs: displays results in IE browser
;** TO DO: Optionally add code to store the results of to a file
;** Revisions: Deana Falk 2013.10.11
;***************************************************************************

;>search for:
;in:<select id="cAbb" name="catAbbreviation">
;<option value="ccc">all community<option value="eee">all event<option value="sss">all for sale / wanted<option disabled value="">--<option value="art">   art &amp; crafts
;<option value="pts">   auto parts
;<option value="bab">   baby &amp; kid stuff
;<option value="bar">   barter
;<option value="bik">   bicycles
;<option value="boa">   boats
;<option value="bks">   books
;<option value="bfs">   business
;<option value="car">   cars &amp; trucks
;<option value="emd">   cds / dvds / vhs
;<option value="clo">   clothing
;<option value="clt">   collectibles
;<option value="sys">   computers &amp; tech
;<option value="ele">   electronics
;<option value="grd">   farm &amp; garden
;<option value="zip">   free stuff
;<option value="fur">   furniture
;<option value="tag">   games &amp; toys
;<option value="gms">   garage sales
;<option value="for">   general
;<option value="hsh">   household
;<option value="wan">   items wanted
;<option value="jwl">   jewelry
;<option value="mat">   materials
;<option value="mcy" selected>   motorcycles/scooters
;<option value="msg">   musical instruments
;<option value="pho">   photo/video
;<option value="rvs">   recreational vehicles
;<option value="spo">   sporting goods
;<option value="tix">   tickets
;<option value="tls">   tools
;<option disabled value="">
;<option value="ggg">all gigs
;<option value="hhh">all housing
;<option value="jjj">all jobs
;<option value="ppp">all personals
;<option value="res">all resume
;<option value="bbb">all services offered</select>
;
Browser = ObjectCreate("InternetExplorer.Application")
browser.visible = @TRUE
;url  = 'http://seattle.craigslist.org/mcy/' ;Motorcycle
;url  = 'http://seattle.craigslist.org/spo/' ;Sporting goods
url = 'http://seattle.craigslist.org/sss/'
query = "WinBatch Rocks"

browser.navigate(url)

While browser.readystate <> 4
   TimeDelay(0.5)
EndWhile

doc = browser.document
form = doc.forms.Item(0)
form.GetElementsByTagName("INPUT").Item(0).Value = query
form.GetElementsByTagName("INPUT").Item(2).Value = 1     ; zoomtoposting ( leave as is )
form.GetElementsByTagName("INPUT").Item(3).Value = "1"   ; min price
form.GetElementsByTagName("INPUT").Item(4).Value = "500" ; max price
form.GetElementsByTagName("INPUT").Item(5).Value = 1     ; has pic
form.GetElementsByTagName("INPUT").Item(6).Value = 0     ; only titles

; Unfortunately .Click method doesn't work on IE 9 or newer.
; However this method seems to work:
; Reference: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE+IE9~Click~or~Focus~Methods~Fail.txt
objBtn = form.GetElementsByTagName("INPUT").Item(1)
objEvent = browser.document.createEvent("HTMLEvents")
objEvent.initEvent("click", @TRUE, @TRUE)
objBtn.dispatchEvent(objEvent)

While browser.readystate <> 4
   TimeDelay(0.5)
EndWhile

Exit



Reference: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE/User~Samples+Search~Craigslist.txt
Deana F.
Technical Support
Wilson WindowWare Inc.

stevengraff

Deana, you use this format:

doc = browser.document
form = doc.forms.Item(0)
form.GetElementsByTagName("INPUT").Item(0).Value = query
form.GetElementsByTagName("INPUT").Item(2).Value = 1     ; zoomtoposting ( leave as is )
form.GetElementsByTagName("INPUT").Item(3).Value = "1"   ; min price
form.GetElementsByTagName("INPUT").Item(4).Value = "500" ; max price
form.GetElementsByTagName("INPUT").Item(5).Value = 1     ; has pic
form.GetElementsByTagName("INPUT").Item(6).Value = 0     ; only titles


I tried this, which seemed to work as well (using the item's name instead of number):

doc = browser.document
form = doc.forms.Item(0)
form.GetElementsByTagName("INPUT").Item("query").Value = query
form.GetElementsByTagName("INPUT").Item(2).Value = 1     ; zoomtoposting ( leave as is )
form.GetElementsByTagName("INPUT").Item("minAsk").Value = "1"   ; min price
form.GetElementsByTagName("INPUT").Item("maxAsk").Value = "500" ; max price


The number, instead of string, in the Item() part... is it just ordered sequentially by how the INPUTs appear on the page?

Deana

Steven,
Yes. The HTMLCollection returned by GetElementsByTagName is an interface representing a generic collection of elements (in document order). When dealing with HTMLCollections there are different methods and properties for traversing the list.

Properties

  • HTMLCollection.length Read only The number of items in the collection.


Methods

  • HTMLCollection.Item(index) - Returns the specific node at the given zero-based index into the list. Returns null if the index is out of range.
  • HTMLCollection.namedItem(name)Returns the specific node whose ID or name matches the string specified. Matching by name is only done as a last resort, only in HTML, and only if the referenced element supports the name attribute.

The documentation (http://msdn.microsoft.com/en-us/library/ie/dd347034(v=vs.85).aspx) for the Item method states:
QuoteNote  This method indexes collections by the name or id property of the object; this is a known standards-compliance issue. For interoperability with other browsers, do not reference an object by name using this method.
Which is why I prefer to use the index rather than the name....
Deana F.
Technical Support
Wilson WindowWare Inc.

stevengraff

The line:

form = doc.forms.Item(0)

determines which form, if more than one exists on the page (in this case, zero), is utilized. In my current project the form number turned out to be 13, which I found by guessing wrong 13 times :). Is there a more direct way to determine the form's index number?

Deana

Quote from: stevengraff on October 24, 2013, 09:56:45 AM
The line:

form = doc.forms.Item(0)

determines which form, if more than one exists on the page (in this case, zero), is utilized. In my current project the form number turned out to be 13, which I found by guessing wrong 13 times :). Is there a more direct way to determine the form's index number?

The F12 developer tools is a suite of on-demand tools that is built into Internet Explorer. Otherwise you can get it here: http://www.microsoft.com/en-us/download/details.aspx?id=18359.

Open the webpage. Press F12. Either search for the keyword '<form' or select the Find Menu | Select element by click. Then count number of forms listed before the form of interest.

Or you can write a script to List all the forms on the page. Your in luck I already have some code that should help:

Code (winbatch) Select

;***************************************************************************
;**                 
;**         Form Explorer
;** Discover forms on a webpage using an index
;**
;** Developer: Deana Falk 2013.10.24                 
;***************************************************************************
#DefineFunction udfIEPageLoadWait( objIE )
    ; Wait for webpage to load
    While !(objIE.readyState == 'complete' || objIE.readyState == 4 )
       Timedelay(0.1)           
    EndWhile
    While !(objIE.document.readyState == 'complete' || objIE.document.readyState == 4 )
       Timedelay(0.1)
    EndWhile
    Return 1
#EndFunction

;url = "http://www.yahoo.com"
url = "http://techsupt.winbatch.com/techsupt/sampleform.html"

; Initialize MSIE object
oIE   = ObjectCreate("InternetExplorer.Application")
oIE.Visible = @TRUE ; Change to @FALSE to hide the process from the user
oIE.Navigate(url)

; Wait for webpage to load
udfIEPageLoadWait( oIE )

;***************************************************************************
; Get forms collection
;***************************************************************************
; Get document object
oDoc = oIE.Document

; Forms Collection 
oForms = oDoc.Forms
; OR
; getElementsByTagName Method
;oForms = oDoc.getElementsByTagName("FORM")

formlist = "" ; Initialize variable
if oDoc != 0 ; Check if you have a valid document handle
   count = oForms.Length
; Loop through the collection of forms using index
For index = 0 to count-1
         oForm = oForms.Item(index)
         If ObjectTypeGet(oForm)=="EMPTY" then continue
  formlist = formlist:@tab:index:"|":oForm.id:"|":oForm.name
Next
Else
Pause('Notice','Unable to obtain a handle to the document object on this webpage. Check URL.')
Endif
formlist = StrTrim(formlist) ; remove leading tab
if formlist != ""
   formindex = Int(ItemExtract(1,AskItemlist("Form Index on a webpage", formlist, @tab, @unsorted, @single ),"|"))
   oForm = oForms.Item(formindex)
   Pause('Form Object Handle ', oForm)
Else
   Pause('Notice','There are no forms on this page!')
EndIf

; Quit
oIE.Quit

; Close open COM/OLE handles
oDoc = 0
oIE = 0
exit
Deana F.
Technical Support
Wilson WindowWare Inc.

Deana

I have been working on a tutorial that I plan to post to the tech database. It still needs some work but here is a Webpage Explorer I created:

Code (winbatch) Select

;***************************************************************************
;**        Webpage Element Explorer
;**
;** Purpose: Discovering form elements on a webpage
;** Inputs: ignorehidden, inputonly
;** Outputs: Highlights each form element using a red solid border
;**          and Display attribute information in a Box
;** Reference:
;**       
;**
;** Developer: Deana Falk 2013.10.24
;***************************************************************************

#DefineFunction udfIEPageLoadWait( objIE )
    ; Wait for webpage to load
    While !(objIE.readyState == 'complete' || objIE.readyState == 4 )
       Timedelay(0.1)           
    EndWhile
    While !(objIE.document.readyState == 'complete' || objIE.document.readyState == 4 )
       Timedelay(0.1)
    EndWhile
    Return 1
#EndFunction

; Modify these values to change how the script operates
ignorehidden = @TRUE       ; change to @FALSE to query hidden form elements
inputonly = @FALSE         ; change to @TRUE to query only INPUT form elements.
boxcoords =  '750,0,1000,200' ; coordinates of the display box.
boxclr =  '255,0,0'  ; box background color
url = "http://techsupt.winbatch.com/techsupt/sampleform.html"


; Display form element data in a box
title = 'Discovering form elements on a webpage'
BoxesUp(boxcoords, @NORMAL)
BoxDrawRect(1,'0,0,1000,1000',2)
BoxColor(1,boxclr,0)
BoxCaption(1, title)
WindowOnTop(title, 1)


; Initialize MSIE object
oIE   = ObjectCreate("InternetExplorer.Application")
oIE.Visible = @TRUE ; Change to @FALSE to hide the process from the user
oIE.Navigate(url)

; Wait for webpage to load
udfIEPageLoadWait( oIE )

;***************************************************************************
; Get Colection of Forms
;***************************************************************************
; Get document object
oDoc = oIE.Document

; Forms Collection 
oForms = oDoc.Forms
; OR
; getElementsByTagName Method
;oForms = oDoc.getElementsByTagName("FORM")

formlist = "" ; Initialize variable
If oDoc != 0 ; Check if you have a valid document handle
   ; Loop through the collection of forms using index
   count = oForms.Length
   For index = 0 to count-1
         oForm = oForms.Item(index)
         If ObjectTypeGet(oForm)=="EMPTY" then continue
         formlist = formlist:@tab:index:"|":oForm.id:"|":oForm.name
   Next   
   ; ALTERNATE WAY: Loop through the collection of forms
   ;index = 0
   ;ForEach oForm in oForms
   ;   If ObjectTypeGet(oForm)=="EMPTY" then continue 
   ;   formlist = formlist:@tab:index:"|":oForm.id:"|":oForm.name
   ;   index = index+1
   ;Next
Else
   Pause('Notice','Unable to obtain a handle to the document object on this webpage. Check URL.')
EndIf
formlist = StrTrim(formlist) ; remove leading tab
formindex = Int(ItemExtract(1,AskItemlist("Form Index on a webpage", formlist, @tab, @unsorted, @single ),"|"))
If formindex == "" then formindex = 0 ;Confirm user selected form name
oForm = oForms.Item(formindex)

;***************************************************************************
; Get Collection of Elements in the Form
;***************************************************************************
If oForm != 0 ; Check if you have a valid form handle
   cElements = oForm.Elements
   elementindex = 0
   BoxButtonDraw(1, 1, 'Next Element', '250,750,650,900')
   ForEach oElement in cElements
      ; Get element attributes
      ; Alternatice option: oElement.getAttribute("attributename")
      tag = oElement.nodeName ;The value of tagName is the same as that of nodeName.
      type = StrUpper(oElement.Type)
      id = oElement.Id
      name = oElement.Name
      value = oElement.Value   

      ; Check if user wants to ignore HIDDEN form elements
      If ignorehidden
          If StrUpper(type) == 'HIDDEN' then continue
      EndIf

      ; Check if user wants to see non INPUT form elements
      If inputonly
         If StrUpper(tag) != 'INPUT' then continue
      EndIf

      ; Highlight each element on the form
      oElement.style.border = "1mm solid red"
     
      ; Set focus on element
      If !oElement.disabled
         ErrorMode(@off)
         oElement.Focus()
         ErrorMode(@cancel)
      EndIf
     
      ; Display results
      BoxDrawText(1, "0,0,1000,1000",'Element Index: ' : elementindex : @lf :'Tag: ' : tag : @lf : 'Type: ' : type : @lf :'Id: ' : id : @lf :'Name: ' : name : @lf :'Value: ': value, @TRUE, 0 )
      While !BoxButtonStat(1, 1)
         TimeDelay(0.5)
      Endwhile   

      ; Remove element highlight
      oElement.style.border = ""

      ; Increment element counter
      elementindex = elementindex+1
   Next
Else
   Pause('Notice','Unable to obtain a handle to a form on this webpage. Check URL and formnumber.')
EndIf

:CleanUp

; Quit
oIE.Quit

; Close open COM/OLE handles
oElement
oDoc = 0
oIE = 0
exit


Deana F.
Technical Support
Wilson WindowWare Inc.

markgolay

for your information , I have been unsuccessfully with a webpage trying to click with HTMLEvents, but this just worked EG input.click(1) , hope it helps others

InputCollection = f.document.GetElementsByTagName("a")
ForEach Input In InputCollection
   name=Input.attributes.class.value
   
   if strindexnc( name, "xxxxxxxxxx", 1, @Fwdscan ) >0
      ;input.click ;does not work on IE11
                input.click(1) ; Works on IE11
      f=WaitForMSIE(f)
      break
   Endif
Next

Deana

Quote from: markgolay on March 06, 2014, 02:59:45 AM
for your information , I have been unsuccessfully with a webpage trying to click with HTMLEvents, but this just worked EG input.click(1) , hope it helps others

InputCollection = f.document.GetElementsByTagName("a")
ForEach Input In InputCollection
   name=Input.attributes.class.value
   
   if strindexnc( name, "xxxxxxxxxx", 1, @Fwdscan ) >0
      ;input.click ;does not work on IE11
                input.click(1) ; Works on IE11
      f=WaitForMSIE(f)
      break
   Endif
Next

Interesting. How did you come up with that solution? the documentation for the Click method indicates NO PARAMETERS: http://msdn.microsoft.com/en-us/library/ie/ms536363(v=vs.85).aspx
Deana F.
Technical Support
Wilson WindowWare Inc.

markgolay

I was using your HTMLEvents which worked on all my buttons over 2/3 webpages , expect the last final button on a webpage typical, it would not work, I was getting very frustrated , so hunting the web & just found someone say they tried it and it worked , I think it was on Perl forum.. I would still like to understand why HTMLEvents did not work on my last button, but afraid its above my knowledge , I just wanted it to work

Deana

Quote from: markgolay on March 10, 2014, 05:26:43 AM
I was using your HTMLEvents which worked on all my buttons over 2/3 webpages , expect the last final button on a webpage typical, it would not work, I was getting very frustrated , so hunting the web & just found someone say they tried it and it worked , I think it was on Perl forum.. I would still like to understand why HTMLEvents did not work on my last button, but afraid its above my knowledge , I just wanted it to work

I modified the code in the tech database. It now searches for input elements by name instead of number. Give this code a try: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE/User~Samples+Search~Craigslist.txt

Deana F.
Technical Support
Wilson WindowWare Inc.

rgouette

an old thread, and maybe this doens;t help anyone, but I changed my code to use:

doc.GetElementById("buttonID")

and that was the key to getting my browser sessions(IE10) to click successfully

Rich

deming

Quote from: Deana on March 10, 2014, 08:52:08 AM
I modified the code in the tech database. It now searches for input elements by name instead of number. Give this code a try: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE/User~Samples+Search~Craigslist.txt

Deana, could you please update your sample again. It is not finding any of the named objects and the "count = form.GetElementsByTagName("INPUT").length" is returning a 1.  I want to use your sample to play around with and learn.

Thank you.


Deana

Quote from: deming on July 29, 2014, 07:50:04 AM
Quote from: Deana on March 10, 2014, 08:52:08 AM
I modified the code in the tech database. It now searches for input elements by name instead of number. Give this code a try: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/tsleft.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE/User~Samples+Search~Craigslist.txt

Deana, could you please update your sample again. It is not finding any of the named objects and the "count = form.GetElementsByTagName("INPUT").length" is returning a 1.  I want to use your sample to play around with and learn.

Thank you.

Make sure you are using the latest code posted here: http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/nftechsupt.web+WinBatch/OLE~COM~ADO~CDO~ADSI~LDAP/OLE~with~MSIE/User~Samples+Search~Craigslist.txt

If you continue to have an issue running the latest code please post the DebugTrace output. Simply add DebugTrace(@on,"trace.txt") to the beginning of the script and inside any UDF, run it until the error or completion, then inspect the resulting trace file for clues as to the problem. Feel free to post the trace file here ( removing any private info) if you need further assistance.
Deana F.
Technical Support
Wilson WindowWare Inc.

deming

I looked over the trace.txt (see below) but am stumped and would appreciate your expertise.  Looks like the "name = objInput.Name" is returning a null.


************************************************************

*** Debug Initialized ***

==============================
Wed 7/30/2014 11:42:39 AM
WinBatch 32 2013C
WIL DLL 6.13cmc
D:\Users\Documents\WinBatch\Search Craigslist.wbt
Windows platform: NT, version: 6.1, build: 7601 (Service Pack 1)
ErrorMode: @CANCEL
Valid Code Signature: Yes
UAC Manifest Settings: level="highestAvailable" uiAccess="true"
UAC Elevation Level: Standard User or Disabled
==============================

Browser = ObjectCreate("InternetExplorer.Application")
(2730) VALUE INT/COMOBJ => 41544140

browser.visible = @TRUE
(2761) VALUE VARIANT_BOOL => -1

url = 'http://seattle.craigslist.org/sss/'
(2761) VALUE STRING => "http://seattle.craigslist.org/sss/"

query = "WinBatch Rocks"
(2761) VALUE STRING => "WinBatch Rocks"

browser.navigate(url)
(2808) VALUE VARIANT_EMPTY =>

While browser.readystate <> 4
(3011) END OPERATOR

doc = browser.document
(3042) VALUE INT/COMOBJ => 41545004

form = doc.forms.Item(0)
(3058) VALUE INT/COMOBJ => 41844180

count = form.GetElementsByTagName("INPUT").length
(3058) VALUE VARIANT_I4 => 1

For x = 0 To count
(3058) FOR TRUE==>0

objInput = form.GetElementsByTagName("INPUT").Item(x)
(3058) VALUE INT/COMOBJ => 41844684

If objInput == 0 Then Continue
(3058) ==>FALSE=> (skipped)

name = objInput.Name
(3073) VALUE VARIANT_BSTR =>

id =   objInput.Id
(3073) VALUE VARIANT_BSTR =>

If name == "query" Then form.GetElementsByTagName("INPUT").Item(x).Value = query
(3073) ==>FALSE=> (skipped)

If name == "minAsk" Then form.GetElementsByTagName("INPUT").Item(x).Value = "1"
(3073) ==>FALSE=> (skipped)

If name == "maxAsk" Then form.GetElementsByTagName("INPUT").Item(x).Value = "500"
(3073) ==>FALSE=> (skipped)

If name == "hasPic" Then form.GetElementsByTagName("INPUT").Item(x).checked = @TRUE
(3073) ==>FALSE=> (skipped)

If id == "searchbtn" Then objBtn = objInput
(3073) ==>FALSE=> (skipped)

Next
(3073) END OPERATOR

To count
(3073) FOR TRUE==>1

objInput = form.GetElementsByTagName("INPUT").Item(x)
(3073) VALUE VARIANT_DISPATCH => 0

If objInput == 0 Then Continue
(3089) END OPERATOR

To count
(3089) END OPERATOR

Pause(0,0)
(4992) VALUE INT => 1

If IsDefined(objBtn)
(4992) ELSE DO==>TRUE

Pause('Notice','No search button found')
(5850) VALUE INT => 1

EndIf
(5850) END OPERATOR

While browser.readystate <> 4
(5850) END OPERATOR

Exit
(5850) VALUE INT => 0

--- Normal termination ---

;;;END OF JOB;;;

Deana

The suspect line is :
Code (winbatch) Select
count = form.GetElementsByTagName("INPUT").length
(3058) VALUE VARIANT_I4 => 1


This indicates that the form only contains one input element. Not sure why at this point.

I wonder if adding a short time delay after the Navigate might help? This amy allow the page to fully load before querying the input elements.

Use this UDF to wait for the page to load:
Code (winbatch) Select
#DefineFunction udfIEPageLoadWait( objIE )
    ; Wait for webpage to load
    While !(objIE.readyState == 'complete' || objIE.readyState == 4 )
       TimeDelay(0.1)
    EndWhile
    While !(objIE.document.readyState == 'complete' || objIE.document.readyState == 4 )
       TimeDelay(0.1)
    EndWhile
    Return 1
#EndFunction


What version of Internet Explorer do you have installed on your WinXP system?

Deana F.
Technical Support
Wilson WindowWare Inc.

Deana

Deana F.
Technical Support
Wilson WindowWare Inc.

deming

Thank you Deana! Your latest update works perfectly.  My IE is Version 11.0.9600.17207, Update 11.0.10

Thank you for taking the time to solve this. Your samples are a great tool for learning.

:D