WinBatch® Technical Support Forum

All Things WinBatch => WinBatch => Topic started by: hdsouza on December 19, 2019, 12:16:38 PM

Title: Downloading a webpage
Post by: hdsouza on December 19, 2019, 12:16:38 PM
Hi,

I am trying to download this webpage https://www.nextseed.com/offerings as I want to monitor the offerings on this page.
When I save the page to a file I see garbled characters in the file.

Please Help. Thanks

Code (winbatch) Select

Var_url = "https://www.nextseed.com/offerings"
File_HostConnect = "c:\temp\File_HostConnect.txt"

URL_Temp = strreplace (Var_url, "https://", "")
URL_Temp = strreplace (URL_Temp, "http://", "")
Url_Main =  strtrim(Itemextract(1, URL_Temp, "/" ))
Url_Remain = strtrim(strreplace(URL_Temp, "%Url_Main%/", ""))

tophandle=iBegin(0,"","")
connecthandle=iHostConnect(tophandle, Url_Main, @HTTP, "", "")
datahandle=iHttpInit(connecthandle, "GET", Url_Remain, "",0)
If datahandle==0 then
   err=iGetLastError()
   URL_Error = err_datahandle
   iClose(tophandle)
   Page_Error = 1
   ;Timedelay(1)
EndIf

rslt=iHttpOpen(datahandle, "", 0, 0)
If rslt=="ERROR" || rslt!=200 then
   URL_Error = rslt
   iClose(datahandle)
   iClose(connecthandle)
   iClose(tophandle)
   Display(1, "Error..", "Problem opening webpage")
EndIf
iReadData(datahandle, File_HostConnect)
iClose(datahandle)
iClose(connecthandle)
iClose(tophandle)

Title: Re: Downloading a webpage
Post by: td on December 19, 2019, 02:23:12 PM
I believe we have gone down this rabbit hole before.  The page is generated when a javascript script is executed inside the browser.   Unfortunately, WinInet is not a browser and therefore does not have a javascript engine to execute the code that creates the page.  You can't use the Wininet extender to download a page that doesn't exist yet.
Title: Re: Downloading a webpage
Post by: hdsouza on December 19, 2019, 03:21:31 PM
Maybe someone had the same problem with nextseed.com, although I just started with it.
So can I use something other than Wininet. Any options?
Thanks TD
Title: Re: Downloading a webpage
Post by: JTaylor on December 19, 2019, 03:36:55 PM
Automate Internet Explorer.   Should be examples in the tech database.    Depending on what is happening on the page you may need to check for the existence of certain TAGS,IDs,etc. before grabbing the page content.   You may also be able to make the embedded browser object work but not certain if it is heavy on JS. 

Jim
Title: Re: Downloading a webpage
Post by: hdsouza on December 19, 2019, 03:45:11 PM
hey Jim, the webpage is being downloaded as junk characters
Title: Re: Downloading a webpage
Post by: JTaylor on December 19, 2019, 04:34:51 PM
It does this using IE?   Had only seen WinInet mentioned.

Jim
Title: Re: Downloading a webpage
Post by: hdsouza on December 19, 2019, 04:41:40 PM
Yes I am using the script as in my first post. What options exist?
Title: Re: Downloading a webpage
Post by: td on December 19, 2019, 04:45:18 PM
Thanks, Jim.  I should have mentioned the COM Automation approach in my first response in this topic.   Here is a link to an old but still informative article about different options available to perform Web scraping.

https://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/nftechsupt.web+Tutorials+Working~With~Web~Pages.txt (https://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/nftechsupt.web+Tutorials+Working~With~Web~Pages.txt)
Title: Re: Downloading a webpage
Post by: JTaylor on December 19, 2019, 04:46:55 PM
As mentioned, automating IE is another option.   Again, you have to find some way to test to know the page is complete.    Sorry if I was unclear.   

See if this gets you headed in a useful direction.    Sorry for the sloppy code...just quickly chopped out some code from another script...

Code (winbatch) Select


  XMLURL = "https://www.nextseed.com/offerings"
  ybrowser = ObjectOpen("InternetExplorer.Application")
  ybrowser.addressbar = @TRUE
  ybrowser.statusbar = @TRUE
  ybrowser.menubar = @FALSE
  ybrowser.toolbar = @TRUE
  ybrowser.visible = 1
  ybrowser.Height =  600
  ybrowser.Width  =  800

  ybrowser.navigate(XMLURL)
  TimeDelay(1.5)


message("HEY","Wait until page is loaded.")
txt = ybrowser.document.getElementsByTagName("body").Item(0).OuterHTML
clipput(txt)
message("HEY",txt)
ybrowser.quit
ybrowser = 0
Exit

Title: Re: Downloading a webpage
Post by: stanl on December 20, 2019, 02:45:19 AM
Here is a potentially useful subroutine I used to make sure url is loaded.  I use oIE as my browser object
Code (WINBATCH) Select


#DefineSubroutine ieready(n,msg)
IntControl(73,1,0,0,0)
t = 0 
If msg=="" Then msg="Loading... Please Be patient... WebSite may be busy..."
While oIE.busy || oIE.readystate != 4
   t = t+1
   display(3,msg,"Attempt %t% on Page %page%")
   If WinExist("~Security")
      SendkeysTo("~Security","~")
      TimeDelay(.5)
Endif
   If t>n Then Return(0)
EndWhile


t=0
While oIE.Document.readystate != "complete"
   t = t+1
   display(3,"Document is...",oIE.Document.readystate)
If WinExist("~Security")
      SendkeysTo("~Security","~")
      TimeDelay(.5)
Endif


   If t>n Then Return(0)
EndWhile
Return(1)


:WBERRORHANDLER
Return(0)




#EndSubroutine
Title: Re: Downloading a webpage
Post by: td on December 20, 2019, 07:15:25 AM
There might be a Tech Database article in this but in the interim a glued together in a very crude way version of the two scripts.

Code (winbatch) Select
#DefineSubroutine ieready(n,msg)
   IntControl(73,1,0,0,0)
   
   t = 0 
   If msg=="" Then msg="Loading... Please Be patient... WebSite may be busy..."
   While oIE.busy || oIE.readystate != 4
      t = t+1
      display(3,msg,"Attempt %t% on Page %page%")
      If WinExist("~Security")
         SendkeysTo("~Security","~")
         TimeDelay(.5)
      Endif
      If t>n Then Return(0)
   EndWhile
   
   
   t=0
   While oIE.Document.readystate != "complete"
      t = t+1
      display(3,"Document is...",oIE.Document.readystate)
      If WinExist("~Security")
         SendkeysTo("~Security","~")
         TimeDelay(.5)
      Endif
   
      If t>n Then Return(0)
   EndWhile
   Return(1)
   
   
   :WBERRORHANDLER
   Return(0)
   
#EndSubroutine


XMLURL = "https://www.nextseed.com/offerings"
oIE = ObjectOpen("InternetExplorer.Application")
oIE.visible = @FALSE

oIE.navigate(XMLURL)

if IeReady(10, '')
   txt = oIE.document.getElementsByTagName("body").Item(0).OuterHTML
   clipput(txt)
   message("Web Page Body",txt)
endif
oIE.quit
oIE = 0
Exit


Thanks for the contributions.
Title: Re: Downloading a webpage
Post by: JTaylor on December 20, 2019, 08:29:26 AM
Don't take this as any type of disagreement with what Stan or Tony posted, that approach is what I use as well and probably should have included, but there have been instances where I have had to create a loop and check for the existence of an ID or other Element to know it has loaded before proceeding.  I only mention this so you don't have to come back and ask again before trying such a thing , in case your script still seems to not get all the data.


Jim
Title: Re: Downloading a webpage
Post by: td on December 20, 2019, 08:38:59 AM
It shouldn't be too difficult to add an additional ID parameter and check to Stan's IeReady subroutine.
Title: Re: Downloading a webpage
Post by: hdsouza on December 20, 2019, 09:57:31 AM
Thanks Jim. all. 
Yes the "outerhtml" did the trick. No garbled characters now.
I will just code around with launching IE and doing it that way.
I may have to launch each offering in IE to extract the details, instead of Wininet.

Thanks again

Code (winbatch) Select

FilePut((File_PageCheck), msie.document.GetElementsByTagName("HTML").item(0).outerHTML)

Title: Re: Downloading a webpage
Post by: td on December 20, 2019, 10:15:50 AM
This is the week of the never-ending forum topics it would appear.  To correct a statement I made in my original post to the topic, it should be mentioned that the OP's site is compressed using gzip. That is why it appears as "garbage" to the OP instead of a collection of references to off-site javascript libraries.  The solution still remains the same.
Title: Re: Downloading a webpage
Post by: snowsnowsnow on December 20, 2019, 01:48:28 PM
Quote from: td on December 19, 2019, 02:23:12 PM
I believe we have gone down this rabbit hole before.  The page is generated when a javascript script is executed inside the browser.   Unfortunately, WinInet is not a browser and therefore does not have a javascript engine to execute the code that creates the page.  You can't use the Wininet extender to download a page that doesn't exist yet.

Do we have an ETA on when that will be fixed?
Title: Re: Downloading a webpage
Post by: JTaylor on December 20, 2019, 02:31:17 PM
Don't hold your breath...I did notice that the the "(Almost) Psychic Support" header was removed (or I am blind) so guessing they lack the psychic resources now to produce information that doesn't exist if they can no longer claim to answer questions before they are asked.   Perhaps this will be resolved with the release of the Quantum Extender.   It is still uncertain what that will allow.   It either will or it won't or maybe both at the same time???

Jim
Title: Re: Downloading a webpage
Post by: td on December 20, 2019, 02:47:52 PM
Happy Holidays.
Title: Re: Downloading a webpage
Post by: stanl on December 21, 2019, 02:27:03 AM
(https://www.gimmepeers.com/core/design/header/xmas/x18/sphd2.b.png)