Author Topic: Downloading a webpage  (Read 673 times)

hdsouza

  • Full Member
  • ***
  • Posts: 162
Downloading a webpage
« on: December 19, 2019, 12:16:38 pm »
Hi,

I am trying to download this webpage https://www.nextseed.com/offerings as I want to monitor the offerings on this page.
When I save the page to a file I see garbled characters in the file.

Please Help. Thanks

Code: Winbatch
Var_url = "https://www.nextseed.com/offerings"
File_HostConnect = "c:\temp\File_HostConnect.txt"

URL_Temp = strreplace (Var_url, "https://", "")
URL_Temp = strreplace (URL_Temp, "http://", "")
Url_Main =  strtrim(Itemextract(1, URL_Temp, "/" ))
Url_Remain = strtrim(strreplace(URL_Temp, "%Url_Main%/", ""))

tophandle=iBegin(0,"","")
connecthandle=iHostConnect(tophandle, Url_Main, @HTTP, "", "")
datahandle=iHttpInit(connecthandle, "GET", Url_Remain, "",0)
If datahandle==0 then
   err=iGetLastError()
   URL_Error = err_datahandle
   iClose(tophandle)
   Page_Error = 1
   ;Timedelay(1)
EndIf

rslt=iHttpOpen(datahandle, "", 0, 0)
If rslt=="ERROR" || rslt!=200 then
   URL_Error = rslt
   iClose(datahandle)
   iClose(connecthandle)
   iClose(tophandle)
   Display(1, "Error..", "Problem opening webpage")
EndIf
iReadData(datahandle, File_HostConnect)
iClose(datahandle)
iClose(connecthandle)
iClose(tophandle)

 

td

  • Tech Support
  • *****
  • Posts: 3464
    • WinBatch
Re: Downloading a webpage
« Reply #1 on: December 19, 2019, 02:23:12 pm »
I believe we have gone down this rabbit hole before.  The page is generated when a javascript script is executed inside the browser.   Unfortunately, WinInet is not a browser and therefore does not have a javascript engine to execute the code that creates the page.  You can't use the Wininet extender to download a page that doesn't exist yet.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

hdsouza

  • Full Member
  • ***
  • Posts: 162
Re: Downloading a webpage
« Reply #2 on: December 19, 2019, 03:21:31 pm »
Maybe someone had the same problem with nextseed.com, although I just started with it.
So can I use something other than Wininet. Any options?
Thanks TD

JTaylor

  • Pundit
  • *****
  • Posts: 1362
    • Data & Stuff Inc.
Re: Downloading a webpage
« Reply #3 on: December 19, 2019, 03:36:55 pm »
Automate Internet Explorer.   Should be examples in the tech database.    Depending on what is happening on the page you may need to check for the existence of certain TAGS,IDs,etc. before grabbing the page content.   You may also be able to make the embedded browser object work but not certain if it is heavy on JS. 

Jim

hdsouza

  • Full Member
  • ***
  • Posts: 162
Re: Downloading a webpage
« Reply #4 on: December 19, 2019, 03:45:11 pm »
hey Jim, the webpage is being downloaded as junk characters

JTaylor

  • Pundit
  • *****
  • Posts: 1362
    • Data & Stuff Inc.
Re: Downloading a webpage
« Reply #5 on: December 19, 2019, 04:34:51 pm »
It does this using IE?   Had only seen WinInet mentioned.

Jim

hdsouza

  • Full Member
  • ***
  • Posts: 162
Re: Downloading a webpage
« Reply #6 on: December 19, 2019, 04:41:40 pm »
Yes I am using the script as in my first post. What options exist?

td

  • Tech Support
  • *****
  • Posts: 3464
    • WinBatch
Re: Downloading a webpage
« Reply #7 on: December 19, 2019, 04:45:18 pm »
Thanks, Jim.  I should have mentioned the COM Automation approach in my first response in this topic.   Here is a link to an old but still informative article about different options available to perform Web scraping.

https://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/nftechsupt.web+Tutorials+Working~With~Web~Pages.txt
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

JTaylor

  • Pundit
  • *****
  • Posts: 1362
    • Data & Stuff Inc.
Re: Downloading a webpage
« Reply #8 on: December 19, 2019, 04:46:55 pm »
As mentioned, automating IE is another option.   Again, you have to find some way to test to know the page is complete.    Sorry if I was unclear.   

See if this gets you headed in a useful direction.    Sorry for the sloppy code...just quickly chopped out some code from another script...

Code: Winbatch

  XMLURL = "https://www.nextseed.com/offerings"
  ybrowser = ObjectOpen("InternetExplorer.Application")
  ybrowser.addressbar = @TRUE
  ybrowser.statusbar = @TRUE
  ybrowser.menubar = @FALSE
  ybrowser.toolbar = @TRUE
  ybrowser.visible = 1
  ybrowser.Height =  600
  ybrowser.Width  =  800

  ybrowser.navigate(XMLURL)
  TimeDelay(1.5)

 
message("HEY","Wait until page is loaded.")
txt = ybrowser.document.getElementsByTagName("body").Item(0).OuterHTML
clipput(txt)
message("HEY",txt)
ybrowser.quit
ybrowser = 0
Exit

 

stanl

  • Pundit
  • *****
  • Posts: 1183
Re: Downloading a webpage
« Reply #9 on: December 20, 2019, 02:45:19 am »
Here is a potentially useful subroutine I used to make sure url is loaded.  I use oIE as my browser object
Code: Winbatch

#DefineSubroutine ieready(n,msg)
IntControl(73,1,0,0,0)
t = 0  
If msg=="" Then msg="Loading... Please Be patient... WebSite may be busy..."
While oIE.busy || oIE.readystate != 4
   t = t+1
   display(3,msg,"Attempt %t% on Page %page%")
   If WinExist("~Security")
      SendkeysTo("~Security","~")
      TimeDelay(.5)
        Endif
   If t>n Then Return(0)
EndWhile


t=0
While oIE.Document.readystate != "complete"
   t = t+1
   display(3,"Document is...",oIE.Document.readystate)
        If WinExist("~Security")
      SendkeysTo("~Security","~")
      TimeDelay(.5)
        Endif


   If t>n Then Return(0)
EndWhile
Return(1)


:WBERRORHANDLER
Return(0)




#EndSubroutine
 

td

  • Tech Support
  • *****
  • Posts: 3464
    • WinBatch
Re: Downloading a webpage
« Reply #10 on: December 20, 2019, 07:15:25 am »
There might be a Tech Database article in this but in the interim a glued together in a very crude way version of the two scripts.

Code: Winbatch
#DefineSubroutine ieready(n,msg)
   IntControl(73,1,0,0,0)
   
   t = 0  
   If msg=="" Then msg="Loading... Please Be patient... WebSite may be busy..."
   While oIE.busy || oIE.readystate != 4
      t = t+1
      display(3,msg,"Attempt %t% on Page %page%")
      If WinExist("~Security")
         SendkeysTo("~Security","~")
         TimeDelay(.5)
      Endif
      If t>n Then Return(0)
   EndWhile
   
   
   t=0
   While oIE.Document.readystate != "complete"
      t = t+1
      display(3,"Document is...",oIE.Document.readystate)
      If WinExist("~Security")
         SendkeysTo("~Security","~")
         TimeDelay(.5)
      Endif
   
      If t>n Then Return(0)
   EndWhile
   Return(1)
   
   
   :WBERRORHANDLER
   Return(0)
   
#EndSubroutine


XMLURL = "https://www.nextseed.com/offerings"
oIE = ObjectOpen("InternetExplorer.Application")
oIE.visible = @FALSE

oIE.navigate(XMLURL)

if IeReady(10, '')
   txt = oIE.document.getElementsByTagName("body").Item(0).OuterHTML
   clipput(txt)
   message("Web Page Body",txt)
endif
oIE.quit
oIE = 0
Exit
 

Thanks for the contributions.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

JTaylor

  • Pundit
  • *****
  • Posts: 1362
    • Data & Stuff Inc.
Re: Downloading a webpage
« Reply #11 on: December 20, 2019, 08:29:26 am »
Don't take this as any type of disagreement with what Stan or Tony posted, that approach is what I use as well and probably should have included, but there have been instances where I have had to create a loop and check for the existence of an ID or other Element to know it has loaded before proceeding.  I only mention this so you don't have to come back and ask again before trying such a thing , in case your script still seems to not get all the data.


Jim

td

  • Tech Support
  • *****
  • Posts: 3464
    • WinBatch
Re: Downloading a webpage
« Reply #12 on: December 20, 2019, 08:38:59 am »
It shouldn't be too difficult to add an additional ID parameter and check to Stan's IeReady subroutine.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

hdsouza

  • Full Member
  • ***
  • Posts: 162
Re: Downloading a webpage
« Reply #13 on: December 20, 2019, 09:57:31 am »
Thanks Jim. all. 
Yes the "outerhtml" did the trick. No garbled characters now.
I will just code around with launching IE and doing it that way.
I may have to launch each offering in IE to extract the details, instead of Wininet.

Thanks again

Code: Winbatch
FilePut((File_PageCheck), msie.document.GetElementsByTagName("HTML").item(0).outerHTML)
 

td

  • Tech Support
  • *****
  • Posts: 3464
    • WinBatch
Re: Downloading a webpage
« Reply #14 on: December 20, 2019, 10:15:50 am »
This is the week of the never-ending forum topics it would appear.  To correct a statement I made in my original post to the topic, it should be mentioned that the OP's site is compressed using gzip. That is why it appears as "garbage" to the OP instead of a collection of references to off-site javascript libraries.  The solution still remains the same.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

snowsnowsnow

  • Sr. Member
  • ****
  • Posts: 317
Re: Downloading a webpage
« Reply #15 on: December 20, 2019, 01:48:28 pm »
I believe we have gone down this rabbit hole before.  The page is generated when a javascript script is executed inside the browser.   Unfortunately, WinInet is not a browser and therefore does not have a javascript engine to execute the code that creates the page.  You can't use the Wininet extender to download a page that doesn't exist yet.

Do we have an ETA on when that will be fixed?

JTaylor

  • Pundit
  • *****
  • Posts: 1362
    • Data & Stuff Inc.
Re: Downloading a webpage
« Reply #16 on: December 20, 2019, 02:31:17 pm »
Don't hold your breath...I did notice that the the "(Almost) Psychic Support" header was removed (or I am blind) so guessing they lack the psychic resources now to produce information that doesn't exist if they can no longer claim to answer questions before they are asked.   Perhaps this will be resolved with the release of the Quantum Extender.   It is still uncertain what that will allow.   It either will or it won't or maybe both at the same time???

Jim

td

  • Tech Support
  • *****
  • Posts: 3464
    • WinBatch
Re: Downloading a webpage
« Reply #17 on: December 20, 2019, 02:47:52 pm »
Happy Holidays.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

stanl

  • Pundit
  • *****
  • Posts: 1183
Re: Downloading a webpage
« Reply #18 on: December 21, 2019, 02:27:03 am »