Hi,
I am trying to download this webpage https://www.nextseed.com/offerings as I want to monitor the offerings on this page.
When I save the page to a file I see garbled characters in the file.
Please Help. Thanks
Var_url = "https://www.nextseed.com/offerings"
File_HostConnect = "c:\temp\File_HostConnect.txt"
URL_Temp = strreplace (Var_url, "https://", "")
URL_Temp = strreplace (URL_Temp, "http://", "")
Url_Main = strtrim(Itemextract(1, URL_Temp, "/" ))
Url_Remain = strtrim(strreplace(URL_Temp, "%Url_Main%/", ""))
tophandle=iBegin(0,"","")
connecthandle=iHostConnect(tophandle, Url_Main, @HTTP, "", "")
datahandle=iHttpInit(connecthandle, "GET", Url_Remain, "",0)
If datahandle==0 then
err=iGetLastError()
URL_Error = err_datahandle
iClose(tophandle)
Page_Error = 1
;Timedelay(1)
EndIf
rslt=iHttpOpen(datahandle, "", 0, 0)
If rslt=="ERROR" || rslt!=200 then
URL_Error = rslt
iClose(datahandle)
iClose(connecthandle)
iClose(tophandle)
Display(1, "Error..", "Problem opening webpage")
EndIf
iReadData(datahandle, File_HostConnect)
iClose(datahandle)
iClose(connecthandle)
iClose(tophandle)
I believe we have gone down this rabbit hole before. The page is generated when a javascript script is executed inside the browser. Unfortunately, WinInet is not a browser and therefore does not have a javascript engine to execute the code that creates the page. You can't use the Wininet extender to download a page that doesn't exist yet.
Maybe someone had the same problem with nextseed.com, although I just started with it.
So can I use something other than Wininet. Any options?
Thanks TD
Automate Internet Explorer. Should be examples in the tech database. Depending on what is happening on the page you may need to check for the existence of certain TAGS,IDs,etc. before grabbing the page content. You may also be able to make the embedded browser object work but not certain if it is heavy on JS.
Jim
hey Jim, the webpage is being downloaded as junk characters
It does this using IE? Had only seen WinInet mentioned.
Jim
Yes I am using the script as in my first post. What options exist?
Thanks, Jim. I should have mentioned the COM Automation approach in my first response in this topic. Here is a link to an old but still informative article about different options available to perform Web scraping.
https://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/nftechsupt.web+Tutorials+Working~With~Web~Pages.txt (https://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/nftechsupt.web+Tutorials+Working~With~Web~Pages.txt)
As mentioned, automating IE is another option. Again, you have to find some way to test to know the page is complete. Sorry if I was unclear.
See if this gets you headed in a useful direction. Sorry for the sloppy code...just quickly chopped out some code from another script...
XMLURL = "https://www.nextseed.com/offerings"
ybrowser = ObjectOpen("InternetExplorer.Application")
ybrowser.addressbar = @TRUE
ybrowser.statusbar = @TRUE
ybrowser.menubar = @FALSE
ybrowser.toolbar = @TRUE
ybrowser.visible = 1
ybrowser.Height = 600
ybrowser.Width = 800
ybrowser.navigate(XMLURL)
TimeDelay(1.5)
message("HEY","Wait until page is loaded.")
txt = ybrowser.document.getElementsByTagName("body").Item(0).OuterHTML
clipput(txt)
message("HEY",txt)
ybrowser.quit
ybrowser = 0
Exit
Here is a potentially useful subroutine I used to make sure url is loaded. I use oIE as my browser object
#DefineSubroutine ieready(n,msg)
IntControl(73,1,0,0,0)
t = 0
If msg=="" Then msg="Loading... Please Be patient... WebSite may be busy..."
While oIE.busy || oIE.readystate != 4
t = t+1
display(3,msg,"Attempt %t% on Page %page%")
If WinExist("~Security")
SendkeysTo("~Security","~")
TimeDelay(.5)
Endif
If t>n Then Return(0)
EndWhile
t=0
While oIE.Document.readystate != "complete"
t = t+1
display(3,"Document is...",oIE.Document.readystate)
If WinExist("~Security")
SendkeysTo("~Security","~")
TimeDelay(.5)
Endif
If t>n Then Return(0)
EndWhile
Return(1)
:WBERRORHANDLER
Return(0)
#EndSubroutine
There might be a Tech Database article in this but in the interim a glued together in a very crude way version of the two scripts.
#DefineSubroutine ieready(n,msg)
IntControl(73,1,0,0,0)
t = 0
If msg=="" Then msg="Loading... Please Be patient... WebSite may be busy..."
While oIE.busy || oIE.readystate != 4
t = t+1
display(3,msg,"Attempt %t% on Page %page%")
If WinExist("~Security")
SendkeysTo("~Security","~")
TimeDelay(.5)
Endif
If t>n Then Return(0)
EndWhile
t=0
While oIE.Document.readystate != "complete"
t = t+1
display(3,"Document is...",oIE.Document.readystate)
If WinExist("~Security")
SendkeysTo("~Security","~")
TimeDelay(.5)
Endif
If t>n Then Return(0)
EndWhile
Return(1)
:WBERRORHANDLER
Return(0)
#EndSubroutine
XMLURL = "https://www.nextseed.com/offerings"
oIE = ObjectOpen("InternetExplorer.Application")
oIE.visible = @FALSE
oIE.navigate(XMLURL)
if IeReady(10, '')
txt = oIE.document.getElementsByTagName("body").Item(0).OuterHTML
clipput(txt)
message("Web Page Body",txt)
endif
oIE.quit
oIE = 0
Exit
Thanks for the contributions.
Don't take this as any type of disagreement with what Stan or Tony posted, that approach is what I use as well and probably should have included, but there have been instances where I have had to create a loop and check for the existence of an ID or other Element to know it has loaded before proceeding. I only mention this so you don't have to come back and ask again before trying such a thing , in case your script still seems to not get all the data.
Jim
It shouldn't be too difficult to add an additional ID parameter and check to Stan's IeReady subroutine.
Thanks Jim. all.
Yes the "outerhtml" did the trick. No garbled characters now.
I will just code around with launching IE and doing it that way.
I may have to launch each offering in IE to extract the details, instead of Wininet.
Thanks again
FilePut((File_PageCheck), msie.document.GetElementsByTagName("HTML").item(0).outerHTML)
This is the week of the never-ending forum topics it would appear. To correct a statement I made in my original post to the topic, it should be mentioned that the OP's site is compressed using gzip. That is why it appears as "garbage" to the OP instead of a collection of references to off-site javascript libraries. The solution still remains the same.
Quote from: td on December 19, 2019, 02:23:12 PM
I believe we have gone down this rabbit hole before. The page is generated when a javascript script is executed inside the browser. Unfortunately, WinInet is not a browser and therefore does not have a javascript engine to execute the code that creates the page. You can't use the Wininet extender to download a page that doesn't exist yet.
Do we have an ETA on when that will be fixed?
Don't hold your breath...I did notice that the the "(Almost) Psychic Support" header was removed (or I am blind) so guessing they lack the psychic resources now to produce information that doesn't exist if they can no longer claim to answer questions before they are asked. Perhaps this will be resolved with the release of the Quantum Extender. It is still uncertain what that will allow. It either will or it won't or maybe both at the same time???
Jim
Happy Holidays.
(https://www.gimmepeers.com/core/design/header/xmas/x18/sphd2.b.png)