Author Topic: Extract Webpage  (Read 73 times)

hdsouza

  • Full Member
  • ***
  • Posts: 147
Extract Webpage
« on: April 11, 2019, 05:09:51 pm »
I have used this numerous times on other sites and I have already received the correct result so I am not sure what I am doing wrong.
I want to download this webpage: http://www.tipranks.com/stocks/pdm/price-target
But I get something vastly different in the File_Hostconnect.txt from what appears on the page.

Code: Winbatch
AddExtender("WWINT44I.DLL")
File_Hostconnect = "c:\temp\File_Hostconnect.txt"

Url_main = "www.tipranks.com"
Url_sub  = "stocks/pdm/price-target"
tophandle=iBegin(0,"","")
connecthandle=iHostConnect(tophandle, Url_main, @HTTP, "", "")
datahandle=iHttpInit(connecthandle, "GET", Url_sub, "",0)
rslt=iHttpOpen(datahandle, "", 0, 0)
iReadData(datahandle, File_Hostconnect)
iClose(datahandle)
iClose(connecthandle)
iClose(tophandle)
 

stanl

  • Pundit
  • *****
  • Posts: 875
Re: Extract Webpage
« Reply #1 on: April 12, 2019, 03:05:10 am »
Your script returns the page source. It is primarily JS links. Probably requires skill in Ajax to get anything useful.

td

  • Tech Support
  • *****
  • Posts: 2814
    • WinBatch
Re: Extract Webpage
« Reply #2 on: April 12, 2019, 06:40:14 am »
A lot of references to the React javascript library. 
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

stanl

  • Pundit
  • *****
  • Posts: 875
Re: Extract Webpage
« Reply #3 on: April 12, 2019, 08:53:13 am »
A lot of references to the React javascript library.


and some babel-polyfill ???

stanl

  • Pundit
  • *****
  • Posts: 875
Re: Extract Webpage
« Reply #4 on: April 12, 2019, 09:16:20 am »
really interesting... try this
Code: Winbatch

IntControl(73,1,0,0,0)
url = 'http://www.tipranks.com/stocks/pdm/price-target'
ObjectClrOption("useany","System")
oWEB = ObjectClrNew('System.Net.WebClient')
content = oWEB.DownloadString(url)
oWEB=0
FilePut("c:\temp\content.txt",content)
Exit


:WBERRORHANDLER
oWEB=0
Pause("oops, error",wberrortextstring:@CRLF:wberroradditionalinfo)
Exit
 

hdsouza

  • Full Member
  • ***
  • Posts: 147
Re: Extract Webpage
« Reply #5 on: April 12, 2019, 10:44:55 am »
Thanks Stan. I ran your script with winbatch version 2018B . It returned "405: method not allowed"

I also tried using winbatch ,
-- opened IE
--  navigating to 'http://www.tipranks.com/stocks/pdm/price-target'
-- FilePut((File_PageCheck), msie.document.GetElementsByTagName("HTML").item(0).outerHTML)
 The contents of File_PageCheck correctly had "Moderate Buy". So I would have assumed that the ihostconnect  would have displayed the same contents too.


td

  • Tech Support
  • *****
  • Posts: 2814
    • WinBatch
Re: Extract Webpage
« Reply #6 on: April 12, 2019, 01:03:43 pm »
A lot of references to the React javascript library.


and some babel-polyfill ???

I suppose it works better than a Babel Fish stuck in your ear in this case...
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

  • Tech Support
  • *****
  • Posts: 2814
    • WinBatch
Re: Extract Webpage
« Reply #7 on: April 12, 2019, 02:37:51 pm »
Thanks Stan. I ran your script with winbatch version 2018B . It returned "405: method not allowed"

I also tried using winbatch ,
-- opened IE
--  navigating to 'http://www.tipranks.com/stocks/pdm/price-target'
-- FilePut((File_PageCheck), msie.document.GetElementsByTagName("HTML").item(0).outerHTML)
 The contents of File_PageCheck correctly had "Moderate Buy". So I would have assumed that the ihostconnect  would have displayed the same contents too.

Stan, correct me if I am wrong but I believe Stan's point was that the page generates its contents by running javascript in the client browser.  The MSIE COM object has somewhat limited browser scripting capabilities but the WinInet extender is not a browser at all.  This means that the WinInet extender cannot return the text as you see it in a Web browser because it does not have a javascript engine that will generate the output.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

stanl

  • Pundit
  • *****
  • Posts: 875
Re: Extract Webpage
« Reply #8 on: April 14, 2019, 01:35:15 pm »
Bottom line: it is an interesting site. I had some time and tried a Plan-B by connecting from Excel but got the same 405 error that the .net script returned. Great reading - the 405. Thought about Selenium, but then read a thread about Selenium and React javascript. More interesting reading. My best guess is that for 'free' the site is look but don't touch and maybe they are a little more lenient with the paid stuff.

hdsouza

  • Full Member
  • ***
  • Posts: 147
Re: Extract Webpage
« Reply #9 on: April 14, 2019, 05:19:43 pm »
Thanks Stan and Td.
Appreciate the thought process and Ideas.