HtmlAgilityPack Pt. 2

Started by JTaylor, April 23, 2020, 06:44:04 PM

Previous topic - Next topic

JTaylor

Since the other thread got cluttered I thought I would start a new one since this is a new version.   This, as far as I know, fixes the COM interface error when using CLR.   Nothing has to be registered.     I made notes throughout the test script for Methods I added and Changed.   I only changed things in three files from the original project.  If there are other things you think need changing let me know...or just change them yourself.   Keep in mind you cannot use Generic stuff, especially for enumerated items.   You can download it, replace the files in the Shared folder and compile it and you should be good to go.  I only used the "HtmlAgilityPack.Net45" project along with the Shared.  I removed all the others.  Hope this makes sense and is useful.

I also included the DLL for those that just want the finished product.   Use at your own risk  :)

Stan, if you figure out that the COM stuff is just me let me know.  Not sure what it would be but you know me.

Jim

stanl

This just opened a potential can of worms.  Your dll was 165kb, mine was 134kb.  Using mine as appbase with your test script fails as cannot create HTMLNode.  But your appbase now fails with my script on LoadHTML()

JTaylor

Odd.  Although, I did add a number of things but surely not that much.   I also rewrote a lot using Non-Generics.   Guess that could have impacted things.   Also, HtmlNode won't work on yours because it has to be made "public".  What is the message on the load fail?

Jim

stanl

Quote from: JTaylor on April 24, 2020, 06:06:58 AM
Odd.    What is the message on the load fail?
Jim


"CLR Type Not Found." -  I am able to get to Node elements without an HTMLNode object. AgilityPack is nice so are simple on-liners calling PS from WB CLR - Invoke-RestMethod or Invoke-WebRequest - where everything is an object and you can Select-Object. The holy grail, of course, would be to return PS results back to WB w/out having to output to file.   

JTaylor

It is case sensitive so if you used "LoadHTML()", it will need to be "LoadHtml()".

Guessing you know this but if you change "private partial class HtmlNode" to "public partial class HtmlNode" loading HtmlNode should work.     Too ignorant to know if this is a bad idea but seems to work  :)

Jim

JTaylor

This thing is driving me crazy.   Everything works on my test file but once I pull a page from a site I need to use it chokes on the initial Parsing to create the document  >:(

Jim

JTaylor

It is the <script> tags.  If I remove those it parses with no problem.   That seems odd...

Jim

JTaylor

Maybe it is just a malformed page.  Get the same error if I leave off the end of a script tag in my test file.

Jim

JTaylor

FINALLY figured it out.  The page has an "<" inside a <script> tag.    So, if you can't load a page and wondering why, you might want to check on that possibility.

Guess I should report that to the HAP folks.   

Jim