Need more powerful version of httpStripHTML

Started by stevengraff, February 05, 2014, 07:45:45 AM

Previous topic - Next topic

stevengraff

I don't seem to get great results with the httpStripHTML function. Is anyone using anything better? or supplemental?


Input:


notes = "<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE>



<META name=GENERATOR content="MSHTML 10.00.9200.16736"></HEAD>

<BODY leftMargin=1 rightMargin=1 topMargin=1><FONT size=2 face="Segoe UI">

<DIV>

<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Segoe UI','sans-serif'; mso-fareast-font-family: 'Times New Roman'">Guys - final poll of GC for practice tonight&nbsp;- all NO's so far.&nbsp;&nbsp;If you&nbsp;are a YES and haven't responded, please TEXT back immediately.&nbsp; Thanks.&nbsp; Steve<?xml:namespace prefix = "o" ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p></SPAN></P></DIV></FONT></BODY></HTML>"

====================
Output:

notes = "



P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}



Guys - final poll of GC for practice tonight&nbsp;- all NO's so far.&nbsp;&nbsp;If you&nbsp;are a YES and haven't responded, please TEXT back immediately.&nbsp; Thanks.&nbsp; Steve"



OK... so it does a lot, but still, I need more. I'm reluctant to start micromanaging every tag this function doesn't clean up... but maybe that's the only way?


td

Using the dotNet "RegularExpressions" class is probably one of the most common approaches.  If lacking a version of WinBatch with CLR support, you could always use the "VBScript.RegExp" object. There is an example here:

http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/techsupt.web

The article contains a dead link to an example using the WinBatch "BinaryTag" functions but they are still an option.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

....IFICantBYTE

Double posting.. I think, but perhaps use object.innerText or object.textContent?
Regards,
....IFICantBYTE

Nothing sucks more than that moment during an argument when you realize you're wrong. :)

td

That is one of the techniques offered as an example in the previously mentioned tech article.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade