I don't seem to get great results with the httpStripHTML function. Is anyone using anything better? or supplemental?
Input:
notes = "<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE>
<META name=GENERATOR content="MSHTML 10.00.9200.16736"></HEAD>
<BODY leftMargin=1 rightMargin=1 topMargin=1><FONT size=2 face="Segoe UI">
<DIV>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Segoe UI','sans-serif'; mso-fareast-font-family: 'Times New Roman'">Guys - final poll of GC for practice tonight - all NO's so far. If you are a YES and haven't responded, please TEXT back immediately. Thanks. Steve<?xml:namespace prefix = "o" ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p></SPAN></P></DIV></FONT></BODY></HTML>"
====================
Output:
notes = "
P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}
Guys - final poll of GC for practice tonight - all NO's so far. If you are a YES and haven't responded, please TEXT back immediately. Thanks. Steve"
OK... so it does a lot, but still, I need more. I'm reluctant to start micromanaging every tag this function doesn't clean up... but maybe that's the only way?
Using the dotNet "RegularExpressions" class is probably one of the most common approaches. If lacking a version of WinBatch with CLR support, you could always use the "VBScript.RegExp" object. There is an example here:
http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/techsupt.web (http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/techsupt.web)
The article contains a dead link to an example using the WinBatch "BinaryTag" functions but they are still an option.
Double posting.. I think, but perhaps use object.innerText or object.textContent?
That is one of the techniques offered as an example in the previously mentioned tech article.