WinBatch® Technical Support Forum

All Things WinBatch => WinBatch => Topic started by: stevengraff on February 05, 2014, 07:45:45 AM

Title: Need more powerful version of httpStripHTML
Post by: stevengraff on February 05, 2014, 07:45:45 AM
I don't seem to get great results with the httpStripHTML function. Is anyone using anything better? or supplemental?


Input:


notes = "<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE>



<META name=GENERATOR content="MSHTML 10.00.9200.16736"></HEAD>

<BODY leftMargin=1 rightMargin=1 topMargin=1><FONT size=2 face="Segoe UI">

<DIV>

<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Segoe UI','sans-serif'; mso-fareast-font-family: 'Times New Roman'">Guys - final poll of GC for practice tonight&nbsp;- all NO's so far.&nbsp;&nbsp;If you&nbsp;are a YES and haven't responded, please TEXT back immediately.&nbsp; Thanks.&nbsp; Steve<?xml:namespace prefix = "o" ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p></SPAN></P></DIV></FONT></BODY></HTML>"

====================
Output:

notes = "



P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}



Guys - final poll of GC for practice tonight&nbsp;- all NO's so far.&nbsp;&nbsp;If you&nbsp;are a YES and haven't responded, please TEXT back immediately.&nbsp; Thanks.&nbsp; Steve"



OK... so it does a lot, but still, I need more. I'm reluctant to start micromanaging every tag this function doesn't clean up... but maybe that's the only way?

Title: Re: Need more powerful version of httpStripHTML
Post by: td on February 05, 2014, 09:54:15 AM
Using the dotNet "RegularExpressions" class is probably one of the most common approaches.  If lacking a version of WinBatch with CLR support, you could always use the "VBScript.RegExp" object. There is an example here:

http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/techsupt.web (http://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/techsupt.web)

The article contains a dead link to an example using the WinBatch "BinaryTag" functions but they are still an option.
Title: Re: Need more powerful version of httpStripHTML
Post by: ....IFICantBYTE on February 05, 2014, 05:34:08 PM
Double posting.. I think, but perhaps use object.innerText or object.textContent?
Title: Re: Need more powerful version of httpStripHTML
Post by: td on February 05, 2014, 09:45:10 PM
That is one of the techniques offered as an example in the previously mentioned tech article.