Need more powerful version of httpStripHTML

Started by stevengraff, February 05, 2014, 12:10:05 PM

Previous topic - Next topic

stevengraff

I don't seem to get great results with the httpStripHTML function. Is anyone using anything better? or supplemental?


Input:


notes = "<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<STYLE type=text/css> P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}</STYLE>



<META name=GENERATOR content="MSHTML 10.00.9200.16736"></HEAD>

<BODY leftMargin=1 rightMargin=1 topMargin=1><FONT size=2 face="Segoe UI">

<DIV>

<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Segoe UI','sans-serif'; mso-fareast-font-family: 'Times New Roman'">Guys - final poll of GC for practice tonight&nbsp;- all NO's so far.&nbsp;&nbsp;If you&nbsp;are a YES and haven't responded, please TEXT back immediately.&nbsp; Thanks.&nbsp; Steve<?xml:namespace prefix = "o" ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p></SPAN></P></DIV></FONT></BODY></HTML>"

====================
Output:

notes = "



P, UL, OL, DL, DIR, MENU, PRE { margin: 0 auto;}



Guys - final poll of GC for practice tonight&nbsp;- all NO's so far.&nbsp;&nbsp;If you&nbsp;are a YES and haven't responded, please TEXT back immediately.&nbsp; Thanks.&nbsp; Steve"



OK... so it does a lot, but still, I need more. I'm reluctant to start micromanaging every tag this function doesn't clean up... but maybe that's the only way?


JTaylor

You mean it hasn't improved since this morning?   ;)

Jim

....IFICantBYTE

Use IE's Com and object.innerText?
(there is object.textContent in later versions too I think, but it returns CSS stuff as well I think?? .. look up in Google?)
Regards,
....IFICantBYTE

Nothing sucks more than that moment during an argument when you realize you're wrong. :)

stevengraff

Quote from: JTaylor on February 05, 2014, 12:13:10 PM
You mean it hasn't improved since this morning?   ;)

Jim

Hope springs eternal? :)

I actually was vaguely aware of double-punching something, but I thought it was all in the morning, 2 minutes apart. I think it's safe to delete this one.

JTaylor

Sorry...sometimes I just can't resist  :)

Jim