Here the first draft of the HTML parsing in place. Seems solid but I haven't integrated into any of my projects so might change my mind. Thought I would throw it out there and see if it sticks. Also just remembered I haven't test Unicode yet.
Also, dmParseFile() isn't working for HTML yet. You will have to use dmParse() for the moment.
http://www.jtdata.com/anonymous/DomData.zip
Jim
This contains the rest of the HTML stuff and some nice updates to the JSON portion. I have also folded my wbSundry Extender (no function name changes) into this project as a number of the functions will be very useful here as well. As a result I removed the "ToArray" functions as that can be done from the CSV data using one of the Sundry Extender functions.
http://www.jtdata.com/anonymous/DomData.zip
Many thanks to Stan and his testing as it help me fine tune some things as well as think of some things I might not have otherwise.
Now to integrate the HTML stuff in a real project and see what I have missed.
Jim
New functions, ParseFile() for HTML now works and unless I hear differently this is getting close to being ready for production. If this stuff interests you, please give it a try now. Easier to tweak things now as I can live with some breaking changes.
http://www.jtdata.com/anonymous/DomData.zip
Jim
Had to rework a lot of the attribute stuff so if you are trying out the HTML parsing you will want to update. Funny how testing works so well but when you try it in real life it doesn't work.
http://www.jtdata.com/anonymous/DomData.zip
Jim
In case anyone finds the below useful so as to make node based text retrievals easier. Example:
title = Get_NText(dmhGetElementsByAttribute(0,"class","product_biblio_title","h1",@FALSE,1),"IT")
Jim
#DefineFunction Get_NText(node,dtype)
IntControl(73, 2, 1, 0, 0)
err = ""
If dtype == "OH" Then
txt = dmhGetOuterHTML(node)
EndIf
If dtype == "IH" Then
txt = dmhGetInnerHTML(node)
EndIf
If dtype == "IT" Then
txt = dmhGetInnerText(node)
EndIf
If err == "ERRTXT" Then
Return ""
Else
Return txt
EndIf
Return
:WBERRORHANDLER
err = 1
lasterr = wberrorarray[0]
handlerline = wberrorarray[1]
textstring = wberrorarray[5]
linenumber = wberrorarray[8]
errstr = StrCat("Number: ",lasterr,@LF,"String: ",textstring,@LF,"Line (",linenumber,"): '",handlerline,"'")
err = "ERRTXT"
; Message("Error Information",errstr)
Return
#EndFunction
If you are making use of this Extender and you haven't updated lately there have been a number of changes. Just added a dmUrlParamMap() function. It will take a URL and create a map string from the parameters and their values. If you often extract href attributes and parse out values from those you will appreciate this function. It is separate from the Parsing so you need not parse a document to use it.
Also, the dmStore() functions have been VERY useful in one of my projects.
So far I REALLY like the HTML parsing. I have been making heavy use of it in real projects so have worked out most of the kinks, I think, as well as adding things like the above to make data extraction easier. I have used the XML parser a bit so fairly comfortable with its status and I know Stan has been hitting the JSON parser and, as a result, I have tweaked a few things on that front as well. I plan to give it a few more days to see if anything else pops up and will then declare it production ready.
http://www.jtdata.com/anonymous/domdata.zip
Jim