wbDOMData Extender(Beta 3/3)

Started by JTaylor, January 26, 2021, 01:58:00 PM

Previous topic - Next topic

JTaylor

Here the first draft of the HTML parsing in place.   Seems solid but I haven't integrated into any of my projects so might change my mind.  Thought I would throw it out there and see if it sticks.    Also just remembered I haven't test Unicode yet.

Also, dmParseFile() isn't working for HTML yet.  You will have to use dmParse() for the moment.


         http://www.jtdata.com/anonymous/DomData.zip


Jim

JTaylor

This contains the rest of the HTML stuff and some nice updates to the JSON portion.  I have also folded my wbSundry Extender (no function name changes) into this project as a number of the functions will be very useful here as well.   As a result I removed the "ToArray" functions as that can be done from the CSV data using one of the Sundry Extender functions.

          http://www.jtdata.com/anonymous/DomData.zip

Many thanks to Stan and his testing as it help me fine tune some things as well as think of some things I might not have otherwise.

Now to integrate the HTML stuff in a real project and see what I have missed.

Jim

JTaylor

New functions, ParseFile() for HTML now works and unless I hear differently this is getting close to being ready for production.   If this stuff interests you, please give it a try now.  Easier to tweak things now as I can live with some breaking changes.

        http://www.jtdata.com/anonymous/DomData.zip

Jim

JTaylor

Had to rework a lot of the attribute stuff so if you are trying out the HTML parsing you will want to update.  Funny how testing works so well but when you try it in real life it doesn't work.

   http://www.jtdata.com/anonymous/DomData.zip

Jim

JTaylor

In case anyone finds the below useful so as to make node based text retrievals easier.  Example:


  title  = Get_NText(dmhGetElementsByAttribute(0,"class","product_biblio_title","h1",@FALSE,1),"IT")

Jim


Code (winbatch) Select


#DefineFunction Get_NText(node,dtype)

  IntControl(73, 2, 1, 0, 0)
  err = ""

  If dtype == "OH" Then
    txt = dmhGetOuterHTML(node)
  EndIf

  If dtype == "IH" Then
    txt = dmhGetInnerHTML(node)
  EndIf

  If dtype == "IT" Then
    txt = dmhGetInnerText(node)
  EndIf

  If err == "ERRTXT" Then
    Return ""
  Else
    Return txt
  EndIf
Return

:WBERRORHANDLER
  err = 1
  lasterr = wberrorarray[0]
  handlerline = wberrorarray[1]
  textstring = wberrorarray[5]
  linenumber = wberrorarray[8]
  errstr = StrCat("Number: ",lasterr,@LF,"String: ",textstring,@LF,"Line (",linenumber,"): '",handlerline,"'")
  err = "ERRTXT"
; Message("Error Information",errstr)
Return

#EndFunction


JTaylor

If you are making use of this Extender and you haven't updated lately there have been a number of changes.   Just added a dmUrlParamMap() function.  It will take a URL and create a map string from the parameters and their values.   If you often extract href attributes and parse out values from those you will appreciate this function.  It is separate from the Parsing so you need not parse a document to use it. 

Also, the dmStore() functions have been VERY useful in one of my projects.

So far I REALLY like the HTML parsing.  I have been making heavy use of it in real projects so have worked out most of the kinks, I think, as well as adding things like the above to make data extraction easier.   I have used the XML parser a bit so fairly comfortable with its status and I know Stan has been hitting the JSON parser and, as a result, I have tweaked a few things on that front as well.   I plan to give it a few more days to see if anything else pops up and will then declare it production ready.


          http://www.jtdata.com/anonymous/domdata.zip

Jim