fun with xml: semi-final

Started by spl, March 21, 2025, 03:36:15 AM

Previous topic - Next topic

spl

Below is a current version of an xml parser for files w/out prior knowledge of the xml structure. I attached a zip with several small xml files I used in testing, as well as a screenshot of output for the bookstore xml file. The script is focused on parsing attributes and innerText but does include a map for nodeTypes. The csv output includes a parent node useful for grouping data like the bookstore into a table of Excel. However, if you parse the included Inventory xml, similar transfer can be problematic. That is, although well-formed xml, linking the privilege nodes to the name attribute, while consistent in the output, is not intuitive. It is also worth noting the script does a fair job with .xsd schema files.
;Winbatch 2025A - iterate/parse xml nodelist
;now uses loop based on XDoc.SelectNodes("//*")
;Stan Littlefield (who to blame)
;3/21/2025  [updated]
;
;Purpose: extract attributes/node innerText from xml
;         files w/out prior knowledge of structure.
;=====================================================
gosub udfs
IntControl(73,1,0,0,0)
;construct map for possible node type description
nodeTypes = $"0=None
1=ELEMENT_NODE
2=ATTRIBUTE_NODE
3=TEXT_NODE
4=CDATA_SECTION_NODE
5=ENTITY_REFERENCE_NODE
6=ENTITY_NODE
7=PROCESSING_INSTRUCTION_NODE
8=COMMENT_NODE
9=DOCUMENT_NODE
10=DOCUMENT_TYPE_NODE
11=DOCUMENT_FRAGMENT_NODE
12=NOTATION_NODE
13=WHITESPACE
14=SIGNIFICANTWHITESPACE
15=ENDELEMENT
16=ENDENTITY
17=XMLDECLARATION$"
nodeTypes= MapCreate(nodeTypes,'=',@lf)

;select file to process
types="XML Files|*.xml;*.xsd"
file=AskFilename("Select XML", dirscript(), types, "", 101)
if !fileexist(file) then Terminate(@TRUE, "Exiting", "File Not Found:":file)
;create DomDocument Object
XDoc = CreateObject("Msxml2.DOMDocument.6.0")  ;or just Msxml2.DOMDocument
XDoc.async = @False 
XDoc.validateOnParse = @False
XDoc.Load(file)
;initialize output variable - comma separated with 4 columns
output="Parent,Item,Value,Type":@CRLF

;check if file begins with xml declaration
;use the nodeTypes map to include the description of nodes
dtype = XDoc.ChildNodes.item(0)
if  nodeTypes[dtype.NodeType] <> 1
   ;======== uncomment only if needed
   ;output := dtype.BaseName:",":"null":",":nodeTypes[dtype.NodeType]:",":"Node":@CRLF
   parent =  dtype.parentNode.nodeName
   output := Get_Attributes(dtype,parent)
endif

;select/process all nodes from root node
nodes = XDoc.SelectNodes("//*")
for i=0 to nodes.length -1
   basename = nodes.item(i).BaseName
   ;======== uncomment only if needed
   ;output := basename:",":"null":",":nodeTypes[basename.NodeType]:",":"Node":@CRLF
   parent =  nodes.item(i).parentNode.nodeName
   parse =  parent: "," :basename:",": nodes.item(i).Text:",":"Text"
   if ! (strindex(parse,",,",0,@fwdscan) || nodes.item(i).ChildNodes.Length >1)
      output := parse:@CRLF 
   endif
   output := Get_Attributes(nodes.item(i),parent)
Next
Message(file,output)
Exit

:WBERRORHANDLER
XDoc = 0
geterror()
Terminate(@TRUE,"Error Encountered",errmsg)
;=====================================================

:udfs
#DefineSubRoutine geterror()
   wberroradditionalinfo = wberrorarray[6]
   lasterr = wberrorarray[0]
   handlerline = wberrorarray[1]
   textstring = wberrorarray[5]
   linenumber = wberrorarray[8]
   errmsg = "Error: ":lasterr:@LF:textstring:@LF:"Line (":linenumber:")":@LF:wberroradditionalinfo
   Return(errmsg)
#EndSubRoutine

#DefineFunction Get_Attributes(node,parent)
IntControl(73,1,0,0,0)
retval = ""
atts = node.attributes           
length = atts.Length
If length > 0
   For i=0 to length-1
       att = atts.Item(i)
       name = att.Name
       value = att.Value
       parse =  parent:",":name:",":value:",Attribute":@CRLF
       retval := parse
   Next
EndIf
Return retval
:WBERRORHANDLER
XDoc = 0
geterror()
Terminate(@TRUE,"Error Encountered",errmsg)
#EndFunction

Return
;=====================================================
Stan - formerly stanl [ex-Pundit]