Removing Duplicates with a WIL Map

Started by td, May 02, 2019, 02:51:44 PM

Previous topic - Next topic

td

The WinBatch Studio's menu file "wspopup.mnu" contains a list of every WIL, WinBatch, WinBatch Studio, and Extender function name along with each function's parameters and a brief description of the function.  The file is parsed as part of the WinBatch release build process to create input for other downstream release creation processes.  It is also used to verifying the integrity of the help system and the WinBatch syntax coloring files.  Unfortunately, "wspopup.mnu" contains many duplicate entries and duplicates can interfere with the downstream processing and verification.

The current production script uses the old standard sort approach to removing duplicates using the ArraySort function. Given the recent introduction of  Wil Maps I thought it might be useful to illustrate an efficient use of WIL Maps to remove duplicate data from a file and create a new file in a different format. 

Unfortunately, the output file's format requirements do not permit the use of the MapFilePutCsv function or the script would have been even more efficient.

Code (winbatch) Select
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Name: FunctionDump
;;
;; Purpose: Creates a list of functions file from the latest
;;          wspopup.mnu file.  The file is a simple text file
;;          containing three lines per function.  The fist line is the
;;          the name of the WIL, WinBatch or extender function.  The
;;          second line is the parentheses surrounded parameters for the
;;          function.  The third is a brief description of the function.
;;
;; Parameters:  Output file name and path.
;;
;; Return: Array with Number of function names found and
;;         number of functions added to the the output file.
;;         [0] = total function names found including duplicates.
;;         [1] = total names written to file.
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
#DefineFunction FunctionDump(_strOutFile)
   
   BoxOpen("Building Function File",_strOutFile)

   hWspopup = 0
   hFuncDmp = 0

   strWbs = RegQueryStr(@REGMACHINE, 'SOFTWARE\Microsoft\Windows\CurrentVersion\App Paths\WinBatch Studio.exe', 64)
   WsPopup = FilePath(strWbs):'wspopup.mnu'
   if !FileExist(WsPopup)
      Pause("Build Function File Error",WsPopup:" not found!")
       exit
   endif
   
   ; Plan 9 from outerspace
   strOutDir = FilePath(_strOutFile)
   if !DirExist(strOutDir) then DirMake(strOutDir)
   if FileExist(_strOutFile) then FileDelete(_strOutFile)
   hWspopup = FileOpen(WsPopup, "READ")
   hFuncDmp =  FileOpen(_strOutFile, "WRITE")
   strLineIn = ""
   mapFuncs = MapCreate()
   nWithDups = 0                                                   
   while 1
      strLineIn = FileRead(hWspopup)
      if strLineIn == "*EOF*" then break

      ; Parse each function entry in wspopup and add to a map
      ; as value key pair.
      if StrIndex(strLineIn, ";^", 0, @Fwdscan ) && ItemCount( strLineIn, "^" ) > 2
         strLineIn = StrReplace(strLineIn, @Tab, "")
         strName = strTrim(ItemExtract(2,strLineIn,"^"))
         if StrLen(strName)>0 && StrIndex(strName, ".", 0, @Fwdscan)==0
            ; Function name is the key and the parameters and Description are combinded
            ; to make the map value.
            mapFuncs[strName] =  ItemExtract(3,strLineIn,"^"):@tab:ItemExtract(4,strLineIn,"^")           
            nWithDups += 1   
         endif
      endif
   endwhile

   ; Dump to a file with the requirement that each function having 3 lines with
   ;   Line 1 - function name
   ;   Line 2 - function parameters
   ;   Line 3 - function description
   ; If we didn't want the information on three separate
   ; lines we could have used MapFilePutCsv instead of the
   ; following loop.
   foreach strName in  mapFuncs
     FileWrite( hFuncDmp, strName )
     lRest = mapFuncs[strName]
     FileWrite( hFuncDmp, ItemExtract(1,lRest,@Tab ) )
     FileWrite( hFuncDmp, ItemExtract(2,lRest,@Tab ) )
   next
     
:cleanup
   if hFuncDmp then FileClose(hFuncDmp)
   if hWspopup then FileClose(hWspopup)
   BoxShut()

   aRet[0] = nWithDups
   aRet[1] = ArrInfo(mapFuncs, 1)
   return aRet
#EndFunction

;; Test
aCnts = FunctionDump('C:\temp\Functions.txt')

;; Report
Message('FunctionDump Test', aCnts[1]:' Functions dumped':@lf:' and ': aCnts[0]-aCnts[1]:' Dups Removed')
 
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

kdmoyers

Have I mentioned this week how much I love the new maps?
Oh, I see I have. OK, I'll wait until next week to say it again.
(wink)
-Kirby
The mind is everything; What you think, you become.

td

WIL maps are new functionality so there is room for growth.  There are a few ideas floating around about what enhancement might be made to that functionality. 

One idea is to make it possible for the user to directly set the size of a map's hash table.  Currently, the hash table size is set to a default when the map is created without data or created dynamically. The WIL interpreter uses an algorithm to set the size when MapCreate is called with data or MapFileGetCsv is called to create a preloaded map.  The hash table size doesn't affect the number of items you can store in the map but it does affect the balance between performance and memory usage.

If you have any ideas for enhancements that you think would have broad applicability and practicality, let us know.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

kdmoyers

Well, since you asked, a cool thing would be some sort of hybrid array: indexed by words on one axis and numbers on the other.  This effect could be constructed of course, but look how intutive and clean this syntax is:
Code (winbatch) Select
    for idx = 1 to NumRecs
      Total = Total + Arr["merchandise",idx]
    next idx

Maybe have some handy support functions for read/write CSV files and converting ADO result sets.

I dunno, maybe not worth the effort.  just an idea.
-K
The mind is everything; What you think, you become.

stanl

Quote from: kdmoyers on May 07, 2019, 05:13:18 AM
Well, since you asked, a cool thing would be some sort of hybrid array: indexed by words on one axis and numbers on the other. 

Funny, I was just thinking of that as I just upgraded to look at maps. I'm in a situation now where I get up to 2 gig file extracts delimited with ^ [which is not great for pre-parsing with Powershell as it gives regex warnings].  What I would want is to parse the files by

[column name]:position   

which would be nice as a map index. Of course, I claim total infancy as per understanding maps. And yes, I know there are other ways.

td

Certainly is an interesting proposal.   I can see its usefulness.  The idea would be almost equivalent to allowing regular WIL arrays to be embedded in WIL maps as values. Of course, the difference would be the language syntax used to access individual elements - a['key', 0] vs. a['key'][0].

It would be surprisingly tricky to implement but it is worth further consideration.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

kdmoyers

I think I was imagining something simpler: just a 2D array, but with named
columns. That is, there's only ONE map for the whole array, and it's values
are just column numbers (small integers).

The first time a new column name is seen, it is assigned the next higher
column number, starting at zero.

xarr = MapArrDimension("",100) ; named columns, 100 rows

xarr["fred",0] = 57  ; new name, equiv to xarr[0,0] = 57
xarr["sally",6] = 23 ; new name, equiv to xarr[1,6] = 23
xarr["fred",4] = 45  ; old name, equiv to xarr[0,4] = 45

Basically it's just a regular 2d array with a hidden auxiliary map that
takes care of naming the columns.

Something like MapFileGetCsvArray() would create the array automatically,
interpreting the first CSV line as column names.

Something like MapGetAdoArray() would create the array automatically,
interpreting field names from an ADO result set as column names.

I might be describing the same thing you said!
I'm in water over my head here. (smile)
-K
The mind is everything; What you think, you become.

JTaylor

...and if you did all this, especially the GetADO() otpion, assuming I understand, allow it to be applied to a ReportView control where it automatically applied the column names to the header row.

Jim


Quote from: kdmoyers on May 09, 2019, 05:20:10 AM
I think I was imagining something simpler: just a 2D array, but with named
columns. That is, there's only ONE map for the whole array, and it's values
are just column numbers (small integers)..................

-K