Hello!
I'm trying to parse a text list and extract only the unique items in that list. Right now it's approximately 6,000 items and growing.
Example:
I have a list similar to the following.
Cat
Dog
Cat
Cat
Dog
Bird
Cat
Bird
Giraffe
Dog
Bird
Dog
Lion
Lion
Cat
Giraffe
Bird
Bird
And I'd like to generate a concise list of unique items in that list like this.
Cat
Dog
Bird
Giraffe
Lion
Thanks in advance for your pointers!
The simplest solution would be to place all the items into a WIL Map. You don't need to test for the existence of the string key before placing a string into the map. This works because all WIL Map keys must be unique. Another solution would be to follow the traditional approach to deduping a list by placing your strings into an array and then sort the array. Once you have the sorted array you step through the array adding elements to another array (or whatever containing data structure you choose) looking for cases where the current and previous array elements are the same. When you find a match you skip placing the current element into the second array.
Once again my rudimentary skills sent me down a rabbit hole and I ended up painted into a corner. I sincerely appreciate you pointing me to some exits!
Thank you Tony!
If you want to jump into WB's CLR there is System.Collection.Generic HashSet which can take you input and remove duplicates.
Example of using map to find unique values
; list of words
list = "a b c d e b ff r e d f b c b s d e s a s d"
list = list : " " ; last word gets a space
list = strreplace(list," ","=") ; switch space to better delimiter
; all the work is done here
map = mapcreate(list,",","=") ; make it into a map
; make the map keys into a string for presentation
lyst = ""
foreach this in map
lyst = lyst : @lf : this
next
;present
message("unique",lyst)
Quote from: kdmoyers on June 10, 2020, 05:11:09 AM
Example of using map to find unique values
This is so simple and elegant!
Thank you!
The map code is preferable.... but for giggles [ I couldn't get a HashSet to work, but Hashtable does]
oHash = ObjectClrNew("System.Collections.Hashtable")
list = "a b c d e b ff r e d f b c b s d e s a s d"
n= ItemCount(list," ")
newlist = ""
For i = 1 To n-1
key=ItemExtract(i,list," ")
If ! oHash.ContainsKey(key)
oHash.Add(key,",")
newlist=newlist:key:@LF
Endif
Next
nCnt = oHash.Count
Message("Original Count:":n,"HashCount:":nCnt)
Message("New List",newlist)
oHash=0
Exit
Quote from: stanl on June 11, 2020, 03:52:52 AM
The map code is preferable.... but for giggles [ I couldn't get a HashSet to work, but Hashtable does]
You guys are ROCK STARS!
When you mentioned the CLR options I looked them up in the WIL Consolidated help file. When I saw all the .NET information (which is totally foreign to me) I realized that stuff is so far
above my pay grade that I didn't explore it any further.
Your
(also simple and elegant) example makes it look like it might be doable by a novice like me.
Thank you Stan!
For grins, I add the following code to the first script to get the list sorted...
might bbe a better way, but for now it works for me and it was not something
I had ever tried previously.
; -----------------------------------------
i = 0
myArray = arrDimension(100)
foreach this in map
;message("This contains",this)
myArray[i] = this
i = i +1
next
; Sort the array using an ascending intuitive sort
;ArraySort( myArray, @ASCENDING|@StringSort, 0, 0, ArrInfo(myArray,1) -1 )
ArraySort( myArray,@ASCENDING,0,0,i-1)
strList = ArrayItemize( myArray, @LF )
message("Sorted List",strList)
Just a random thought but perhaps we need to add a function that places all the keys in a map into either an array or item list. Something like:
list_or_array = MapKeysGet(map, return_type[,delimiter (optional)]) where "return_type" is set to 1 to indicate a list and 2 to indicate an array function return type.
Tony,
I often use a couple of functions like that with Javascript's Maps - map.values() and map.keys(). I often find them useful. I'd never say "no" to a new feature like that!
Quote from: td on June 11, 2020, 01:09:18 PM
Just a random thought but perhaps we need to add a function that places all the keys in a map into either an array or item list. Something like:
list_or_array = MapKeysGet(map, return_type[,delimiter (optional)]) where "return_type" is set to 1 to indicate a list and 2 to indicate an array function return type.
This is where I'm heading. My raw file is tab delimited in the format "
Date -
Columns of other info -
Names -
Columns of other info"Steps:
- Extract date (for filtering)
- Extract "unique" name
- Sort names (thanks KeithW) and insert into an "ItemList" for use in a "DropListBox"
So your "random thought" appears to be an exact solution for my current project!
Just to throw in another option...
Recently, there was a request for an example on how to generate a UUID/GUID value, and using the .Net CLR easily provides that functionality via a static class method on the System.Guid class. However, WinBatch cannot readily access static class methods from the class type itself and requires an instance of the class to be created so that the method can be invoked as if it were an instance method instead of a static method.
Also, there was another discussion regarding WinBatch having difficulty creating instances of generic classes, such as "System.Collections.SortedSet<>". Although WinBatch's ObjectClrNew() function doesn't have a graceful way to utilize generics, PowerShell can easily do this, and WinBatch can consume PowerShell.
In the event that you actually want to use a specific generic collection class instead of the more general object collections or the builtin WIL Map data object, the following proof of concept code snippet demonstrates how to use WinBatch to invoked PowerShell to create an instance of a generic collection class [SortedSet<string>], populate it with some values and then enumerate thru the values. This could easily be wrapped up in a UDF for convenient usage.
ObjectClrOption("UseAny", "System.Management.Automation")
ObjectClrOption("UseAny", "System.Collections")
oAutoPs = ObjectClrNew("System.Management.Automation.PowerShell")
oPowerShell = oAutoPs.Create()
oPowerShell.AddCommand("New-Object")
oPowerShell.AddArgument("System.Collections.Generic.SortedSet[System.String]")
oAsync = oPowerShell.BeginInvoke()
oPsCollection = oPowerShell.EndInvoke(oAsync)
oSortedSet = oPsCollection.Item(0).BaseObject()
oSortedSet.Add("C")
oSortedSet.Add("B")
oSortedSet.Add("A")
Pause( 'oSortedSet', 'Count = ' : oSortedSet.Count )
items = ""
foreach item in oSortedSet
if (items == "")
items = item
else
items = items : " " : item
endif
next
Pause( 'oSortedSet', 'Items = ' : items )
Quote from: ChuckC on June 12, 2020, 02:31:43 PM
Although WinBatch's ObjectClrNew() function doesn't have a graceful way to utilize generics, PowerShell can easily do this, and WinBatch can consume PowerShell.
Learned that the hard way :-X
oHash = ObjectClrNew("System.Collections.Hashtable") ; works fine
oHash = ObjectClrNew("System.Collections.Generic.HashSet") ; gives CLR Type error
but in Powershell
$HashSetTest = [System.Collections.Generic.HashSet[string]]::new()
$FileExtList = (Get-ChildItem -LiteralPath $env:TEMP -File).Extension
$FileExtList.Where({$_}).ForEach({[void]$HashSetTest.Add($_)})
$HashSetTest.GetType()
'=' * 40
$HashSetTest.Count
'=' * 40
$HashSetTest
'=' * 40
My how times have changed. During the course of this topic, no one bothered to mentioned the old reliable "scripting.dictionary" COM Automation object as a way to create a list of unique items.
Quote from: td on June 15, 2020, 07:54:23 AM
My how times have changed. During the course of this topic, no one bothered to mentioned the old reliable "scripting.dictionary" COM Automation object as a way to create a list of unique items.
Ha ha! your're right, and that is what I used to use. I guess Map just knocked that right outa my head. (laugh)