Extract unique items

Started by jmburton2001, June 08, 2020, 08:00:55 AM

Previous topic - Next topic

jmburton2001

Hello!

I'm trying to parse a text list and extract only the unique items in that list. Right now it's approximately 6,000 items and growing.

Example:

I have a list similar to the following.

Cat
Dog
Cat
Cat
Dog
Bird
Cat
Bird
Giraffe
Dog
Bird
Dog
Lion
Lion
Cat
Giraffe
Bird
Bird

And I'd like to generate a concise list of unique items in that list like this.

Cat
Dog
Bird
Giraffe
Lion

Thanks in advance for your pointers!

td

The simplest solution would be to place all the items into a WIL Map. You don't need to test for the existence of the string key before placing a string into the map. This works because all WIL Map keys must be unique. Another solution would be to follow the traditional approach to deduping a list by placing your strings into an array and then sort the array.  Once you have the sorted array you step through the array adding elements to another array (or whatever containing data structure you choose) looking for cases where the current and previous array elements are the same. When you find a match you skip placing the current element into the second array.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

jmburton2001

Once again my rudimentary skills sent me down a rabbit hole and I ended up painted into a corner. I sincerely  appreciate you pointing me to some exits!

Thank you Tony!

stanl

If you want to jump into WB's CLR there is System.Collection.Generic HashSet which can take you input and remove duplicates.

kdmoyers

Example of using map to find unique values
Code (winbatch) Select
; list of words
list = "a b c d e b ff r e d f b c b s d e s a s d"

list = list : " "               ; last word gets a space
list = strreplace(list," ","=") ; switch space to better delimiter

; all the work is done here
map = mapcreate(list,",","=")   ; make it into a map

; make the map keys into a string for presentation
lyst = ""
foreach this in map
  lyst = lyst : @lf : this
next

;present
message("unique",lyst)
The mind is everything; What you think, you become.

jmburton2001

Quote from: kdmoyers on June 10, 2020, 05:11:09 AM
Example of using map to find unique values

This is so simple and elegant!
Thank you!

stanl

The map code is preferable.... but for giggles [ I couldn't get a HashSet to work, but Hashtable does]
Code (WINBATCH) Select


oHash = ObjectClrNew("System.Collections.Hashtable")
list = "a b c d e b ff r e d f b c b s d e s a s d"
n= ItemCount(list," ")
newlist = ""
For i = 1 To n-1
   key=ItemExtract(i,list," ")
   If ! oHash.ContainsKey(key)
      oHash.Add(key,",")
      newlist=newlist:key:@LF
   Endif
Next
nCnt = oHash.Count                               
Message("Original Count:":n,"HashCount:":nCnt)
Message("New List",newlist)
oHash=0
Exit

jmburton2001

Quote from: stanl on June 11, 2020, 03:52:52 AM
The map code is preferable.... but for giggles [ I couldn't get a HashSet to work, but Hashtable does]

You guys are ROCK STARS!

When you mentioned the CLR options I looked them up in the WIL Consolidated help file. When I saw all the .NET information (which is totally foreign to me) I realized that stuff is so far above my pay grade that I didn't explore it any further.

Your (also simple and elegant) example makes it look like it might be doable by a novice like me.

Thank you Stan!

KeithW

For grins, I add the following code to the first script to get the list sorted...
might bbe a better way, but for now it works for me and it was not something
I had ever tried previously.




; -----------------------------------------
i = 0
myArray = arrDimension(100)
foreach this in map
   ;message("This contains",this)
myArray[i] = this
i = i +1
next

; Sort the array using an ascending intuitive sort
;ArraySort( myArray, @ASCENDING|@StringSort, 0, 0, ArrInfo(myArray,1) -1 )
ArraySort( myArray,@ASCENDING,0,0,i-1)
strList = ArrayItemize( myArray, @LF )
message("Sorted List",strList)


td

Just a random thought but perhaps we need to add a function that places all the keys in a map into either an array or item list.  Something like:

list_or_array = MapKeysGet(map, return_type[,delimiter (optional)]) where "return_type" is set to 1 to indicate a list and 2 to indicate an array function return type.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

mhall

Tony,

I often use a couple of functions like that with Javascript's Maps - map.values() and map.keys(). I often find them useful. I'd never say "no" to a new feature like that!

jmburton2001

Quote from: td on June 11, 2020, 01:09:18 PM
Just a random thought but perhaps we need to add a function that places all the keys in a map into either an array or item list.  Something like:

list_or_array = MapKeysGet(map, return_type[,delimiter (optional)]) where "return_type" is set to 1 to indicate a list and 2 to indicate an array function return type.

This is where I'm heading. My raw file is tab delimited in the format "Date - Columns of other info - Names - Columns of other info"

Steps:

  • Extract date (for filtering)
  • Extract "unique" name
  • Sort names (thanks KeithW) and insert into an "ItemList" for use in a "DropListBox"

So your "random thought" appears to be an exact solution for my current project!

ChuckC

Just to throw in another option...

Recently, there was a request for an example on how to generate a UUID/GUID value, and using the .Net CLR easily provides that functionality via a static class method on the System.Guid class.  However, WinBatch cannot readily access static class methods from the class type itself and requires an instance of the class to be created so that the method can be invoked as if it were an instance method instead of a static method.

Also, there was another discussion regarding WinBatch having difficulty creating instances of generic classes, such as "System.Collections.SortedSet<>".  Although WinBatch's ObjectClrNew() function doesn't have a graceful way to utilize generics, PowerShell can easily do this, and WinBatch can consume PowerShell.

In the event that you actually want to use a specific generic collection class instead of the more general object collections or the builtin WIL Map data object, the following proof of concept code snippet demonstrates how to use WinBatch to invoked PowerShell to create an instance of a generic collection class [SortedSet<string>], populate it with some values and then enumerate thru the values.  This could easily be wrapped up in a UDF for convenient usage.


ObjectClrOption("UseAny", "System.Management.Automation")

ObjectClrOption("UseAny", "System.Collections")

oAutoPs = ObjectClrNew("System.Management.Automation.PowerShell")

oPowerShell = oAutoPs.Create()

oPowerShell.AddCommand("New-Object")

oPowerShell.AddArgument("System.Collections.Generic.SortedSet[System.String]")

oAsync = oPowerShell.BeginInvoke()

oPsCollection = oPowerShell.EndInvoke(oAsync)

oSortedSet = oPsCollection.Item(0).BaseObject()

oSortedSet.Add("C")

oSortedSet.Add("B")

oSortedSet.Add("A")

Pause( 'oSortedSet', 'Count = ' : oSortedSet.Count )

items = ""

foreach item in oSortedSet

   if (items == "")
      items = item
   else
      items = items : " " : item
   endif
next

Pause( 'oSortedSet', 'Items = ' : items )

stanl

Quote from: ChuckC on June 12, 2020, 02:31:43 PM
Although WinBatch's ObjectClrNew() function doesn't have a graceful way to utilize generics, PowerShell can easily do this, and WinBatch can consume PowerShell.



Learned that the hard way :-X
oHash = ObjectClrNew("System.Collections.Hashtable")   ; works fine
oHash = ObjectClrNew("System.Collections.Generic.HashSet") ; gives CLR Type error


but in Powershell


$HashSetTest = [System.Collections.Generic.HashSet[string]]::new()
$FileExtList = (Get-ChildItem -LiteralPath $env:TEMP -File).Extension


$FileExtList.Where({$_}).ForEach({[void]$HashSetTest.Add($_)})


$HashSetTest.GetType()
'=' * 40
$HashSetTest.Count
'=' * 40
$HashSetTest
'=' * 40

td

My how times have changed. During the course of this topic, no one bothered to mentioned the old reliable "scripting.dictionary" COM Automation object as a way to create a list of unique items.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

kdmoyers

Quote from: td on June 15, 2020, 07:54:23 AM
My how times have changed. During the course of this topic, no one bothered to mentioned the old reliable "scripting.dictionary" COM Automation object as a way to create a list of unique items.
Ha ha! your're right, and that is what I used to use.  I guess Map just knocked that right outa my head. (laugh)
The mind is everything; What you think, you become.