Array2Map2Array for duplicates

Started by spl, March 01, 2025, 06:33:00 AM

Previous topic - Next topic

spl

There is probably a better method to remove duplicates from an array without involving map keys, but code below worked. I have 2 questions
  • Tried using MapKeysGet(unduped,1) to just convert to array without having to rebuild from list, but get an error
  • Is there a way to get the final array in the same order as the original with duplicates removed

Would love to see MapAdd(), MapRemove(), MapSort()
;Winbatch 2025A - using map to remove dupes from array
planets = 'Mercury,Mars,Venus,Earth,Mars,Jupiter,Mars,Saturn,Uranus,Neptune,Earth,andPluto'
arr = Arrayize( planets, ',' )
duplicates= ArrayItemize(arr,@lf)
Message('Duplicate Planets', duplicates) 
count = ArrInfo (arr, 1) -1
unduped = MapCreate()

for i=0 to count
   unduped[arr[i]] = i
Next

distinctplanets = MapKeysGet(unduped,2,",")
arr = Arrayize( distinctplanets, ',' )
unduped = ArrayItemize(arr,@lf)
Message('Distinct Planets', unduped) 
Exit
Stan - formerly stanl [ex-Pundit]

td

WIL maps are an unordered associative container. By definition, they do not preserve any order. I tried your script using MapKeysGet(unduped,1) and did not get an error.

Here is an example of one of many approaches to preserving order.
;Winbatch 2025A - using map to remove dupes from array
planets = 'Mercury,Mars,Venus,Earth,Mars,Jupiter,Mars,Saturn,Uranus,Neptune,Earth,andPluto'
arr = Arrayize( planets, ',' )
duplicates= ArrayItemize(arr,@lf)
;Message('Duplicate Planets', duplicates)
count = ArrInfo (arr, 1) -1
unduped = MapCreate()

for i=0 to count
   unduped[arr[i]] = i
Next

; The order is preserved in the map element values.
foreach planet in unduped
   distinctplanets[unduped[planet]] = planet
next

unduped = ArrayItemize(distinctplanets,@lf)
Message('Distinct Planets', unduped) 
exit
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

spl

Quote from: td on March 01, 2025, 07:49:37 AMWIL maps are an unordered associative container. By definition, they do not preserve any order. I tried your script using MapKeysGet(unduped,1) and did not get an error.


Thanks. I suppose I should have asked about preserving the 'natural order', i.e. the order of the array as established w/out duplicates, in this case Mercury,Mars,Venus,Earth,Jupiter,Saturn,Uranus,Neptune,andPluto with duplicates for Mars and Earth removed {even though Mars is wrong if the original array was based on distance from the sun). Even your code has them misplaced after de-duping. It is not an issue or worth arguing about, unless the array was integers and the outcome was expected to be ordered, in which case you would still want to sort the keys. I'll see how placing the dupe planets into a data table and applying SELECT DISTINCT... does.

Again, thanks for replying.

[EDIT]
change unduped[arr] = i  to unduped[arr] = 1 ;or @True

My code is unchanged, yours only finds Saturn [at least when I tested].
Stan - formerly stanl [ex-Pundit]

td

You are correct. The example I posted does not preserve natural order, as it makes no decision about which duplicate to use. It just uses the index of the last duplicate of a set of duplicates. If you reverse the order of the loop that loads the WIL map to iterate from high to low, you get the first duplicate instead of the last duplicate. But that does not create the "natural" order.

I am unsure what you mean by "My code is unchanged, yours only finds Saturn [at least when I tested]." The whole point of using the array index as the value is to preserve the order.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

As an exorsize exercize.

planets = ArrDimension(12,2)

planets[0, 0] = 'Mercury'
planets[0, 1] = 35
planets[1, 0] = 'Mars'
planets[1, 1] = 142
planets[2, 0] = 'Venus'
planets[2, 1] = 67 
planets[3, 0] = 'Earth'
planets[3, 1] = 93
planets[4, 0] = 'Mars'
planets[4, 1] = 142
planets[5, 0] = 'Jupiter'
planets[5, 1] = 484
planets[6, 0] = 'Mars'
planets[6, 1] = 142
planets[7, 0] = 'Saturn'
planets[7, 1] = 889
planets[8, 0] = 'Uranus'
planets[8, 1] = 1790
planets[9, 0] = 'Neptune'
planets[9, 1] = 2800
planets[10, 0] = 'Earth'
planets[10, 1] = 93
planets[11, 0] = 'Pluto'
planets[11, 1] = 3670

ArraySort(planets, @ASCENDING, 1)

; Dedup in order 
nmax = ArrInfo(Planets, 1) - 1
index = 0
dedupped[index] = Planets[0,0]
for i = 1 to nmax
   if StriCmp(dedupped[index], Planets[i,0]) == 0 then continue
   index += 1
   dedupped[index] = Planets[i,0]
next

orderedplanets = ArrayItemize(dedupped,@lf)
Message('Distinct Planets', orderedplanets) 
 
exit
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

spl

Quote from: td on March 01, 2025, 02:13:22 PMI am unsure what you mean by "My code is unchanged, yours only finds Saturn [at least when I tested]." The whole point of using the array index as the value is to preserve the order.

First, thank you for the 'excercise'. As for what you responded above. If you run the code below it should only return Saturn in the final message. It merely substitues 1 for i in the loop to create the unduped map, and since it is a value and not a key duplicate values should be allowed in a map. So I am confused why the foreach loop following does not cover all unduped planets.
;Winbatch 2025A - using map to remove dupes from array
planets = 'Mercury,Mars,Venus,Earth,Mars,Jupiter,Mars,Saturn,Uranus,Neptune,Earth,andPluto'
arr = Arrayize( planets, ',' )
duplicates= ArrayItemize(arr,@lf)
;Message('Duplicate Planets', duplicates)
count = ArrInfo (arr, 1) -1
unduped = MapCreate()

for i=0 to count
   unduped[arr[i]] = 1
Next

; The order is preserved in the map element values.
foreach planet in unduped
   distinctplanets[unduped[planet]] = planet
next

unduped = ArrayItemize(distinctplanets,@lf)
Message('Distinct Planets', unduped)
exit

For fun, I tried this approach assuming I could create a natural order in the map similar to the initial array. I assumed that by building the map per planet by first checking if the planet existed would come out in that order with MapKeysGet(). Now think Array2Array rather than Array2Map is a better solution for removing duplicates and preserving natural order.
;Winbatch 2025A - using map to remove dupes from array
planets = 'Mercury,Mars,Venus,Earth,Mars,Jupiter,Mars,Saturn,Uranus,Neptune,Earth,andPluto'
arr = Arrayize( planets, ',' )
duplicates= ArrayItemize(arr,@lf)
Message('Duplicate Planets', duplicates) 
count = ArrInfo (arr, 1) -1
unduped = MapCreate()

for i=0 to count
   key = arr[i]
   display(1,'Element %i%',key) ;indicate array natural order
   If !MapKeyExist(unduped,key)
      unduped[key] = 1
   Endif
Next

distinctplanets = MapKeysGet(unduped,2,",")
arr = Arrayize( distinctplanets, ',' )
unduped = ArrayItemize(arr,@lf)
Message('Distinct Planets', unduped) 
Exit
Stan - formerly stanl [ex-Pundit]

td

This loop produces a 2 element array with the first element being undefined:

; The order is preserved in the map element values.
foreach planet in unduped
  distinctplanets[unduped[planet]] = planet
next

I is equivalent to repeatedly executing this line "distinctplanets[1] = planet" because "unduped[planet]" always contains the value '1'.

Your second example is using a map as a form of index. It is a common approach I use regularly in unsorted data. It has the advantage of not producing empty elements in the final result array.  I am less sure of what you mean by "natural order." It does not, as far as I can tell, preserve the first occurrence element order of the array.

If you assume the first occurrence of an element is in the order you wish, simply load the map from the array's high index to low index.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

spl

Quote from: td on March 03, 2025, 07:56:39 AMIf you assume the first occurrence of an element is in the order you wish, simply load the map from the array's high index to low index.

Yep. Time for a UniqueArray(inputArray) [returns unique elements] UDF.

[EDIT]
And for my probable misuse of 'natural order' - suffice to say I meant removing duplicates but keeping elements in the same array order based on first position of the duplicate element.
Stan - formerly stanl [ex-Pundit]