Hello everyone,
I have an output.txt file like below. I would like to find all the duplicate server names from the first column and add up all the numbers from the second column (columns are seperated by comma delimiter) onto a single line (using winbatch). For example I need to look through the output.txt file and find all the "server1" which there are 3 lines that has "server1", then I combine all "server1" into 1 line and add up all the number from the second column (79+79+79=237) and it should become:
server1,237,VSP,01/27/25,,1262,
Same process for all other servers
Does any one have a sample or give me a hint?
Thanks in advance!
KLE2
output.txt
***********************
server1,79,VSP,01/27/25,,1262,
server1,79,VSP,01/27/25,,1263,
server1,79,VSP,01/27/25,,1264,
server2,100,VSP,01/27/25,,1265,
server6,100,VSP,01/27/25,,1266,
server2,100,VSP,01/27/25,,1268,
server5,430,VSP,01/27/25,,1294,
server2,100,VSP,01/27/25,,1269,
server2,100,VSP,01/27/25,,1276,
server2,380,VSP,01/27/25,,1277,
server2,380,VSP,01/27/25,,1278,
server2,380,VSP,01/27/25,,1279,
server3,80,VSP,01/27/25,afps,1291,
server2,430,VSP,01/27/25,,1292,
server2,430,VSP,01/27/25,,1293,
server6,430,VSP,01/27/25,,1294,
server6,430,VSP,01/27/25,,1295,
server2,430,VSP,01/27/25,,1296,
server2,430,VSP,01/27/25,,1297,
It depends on whether you want to preserve the other fields on each line to some extent. Either way, a WIL Map would be a straightforward solution. It would involve the ArrayFileGetCSV and MapKeyExist functions along with creating a map from the WIL array returned by ArrayFileGetCSV.
It may not address your problem directly and the script makes a lot of assumptions, but it may be enough to get you started.
aWithDups = ArrayFileGetCsv('Output.txt', 0)
aSums = MapCreate()
nMax = ArrInfo(aWithDups, 1) -1
for i=0 to nMax
if MapKeyExist(aSums, aWithDups[i,0])
aSums[aWithDups[i,0]] += aWithDups[i,1]
else
aSums[aWithDups[i,0]] = aWithDups[i,1]
endif
Next
MapFilePutCsv('Summed.txt', aSums)
Message('Example Test', FileGet('Summed.txt'))
Thanks for quick responding, TD!
Yes, I only want to combine the duplicated name and add up the second column. The other columns can be anything, I am not worry much about them.
I got an error "Uninitialized variable, undefined function...for the aSums = MapCreate(). Is there any extender I need to add?
Thanks,
kle2
Maybe that message means your version does not know the MapCreate function.
What version of WinBatch are you running?
MapCreate came in at version 2019A, I think.
Quote from: kdmoyers on January 28, 2025, 03:38:30 AMMaybe that message means your version does not know the MapCreate function.
What version of WinBatch are you running?
MapCreate came in at version 2019A, I think.
Oh...that makes sense. I am using Winbatch version 2016A. I will have to find another way :(
Thanks kdmoyers!
Quote from: kle2 on January 28, 2025, 04:46:56 AMOh...that makes sense. I am using Winbatch version 2016A. I will have to find another way :(
Probably a file-read / aggregate into fabricated recordset / then output with getstring(). Not as convenient as Tony's suggestion, but very workable, and not complicated to understand. I can cook up a simple script if you wish.
First, I want to clarify exactly what the desired output is. Here is the output of a script I wrote in another language (AWK, not WinBatch). Is this what you want (given the data given above):
server1,237,VSP,01/27/25,,1262,
server2,3260,VSP,01/27/25,,1265,
server3,80,VSP,01/27/25,afps,1291,
server5,430,VSP,01/27/25,,1294,
server6,960,VSP,01/27/25,,1266,
Note that the data in the 3rd through fifth fields is essentially ignored, and what I did was to preserve the first line for each server name, just replacing the 2nd field with the total.
Assuming that is correct, re-writing it in WB should be straightforward, but note that, as I've mentioned a few times in the past, WB is not an ideal language for doing normal text/data processing tasks. But it can be done.
Here is another approach that while more resource-intensive might work with the OP's version of WinBatch.
; The tried and true algorithm for dedupping data...
aWithDups = ArrayFileGetCsv('Output.txt', 0)
nRows = ArrInfo(aWithDups, 1)
ArraySort( aWithDups, @ASCENDING , 0)
aSums = ArrDimension(nRows, 2)
j = 0
aSums[j,0] = aWithDups[0,0]
aSums[j,1] = aWithDups[0,1]
nMax = nRows - 1
for i = 1 to nMax
if StriCmp(aWithDups[i-1, 0], aWithDups[i, 0])
j += 1
aSums[j,0] = aWithDups[i,0]
aSums[j,1] = aWithDups[i,1]
else
aSums[j,1] += aWithDups[i,1]
endif
next
ArrayRedim(aSums,j+1, -1)
ArrayFilePutCsv('Summed.txt', aSums)
Message('Example Test', FileGet('Summed.txt'))
exit
Snow++ made a good point, as there were no duplicate lines in the original post. So the assumption being you wanted all columns for the first instance of server, but the subsequent value of column 2 for each server aggregated, the code below will do that, although super tweakable. It assumes your input is a file that can be read. The fabricated recordset is based on the number of items per the delimiter used (in this case a comma). You can be more creative with field names. I suggested an output file but chose to display Getstring(), as you might prefer fileout with trimmed fields. Also, there is a bit of variable substitution, but as I reflected on another thread, no harm no foul....
input = Dirscript():"servers.txt"
output = Dirscript():"serversaggregate.txt"
If !FileExist(input) Then Terminate(@TRUE,"Input File Not Foubd",input)
rs=CreateObject("ADODB.Recordset")
rs.CursorType = 1
rs.LockType = 3
rs.Fields.Append("f1",129,10,32768)
rs.Fields.Append("f2",3,10)
rs.Fields.Append("f3",129,20,32)
rs.Fields.Append("f4",129,20,32)
rs.Fields.Append("f5",129,20,32)
rs.Fields.Append("f6",129,20,32)
rs.Fields.Append("f7",129,20,32)
rs.Open()
h = FileOpen(input, "READ")
While @True
line = FileRead(h)
If line == "*EOF*" Then Break
svr = ItemExtract(1,line,",")
rs.Find("f1='%svr%'")
if rs.eof()
rs.Addnew()
for i= 1 to 7
rs.Fields("f%i%").Value = ItemExtract(i,line,",")
next
else
rs.Fields("f2").Value = rs.Fields("f2").Value + ItemExtract(2,line,",")
endif
rs.update()
rs.MoveFirst()
Endwhile
FileClose(h)
Message("Aggregated Server",rs.Getstring(2,,",",))
rs.close()
rs=0
exit
In AWK, I used AWK's "associative arrays" - which are arrays with string indexes. The server name is the index of the array(s); this gets you the "de-dupe" functionality, as there can only be one array element for each index (i.e., server name).
If I was doing this (in WB), I'd use the DataSaver library (written a million years ago by Marty), as that implements associative arrays in WB.
Quote from: spl on January 28, 2025, 06:00:06 AMQuote from: kle2 on January 28, 2025, 04:46:56 AMOh...that makes sense. I am using Winbatch version 2016A. I will have to find another way :(
Probably a file-read / aggregate into fabricated recordset / then output with getstring(). Not as convenient as Tony's suggestion, but very workable, and not complicated to understand. I can cook up a simple script if you wish.
Thanks for the offering spl. Yes, please give me a script.
kle2
Quote from: snowsnowsnow on January 28, 2025, 06:44:12 AMFirst, I want to clarify exactly what the desired output is. Here is the output of a script I wrote in another language (AWK, not WinBatch). Is this what you want (given the data given above):
server1,237,VSP,01/27/25,,1262,
server2,3260,VSP,01/27/25,,1265,
server3,80,VSP,01/27/25,afps,1291,
server5,430,VSP,01/27/25,,1294,
server6,960,VSP,01/27/25,,1266,
Note that the data in the 3rd through fifth fields is essentially ignored, and what I did was to preserve the first line for each server name, just replacing the 2nd field with the total.
Assuming that is correct, re-writing it in WB should be straightforward, but note that, as I've mentioned a few times in the past, WB is not an ideal language for doing normal text/data processing tasks. But it can be done.
I can try anything as long as it works. Yes please give me the script. Thanks, kle2
Quote from: td on January 28, 2025, 09:19:04 AMHere is another approach that while more resource-intensive might work with the OP's version of WinBatch.
; The tried and true algorithm for dedupping data...
aWithDups = ArrayFileGetCsv('Output.txt', 0)
nRows = ArrInfo(aWithDups, 1)
ArraySort( aWithDups, @ASCENDING , 0)
aSums = ArrDimension(nRows, 2)
j = 0
aSums[j,0] = aWithDups[0,0]
aSums[j,1] = aWithDups[0,1]
nMax = nRows - 1
for i = 1 to nMax
if StriCmp(aWithDups[i-1, 0], aWithDups[i, 0])
j += 1
aSums[j,0] = aWithDups[i,0]
aSums[j,1] = aWithDups[i,1]
else
aSums[j,1] += aWithDups[i,1]
endif
next
ArrayRedim(aSums,j+1, -1)
ArrayFilePutCsv('Summed.txt', aSums)
Message('Example Test', FileGet('Summed.txt'))
exit
Wow! This works very well for my purpose. Thanks td!
One more question, if I want to put the rest of the fields of the first row in the result? for example the result for server1 would be:
server1,237,VSP,01/27/25,,1262,
How do I do that?
Thanks
kle2
Well, I'll be. WB now supports += (as in C) ? Who knew?
Quote from: kle2 on January 28, 2025, 11:38:54 AMQuoteWow! This works very well for my purpose. Thanks td!
One more question, if I want to put the rest of the fields of the first row in the result? for example the result for server1 would be:
server1,237,VSP,01/27/25,,1262,
How do I do that?
Thanks
kle2
Just create the aSums array with the same number of columns as the aWithDups array. You then create a loop to copy there remaining columns from aWithDups to aSums when ever you find a new server name in the first column.
; The tried and true algorithm for dedupping data...
aWithDups = ArrayFileGetCsv('Output.txt', 0)
nRows = ArrInfo(aWithDups, 1)
nCols = ArrInfo(aWithDups, 2)
ArraySort( aWithDups, @ASCENDING , 0)
aSums = ArrDimension(nRows, nCols)
nMaxRow = nRows - 1
nMaxCol = nCols - 1
j = 0
for k = 0 to nMaxCol
aSums[j,k] = aWithDups[0,k]
next
for i = 1 to nMaxRow
if StriCmp(aWithDups[i-1, 0], aWithDups[i, 0])
j += 1
for k = 0 to nMaxCol
aSums[j,k] = aWithDups[i,k]
next
else
aSums[j,1] += aWithDups[i,1]
endif
next
ArrayRedim(aSums,j+1, -1)
ArrayFilePutCsv('Summed.txt', aSums)
Message('Example Test', FileGet('Summed.txt'))
exit
Quote from: td on January 28, 2025, 01:58:35 PM; The tried and true algorithm for dedupping data...
aWithDups = ArrayFileGetCsv('Output.txt', 0)
nRows = ArrInfo(aWithDups, 1)
nCols = ArrInfo(aWithDups, 2)
ArraySort( aWithDups, @ASCENDING , 0)
aSums = ArrDimension(nRows, nCols)
nMaxRow = nRows - 1
nMaxCol = nCols - 1
j = 0
for k = 0 to nMaxCol
aSums[j,k] = aWithDups[0,k]
next
for i = 1 to nMaxRow
if StriCmp(aWithDups[i-1, 0], aWithDups[i, 0])
j += 1
for k = 0 to nMaxCol
aSums[j,k] = aWithDups[i,k]
next
else
aSums[j,1] += aWithDups[i,1]
endif
next
ArrayRedim(aSums,j+1, -1)
ArrayFilePutCsv('Summed.txt', aSums)
Message('Example Test', FileGet('Summed.txt'))
exit
Perfect! Thank you again, TD!