Size on Disk

Started by IJRobson, March 13, 2015, 07:06:00 AM

Previous topic - Next topic

IJRobson

Windows "File Properties" reports both File Size and Size on Disk.

Is there a method to report the "Size on Disk" instead of the File Size?

Thanks

td

Give the following a try:

Code (winbatch) Select
strDisk = "C:"
strFile = strDisk:"\Logs\vss.log"

nSectorSize = DiskInfo(strDisk, 1) * DiskInfo(strDisk, 2)
nFileSize = FileSize(strFile)
if nFileSize mod nSectorSize then  nSizeOnDisk = (nFileSize/nSectorSize + 1) * nSectorSize
else nSizeOnDisk = nFileSize

Message("Size on Disk", strFile:@CRLF:nSizeOnDisk/1024:" KB")
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

IJRobson

Thanks for the code and yes this provides the theoretical size on the disk.

In my case I have a number of Files were the Size on Disk is a lot smaller than the reported file size.  I need to detect these files and remove them.  I can check the File Properties to determine which files are truncated but there are a lot of them and it would be easier if WinBatch could help.

So what I am after is the "Size on Disk" value as reported within File Properties.

Thanks

td

The only way I am aware of  that a file's disk size can be significantly less than its reported file size is if it is on a drive that supports compression and the file is compressed or sparse.   In that case you can detect significant differences in size by adapting something like the following to your specific needs:

Code (winbatch) Select
strDisk = "C:"
strFile = strDisk:"\Logs\vss.log"

; Note that the compressed or sparse size is not the same as the size on disk.
; However, the size on disk can be computed from the compressed size.
#DefineFunction CompOrSparseSize(_File)
   hSizeHigh = BinaryAlloc(4)
   nSizeLow  = DllCall("kernel32.dll",long:"GetCompressedFileSizeA", lpstr:_File, lpbinary:hSizeHigh)
   nSizeHigh = BinaryPeek4(hSizeHigh, 0)
   BinaryFree(hSizeHigh)
   return (nSizeHigh * 2.0**32) + nSizeLow
#EndFunction

nCmSpSize = CompOrSparseSize(strFile)
nFileSize = FileSizeEx(strFile)

if nFileSize > nCmSpSize then strText = "File is spares or compressed."
else strText = "File is not spares or compressed."

Message("Spars or compressed file test", strText)


Assuming by 'File Properties'  you are referring to the Windows shell properties context menu 'General' tab size information.  I am not aware of the shell providing a mechanism for obtaining the 'Size on Disk' attribute.   
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

Need to add that you can also get strange file size results when a file is still open for writing but it doesn't sound like this is pertinent in this case.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

IJRobson

I tried the code and it did work and provided a solution to what I was trying to do but not in the way expected?

The problem files are files that are downloaded by a third party application and what sometimes happens is the file do not fully download.  The File Size is marked as the size of the File to be downloaded when download starts but the File contents saved on the HDD is sometimes smaller thus did not fully download.  Thus my need to locate and delete them so the file is downloaded again.

Your code example returns the file size on files successfully downloaded or 0 if the file did not fully download.  As I just need to identify these files this works fine for me as I just delete any file reported with a returned value of 0.

It would be nice to fine a way to determine the "Size on Disk" value as returned by Windows but I have a workaround.

Thanks

td

Again assuming that 'by Windows' you mean the Windows Shell (Explorer.exe), the shell could be using one of several methods to cook up the number you view in the Properties General tab.   The number may involve counting multiple stream allocation sizes of alternate file data  streams associated with the file or perhaps it includes file data stored in the extra space of  the master file table.  There are other possibilities but, obviously, a file's size on disk is not as simple a metric as it sounds.   
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

ChuckC

I'd be interested in seeing an example of this situation, as it seems to run counter to how file sizes are tracked.

The actual size of a file is the size, in bytes, of the default/unnamed named stream, and represents the greatest offset from the beginning of the file at which data has been written.  It is possible for a program performing a file download to know in advance how big the file will be, and thus allocate all of the space necessary to contain the file's data, and then fail to complete the download.  In that scenario, the file may contain some portion of uninitialized data beyond the portion that was downloaded, but the overall allocation remains the same, and the size on disk should be accurate.

The size on disk, however, can actually have one of several meanings.  For a non-sparse/uncompressed file, this is the total number of allocation units required to contain the file's data, multiplied by the size of an allocation unit.  By default, the allocation unit, or cluster size, is 4KB.  So, a file that contains only a single byte of data has a size on disk of 4KB on most NTFS volumes that were formatted using the default settings.

In the case of compressed files, the size on disk is the compressed size of the file.

In the case of sparse files, the size on disk represents actual allocation units that have been used to store data in the portions of the sparse file that have data mapped in them.



td

The file size formula you mention is represented by the first posted script above because WinBatch uses the seek method to determine file size.  MSFT provided the GetCompressedFileSize function because other Win32 API techniques do not accurately report the size for sparse files or files on disks with compression turned on (according to MSFT engineers.) 

The OP apparently wanted the size information as presented by the Windows shell and therein lies the rub. 
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

ChuckC

By saying it uses the "seek method", you mean it's opening it for read access, doing a seek to a location that is at offset zero from the end of the file, and then a "tell" operation to find out the absolute offset from the beginning?

If so... perhaps what is needed, then is some DllCall usage of FindFirstFile() to get back a WIN32_FIND_DATA structure for the file in question, as the size value that the Windows Explorer shell reports is embedded in that structure in a pair of high & low parts.

td

The FindFirstFile method  is precisely the reason given by MSFT developers for providing the GetCompressedFileSize function.  The issue isn't how WinBatch is determining file size.  As mentioned earlier in this topic the issue is that file size can mean multiple things.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

ChuckC

I agree, the unqualified term "file size" is not specific enough to indicate which particular value is desired.  However, if we're using the data shown by the Windows Explorer when viewing the "General" tab of the properties page for a file in the Windows Explorer, then the "Size" value is obtained from FindFirstFile(), and the "Size on disk" value may be obtained via GetCompressedFileSize(), with the appropriate attribute bits for "compressed" and "sparse" needing to be consulted to know for certain if the size on disk should be computed based on the # of allocation clusters used to contain the uncompressed/non-sparse data, or if it be obtained from GetCompressedFileSize() and represents either the compressed size on disk or the actual amount of storage allocated to the segments of the file that contain actual data.  In the latter 2 cases, it's always going to be a size value that is a multiple of the allocation cluster size.  If GetCompressedFileSize() is called for a file that is neither compressed nor sparse, then it returns the size in bytes same as you'd get from FindFirstFile().


td

Again the problem isn't WinBatch.    The issue reported by the user is that the shell's File Size on Disk is neither the the size computed from the size reported by FindFirstFile nor the size computed from or directly from GetCompressedFileSize.  As the OP pointed out  GetCompressedFileSize returned 0 for the files in question whereas the shell shows a value less than the file size but greater than 0.  Also note that WinBatch and the shell report the same file size.

In other words, the assumption that the GetCompressedFileSize will return the same value as FindFirstFile on files that are not compressed nor spares is, documentation not withstanding, not necessarily correct.  Nor it is safe to assume that the size on disk reported by the shell can be computed directly from either API.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade