Where's My Disk Space Gone?
Disk space is cheap, but it's not unlimited - and if you
manage a big archive file system server, it's easy to
lose track sometimes of which files and directories are
taking up a lot of room.
The Unix 'du' command is a handy utility that will tell
you how much disk space is consumed by files and subdirectories,
but its output is not presented in a particularly friendly
format.
Here's a simple little Perl script that massages the output
of 'du' into a report that highlights the biggest files and
directories in a file system:
#!/usr/bin/perl -w
#
# THRESHOLD = minimum number of kilobytes to report
$THRESHOLD = 100000;
$REPORT = "/tmp/hog.txt";
open (OUTFILE,">$REPORT");
open (DIRDATA,"du -ak . | sort -rn |");
while () {
chomp;
($size,$entry) = split;
if ($size < $THRESHOLD) { last; }
printf OUTFILE "%10u KB\t%s\n", $size,$entry;
}
close DIRDATA;
close OUTFILE;
The script is pretty straightforward; just go stand in the
directory you want to analyze and execute it. The du
command starts looking down through the file system from
wherever you're standing and reports sizes in kilobytes,
and the sort presents the results in descending
order.
One of the inconvenient things about running du like
this is that it reports everything - you'll see the
big files and directories that are taking up huge chunks of
disk space, but you'll also see every one of the dinky little
one-kilobyte files that you don't really care about.
In order to cut the huge roar of data coming out of du
to a useful size, the Perl script uses the $THRESHOLD
variable to throw away any file/directory listing under a
certain size. In this example, I don't care about anything
under 100 megabytes - and that should tell you something
about the size of the disk arrays I manage with this script.
I use the %10u numerical format to make the numbers
line up neatly and remind myself to think in kilobytes, and
voila - I've got a simple little report that tells
me where all my disk space has gone.
This is certainly not the most efficient report to run - du
actually lists every file and subdirectory from your starting
point down, and that entire result is fed through the sort
utility before it even gets to the Perl script. The script is
at least smart enough to bail out of the loop when the size
figure drops below $THRESHOLD, though - since the list
is in descending order, we know that subsequent entries are all
going to be less than $THRESHOLD as well.
|