home > notes > hog [ printer-friendly version ]
Git
SSH and scp
FCGI config
8 page book
Animal Crossing
AOL Images
automount
baked beans
Bass Guitar
AOL main.idx
Crash IE
Ham Radio
Disk Hog
Dual Heads
favicon.ico
FireFox prefetch
Highway Hacking
ISO images
Chip's Challenge
ladder
lunch
mmencode
PHOENIX
PostScript
Powershell
prtdiag
rsync
samba
sims cheats
Sim Tower hack
Solaris USB
spray cans
SSI Banners
HTML Symbols
ToolTalk error
vistium
CDE Wallpaper
Windows 7 Password

Where's My Disk Space Gone?

Disk space is cheap, but it's not unlimited - and if you manage a big archive file system server, it's easy to lose track sometimes of which files and directories are taking up a lot of room.

The Unix 'du' command is a handy utility that will tell you how much disk space is consumed by files and subdirectories, but its output is not presented in a particularly friendly format.

Here's a simple little Perl script that massages the output of 'du' into a report that highlights the biggest files and directories in a file system:

#!/usr/bin/perl -w # # THRESHOLD = minimum number of kilobytes to report $THRESHOLD = 100000; $REPORT = "/tmp/hog.txt"; open (OUTFILE,">$REPORT"); open (DIRDATA,"du -ak . | sort -rn |"); while (<DIRDATA>) { chomp; ($size,$entry) = split; if ($size < $THRESHOLD) { last; } printf OUTFILE "%10u KB\t%s\n", $size,$entry; } close DIRDATA; close OUTFILE;

The script is pretty straightforward; just go stand in the directory you want to analyze and execute it. The du command starts looking down through the file system from wherever you're standing and reports sizes in kilobytes, and the sort presents the results in descending order.

One of the inconvenient things about running du like this is that it reports everything - you'll see the big files and directories that are taking up huge chunks of disk space, but you'll also see every one of the dinky little one-kilobyte files that you don't really care about.

In order to cut the huge roar of data coming out of du to a useful size, the Perl script uses the $THRESHOLD variable to throw away any file/directory listing under a certain size. In this example, I don't care about anything under 100 megabytes - and that should tell you something about the size of the disk arrays I manage with this script. I use the %10u numerical format to make the numbers line up neatly and remind myself to think in kilobytes, and voila - I've got a simple little report that tells me where all my disk space has gone.

This is certainly not the most efficient report to run - du actually lists every file and subdirectory from your starting point down, and that entire result is fed through the sort utility before it even gets to the Perl script. The script is at least smart enough to bail out of the loop when the size figure drops below $THRESHOLD, though - since the list is in descending order, we know that subsequent entries are all going to be less than $THRESHOLD as well.


home Reality 2.0: Score counter, extra men, and hints! privacy