Vote for this Blog

I ran out of space on my harddrive today. Serves me right trying to get along with a minimalistic 20gb /home partition.

Instead of trying to install or use a graphic tool to see what was using up the most space I decided I needed a quick script to get a picture of what is going on at /home.

Here is a quick overview of what I did…

Now for those well versed in bash-fu lurking in the wings to tear my pitiful mash-up of commands to bits with a few deft taps of the keyboard – here’s looking at you Jaco – this is the result of typing away at bash while sitting behind the steering wheel in the parking lot waiting to pick up my brother after work.

Getting the commands right

To get a list of all the files in my /home partition I used:

du -ah /home/quintin

This gives me du (disk usage) a (all) h (human readable output) of /home/quintin

Now that will give you a long list of all the files and directories in your (in this case my) home folder. In the order that the command reads the partition. Very difficult to sort through.

Did you say “sort?”

I expand my command to sort the output from smallest to largest, like so:

du -ah /home/quintin | sort

Notice the “|”? No it is not some geeky monolith sign – it is known as a “pipe.” This pasically takes the output from the first command (du -ah /home/quintin) and pipes (imports) it into the next command (sort.)

Now this is all and dandy, but the output of the average /home directory is very long, so to read through it is impossible in a terminal window, you will not be able to scroll all the way back.

So the best idea is to output the results of the commands to a text file.

du -ah /home/quintin | sort >diskusage.txt

You can basically call the file whatever you want, just remember what it was.

After running this, you will now have a nice text file to read through – but wait: sort -erm- “sort’s” the output by the first character of the lines in the output that you get, so 10K, 11M and 12G are right behind each other in the file, not a sensible way to work with the file, and a pain to read through.

awk swoops in

awk is a nice language to parse text. It is really useful if you want to remove certain information from a list of text.

In the above example you will see that the text, although sorted by the first character of each line, has the kilobyte, megabyte and gigabyte size files all mixed together. Since there are yonks of kilobyte size files you will be more interested in finding out what are the larger files lurking in your system.

In this case a very basic usage of awk will allow you to make some sense of the output, but awk needs a helper, and this is where grep steps in.

Take the output of the file you created in step one of your script and apply the following commands to it:

First awk:

awk ‘{print $1}’ diskusage.txt

This means the following: awk (the text parser) ‘{print $1}’ (print to the terminal output the first group of characters of the line seperated by the [spaces] in the line – to specify different characters to use as spaces apply -F and the character you want to use like @) and then you specify the file to read, in this case the file created with the first set of commands.

Now pipe (we did this above as well) the output to grep:

awk ‘{print $1}’ diskusage.txt | grep G >gigs.txt

We pipe | the output to grep (this searches for specified characters in an output or a file [more on this below] and displays only the lines with the character or phrase in it, and then we output > this to a new file called gigs.txt

In gigs.txt is only the sizes of the largest files, but without the file paths. A file full of 1.2G, 2.4G etc is useless to us, so we need to find the lines in our original file with those values and output that so that we can see where the large files and directories are.

grep has its moment of glory

grep is useful in that we can tell it to look for the content of one file (every new line is used as something to look for) in another file, in our case we do it thusly:

grep -f gigs.txt diskusage.txt

By now the picture of what is happening should become pretty clear to you. In essence we are using grep to look for lines with the content of gigs.txt in the file diskusage.txt and this time we are not sending the output to a text file.

In your terminal you should see something like this:

1.1G    /home/quintin/.VirtualBox/HardDisks/Win95.vdi
1.2G    /home/quintin/.evolution
1.2G    /home/quintin/.evolution/mail
1.2G    /home/quintin/.evolution/mail/local
12G    /home/quintin/.VirtualBox
12G    /home/quintin/.VirtualBox/HardDisks
15G    /home/quintin
1.5G    /home/quintin/.VirtualBox/HardDisks/KarmicKoala.vdi
2.4G    /home/quintin/.VirtualBox/HardDisks/UbuntuNetbook.vdi
6.1G    /home/quintin/.VirtualBox/HardDisks/PC-BSD.vdi

And I have found my culprit! My Evolution mailbox is clearly huge, and the largest files are the ones I use as my virtual machine hard drives.

But wait, there’s more!

At the beginning of this blog I mentioned putting this in a script. This is very easy to do, and requires using a text editor and one more command in the terminal befor you can run all this at the same time and read the results at your leasure in a text file.

First off open a text file and on the first line enter this:

#/bin/bash -x

This is the first line that your script will see, and essentially tells it what shell to use to run the commands in it.

Now enter your commands one after the other, each on a new line in the text file, like this:

du -ah /home/quintin | sort >diskusage.txt
awk ‘{print $1}’ diskusage.txt | grep G >gigs.txt
grep -f gigs.txt diskusage.txt >largefiles.txt

The eagle eyed among you will notice that I added an extra > after the last grep command in order to be able to store the output of my command in a file for later viewing.

Once you are done you will have something like this:

#/bin/bash -x

du -ah /home/quintin | sort >diskusage.txt
awk ‘{print $1}’ diskusage.txt | grep G >gigs.txt
grep -f gigs.txt diskusage.txt >largefiles.txt

Save this file as something with a .sh extension, like filesizes.sh

(.sh is the general norm for shell scripts.)

And now for the final flourish

Now you need to make your script executable, so that it will run as a program. This is easily achieved by running the following command:

chmod +x filesizes.sh

Now your file is an executable. If you double-click on it a dialogue will pop up asking if you want to display it’s contents, run it in a terminal, or just run it.

You can also run the file from the command line by using ./ like so:

./filesizes.sh

And there you go. You will now get a file called largefiles.txt in the same directory that the script was running in where you will have a list of all your gig+ sized files. If you want to have a list of all the meg+ sized files just substitute all the G’s in the commands that make up your script with M’s and you will get the appropriate output.

It is in fact possible to have both in your script and output them to separate files with the same script. Since there are many ways to achieve this I’ll leave you to figure that out with a little tinkering. Oh, and remember to replace all the ‘quintin’s in my commands with your username or the name of the home directory that you want to search.

No related posts.