5.27.2009

Fun with Find

Recently I needed to clean up a bunch of temp files being create by a web application but, not being properly removed. I turned to find as a stop gap solution to this problem while the real problem was being solved. In my situation I knew that no file should survive longer than two hours in the directory where the files were being created. So using find I check the ctime on the files and remove them if it is more than 120 minutes old. Using find's -cmin option with +120 will give us files that have not been changed in the last 120 minutes, a negative number (-120) would give us files only files that have been changed in the last 120 minutes. Since I only want to delete files, and keep directories I also use the -type f option. If I simply run

shell> find /path/to/files -cmin +120 -type f

This will give me a list of all files that have not been changed in the last 120 minutes below the path I've given find. If I wanted to make sure find didn't list files in subdirectories I would add the option -maxdepth 1.

So now we have a list of the files we want to delete, using the -exec option will allow us to delete them. This is pretty much the same as piping the out put of find to xargs but it should use a little less overhead. With the -exec option foreach thing we've found with find the command we supply is executed. Find 25 items with find and your command is executed 25 times. Since we want the files we've found to be part of the command, we use '{}' as a variable. We also need to let find know when the command we are supplying is finished. You use a ';' for that, but since ';' has it's own meaning in bash, it needs to be escaped. So our option to remove all the files we've found without confirmation should looks like

-exec rm -f '{}' \;

If we used '+' instead of ';' find would try to string many of our files together and cause the command to be executed fewer times. This could be more efficient depending on the command we are running.

Now I needed to run this command every hour, so I created a little bash script to start this process up. I set it's nice level high so it's priority would be low, and setup a little checking to make sure we don't start this job up again if it hasn't finished from the last time it was started.

#!/bin/bash

# http://greg-techblog.blogspot.com
TMPPATH=/var/www/html/tmp/

# Check if cleantmp.cron is already running
if [ ! -f /var/run/cleantmp.pid ]; then
# Delete all files in tmp that are older than two hours
# run at lowest priority
nice -n 19 find $TMPPATH -nowarn -cmin +120 -type f -exec rm -f '{}' \; &
PID=$! # Get the pid of the find process
echo "$PID" > /var/run/cleantmp.pid # create a pidfile
wait $PID # wait for find to finish
rm -f /var/run/cleantmp.pid # remove pidfile
exit
fi

# cleantmp is already running
echo cleantmp.cron already running
cat /var/run/cleantmp.pid # print pid of find process

No comments:

Post a Comment