Periodic Tarballs: Find and Awk
I have a gang of directories. There’s a new directory each day! Each directory stores about 20k files. Over time, the performance of the directory hierarchy degrades, and it’s best for me to tar up old stuff.
In linux-land, there’s always more than one way of doing things… even if it’s not the right way. Today, we’ll explore find piped to awk piped to the shell.
find . -type d -maxdepth 1 -mtime +60 | sort | awk -F"/" '{print "tar czf " $2 ".tar.gz --remove-files ./" $2 "; rmdir " $2 }' | sh
Basically, find all sub-directories in the current directory — don’t go recursive. Sort for sanity. Extract the file name portion (lots of ways to do this). Generate a shell command tar czf {{dirname}}.tar.gz --remove-files ./{{dirname}}; rmdir {{dirname}} to build the tarball and remove the empty directory when done. Finally, we just tell sh to execute it all… one directory at a time. Bam!
Summing a list of numbers
awk '{sum += $0} END {print sum}'
Example:
% find . -type f -exec wc {} \; | tr -s " " | cut -f2 -d" " | awk '{sum += $0} END {print sum}'
What’s going on? I want to recursively count the number of lines present in all files contained by the current directory. Why? Sub-directories => namespaces, and I want to know how many lines of code exist in the entire project, namespaces and all.