Little Bash Tools

When we talk about building software, one of our goals is to write software components that do one thing well. We can then combine these components to build more powerful programs. Any ruby conference you go to will have talks on these ideas, as will most blogs. I’m not going to go into detail about why small, composable components are a good idea, that has been covered many times before. Instead, I want to show some examples using an underutilized environment, the UNIX shell.

For example, while working on converting this blog to Jekyll, I wanted to edit all of the posts from 2007 in the order they appeared in the archive (reverse chronologically). I knew that I could list the files in the order I wanted by simply running ls -r _posts/2007*. Due to the flexibility of UNIX, I could use this little command as a part of a command to start vim:

$ vim $(ls -r _posts/2007*)

Another example

Not all little scripts need to be just one line. I was inspired to write this post after building a little exercise timer. As part of my physical therapy, I had to do a simple exercise circuit that consisted of four rounds of five different exercises. I was supposed to do each exercise for one minute before moving on. The first time around, I used my stopwatch to time the exercises. It was a pain! I couldn’t easily see if I was done and I kept dropping it while doing crunches. To make life easier, I wrote a little script that just made a sound every minute. Here it is:

while true
  say "change"
  sleep 60

That’s all it took to get my computer to tell me when to move to the next exercise. It’s a great example of why I love being a programmer.

Explaining something you may have seen

If you’ve ever been responsible for managing a server, you’ve probably seen a command like:

ps -ef | grep apache | grep -v grep | awk '{print $2}' | xargs kill -9

(If you’ve had the luck not to manage a server, that’s the command you would use to kill all running apache processes.) Have you thought about how it works? This is the perfect example of the composability of simple components.

Let’s look at what we would do if we wanted to kill all processes manually:

  • Run ps -ef to get a list of all processes
  • Find the ones named apache and then write down their process id numbers
  • Run “kill -9 pid1 pid2 etc.” to kill the processes.
  • Repeat to make sure you didn’t miss any

This type of manual process is meant to be automated. The trick is understanding how UNIX works. UNIX was built early on as a text processing language. Each command gets an input stream that it reads from and an output stream that it writes to. To make individual commands more powerful, UNIX includes the concept of a pipe or |. By combining commands with the pipe symbol, you are feeding the output of one command into the input of the next.

While this concept is simple, it allows tools to be built to do just one small thing, and do it very well. For example, the grep command is made to match patterns in text. In the above example, we could use it to find the list of only apache processes by running:

  ps -ef | grep apache

This command takes the output of ps -ef a list of all processess running, and makes it the input to the grep command. The grep command then only outputs lines that match the pattern “apache”.

We can continue to iteratively add to our command, doing more of our manual steps as additional parts of the pipeline. Because the output of all of these commands is just text, we can inspect it piece by piece to make sure we’re doing the right thing. For example, if we ran the above command, we might notice that the grep apache command was included in the list of processes. We can remove that command by adding another step to the pipeline that elimintaes it. For example:

  ps -ef | grep apache | grep -v grep

Once we have that done, we might want to isolate just the process ids so that we can kill them. We could do that a couple of ways. My preferred way is to use the awk command to print the value of the second field. For example:

  ps -ef | grep apache | grep -v grep | awk '{print $2}'

Now that we’ve got the list of process ids, our last step is to run the kill command. In this case, we need to run kill on each value. The xargs command will do exactly that. It takes each line in the input stream and adds it to the end of a command line. So if xargs sees as input param1\nparam2 and you ran xargs echo it would build and run the command line echo param1 param2. This is exactly what we want to do with the kill command. We’ll put it all together to get

  ps -ef | grep apache | grep -v grep | awk '{print $2}' | xargs kill -9

The result will be that all apache processes are killed.

Of course, the trick here is that you need to know that all of these little commands exist. If you didn’t know about grep, or awk, this task would likely be much more difficult. The same is true for any programming environment. In general, you will be a much more productive programmer if you know what tools exist in your environment and learn to use them. While there is a cost to learning new tools, it has always paid off for me.