Communication Breakdown
While the focus of our blog is to put out information about planning, managing, and deploying both applications and infrastructure, we also spend a lot of time thinking about how to make work generally more enjoyable and productive. This post is about one of those thoughts. It may be considered a call-to-arms of sorts. ;)
Email Sucks

Email was once a technical wonder, enabling near-instant communication regardless of distance. It’s still capable of that, but we no longer use it for that purpose, and even if we did, we have to sift through piles and piles of junk before we actually get to meaninful conversation … and that is just the user-side of the equation.
The administration side of the email equation is a near-endless game of catch-up for security patches, SPAM filtering, and generally making the decades-old technology scheme continue working in a more modern era. To give an idea, Canvas does not provide email hosting simply because there are providers that can do a much better job of it than we can.
So, What Do We Do?
We don’t know the answer to the communication problem right now. Services like Facebook go a long way toward this end, but only for Facebook users. Google put a lot of really good ideas into Wave, but those ideas were only exposed, again, to Wave users. Something more ubiquitous is needed.
Paul Graham recently listed “replace email” as one of the items on his list of frighteningly ambitions startup ideas, indicating that we tend to use email as more of a to-do list nowadays, so we should replace it with an actual to-do list protocol. We’re not sure how we feel about the pure to-do list as an email replacement, but some of the usage ideas that he presents are fairly intriguing.
Let’s Talk
As we said above, we don’t really know the answer right now. We would love to start a conversation about how to solve the problem, though, so here goes.
Here are some of the traits that we would love to see in an email successor:
Strict Open Standard - The successor should define an open standard to which implementors, service providers, and users can leverage. This standard should be maintained and followed in a strict manner. For example, part of this adherence might dictate that a non-conforming server is not allowed to communicate with a properly-conforming server.
Designed For Real Communication - Email was originally intended to be a general communication system, and it is not much of a stretch to say that it was intended to replace postal mail. With that in mind, its successor should be designed for communication. It should do that one thing, and it should do it well.
Designed To Defeat SPAM - When email was originally conceived, the notion that the technology would be adopted in the large and that people might be jerks probably was not given much consideration. The successor technology can learn from our past mistakes.
We would love to hear others’ thoughts on this, so please don’t be shy.
Canvas needs a babysitter!
We built Canvas to better Plan, Deploy, and Manage cloud infrastructure. Many hosting companies provide wonderful support and quick responses – when you ask for help. But too few provide proactive dedication. So we started our own hosting company.
Canvas has grown considerably over the past few years; now it’s time to grow up. Elevator Up, our parent company, has provided the resources to launch and reach profitability. But there’s plenty of opportunity to tackle outside of the umbrella, so we’re looking for help.
We need a first role – someone who is responsible for the day to day success of the company. We need someone to focus on continued growth, sales and marketing plans, and new hires. We don’t need job titles – just passion, experience, and cultural fit. In short, we need someone to lead the company to the next level.
If you, or someone you know, may be that person to lead us forward, say hello. We’d love to talk.
Note: If you need a job description - don’t apply
Twitter Releases MySQL Mods
So, yesterday, Twitter broke some awesome news over on the official Twitter Engineering blog. In short, over the years, they have made a lot of custom changes to MySQL that they use internally, and these changes help so much that the team has generalized them and released their own mysql fork on Github.
Why a fork? Basically, they have made enough changes that it makes more sense to post the fully-modified project so that people can use it while the MySQL project proper works to include the patches into the upstream project.
This is all pretty exciting news, and we can’t wait to play with some quasi-official builds. Also, if this somehow becomes a new project base altogether, we’re going to go ahead and throw this idea out there for the project logo (courtesy of DragonDream):

Summer Camp Roundup - Save the Servers
So far, most of our Summer Camps have followed a theme: they have had something to do with processes. Today’s talk is meant to tie them all together, as we’re going to talk a bit about taming a misbehaving server.
Unfortunately, there are several ways in which a server can go haywire, and we can’t really cover them all. On the bright side, we are going to cover the particular set of circumstances that we most often see in the wild. When that happens, this is the high-level view of what we do to handle the problem.
My Server Is Blowing Up!!!

What does that scenario look like? It tends to exhibit these symptoms:
- Not Serving Requests - Web requests to the sites on the server either time out or serve some other notworthy error.
- Barely Responsive - If you are able to log in remotely, it probably takes a rather long time for your session to start, and subsequent commands probably take a while to produce results.
- Throwing Alerts - If using a comprehensive monitoring system, you are being alerted pretty much constantly about high load, low memory, failing services, so on.
Those symptoms can be caused by a lot of problems, but in our experience, they tend to be present because due to runaway processes consuming all of the resources on the server. Perhaps one of the sites has experienced a sudden surge in traffic or is being crawled by an overly-aggressive web crawler. Either way, we need to investigate the cause of the issue so that we can keep it from happening in the future, but there is a pretty good chance that we will be really ineffective investigators with the server dying in a fire. We need to handle that first.
Let’s Get Started
First thing first, we need to find out what is soaking the system resources, so we will use ps aux to get an idea as to what is going on. To do this most effectively, we would usually filter the results, but with a bunch of runaway processes, we will be able to see a pattern in the processes as the output flies by.
We will pretend for a moment that the PHP setup on the server is one that spawns PHP processes (like CGI or SuPHP) and that a database-heavy PHP application is being slammed with requests. Our ps command will show a whole lot of php processes. Some of them may be waiting for a database connection, some of them may be associated with requests that have already timed out, so on. Either way, those are the processes that we want to kill, and we want them dead immediately.
So, here is what we know:
- ps aux outputs the pid of a process as column 2.
- There are so many live processes that we are really low on memory, possibly using swap (slow, disk-based) memory.
- We want to kill off the PHP processes.
Let’s Tame This Thing
We know what we need to do, and we have the tools to do it.
ps aux | grep php | awk '{print $2}' | xargs kill -KILL
There are a few new utilities in that command, and we will discuss them at length eventually, but here is how we are using them today:
- grep php - This filters its input and prints out only those lines containing “php”
- awk ‘{print $2}’ - This filters its input and prints out the second (space-delimited) column
- xargs kill -KILL - This filters its input and passes the input items as arguments to the kill -KILL command
So, the command that we have constructed weeds out the non-php processes from the ps output, then gets the PIDs of the php processes, then kills them all by sending them the KILL signal.
At this point, we should be able to investigate what it is that is causing so many php processes to be spawned so we can prevent it from happening in the future.
Credit Where It Is Due
The “computer on fire” image above was found on a Nerdy Nerds post about online photo backups.
Deployment Fabrication
Not long after we finished up our first series of articles about application deployment, @_chaselee wondered something aloud on the Twitters:

Well, Had You?
No, not really.
At the time, the only thing we really knew about Fabric was that it is a Python-based utility that a lot of our Pythonista friends seem to really like. In the name of fairness, we did a bit of digging and fiddling to come up with a minimal deployment scheme like that in our other examples.
In short, Fabric is less a deployment tool and something more along the lines of Capistrano … it is a powerful roll-your-own-remote-control tool. Unlike Capistrano, however, it doesn’t really provide a drop-in solution for application deployments.
So We Rolled Our Own

Following the Fabric Tutorial, we ended up with an exceptionally basic deployment fabfile. It didn’t really seem like enough, though, so we expanded on the idea a bit and ended up with a fabfile that acts a good deal like whiskey_disk, albeit at a very limited capacity. You can grab the current release version here.
It is important to note that we have done a bit of testing on that file, but we are also REALLY rusty with Python, so you will probably want to spot check it before actually using it.
Er, How DO I Use It?
That’s pretty easy, actually. We like the way that whiskey_disk works, so we emulated its behavior at a very basic level. The first thing that you’ll need to do is set up a deployment target. That is done in the fabfile in the CONFIG dictionary:
# Configure your deployment targets here
CONFIG['staging'] = {
'ssh_user': 'static',
'domain': 'example.com',
'deploy_to': '/home/example/staging.example.com',
'repository': 'git@supersourcecontrolteamgo.com:example.git',
'branch': 'staging',
'post_deploy_script': 'script/build',
'post_setup_script': 'script/build'
}
You might notice that we support a “ssh_user” option as well, as just setting it as part of the domain name will not really work with Fabric (so far as we can tell).
Once that is set up, you need to use the setup task to set up your deployment target. In keeping with our example, here is how you would set up the staging target:
fab target:staging setup
Then, to deploy to staging, you’d do this:
fab target:staging deploy
Well, What Do You Think?
All in all, Fabric is not at all bad. Much like the other deployment tools that we’ve covered so far, using it is as easy or hard as one decides to make it. For example, as mentioned above, a good deal of moss has grown on our Python hacking abilities, but we were able to come up with a fabfile that seems to do exactly what we want it to do.
That said, as always, the best course of action will be to take it for a test drive yourself.
Summer Camp - kill
As the name would suggest, kill terminates a process by sending it a termination signal. By default, kill will send the TERM signal, which requests that the target process terminate itself gracefully. This is the default for good reason … killing a process out right can cause several sorts of badness, and the nature of the badness is dependent on the process being killed. To use kill, you need to pass it a PID (process ID), and you can get that from either ps or top.
All in all, there are around 32 signals that kill knows about, but there are only two that can really be considered universal, as the rest of them are interpreted differently by different processes. We’ve already talked about the TERM signal, and the other universal signal is KILL. When a process is killed with the KILL signal, it is shot in the head and dies immediately. In short, if the process does not die when killed with KILL, the only way to get rid of it is to reboot the system (or wait).
Since we don’t know what the PID is going to be, it seems best to leave the copy-paste example out this time. Here is what the session looks like:

Notice that when we use the KILL signal, the process is immediately listed as killed on the screen. When we use TERM, however, the sleep process is killed gracefully, and it is still running when our prompt returns (because the sleep utility, which is not really being covered, has a minimum resolution of 1 second).
Summer Camp - top
The top command shows a Table of Processes, and it provides more or less the same information as ps. The difference, though, is that top keeps running, refreshing the process table on a regular interval, and you can pick and choose things like process sorting the columns shown, et cetera. Granted, you can do all of that with ps as well, but top is a bit easier to deal with. Also, top provides a bit of general system information (system uptime stats, system load, memory usage, et cetera), as it basically just acts as a front-end for several other commands. Let’s give it a shot.

That is roughly the same information that you get from ps aux, but that brick of information at the top (heh) is pretty handy, as it will let you know how the system resources are looking in addition to the following process list. Another handy feature that we use all the time is the ability to sort processes by their memory usage (the default on most systems is to sort by current CPU usage, which is pretty reasonable). To sort by descending memory usage, enter an upper-case M.

As you can see, processes using the most memory are now listed first, and that is pretty handy to have in front of you when a server is having trouble keeping up.
Summer Camp - ps
What’s this Summer Camp noise? Check out our Summer Camp Introduction.
The ps utility shows you processes that are currently running, and you can add flags to it in order to, for example, show you specific process contexts. Here is the output with no flags:

The default behavior for ps is to show a fairly condensed view of the processes running in the current session. Here’s what the output means:
- The “PID” column shows the unique process identifier associated with the process.
- The “TTY” (TeleTYpe) column shows the terminal device, if any, on which the process is running
- The “TIME” column is a little confusing … it shows the accumulated CPU time used over the lifetime of the process (more on this later)
- The “CMD” column in this view shows the condensed, common name of the process. This is usually the name of the command that was run.
So, that’s fine and all, but it really isn’t very helpful to sorta know what commands we have run since starting our session (well, it can be useful, but not in this context, and definitely not with so little command information). Let’s dig a little more.
When we use ps, we want a lot of information (memory usage, processes other than those that we ran, process state, et cetera), so we need to add some flags. Since we want to know about processes that aren’t necessarily running on a TTY (like web server processes), we’ll add the ‘x’ flag. Since we want to know about all processes running, we’ll add the ‘a’ flag. Since we want to know just about everything about a process, we’ll add the ‘u’ flag for “user-oriented output.”
So, here’s our command
#: ps aux
And here is our output

Wow, that was quite a bit of output (it scrolls WAY beyond the size of that image), but the gist is this (unless specified again, the column meanings are the same as above):
- Every process with a PID is shown
- The “USER” column shows the user that is running a given process.
- The “%CPU” column shows the average CPU utilization for the process (the accumulated CPU time divided by the amount of time that the process has been running).
- The “%MEM” column shows the current percentage of the total memory resources on the system that the process is using.
- The “VSZ” column shows the virtual memory size of the process (actual + swap, expressed in KB).
- The “RSS” column shows the resident size size of the process (actual physical memory, expressed in KB).
- The “STAT” column shows the process state (more on this later)
- The “START” column shows the time that the process started, with slightly different output depending on how long the process has been running (the processes in the above output have all been running for around 6 months)
- The “CMD” column shows either the full command line used to spawn the process, the kernel module name that the process represents, or, if the process is a tricky little devil, it will show a custom output string. For example, our thin processes look like this: “thin server (127.0.0.1:9100)”
Summer Camp - Processes
What’s this Summer Camp noise? Check out our Summer Camp Introduction.
A process is a unique instance of any program running on a computer. Your login shell is a process, as is any command that you execute through it. Even though the shell is still running and most commands run through it terminate almost as soon as you hit the enter key, they are both processes none the less.
Each process on a system uses some amount of memory, some amount of CPU resources, and possibly causes disk IO operations to happen.
In addition to the above characteristics, every process running on a system has a process state. As one might guess, this is an indication as to the health of the process. Here are the more important states that one will see in a process list:
- R: The process is “running,” meaning that it is currently doing real, actual work of some sort.
- S: The process is “sleeping,” which means that it is waiting for something to happen (input, termination, what have you). Processes are a lot like cats … under normal circumstances, they are either eating your resources or sleeping.
- D: The process is in a deep sleep. This is a little more troublesome than regular sleep, because it is actively trying to wake up, possibly from a bad dream, but something is keeping that from happening (high disk IO levels are a pretty common reason).
- T: The process is stopped. This can be for any number of reasons, but it is most typically because it has been put on hold by either the user or another process. This one is not really much to worry about unless your system is running low on resources.
- Z: The process is dead, has become a zombie, and is only interested in eating the brain meat (memory) in your system. No kidding, these are really called “zombie” processes. Basically, the process has terminated, but it is being kept from its eternal slumber, usually because its parent process has already been terminated. To summarize the problem that zombies cause, our good friend Dan Ryan once suggested that these should be renamed “Batman processes,” because “they have no parents and cannot be killed.” We can only think of one thing worse.

Summer Camp - echo
What’s this Summer Camp noise? Check out our Summer Camp Introduction.
echo is a small utility that echoes whatever you say to it. For example, take a look at this:
echo I am a pretty princess.
Here is the output

It usually isn’t very helpful to remind oneself of one’s standing as a pretty princess. The power of echo, though, is that it will echo anything at all handed to it. So, if one echoes a variable reference, one gets the contents of that variable:

That, however, is really all that echo does. There are flags that you can hand to the command (for example, you can add “-n” as a flag to keep the newline from being printed at the end of the output), but as far as core functionality goes, that is echo, and that is pretty much all we will use it for.