Friday, November 11, 2011

Why You'll Never be Able to Type Fast Enough

Composing a lengthy text message in my head takes all of half a second. Typing out a text message on the screen of my smart phone takes me twenty to thirty seconds. Before I had a smart phone, I might have taken thirty to forty seconds to tap out a message with predictive texting on a normal twelve key cell phone keyboard. The entire difference between the time it takes me to form a thought and the time it takes me to type it in is a waste attributable to a failure of technology. While input technology is improving, a massive amount of friction exists between man and machine--and that's not just inefficient. It's annoying. I'll call this 'friction' a bandwidth asymmetry between humans and computers.



The bandwidth asymmetry exists because computers can convey information to humans much more quickly than humans can provide input to computers. For instance, a fast typist can type at 120 words per minute while a fast reader can read at 300 to 500 words per minute. Keep in mind that reading text is one of the slowest ways that a computer can output information. Video and sound are both give us information faster than we could read it.

One logical solution is to simply allow computers to sense video and sound through cameras and microphones. In certain instances, this works very well. You're not going to find a better way of creating a movie than using a digital video camera. Except for purely electronic music, recording live music with a microphone is still the method used to create digital songs.

Why don't we use cameras and microphones to eliminate our bandwidth asymmetry? For one, computers are dumb when it comes to sound and video. Upload a movie to YouTube, and the computer is no closer to knowing whether it's looking at a baby monkey riding on a pig or a Katy Perry music video. Video just isn't a data format that computers understand naturally. The same goes for digital sound. However, the outlook isn't as bleak as I've painted it.

Recently, several products have come to market based on voice recognition. Android comes with voice search built in. More often, I use the voice recognition capabilities to dictate long text messages. Interestingly, the voice recognition does not take place locally on my phone. Rather, the sound signal is uploaded to Google's servers which perform the recognition process using machine learning. As an incurable early adopter, I'm biased, but I believe that the voice recognition quality has reached a level such that texting by dictation is faster and more convenient than using the keyboard--especially for long texts.

The latest installment of Apple's iPhone, the 4S, debuted the digital personal assistant named Siri. Siri listens to an iPhone user's voice and attempts to perform tasks as instructed--tasks such as checking the weather and sending text messages. Siri was the flagship feature of this iteration of the iPhone, and all indications show that it has been a success. Speaking to a phone in public might garner you sideways looks for the time being, but advances like Siri bring the day when this is no longer the case ever closer.

While the interface between humans and computers has become much more streamlined since the days of punch card input and output, communicating with a computer is far from an optimized task. Advances in speech and image recognition have recently broken down barriers to this communication. For now, speech recognition may be the best input method we have, but in order to completely eliminate the bandwidth asymmetry, something even more revolutionary has to come on the scene.