The Pebble and the Avalanche

Moshe Thumbnail
Current Revolutions in Business and Technology

by Dr. Moshe Yudkowsky,

author of The Pebble and The Avalanche: How Taking Things Apart Creates Revolutions


Mon, 2009-Jan-26, 14:03

Story Marker
Apple Upgrades Its Help Line

Apple just upgraded its 800 number to use more advanced technology. If you call 1-800-MY-APPLE (1-800-692-7753) from Illinois (and a few other states) your call will be answered by the new service.

Instead of the old "press one for..." service, the new service accepts any spoken request and routes your call. You can ask "Where's the closest store to zip code 60645?", ask for help with a drive problem, or ask to purchase a Mac computer. The application doesn't actually do trouble-shooting or let you order something — it either gets you the address and number of a local store (and/or connects you to that store) or it routes you the proper human agent. The accuracy is very good, considering the complexity of the requests.

In fact, in my opinion it's a little too accurate to be a purely automated system. I suspect that it uses a mix of human and speech recognition. Of course, when it comes to customer service, I really don't mind if the application is better than expected.

Of course I fiddled with the system a little to see how well it works. As you can imagine I did find a few errors and gaps in the system.

Can you break the system too? Sure, if you want, but what's the point of breaking the system if you don't learn anything? We in the speech technology business have enough real problems without worrying about contrived ones.
When I did a trial of one of my first speech recognition systems, a tester wrote to say that the system didn't recognize his Southern accent. I wrote back and asked if he really had a Southern accent — I already knew enough about non-expert testers to inquire — and he didn't. He faked an accent and (of course) the speech technology failed.
Experts generate
useful failures, that provide information about the system to either the expert or the people maintaining it. Trust me, Apple will have a sufficient number of real errors and doesn't need your help generating contrived ones.
Overall, however, the system worked as well as can be expected for the first few days in service. Any new speech technology system (even one that uses humans for part of the recognition, assuming it does) requires tuning and tweaks; but so far, so good, and it'll be interesting to see how far Apple pushes this high-level speech technology — I'd be interested to hear how this technology would work to automate common help calls.

Tue, 2008-Dec-09, 06:37

Story Marker
Voxeo and VoiceObjects Build on Top of Open Source Editor

In addition to my work on innovation, I also work on speech technology projects. One of the most complex programming languages I work with is the speech recognition programming language VoiceXML. For the purposes of this blog, the interesting thing about VoiceXML is that the VoiceXML programming language incorporates many other programming languages: Javascript for calculations, a specialized programming language to describe what we expect a person to say so we can recognize his speech, and another programming language to describe how to read announcements to the person. On top of all this, I rarely bother to write in VoiceXML directly and instead use PHP or some other programming language to produce the VoiceXML pages.

In other words, VoiceXML programming is complicated, as if speech technology wasn't complicated enough on its own. One of the things that usually saves developer's sanity when writing in other programming languages is a decent "development tool" that helps make the task of programming easier. The tool will often automate certain tasks, find errors in the program, and sometimes even help the developer remember all the various features of the programming language. I've had one situation, programming in a new (to me) language, where the right tool spelled the difference between success and disaster.

The name of that important tool, by the way, is Eclipse. Eclipse is an open-source project that has a modular structure. People build new tools on top of Eclipse to support new programming languages as well as new methods to program — for example, a new way to visualize the program you're writing. Over the years I've tried a few packages that purport to provide VoiceXML language programming via Eclipse, and never found one that was satisfactory. I embarked on a quest to find a decent tool — which led me to organize and run a public demonstration of different tools at a recent industry conference.

A company I often work with, Voxeo, announced the purchase of VoiceObjects today. VoiceObjects makes an Eclipse-based VoiceXML editor. This is a funny coincidence because I'm in the middle of writing up an assessment of that public demonstration of speech application tools and VoiceObjects is one of them; I'll reveal in advance of my article that I liked VoiceObjects.

I recommend a look at Voice Object's extensive documentation to get a better idea of what they offer. I particularly like the idea of generating project documentation from within the design tool. Even more importantly, Voice Objects' tool output is "standard" VoiceXML, not a proprietary flavor, and the output interoperates with many VoiceXML platforms (not just Voxeo's, for example). This quote from Voxeo's press release is particularly important:

Voxeo will continue to openly and actively support VoiceObjects' application deployment on multiple VoiceXML platforms including Aspect, Avaya, Genesys, Intervoice and Nortel.
In other words, Voxeo is smart enough (as usual) to realize that they should compete on cost and service instead of attempting to lock in their customers through the use of tools that generate output that only works on their system. I know that I prefer a tool that interoperates over one that does not.

Strategically, this fills out Voxeo's suite of tools. Their current design tool ("Evolution Designer"; I've never used it) is suitable for entry-level programming; VoiceObjects is suitable for high-level developers. Speech technology programming is difficult work, with only a (relative) handful of VoiceXML practitioners worldwide; from Voxeo's perspective, the more programmers, the more speech technology applications and the more business for Voxeo.

Comments: 2, Trackbacks: 0

Tue, 2008-Dec-02, 08:32

Story Marker
Speech Technology, the Police, and the Subways

I've been wondering about the New York Police Department's plans to "monitor" mobile phone calls in "high-risk" areas; at least some New Yorkers offer mildly-enthusiastic endorsement of the idea.

Of course this is a tremendous invasion of privacy; once the door opens, you can expect the police to push this precedent to monitor calls in "high-drug-use" locations, near "vulnerable children," and so on and so forth — it's a process as inevitable as gravity. I am certain that open-source phones (such as the Google Android) will include encryption in the very near future if this initiative goes through. But more fundamentally, I find it hard to believe that it would be effective — I really have to wonder whether terrorists use mobile phones speak "in the clear."

Regardless, I wonder most of all about the technology. What will the NYPD monitor? Will they monitor to see if anyone is calling known terrorists? Will they do traffic analysis — look for patterns that indicate terrorist activity? (I find it hard to believe that enough is known about these patterns, assuming they exist in the first place.) Or will they attempt to use speech technology?

If the NYPD does attempt to use speech technology, will they succeed? I am skeptical: mobile calls with inherent poor quality; lots of noise in the background; many accents; rapid speech; a huge amount of speech which leads directly to a huge number of false positives. On the other hand, national intelligence agencies around the world have programs in place to solve these speech technology problems, and perhaps these agencies will share some of solutions with local police forces.

Will New Yorkers stand for this invasion of privacy? Of course; they have already surrendered their right to privacy of their persons, as the NYPD can and does search anyone on the subway for no reason at all. Citizens will be offered the choice of further invasion of privacy vs. the "choice" of not being able to commute to work — no real choice at all.