Dana Massey | 11 Sep 2007 20:55
For every MMO, there is an army of specialist studios that make all the individual pieces work. While what these companies do is important, it is usually not terribly exciting to talk about. Yet, on rare occasions there are those small gems of companies that make some small part of the larger framework fun. Vivox does that for voice over IP (VOIP).

Boston-based Vivox develops VOIP solutions and focuses in specifically on games. Most notably, they provide the VOIP services for CCP's space MMO EVE Online and LindenLab's virtual world Second Life. Recently, they've also announced support for a new project from Pixel Mine and for "Poker Manager", a poker portal. They also hinted at more deals in the coming weeks.

Their association with Second Life has provided them with some impressive numbers. Currently, they see 10,000 new accounts registered for their service each day, on top of the 330,000 already using it. At this rate they hope to hit 1,000,000 by year's end. At peek hours, 15,000 people are using the service for 14,000,000 minutes each day. This is just in Second Life. To put that in perspective, according to Vivox, that's 20% of Skype's total call volume just from one product.

In EVE, they just released a new version of their client. Unlike Second Life, users must pay a small premium to participate. The new add on adds multi-channel support, which means players can listen to several conversations at once (up to six), then jump in and talk where appropriate. It sounded confusing to me, but not altogether strange for an EVE player. The new version also has simplified UI, lower bandwidth requirements. Their next task is a hierarchy of chat, so for example, fleet commanders can issue voice commands to several squadrons below them at the same time.

That stuff above is just the house keeping. What's really cool about Vivox is there efforts to blur the line between VOIP and text chat in games. As a friend at the show pointed out, it's always neat to see one of those old science fiction moments come true. Vivox's booth had two such moments.

First, they're working on text-to-speech and speech-to-text so that players of games can co-exist seamlessly regardless of how they choose to personally interact with their game. They had no demo of speech-to-text, which would type out any words said into voice chat, but had a good example of text-to-speech.

They took a real clump of dialogue pulled randomly from EVE Online and let the scene play out. While all the voices still had a vague "Microsoft Sam" feel to their speech, they had come a long way. In the scene they showed, three players argued back and forth in distinct voices. Parts were still a bit stilted, but it was easily the best translation of text to audio I'd ever heard. One thing it showed though was that if this ever becomes mainstream, people are going to have to begin typing in full sentences. It was hilarious to hear "lol" and other internet words aloud so casually, but clearly not something that fully translates.

One of the challenges of text-to-speech is inflection. Just like over instant messenger, sarcasm and other subtle intonations simply don't work. Vivox figured that one out too. They hope to use emoticons like smiley faces to apply the proper emotion to a phrase. If the person is happy then, they just type a smiley and the voice says it with glee. Sad? A frown, and so on. The idea sounded quite theoretical, but seemed a clever and viable solution.

The second science fiction moment was their display of voice fonts. In a demonstration with a female member of the Vivox team, the same filters were applied to both our voices and in real-time gave us distinct, yet completely unrecognizable voices. They had three on display: a scratchy Orc, a booming Paladin and a squeaky Elf. All three were quite good, especially the Orc and the Paladin. The Elf had consumed a bit too much helium for my tastes, though.

What's scary is that they seem to be just a few steps away from the Star Trek-like ability to impersonate real people. In the meantime though, this kind of tool applied to a fantasy MMORPG takes something that inherently breaks emersion, VOIP, and makes it into one of the potentially most immersive parts of the game. Now when a group mate gets into an argument with his mother, at least it will sound like a pair of Orcs arguing, rather than the 15-year-old boy it actually is. The first fantasy MMO to build in Vivox voice fonts and package their game with a quality headset is going to make a mint.

