VOIP Support in SVN

Lets examine two quick points:

Hardcore gamers like choices.
Hardcore gamers like VOIP.

Based on these theories, ioquake3 is adding VOIP support for the next release. This internal support is going to bring along with it support for (entirely optional) Mumble positional VOIP audio. More nerd-speak after the break, the short and quick of it is, however:

We’re going to have VOIP for mods/new games, and baseq3. This is a pretty radical departure from the initial goal of not changing anything in baseq3, and is probably the single largest (obvious) end-user benefit for using ioquake3.
The fact of the matter is that if you want to blame someone for allowing it to be included, you can blame me (Zachary, lead omnipresent overseer of ioquake and related entities). If, however, this makes you happy and you want to praise somebody, give either big ups OR big props to Ryan Gordon (lead intergalactic space nerd) and Ludwig (Herr Angst).
Please, do not spoil them with both ups AND props.

Ryan “icculux” Gordon started it off with this post to the mailing list:

I promised this to zakk like 18 years ago.
Here's a first shot at VoIP support. It's still pretty rough, but it's
just meant to be the groundwork. Patches are against svn revision #1345.
This requires patched builds to be useful, but remains network
compatible with legacy quake3 clients and servers. Clients and servers
both report in their info strings whether they support VoIP, and won't
send VoIP data to those not reporting support. If a stray VoIP packet
makes it to a legacy build, it might print an error to the console, but
should continue on anyhow.
Data is processed using the Speex narrowband codec, and should be
cross-platform. Bigendian and littleendian systems can speak to each
other, as can 32 and 64-bit platforms.
Bandwidth: VoIP data is broken up into 20 millisecond frames (this is a
Speex requirement), and we try to push up to 12 Speex frames in one UDP
packet (about a quarter of a second of audio)...we're using the
narrowband codec: 8000Hz sample rate. In practice, a client should send
about 2 kilobytes per second more when speaking, spread over about four
bursts per second, plus a few bytes of state information. For
comparison, this is less than the server sends when downloading files to
the client without an http redirect. The server needs to rebroadcast the
packet to all clients that should receive it (which may be less than the
total connected players), so servers should assume they'll need to push
(number of players speaking at once times number of people that should
hear it) * 2 kilobytes per second. It shouldn't be a problem for any
client or server on a broadband connection, although it may be painful
for dialup users (but then again, everything is. They can just disable
the cvar).
Clients can choose to ignore specific players, or everyone, which
instructs the server to stop sending ignored packets to the ignoring
client, to save bandwidth. Although unimplemented, there are hooks for
clients to speak directly to specific clients (a private message, just
teammates, etc). Also unimplemented, there are hooks for the server to
maintain a blacklist, so annoying chatters can play, but their VoIP
packets will be dropped without the server rebroadcasting them to other
clients.
The client requires an OpenAL with ALC_EXT_capture support (which is any
OpenAL 1.1 implementation...which is most of them, now)...all voice
packets are sent to all players that are listening. This isn't really a
design flaw so much as unimplemented code...the client needs a means to
decide who to send voice to, but that's largely a question of UI and
some global variables.
All my VoIP changes are wrapped in #if USE_VOIP, but it should be
harmless to build everywhere without the #ifdefs, since it doesn't break
network compat and you have to enable the feature with cvars.
This work involves patches to a few bits of the engine. The first adds
"streams" to the sound interface, so S_RawSamples() can specify which
stream the new sound data goes with. The usual uses of S_RawSamples()
use stream zero, then each player's voip data is stream 1 through n.
Even though the primary intention is to offload the management and
mixing of various VoIP streams to the audio layer, this could be useful
for other mods, perhaps, with a little more cleanup.
The rest of the patches are the VoIP-specific bits.
To use:
- Install libspeex from speex.org
- Patch your client. Build with USE_VOIP=1.
- Hook up a microphone.
- Add this to your startup script (change Q to your liking)
    bind q "+voiprecord"
- Start the client with these cvars:
    +set s_useOpenAL 1 +set voip 1
- Patch your server. Build with USE_VOIP=1.
- Start the server with this cvar:
    +set sv_voip 1
- Connect some patched clients to the patched server.
- While playing, hold down 'Q' and speak into your microphone.
- Server rebroadcasts your voice to all clients.
- Patched clients hear you. Hopefully.
cvar notes:
- s_alCapture 1 tells the audio layer to open an OpenAL capture device.
Without this set on sound startup, you'll never get bits from the
microphone.
- voip 1 enables VoIP support on the client. Without this set, we'll
just drop any incoming VoIP data and refuse to record audio data for
sending.
- cl_voipGainDuringCapture is the volume of audio coming out of your
speakers while you are recording sound for transmission. This is a
floating point value between 0.0 and 1.0, zero being silence and one
being no reduction in volume. This prevents audio feedback and echo and
such, but if you're listening in headphones that your mic won't pick up,
you don't need to turn down the gain. Default is 0.2 (20% of normal volume).
- cl_voipSend is set to 1 by the game when the player wants to record
and send VoIP packets. This is not a command line thing...the cvar is
toggled with a keybind, so you only record when explicitly holding down
a specified key.
- sv_voip 1 tells the server to accept and rebroadcast voip packets.
Without this, all VoIP packets sent to the server are dropped, so no one
hears anything from any client. This cvar will make the server report
voip support in the server browser query.
- "+voiprecord" is the action you should bind to a key to record.
- "voip ignore <playernum>" is a console command that tells the client
to drop any VoIP packets that arrive from a specific player number. It
will also inform the server of this with a reliable command, so it won't
even send the packets until further notice.
- "voip unignore <playernum>" is the opposite of "voip ignore".
- "voip muteall" is a console command that is the rough equivalent of
"voip ignore" for each player in the game, except when you unmute, your
previous ignores of specific players were not lost.
- "voip unmuteall" is the opposite of "voip muteall".
Some of this code is not as clean as I'd like, and there's a lot of
hardcoding and a few shortcuts. Cleaning those up would be wise. I would
recommend against committing this patch to ioq3's repository (or any
project based on ioq3) until there has been some review and improvements.
Other things to be done (or at least, things worth doing):
Client:
- Add UI that shows a volume meter while recording, based on the value
of clc.voipPower...this value changes every frame based on audio input,
so it can be used to show how well the game is "hearing" you. 1.0f is
loud, 0.0f is silent. Reuse the cgame lagometer code?
- Update server browser to note which servers are VoIP compatible (look
for voip=1 in the infoResponse packet).
- Decide if a VoIP packet is too old to be worth playing when it arrives
, and if so, just drop it. We handle sequencing and ordering, this is
just a question of latency.
- Fill in the audio layer's recording code for other platforms (or are
we only OpenAL
now?).
- Fill in the mixer code for S_Base_RawSamples() where stream != 0 (or
are we only OpenAL now?).
- Clean up speex dependency (statically link it? Include it in the
project?).
- Team-only chat, groups, friends, etc...right now everything goes to
everyone by default, and while the protocol allows for culling the
recipient list of a given VoIP packet, there's no code or ui to actually
do the culling at the moment.
- Have a UI to speak only to the person currently in your crosshairs
("hey, turn around.")
Server:
- Allow users to be blacklisted by the admin, so if they send a voip
packet, you just drop it without rebroadcast.
- Do some sanity checking on the speex data before rebroadcast?
Both:
- Look at all my FIXMEs.
- Voice positioning...right now there's no in-world position for VoIP,
but it'd be interesting if you could only hear people near you (and the
server regulated this by choosing a cutoff distance and refusing to send
you packets from people that are past it), and the client spatialized it
so the voice came from the same place as the speaker in-world. UT2004
has this as an option, but beyond the cool factor or a specific mod that
requires that functionality, I don't see it as valuable.
As you can see, there's still a _lot_ to be done to make this robust,
and a lot of it depends on small UI mods. I just wanted to put down a
framework for others to build on here.
Opinions?
--ryan.

Eventually, we ended up with this bugzilla entry, commit 1347 and commit 1348. Eventually, it will reach some pre-built binaries and a quake 3 server near you.

VOIP Support in SVN

Comments

2 responses to “VOIP Support in SVN”