VOIP Support in SVN
Posted on May 30th, 2008 by TimeDoctor
Lets examine two quick points:
- Hardcore gamers like choices.
- Hardcore gamers like VOIP.
Based on these theories, ioquake3 is adding VOIP support for the next release. This internal support is going to bring along with it support for (entirely optional) Mumble positional VOIP audio. More nerd-speak after the break, the short and quick of it is, however:
We’re going to have VOIP for mods/new games, and baseq3. This is a pretty radical departure from the initial goal of not changing anything in baseq3, and is probably the single largest (obvious) end-user benefit for using ioquake3.
The fact of the matter is that if you want to blame someone for allowing it to be included, you can blame me (Zachary, lead omnipresent overseer of ioquake and related entities). If, however, this makes you happy and you want to praise somebody, give either big ups OR big props to Ryan Gordon (lead intergalactic space nerd) and Ludwig (Herr Angst).
Please, do not spoil them with both ups AND props.
Ryan “icculux” Gordon started it off with this post to the mailing list:
I promised this to zakk like 18 years ago. Here's a first shot at VoIP support. It's still pretty rough, but it's just meant to be the groundwork. Patches are against svn revision #1345. This requires patched builds to be useful, but remains network compatible with legacy quake3 clients and servers. Clients and servers both report in their info strings whether they support VoIP, and won't send VoIP data to those not reporting support. If a stray VoIP packet makes it to a legacy build, it might print an error to the console, but should continue on anyhow. Data is processed using the Speex narrowband codec, and should be cross-platform. Bigendian and littleendian systems can speak to each other, as can 32 and 64-bit platforms. Bandwidth: VoIP data is broken up into 20 millisecond frames (this is a Speex requirement), and we try to push up to 12 Speex frames in one UDP packet (about a quarter of a second of audio)...we're using the narrowband codec: 8000Hz sample rate. In practice, a client should send about 2 kilobytes per second more when speaking, spread over about four bursts per second, plus a few bytes of state information. For comparison, this is less than the server sends when downloading files to the client without an http redirect. The server needs to rebroadcast the packet to all clients that should receive it (which may be less than the total connected players), so servers should assume they'll need to push (number of players speaking at once times number of people that should hear it) * 2 kilobytes per second. It shouldn't be a problem for any client or server on a broadband connection, although it may be painful for dialup users (but then again, everything is. They can just disable the cvar). Clients can choose to ignore specific players, or everyone, which instructs the server to stop sending ignored packets to the ignoring client, to save bandwidth. Although unimplemented, there are hooks for clients to speak directly to specific clients (a private message, just teammates, etc). Also unimplemented, there are hooks for the server to maintain a blacklist, so annoying chatters can play, but their VoIP packets will be dropped without the server rebroadcasting them to other clients. The client requires an OpenAL with ALC_EXT_capture support (which is any OpenAL 1.1 implementation...which is most of them, now)...all voice packets are sent to all players that are listening. This isn't really a design flaw so much as unimplemented code...the client needs a means to decide who to send voice to, but that's largely a question of UI and some global variables. All my VoIP changes are wrapped in #if USE_VOIP, but it should be harmless to build everywhere without the #ifdefs, since it doesn't break network compat and you have to enable the feature with cvars. This work involves patches to a few bits of the engine. The first adds "streams" to the sound interface, so S_RawSamples() can specify which stream the new sound data goes with. The usual uses of S_RawSamples() use stream zero, then each player's voip data is stream 1 through n. Even though the primary intention is to offload the management and mixing of various VoIP streams to the audio layer, this could be useful for other mods, perhaps, with a little more cleanup. The rest of the patches are the VoIP-specific bits. To use: - Install libspeex from speex.org - Patch your client. Build with USE_VOIP=1. - Hook up a microphone. - Add this to your startup script (change Q to your liking) bind q "+voiprecord" - Start the client with these cvars: +set s_useOpenAL 1 +set voip 1 - Patch your server. Build with USE_VOIP=1. - Start the server with this cvar: +set sv_voip 1 - Connect some patched clients to the patched server. - While playing, hold down 'Q' and speak into your microphone. - Server rebroadcasts your voice to all clients. - Patched clients hear you. Hopefully. cvar notes: - s_alCapture 1 tells the audio layer to open an OpenAL capture device. Without this set on sound startup, you'll never get bits from the microphone. - voip 1 enables VoIP support on the client. Without this set, we'll just drop any incoming VoIP data and refuse to record audio data for sending. - cl_voipGainDuringCapture is the volume of audio coming out of your speakers while you are recording sound for transmission. This is a floating point value between 0.0 and 1.0, zero being silence and one being no reduction in volume. This prevents audio feedback and echo and such, but if you're listening in headphones that your mic won't pick up, you don't need to turn down the gain. Default is 0.2 (20% of normal volume). - cl_voipSend is set to 1 by the game when the player wants to record and send VoIP packets. This is not a command line thing...the cvar is toggled with a keybind, so you only record when explicitly holding down a specified key. - sv_voip 1 tells the server to accept and rebroadcast voip packets. Without this, all VoIP packets sent to the server are dropped, so no one hears anything from any client. This cvar will make the server report voip support in the server browser query. - "+voiprecord" is the action you should bind to a key to record. - "voip ignore <playernum>" is a console command that tells the client to drop any VoIP packets that arrive from a specific player number. It will also inform the server of this with a reliable command, so it won't even send the packets until further notice. - "voip unignore <playernum>" is the opposite of "voip ignore". - "voip muteall" is a console command that is the rough equivalent of "voip ignore" for each player in the game, except when you unmute, your previous ignores of specific players were not lost. - "voip unmuteall" is the opposite of "voip muteall". Some of this code is not as clean as I'd like, and there's a lot of hardcoding and a few shortcuts. Cleaning those up would be wise. I would recommend against committing this patch to ioq3's repository (or any project based on ioq3) until there has been some review and improvements. Other things to be done (or at least, things worth doing): Client: - Add UI that shows a volume meter while recording, based on the value of clc.voipPower...this value changes every frame based on audio input, so it can be used to show how well the game is "hearing" you. 1.0f is loud, 0.0f is silent. Reuse the cgame lagometer code? - Update server browser to note which servers are VoIP compatible (look for voip=1 in the infoResponse packet). - Decide if a VoIP packet is too old to be worth playing when it arrives , and if so, just drop it. We handle sequencing and ordering, this is just a question of latency. - Fill in the audio layer's recording code for other platforms (or are we only OpenAL now?). - Fill in the mixer code for S_Base_RawSamples() where stream != 0 (or are we only OpenAL now?). - Clean up speex dependency (statically link it? Include it in the project?). - Team-only chat, groups, friends, etc...right now everything goes to everyone by default, and while the protocol allows for culling the recipient list of a given VoIP packet, there's no code or ui to actually do the culling at the moment. - Have a UI to speak only to the person currently in your crosshairs ("hey, turn around.") Server: - Allow users to be blacklisted by the admin, so if they send a voip packet, you just drop it without rebroadcast. - Do some sanity checking on the speex data before rebroadcast? Both: - Look at all my FIXMEs. - Voice positioning...right now there's no in-world position for VoIP, but it'd be interesting if you could only hear people near you (and the server regulated this by choosing a cutoff distance and refusing to send you packets from people that are past it), and the client spatialized it so the voice came from the same place as the speaker in-world. UT2004 has this as an option, but beyond the cool factor or a specific mod that requires that functionality, I don't see it as valuable. As you can see, there's still a _lot_ to be done to make this robust, and a lot of it depends on small UI mods. I just wanted to put down a framework for others to build on here. Opinions? --ryan.