A detailed, but practical take on transferring bits and networking architecture for an RTS.
Many years ago, I began work on an RTS-style game (which has since released) called The Maestros. Other than a tremendous amount of reading, I had no practical experience implementing an RTS, nor did the other members on my team. We had a university-level understanding of programming, especially for games, and the will to succeed. For the length of this article, I will assume the same from you, and we’ll follow a similar journey from and education on game network models (part 1) to the path my team chose (part 2), including the problems we faced, and what ended up working for us practically.
(Skip to Part 2 Below for our choice & implementation)
As you might guess, Unreal Engine (both 3 & 4) is structured around the core elements needed to build Unreal Tournament (though I’m sure their marketing department would contest me on that one). As a result, it makes a few assumptions in things like it’s default networking model. The Unreal Engine’s default network model is a client-server model that shares many core elements with the The TRIBES Engine Networking Model. This is still the white paper on writing networked multiplayer games, so if you’re making your first multiplayer game, I highly recommend reading it.
It works like this: players send commands to a central server who makes authoritative decisions on the state of the game and publishes it to the players at frequent intervals. It excels at providing up-to-the-millisecond info on all the players in the world through high-frequency (>=20/second) messages about everybody status, and some prediction to fill the gaps. The longer you wait between messages, either due to tick-rate or latency, the more likely it is that prediction is wrong. You might have moved left instead of continuing straight in that time, and then the server has to correct your client. This is where “rubber-banding” comes from.
Server Update Rate of 10/s on Maestros – note the hitching/rubber-banding
Server Update Rate of 30/s on Maestros – pretty smooth
This is why you don’t want to play Halo from New York with somebody in Hong Kong, for instance. Somebody is going to have a bad time.
Where the Unreal (or Tribes) model starts to break down is when the number of characters starts to get really large because the amount of data you need to send starts to exceed what a common player’s internet connection can reliably transfer. Because of this, many classic RTS games with hundreds of units like, Starcraft, Age of Empires, Warcraft, Total Annihilation, etc. take a different approach.
Instead of one authoritative server, these classic RTS games use a peer-to-peer model where every player sends their commands to every other player. In some ways, this is really cool – you don’t need any dedicated servers because the player’s computers make up the game. Unfortunately, player’s computers can vary widely in power and connection speed, and without a central authority, they all need to operate at the same pace. This means the player with the worst latency to any other player, dictates the game’s responsiveness for every other player.
In peer-to-peer, players will also be initiating connections directly to one another, something that home firewalls are specifically built to stop. In this situation, workarounds like port forwarding or NAT punchthrough become necessary. If you’ve ever played Starcraft: Brood War and you couldn’t join a friend’s lobby, so you made your own lobby, had them try to join, and then had that friend host again you were essentially performing a manual NAT punchthrough. These days, Age of Empires II HD uses a proxy to resolve this problem, but that also increases latency.
As you can see in the model above, only commands are ever sent over the wire, no game states. In this peer to peer model, the game state is maintained by simulating the game identically across every machine i.e. it must have determinism. This model for multiplayer games is known as Deterministic Lockstep, and is described beautifully in 1500 Archers on a 28.8: Network Programming in Age of Empires and Beyond (again, this is a little piece of internet gold, highly recommended).
In Deterministic Lockstep, players have to execute every other player’s commands at the same point in the simulation on every machine or else the simulations will start to diverge. This means every client has to wait until they have the commands from every other client in order to execute them – talk about input lag. Fortunately for Real Time Strategy games, issuing a command and then having a unit execute it tens or even a couple hundred milliseconds later doesn’t break the game feel, because it’s an indirect action. You aren’t your marine, and your brain makes sense of the fact that it’ll take a moment for him to react to your order. In Unreal Tournament however, you ARE your character, and it would feel completely broken if you took 200ms to react to you dodging left.
It turns out that determinism is quite an endeavour – on PC you’re dealing with effectively infinite hardware profiles and nondeterministic behavior shows up in the darnedest places – virtual machines for certain languages, differing compilers, it might even come up in floating point numbers. To make things worse as an independent UDK developer, we don’t have access to change the underlying engine code in Unreal 3. All in all, we ran too high a risk of running into non-deterministic behavior that we simply could not overcome.
This left us with the traditional Tribes/Unreal model with high bandwidth requirements. In an RTS with thousands of units moving at once, the per-player bandwidth requirements would have been too restrictive – we wouldn’t have been able to make the game. Fortunately, we wanted The Maestros to be something a little smaller, a lot scrappier, and much, much faster. The update frequency of the Unreal networking architecture would give us excellent responsiveness – one of our core design pillars. With a small enough unit cap, the bandwidth requirements wouldn’t hinder players either, as long as they had modern internet connections.
We also wanted the game to be accessible for new players, and the hassle of configuring routers and/or firewalls wasn’t something we wanted players to deal with. We were also going to need servers with pretty large outbound bandwidth, which many home internet connections wouldn’t be able to support. For these reasons, we settled on a dedicated server model, putting the onus on us to host game servers.
We didn’t take this decision lightly, though. We started with a few assumptions. One, our users would need a reliable 1 mbps internet connection to play The Maestros. This definitely doesn’t apply to every potential player, but after hearing about others coming to the same conclusions, we felt reassured that it was still worth building. How did we know this was good enough for such a bandwidth-intensive network model? First, we did some naive math. If the x, y, z of both position and velocity are stored as 4-byte floats, sending them 30 times a second gives you: 3 * 2 * 4 * 30 = 720 byte / second / unit. Ok, and we’ve got a megabit per second of data which is 1/8th of a megabyte = (1024 * 1024) / 8 = 131072 bytes. So our theoretical cap for moving units, for all players at a single point time should be 131072 / 720 = ~182 units. Now, this is little more than a gut-check number, given how much more complex things could be, but 150-200 units was just enough for us to make a compelling RTS.
As soon as we got the basics of our game up and running in-engine, we put our math to the test. Unreal performs many improvements on the naive model we described above, so we were able to get 200 units moving on the screen simultaneously with little effort, after aggressively pushing the maximum bandwidth per client upwards. We used the ‘net stat’ command in UDK to monitor our network usage. Here’s a snapshot of the stats with nearly 200 units moving at once.
In a 3v3 match, we put the cap for each player at ~20 units, leaving half again as many for neutral monsters that players would fight around the map.
We expected moving units to be our biggest bandwidth hog, but there is a whole category of issues outside of server -> client position updates that we had to solve. Fortunately, we had a lot more freedom to come up with answers. First question: how do we tell the server to start moving units in the first place?
In Unreal, there are two primary ways to send data between client & server: replicated variables, and Remote Procedure Calls (RPC).