AI War 2:Discussion Of Multiplayer Desyncs And Primary Keys
Contents
Breakdown of the Types Of Threads
- Anything that happens on the long-term planning thread is happening on the host only.
- So if you CHANGE the simulation directly in any way, that is an instant desync.
- "The simulation" means anything that is part of the core sim that is supposed to be the same on all the clients and the host.
- Anything that is on the other background threads that are short-term planning or otherwise ARE the sim, and happen consistently for everyone.
- If you generate a GameCommand, it will be executed within about 400ms, for everyone, on a background sim-thread.
Therefore, when do you need to use a GameCommand, and how?
- So anything that the long-term planning thread wants to "touch" that is part of the sim needs to be sent via a GameCommand.
- And similarly, any time that any other background thread, or the main thread, generates a GameCommand based off of sim logic, it needs to wrapper that so that it does it from the host only (otherwise you will have n number of copies of the GameCommand, where n is the total playercount).
What Is Part Of The Sim And What Is Not?
- The sim is deterministic, which means that basically anything that is not predictable from the recent past is probably not sim.
- The two major categories of this are:
- Anything the player does, in terms of directly clicking something or hitting a hotkey. We don't know what buttons you will press.
- Anything that the "long term planning" threads of the factions do.
- These happen over the course of sometimes multiple seconds
- These are specifically designed so that they run on the host only.
- These are basically meant to be the "consciousness" of those factions, or higher-order thoughts that take longer to complete than a few ms can provide.
- The two major categories of this are:
How does that turn out in terms of data and threads?
- The main thread mostly is part of the sim.
- However, it is also responsible for things like drawing the UI state, and those are not part of the sim.
- For instance, if you click a button, that is a nonsim thing to do. Other machines have no idea you did that.
- If you click a button in the UI, or click a ship and issue an order in the main 3D world space, then those get turned into GameCommands or they didn't happen.
- Directly editing sim data based off a player click is an instant desync.
- There are a bunch of "short term" background threads that are part of the sim.
- Things like strength counting, forcefield calculations, and so on.
- These give a consistent, deterministic result. HOWEVER, you have no idea how many cores they are running on, which one finishes first, etc.
- So you can't do anything with them that requires order of operations to matter between them (aka, you can't look at the forcefield protection status from this cycle from the strength counting thread, you actually look at the data that was cached from one frame back).
- IF you issue a GameCommand from any of these threads (which sometimes you do need to do), you need to wrap that in code that says "only send the command from the game host."
- Otherwise you wind up with n copies of the GameCommand, where n is the number of players in multiplayer.
- We think of all of this sort of thing as either being "subconscious thinking" of the ships, very basic, or else it's just the simulation of prior order.
- There are various "long term continuous" threads, which calculate super-expensive things on an ongoing basis.
- Targeting and decollision are two of these things. These things can take many seconds to run, and thus can't be part of the sim because we can't wait for that sort of thing before moving to the next sim step.
- So these things work in isolation on the host only, looking at the sim data as it exists over time, and then periodically dump some GameCommands in for everyone to execute soon.
- These are something we would consider basically like "subconscious thinking," but it's still nonsim because it's too expensive to make part of the sim. The game would collapse under its own weight trying to make truly good decisions for this within a 100ms window that it normally has for a sim step.
- There are then "long term intermittent" background threads, one per TYPE of faction.
- These are the "conscious thinking" for the faction, or other data that requires longer-term thought or more expensive calculations.
- Please note that while there can be multiples of a faction type, only one thread from that faction type is run at a time. Beyond that, many factions can be running at the same time.
- The end result of these threads is either GameCommands, or UI information.
- One example is the various notifications at the top of the screen. Those are entirely non-sim UI elements, but are too expensive to calculate on the main thread. So we calculate those here and then provide them to the local player to see.
- If the player clicks on them and that would cause a change to the sim, then the UI element would be responsible for generating a GameCommand.
- Another example is the concept of Fireteams. These exist solely in faction data, basically in the "conscious thought" of the faction.
- None of the sim code has any concept of a fireteam, although it can check things like fireteam IDs. But it probably shouldn't, since that would be inconsistent on clients other than the host.
- Fireteams actually mean something because after doing a LOT of calculations, the long-term intermittent thread from the faction winds up issuing GameCommands telling specific ships to do something.
- The "external data" of the faction keeps a lot of data about fireteams, and about various other non-sim stuff, but the sim is ignorant of all that. It's a one-way looking glass.
- So you can change fireteam data all day, and not affect the sim at all unless you issue GameCommands afterwards. That's true of pretty most "external" data on factions or ships.
- Stage 1/2/3 method calls on the faction classes. (DoPerSecondLogic_Stage1Clearing_OnMainThreadAndPartOfSim, etc)
- These method calls are actually part of the sim, and happen for everybody.
- These give you a chance to extend the subconscious of the sim itself, and you can use external data on ships and globally to do so.
- This has the nice benefit of not requiring you to make GameCommands to have things change, but the downside that if you do too many expensive operations in here, the game simulation will slow down. In the "long term intermittent" threads, if you take 5 seconds to calculate some impressively dense data, then it's no slowdown at all. But here that would cause an enormous hang of the game and turn it unplayable.
Q&A Items
The goal here is for other programmers or modders to be able to read some of the discussions that are had, and gain some understanding of this complicated topic from it. A more formal writeup would be great, and is something we will do, but these illustrate various use cases and their particular pitfalls.
Can I Add A Planet Without A GameCommand?
Badger: Oh, re: multiplayer and gamecommands, does the "Create new planet" mechanic need to go in a GameCommand rather than in the stage3 sim code?
Chris: Ah... technically not, although you may want to for multiplayer and single-player purposes. Well... actually maybe yes, even so, for some edge cases.
All of the "main thread" stages 1, 2, and 3 all happen on all of the clients and the host, so anything you do in there is going to be okay for multiplayer if it always happens the same way. So the simple answer is that creating a planet should be absolutely A-OK during this, and it should happen just fine on all clients.
BUT.
There's no guarantee which of the main thread stage 1s, 2s, and 3s will run in which order. You are guaranteed that all the stage 1s run, then all the stage 2s, then all the stage 3s. But if I have one core and you have 16, my computer will do them all much more slowly and in sequential-ish random-ish order, while yours will do 16ish of them at once in a race. If ANYTHING you do in those threads changes the immediate game state (create a planet, create a unit, alter health of a unit, whatever else) in a way that would affect the outcome of another thread, then you've introduced a desync. The massive disparity in cores will make it more obvious and frequent, but even if we both had 16 cores or 4 cores or 1 core, we'd have threads racing each other and executing in random order.
The saving grace is that this game knows that stuff like that will happen, and it can recover from it without too much trouble. But the more things like that we have, the more the network bogs down with correction-code. Running correction-code for an entire planet is going to be potentially quite heavy. Running it for all the various units that might pathfind differently if the planet is added a few ms earlier or later is just going to compound things a bit. This is assuming that all the planets being added aren't in a cul-de-sac, but even if they are there might be some fireteams that want to go over there.
Which actually brings me to single-player oddities that might happen. If you add some planets in code like that, and then populate them via gamecommands (that will be a must to make sure primary key IDs from the units are all identical on the host and all clients in multiplayer), then what now structurally happens is that even in single-player you might have a gap of... let's say up to 400ms where these new planets are absolutely un-owned by anyone, and look empty. So a quick-thinking faction of some sort may start making a beeline for the new empty territory, all sorts of decisions might happen in that period, and then they look very foolish and get into a big scuffle as soon as the new occupants arrive 400ms later while the other faction is still committed to this new destination that is suddenly dangerous.
In short, putting in the planet and all the stuff to go on it as part of a single SET of gamecommands (aka issued during the same 100ms "turn" would be a really good idea. There are still likely to be some edge cases where bits populate in 400ms too late if it's not literally all in one gamecommand in some fashion, but that is probably unavoidable. I guess the optimal thing from a sync standpoint would be one command that creates and populates the planet, though.
Backing up to less common cases that are still problematic, let's suppose we have two factions adding new planets at the same time as some sort of coincidence. Right now only one faction can even do that, so we think of that as impossible. But via mods or future DLC or whatever we might have more factions that can do it all at once. If we did, and both factions are creating planets at once during the same sim frame, then you might wind up with differences between clients and hosts on THAT basis, too. Faction A creates 4 planets named b, c, e, and d, numbered 10, 11, 12, and 13. Faction N creates 1 planet named z, and numbered 14. Except because these are multiple threads, on another computer planet z might be numbered anywhere from 10-14, thus throwing off the numbering of all the other planets.
After such a severe mashup, any future gamecommands talking about "planet 14" are going to be incredibly out of sync, and lots of really hard-to-fix data is going to be super duper wrong. The sort of sync-fix code that I'm writing would never account for that sort of data being wrong, because it's going to assume that primary keys -- and by extension planet indexes and faction indexes -- are gospel. So instead of it getting fixed via sync, probably all of the ships will be corrected to match the host, but the planet ownership will be a bit off until you stop and reload the game. I mean, I was planning on having it re-sync planet ownership also, so that part would be fine... but there's all this custom backend code for factions where factions think they own this and that and keep track of "their" planets. There is no generalized way for sync code to see differences there or fix them, so that's the part that would persist in an incorrect way.
So! All of that is to say, anything that creates a new primary key ID, something similar, without going through a gamecommand and getting a sanctioned ID from the host, is likely to create an irreversible desync that requires saving, loading, and reconnecting. A shot would be not a big deal because those are short-lived and the sync stuff would fix it up. Ships would also be found and fixed... but all that metadata if it is owned by another faction would be wrong for a very long time and very problematic potentially. Wormholes or planets or factions would actually be the worst cases, as other than broken mods or bad-acting dev code there's no way those could be wrong and thus no reason to check these things overly frequently.
So... after a lot of thought, and contrary to my initial thoughts: the end answer is yes, we absolutely need to do this by gamecommand. It can be done by Stage3 if you want, but it needs to only be run on the host (there's a flag for that, or a flag for !client, either way), since we don't want 4 copies of the commands if there are 4 human players connected. BUT if you use Stage3 instead of long term planning, you will actually have a secondary problem with Rand(). If you use Rand() at all in a main thread that is supposed to be identical on all the machines, but some of those Rand() calls are inside hostonly or clientonly code, then the values that further calls to that Rand() will all be inconsistent on that thread on that frame. All of that should be recoverable by the sync code, but I'd rather not spike the network with that happening.
If you need to use Rand() and only do so on the host, then you can either use Engine_Universal.PermanentRandom (normally a huge no-no, but since it is non-synchronous it actually is fine here), or you can create a new random to use temporarily for just this purpose (kind of a waste, honestly), or you can just use the long-term planning (that may be inconvenient for code organization, or too slow to come around since those are considered long-term actions that may have a few seconds between them). Suffice it to say, if you are in one of the main thread things (stage3 or otherwise) and you're bracked into hostonly or clientonly code, just be sure not to use Context.Random or its variants or that's a desync.
This is so lengthy and so full of the general thought process behind everything that I think I'll put this on the wiki. ;)