AI War 2:Discussion Of Multiplayer Desyncs And Primary Keys

From Arcen Wiki
Jump to navigation Jump to search

Breakdown of the Types Of Threads

  • Anything that happens on the long-term planning thread is happening on the host only.
    • So if you CHANGE the simulation directly in any way, that is an instant desync.
    • "The simulation" means anything that is part of the core sim that is supposed to be the same on all the clients and the host.
  • Anything that is on the other background threads that are short-term planning or otherwise ARE the sim, and happen consistently for everyone.
  • If you generate a GameCommand, it will be executed within about 400ms, for everyone, on a background sim-thread.

Therefore, when do you need to use a GameCommand, and how?

  • So anything that the long-term planning thread wants to "touch" that is part of the sim needs to be sent via a GameCommand.
  • And similarly, any time that any other background thread, or the main thread, generates a GameCommand based off of sim logic, it needs to wrapper that so that it does it from the host only (otherwise you will have n number of copies of the GameCommand, where n is the total playercount).

What Is Part Of The Sim And What Is Not?

  • The sim is deterministic, which means that basically anything that is not predictable from the recent past is probably not sim.
    • The two major categories of this are:
      • Anything the player does, in terms of directly clicking something or hitting a hotkey. We don't know what buttons you will press.
      • Anything that the "long term planning" threads of the factions do.
        • These happen over the course of sometimes multiple seconds
        • These are specifically designed so that they run on the host only.
        • These are basically meant to be the "consciousness" of those factions, or higher-order thoughts that take longer to complete than a few ms can provide.

How does that turn out in terms of data and threads?

  • The main thread mostly is part of the sim.
    • However, it is also responsible for things like drawing the UI state, and those are not part of the sim.
    • For instance, if you click a button, that is a nonsim thing to do. Other machines have no idea you did that.
    • If you click a button in the UI, or click a ship and issue an order in the main 3D world space, then those get turned into GameCommands or they didn't happen.
    • Directly editing sim data based off a player click is an instant desync.
  • There are a bunch of "short term" background threads that are part of the sim.
    • Things like strength counting, forcefield calculations, and so on.
    • These give a consistent, deterministic result. HOWEVER, you have no idea how many cores they are running on, which one finishes first, etc.
      • So you can't do anything with them that requires order of operations to matter between them (aka, you can't look at the forcefield protection status from this cycle from the strength counting thread, you actually look at the data that was cached from one frame back).
    • IF you issue a GameCommand from any of these threads (which sometimes you do need to do), you need to wrap that in code that says "only send the command from the game host."
      • Otherwise you wind up with n copies of the GameCommand, where n is the number of players in multiplayer.
    • We think of all of this sort of thing as either being "subconscious thinking" of the ships, very basic, or else it's just the simulation of prior order.
  • There are various "long term continuous" threads, which calculate super-expensive things on an ongoing basis.
    • Targeting and decollision are two of these things. These things can take many seconds to run, and thus can't be part of the sim because we can't wait for that sort of thing before moving to the next sim step.
    • So these things work in isolation on the host only, looking at the sim data as it exists over time, and then periodically dump some GameCommands in for everyone to execute soon.
    • These are something we would consider basically like "subconscious thinking," but it's still nonsim because it's too expensive to make part of the sim. The game would collapse under its own weight trying to make truly good decisions for this within a 100ms window that it normally has for a sim step.
  • There are then "long term intermittent" background threads, one per TYPE of faction.
    • These are the "conscious thinking" for the faction, or other data that requires longer-term thought or more expensive calculations.
    • Please note that while there can be multiples of a faction type, only one thread from that faction type is run at a time. Beyond that, many factions can be running at the same time.
    • The end result of these threads is either GameCommands, or UI information.
    • One example is the various notifications at the top of the screen. Those are entirely non-sim UI elements, but are too expensive to calculate on the main thread. So we calculate those here and then provide them to the local player to see.
      • If the player clicks on them and that would cause a change to the sim, then the UI element would be responsible for generating a GameCommand.
    • Another example is the concept of Fireteams. These exist solely in faction data, basically in the "conscious thought" of the faction.
      • None of the sim code has any concept of a fireteam, although it can check things like fireteam IDs. But it probably shouldn't, since that would be inconsistent on clients other than the host.
      • Fireteams actually mean something because after doing a LOT of calculations, the long-term intermittent thread from the faction winds up issuing GameCommands telling specific ships to do something.
      • The "external data" of the faction keeps a lot of data about fireteams, and about various other non-sim stuff, but the sim is ignorant of all that. It's a one-way looking glass.
      • So you can change fireteam data all day, and not affect the sim at all unless you issue GameCommands afterwards. That's true of pretty most "external" data on factions or ships.
  • Stage 1/2/3 method calls on the faction classes. (DoPerSecondLogic_Stage1Clearing_OnMainThreadAndPartOfSim, etc)
    • These method calls are actually part of the sim, and happen for everybody.
    • These give you a chance to extend the subconscious of the sim itself, and you can use external data on ships and globally to do so.
    • This has the nice benefit of not requiring you to make GameCommands to have things change, but the downside that if you do too many expensive operations in here, the game simulation will slow down. In the "long term intermittent" threads, if you take 5 seconds to calculate some impressively dense data, then it's no slowdown at all. But here that would cause an enormous hang of the game and turn it unplayable.

Why Is This So Complicated?

So the game can be smart, but also still run. There are several types of things that we need to do to make this game happen:

  • Sometimes things just need to run locally to make the UI appear a certain way.
    • But sometimes those things are SLOW and we want them to run on a background thread.
  • For the core part of the simulation, we need it to run deterministically (or close) for about 400ms at a time with no input from anyone, so that multiplayer has a chance to get data to everyone (among other reasons that benefit singleplayer as well).
    • These pieces still require crunching a LOT of data, though, and people typically have more than one core in their computer. So some of this is broken out into multiple threads.
    • Some of this stuff is also modded-in logic, or faction-specific logic, so we need to run that on a faction object, but it's still part of the main simulation.
  • For the higher-order thinking or truly obscenely-expensive-to-calculate things, we need to not have a timer on us that is measured in ms.
    • This needs to be calculated in a long-form format and then "told to the sim" via GameCommands.
    • There is no point running this on the clients, because they would just be duplicating effort that already has to be inserted into the sim anyway. And they would likely come up with different results than the host, because different computers run at different speeds and these are not in sync with the main deterministic threads.
    • Calculating the targeting for thousands of ships against thousands of ships in a truly intelligent way takes time; sometimes only half a second, but even that is "slow" by our standards here.
  • Any input from players, like them clicking something that changes the sim (not clicking part of the UI or just changing their view) needs to be scheduled so that it can happen simultaneously for them and other players in the near future.

How Often Do Threads Run?

  • The main thread runs... as frequently as it can, roughly.
    • This is what keeps the game responsive and feeling nice as you scroll around and ships move, etc.
    • It runs at some variable frequency, but basically this is what your framerate is.
    • If you turn on Vsync on a 60Hz screen, you are locking the main thread to being exactly 60 times per second. And you're enforcing the interval of that, too. Don't do that, it's hugely wasteful of your CPU and degrades performance.
    • If you set a target framerate of 80, it will try to maintain roughly 80 FPS on average, but allow properly for uneven-length frame times, which is good.
    • So, to put all this in context, we are HOPING that the main thread runs every 16.67ms on average at bare minimum, but in practice it can often run more like every 10ms if your GPU hardware is good and you're not looking at too much.
    • As you can see, this is hardly any time to do anything, so we can't have any complicated AI in here. We can't even really run the simulation in here!
  • The main simulation is actually run on a background thread, but we refer to that as the "main simulation thread," partly because once-upon-a-time it was run on the main thread, and it's possible to make a setting to have it run there again (but don't do that!).
    • The main simulation thread does almost nothing other than coordinate other simulation threads, and clean up some data (like ships that were marked as destroyed and now need to be removed). But that's still enough not to burden the actual main program thread with it.
  • The simulation itself runs at a constant 10fps, which means it runs 10 "simulation steps" every second, so once every 100ms.
    • It's worth noting that part of what the main thread (the real one -- the one that draws things) does is a whole lot of linear interpolation between the last position of a ship or shot and the new position, so that you get smooth movement of those objects and not a jerky 10fps slideshow while your view slides around smoothly at 60fps.
  • So, all of those "short term planning threads" are kicked off by the "main simulation thread" (which is separate from the true main thread) at the start of every sim step.
    • Aka, each of these runs once every 100ms. If any one of them takes substantially longer than maybe 50-60ms to run, then you're going to have a lowered game speed, but NOT a lower framerate.
      • That's why you can have a very nice framerate going on, but a game speed of 80% because of background processing.
      • Conversely, if your framerate tanks because your GPU can't handle all the things it has to draw, then your background simulation performance will be affected SOME (it probably has pegged one of your CPU cores, but also the timing of "each short-term thread is started close to every 100ms) gets thrown out the window.
  • AFTER all of the "short term planning" threads have ALL finished for a sim step, then it's time for the "main background sim thread" to kick off the "do world simulation logic.
    • This is things like moving ships around, and calling all that logic on the factions stage 1/2/3 (DoPerSecondLogic_Stage1Clearing_OnMainThreadAndPartOfSim and similar).
    • So these things happen about once every 100ms as well, but definitely no more frequently than that. For 100% game speed, it should average out to 100ms, but the gaps between each frame are variable.
    • Bear in mind that these have to share the 100ms window with all the "short term planning threads," and they have to fit into the space after all those other threads are done.
      • So if the slowest "short term planning" thread takes 70ms, then all the rest of the sim processing has 30ms to finish up.
  • All of the "long term continuous" threads, like decollision checking and targeting, run... well, continuously, as you might expect.
    • Basically these things run for a while, which could be a few hundred ms or a few seconds, and then start over when they are done.
    • There are some limiters in there from keeping them from running faster than the simulation could use the data, if the simulation is that small at the moment (few planets, early game, etc).
      • For instance, if it would be flooding you with GameCommands every 50ms, there's no point to that and it would just bog down your machine and/or the network, PLUS there would be no time for the prior results to even be reflected in the simulation, so it waits for an appropriate length of time to avoid doing that to you.
    • Beyond that sort of case, if we assume that these are running heavily -- slower than the simulation, which is why they are separate from the simulation anyhow -- then they wind up running for a few seconds, then immediately starting over and running for another few seconds.
    • Each of these threads is fairly independent of the others in terms of how frequently it runs -- aside from possible contention for CPU time on cores, one of these running more slowly or longer won't make others runt more slowly or take longer.
    • These run even while the game is paused!
  • The "long term intermittent" threads are all related to factions, and run on a somewhat similarly to the continuous ones.
    • However, these don't just start up willy-nilly all the time like the others do. The main simulation background thread says "okay, run one cycle of all the faction threads," and then they all try to run.
    • They take as long as they take, and not quite all of them run at once. As noted elsewhere, any duplicate copies of the same TYPE of faction will run serially instead of in parallel.
    • Once all of these have finished running, which might take a few dozen ms or a few dozen seconds in a super heavy game, then they go into a rest state waiting for the next time they will be awoken.
    • In most cases, since these take a while to run, if the game is not paused they will probably kick off within a half second at most after they all finish up.

Q&A Items

The goal here is for other programmers or modders to be able to read some of the discussions that are had, and gain some understanding of this complicated topic from it. A more formal writeup would be great, and is something we will do, but these illustrate various use cases and their particular pitfalls.

Can I Add A Planet Without A GameCommand?

Badger: Oh, re: multiplayer and gamecommands, does the "Create new planet" mechanic need to go in a GameCommand rather than in the stage3 sim code?

Chris: Ah... technically not, although you may want to for multiplayer and single-player purposes. Well... actually maybe yes, even so, for some edge cases.

All of the "main thread" stages 1, 2, and 3 all happen on all of the clients and the host, so anything you do in there is going to be okay for multiplayer if it always happens the same way. So the simple answer is that creating a planet should be absolutely A-OK during this, and it should happen just fine on all clients.

BUT.

There's no guarantee which of the main thread stage 1s, 2s, and 3s will run in which order. You are guaranteed that all the stage 1s run, then all the stage 2s, then all the stage 3s. But if I have one core and you have 16, my computer will do them all much more slowly and in sequential-ish random-ish order, while yours will do 16ish of them at once in a race. If ANYTHING you do in those threads changes the immediate game state (create a planet, create a unit, alter health of a unit, whatever else) in a way that would affect the outcome of another thread, then you've introduced a desync. The massive disparity in cores will make it more obvious and frequent, but even if we both had 16 cores or 4 cores or 1 core, we'd have threads racing each other and executing in random order.

The saving grace is that this game knows that stuff like that will happen, and it can recover from it without too much trouble. But the more things like that we have, the more the network bogs down with correction-code. Running correction-code for an entire planet is going to be potentially quite heavy. Running it for all the various units that might pathfind differently if the planet is added a few ms earlier or later is just going to compound things a bit. This is assuming that all the planets being added aren't in a cul-de-sac, but even if they are there might be some fireteams that want to go over there.

Which actually brings me to single-player oddities that might happen. If you add some planets in code like that, and then populate them via gamecommands (that will be a must to make sure primary key IDs from the units are all identical on the host and all clients in multiplayer), then what now structurally happens is that even in single-player you might have a gap of... let's say up to 400ms where these new planets are absolutely un-owned by anyone, and look empty. So a quick-thinking faction of some sort may start making a beeline for the new empty territory, all sorts of decisions might happen in that period, and then they look very foolish and get into a big scuffle as soon as the new occupants arrive 400ms later while the other faction is still committed to this new destination that is suddenly dangerous.

In short, putting in the planet and all the stuff to go on it as part of a single SET of gamecommands (aka issued during the same 100ms "turn" would be a really good idea. There are still likely to be some edge cases where bits populate in 400ms too late if it's not literally all in one gamecommand in some fashion, but that is probably unavoidable. I guess the optimal thing from a sync standpoint would be one command that creates and populates the planet, though.

Backing up to less common cases that are still problematic, let's suppose we have two factions adding new planets at the same time as some sort of coincidence. Right now only one faction can even do that, so we think of that as impossible. But via mods or future DLC or whatever we might have more factions that can do it all at once. If we did, and both factions are creating planets at once during the same sim frame, then you might wind up with differences between clients and hosts on THAT basis, too. Faction A creates 4 planets named b, c, e, and d, numbered 10, 11, 12, and 13. Faction N creates 1 planet named z, and numbered 14. Except because these are multiple threads, on another computer planet z might be numbered anywhere from 10-14, thus throwing off the numbering of all the other planets.

After such a severe mashup, any future gamecommands talking about "planet 14" are going to be incredibly out of sync, and lots of really hard-to-fix data is going to be super duper wrong. The sort of sync-fix code that I'm writing would never account for that sort of data being wrong, because it's going to assume that primary keys -- and by extension planet indexes and faction indexes -- are gospel. So instead of it getting fixed via sync, probably all of the ships will be corrected to match the host, but the planet ownership will be a bit off until you stop and reload the game. I mean, I was planning on having it re-sync planet ownership also, so that part would be fine... but there's all this custom backend code for factions where factions think they own this and that and keep track of "their" planets. There is no generalized way for sync code to see differences there or fix them, so that's the part that would persist in an incorrect way.

So! All of that is to say, anything that creates a new primary key ID, something similar, without going through a gamecommand and getting a sanctioned ID from the host, is likely to create an irreversible desync that requires saving, loading, and reconnecting. A shot would be not a big deal because those are short-lived and the sync stuff would fix it up. Ships would also be found and fixed... but all that metadata if it is owned by another faction would be wrong for a very long time and very problematic potentially. Wormholes or planets or factions would actually be the worst cases, as other than broken mods or bad-acting dev code there's no way those could be wrong and thus no reason to check these things overly frequently.

So... after a lot of thought, and contrary to my initial thoughts: the end answer is yes, we absolutely need to do this by gamecommand. It can be done by Stage3 if you want, but it needs to only be run on the host (there's a flag for that, or a flag for !client, either way), since we don't want 4 copies of the commands if there are 4 human players connected. BUT if you use Stage3 instead of long term planning, you will actually have a secondary problem with Rand(). If you use Rand() at all in a main thread that is supposed to be identical on all the machines, but some of those Rand() calls are inside hostonly or clientonly code, then the values that further calls to that Rand() will all be inconsistent on that thread on that frame. All of that should be recoverable by the sync code, but I'd rather not spike the network with that happening.

If you need to use Rand() and only do so on the host, then you can either use Engine_Universal.PermanentRandom (normally a huge no-no, but since it is non-synchronous it actually is fine here), or you can create a new random to use temporarily for just this purpose (kind of a waste, honestly), or you can just use the long-term planning (that may be inconvenient for code organization, or too slow to come around since those are considered long-term actions that may have a few seconds between them). Suffice it to say, if you are in one of the main thread things (stage3 or otherwise) and you're bracked into hostonly or clientonly code, just be sure not to use Context.Random or its variants or that's a desync.

This is so lengthy and so full of the general thought process behind everything that I think I'll put this on the wiki. ;)