I wanted to write this post for a while. It describes a C++ technique to implement rollback in the context of multiplayer games that I feel is quite interesting and useful.
The tl;dr is: don't bother serializing individual objects, just rollback all the memory.
Rollback-based multiplayer
I've been working on a multiplayer version of PewPew, and for reasons that are outside of the scope of this post, I chose to implement multiplayer with deterministic lockstep and rollback.
The basic idea behind rollback-based multiplayer is that the inputs of players are replicated to all the players.
Whenever a player receives the inputs of another player, the state of the game is rolled back to the point where the input happened and fast-forwarded back to the present so that the state shown to a player takes into account the inputs of the other players. Because history is being re-computed, some events get undone.
For example, it's possible a player saw themselves taking a bonus, but after receiving the inputs of the other player, the other player ends up actually taking the bonus before. In practice this is not a big deal: we only rollback as much as the network latency require, which should be less than 100 milliseconds, meaning incorrect states are only shown for a short time. If this is still a problem, there are mitigation techniques such as delaying the visual or audio feedback of events.
My first attempt at implementing the rollback would serialize the state of every object in the game and deserialize them whenever rollback occurred. The code for serializing and deserializing every object was slow but worked well (
video of a demo).
Serializing is *the* challenge of rollback-based multiplayer
A little bit later in the development process, I started using a Lua interpreter to run the script that describes what should happen in a level.
The interpreter's state changes based on the input of the players and can trigger events. For instance, the script may count the number of enemies destroyed, and spawn a bonus once that number reaches a certain threshold.
Since the Lua interpreter is storing a piece of the Game State, whenever the Game State needs to be rollbacked, the state of the Lua interpreter needs to be rollbacked as well.
The problem is that a Lua interpreter is not easily rollback-able because there's no simple way to serialize the state of the interpreter (the interpreter's state has pointers to game objects).
I considered dropping the interpreter and implementing the gameplay logic in C++, but even then serializing the gameplay logic was going to painful and error-prone.
A solution: Rollback ALL the memory that contains the state of the game
A completely different approach is to treat the game code (including the interpreter) as a black box and only consider the memory that was allocated by the game code.
You can take a snapshot of the Game State by making a copy of all the memory that was allocated for it.
When you want to rollback to a past state, you copy the snapshot back to the location in memory where it was taken.
To achieve that you need to separate the code that updates the state from the other systems, including the rendering and the UI. However, once this separation is in place you don't have to worry about the serialization any more as it happens transparently.
To emphasize why this technique is great, I'll quote Ramón Franco on his experience working on Killer Instinct, a game which uses rollback for multiplayer (
source):
Also, any time a new element is added to the game, it has the possibility of affecting your game state and making it so rollbacks could cause a desync. There’s no getting around that. When Keits comes to me and says "hey, I want these projectiles to be random and sometimes four of them come out and they loop around in odd patterns," these are now elements I have to make sure can be rolled back, including adding new elements to the code. I can train programmers to make sure their variables are registered with our rollback system, but it’s unavoidably more work.
Well Ramón, there is getting around that. If you rollback all the memory, you don't have to think about making sure every individual variable gets properly serialized anymore!
Implementation details
Efficient snapshotting
In C++, intercepting the memory allocations can be done by overriding the
new operator and its variants. Tracking the allocations of the code to take snapshots is then simple, but taking a snapshot of all the allocations will be slow because the allocations may not be continuous, and could require as much as one memcpy
per allocation.
A much more efficient solution is to use a custom memory allocator that is designed for efficient snapshotting.
By using a custom memory allocator, you don't have to maintain the tracking of the allocated pieces yourself, you can just read the data structure maintained by your memory allocator. You can also design the memory allocator so that it focuses on having a continuous chunk of allocated memory to reduce the numbers of memcpys, and have the memcpys be done in a cache-friendly way.
Note that because the snapshot contains pointers that point to locations inside the snapshot, when the times comes for the snapshot to be restored it needs to be copied back to the same memory locations it was copied from. A simple solution is to have the allocator allocate memory from a fixed chunk of memory. This guarantees that we will always be able to copy back the snapshot's content to the memory location it was taken from, ensuring that the pointers it contains will be valid.
I call an allocator that allows taking snapshot and restoring them a SnapshotableAllocator.
For the record, my implementation of a SnapshotableAllocator in use in PewPew Live is available
here.
Coexisting with the rest of the code
Only the memory that needs to be rolled-back should be allocated by the SnapshotableAllocator. The rest of the memory (the textures, the 3D models, the audio, the UI, etc...) should be allocated by your usual non-snapshotting allocator.
The new operators must therefore be able to switch allocator at runtime. Moreover, the configuration of which allocator to use must be on a thread-by-thread basis.
In my implementation, a thread_local pointer stores which allocator to use. The new operators use that thread_local pointer to choose which allocator to use.
In order to tell the compiler that a section of the code needs to use a different allocator, I find that using RAII is convenient. When I want to wrap a section of the code with a different allocator, I create a ScopedAllocatorUsage object which changes the thread_local pointer in its constructor. When the ScopedAllocatorUsage goes out of scope, its destructor restores the thread_local pointer to the original value.
Example
The following example creates a string, takes a snapshot, changes the content of the string, uses the snapshot to rollback the content of the string, and finally destroys the string.
Notice how the initialization, updates, and destruction of the string are in blocks that start with ScopedAllocatorUsage.
SnapshotableAllocatorImpl allocator(8);
std::unique_ptr<std::string> s;
// Initialize the string with "hello"
{
ScopedAllocatorUsage sau(&allocator);
s = std::make_unique<std::string>();
*s = "hello";
}
// Take a snapshot
auto snapshot = allocator.TakeSnapshot();
// Change the value of the string to "world"
{
ScopedAllocatorUsage sau(&allocator);
*s = "world ";
}
// Verify the value of the string before and after the rollback.
assert(*s == "world ");
allocator.LoadMemoryFromSnapshot(*snapshot);
assert(*s == "hello");
// Free the memory
{
ScopedAllocatorUsage sau(&allocator);
s.reset();
}
Also notice how the SnapshotableAllocator is completely oblivious to what data structure it is taking snapshots of.
Memory usage
The game being deterministic, the game offers replays. Replays store the initial seed for a game and all the player inputs that occurred, making it possible to replay the game.
Those replays are used to verify the validity of scores, but they are also used by the players to watch and learn from the best players.
Those replay only allow you to simulate the game forward, but we can re-use the rollback system to provide rewind capabilities.
In PewPew Live, rewind is implemented by taking a snapshot anytime you've simulated an additional 0.25% of the replay. This still means that by the end of the replay, 400 snapshots will be kept in memory.
In my case, the snapshots take around 60KB, and an individual snapshot can be compressed down to 25KB with deflate, assuming the SnapshotableAllocator zeroes the memory before handing it out. If you don't plan on compressing individual snapshots, there's no point in zeroing the memory.
Delta-compressing two successive snapshots would likely yield far better compression ratios, but I haven't tried that.
Persisting the snapshots (disclaimer: I haven't actually tried this.)
The snapshots can be restored only as long as the SnapshotableAllocator has been alive, because the application needs to be able to copy the snapshots back to the location where they were taken.
This limitation can be worked around by having the SnapshotableAllocator use memory obtained at a fixed location with mmap.
For example, the following magic incantation allocates 128KiB at the address 0x800000000.
void* ptr = mmap((void*)0x800000000,
4096 * 32,
PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE,
-1, 0);
Using this trick, the snapshots can be persisted to disk, for instance, to persist the player's progress across launches of the game.
The snapshot could even be exchanged over the network, assuming the receiving side has the same endianness, the same pointer size, is running the same binary, and can mmap the same memory location.
In the general case, having the same representation of floating-point numbers on both sides is vital, but in my case not necessary as I'm not using floating-point numbers to store anything important in the game state.
I wouldn't be surprised if it was possible to exchange snapshots between arm64 and x64!
A side benefit of having an allocator distribute chunks of memory from a fixed location is that it makes the allocator deterministic, which can help make the application deterministic. Determinism is a big topic that I will one day elaborate on in a separate blog post.
Pitfalls
By far the biggest pitfall of this technique is not respecting the separation between the Game State and the rest of the game.
For instance, consider the case of an std::map that contains the achievements of the player that is instantiated at the launch of the application. If in the middle of a game the code that updates the Game State tries to insert something in that std::map, the game will crash.
That's because the std::map was allocated with the default allocator, and the game code will try to modify the std::map while the SnapshotableAllocator is active.
Obviously this example was not made up out of thin air, it actually happened to me.
Another manifestation of this bug happens if you declare and use static variables in a function that is called by the code that updates the Game State.
The static variable will be initialized with the SnapshotableAllocator, which means that the static variable is valid only during the lifetime of the SnapshotableAllocator.
This bug is more insidious for two reasons:
1/ Code that does not have any pointers to the "outside" may still create a crash.
2/ The crash won't happen immediately.
Fortunately those errors are detected by ASAN.
Conclusion
Even-though I stumbled upon this technique because of a black box that I couldn't easily serialize, it turned out to be extremely convenient. It is also possibly more efficient than regular serialization.
There are some pitfalls to deal with when setting up the game engine, but afterward, it is completely transparent. I don't have to think about the fact that a different allocator is used (except for static variables). Whatever state and logic I add, whether in C++ or Lua, it will be properly rolled back.
And while the technique was developed to implement multiplayer, it is also used to provide rewind capabilities for replay, and I am even considering using it to add time travel elements to the gameplay (
video demonstrating time travel).
Thanks for the allocator source. it saves me a lot of time.
ReplyDeleteBTW, a dynamic bucket that can allocate new slots on demand would be nice.
ReplyDelete