I've been looking for ways to reproduce/fix the runtime join desync in Clonk Rage and need your help. One cause was a bug in loading/saving savegames that occurred in some scenarios. But there is also at least one bug in the control/network part causing desyncs on runtime join. A year ago I managed to reliably reproduce a runtime join desync that occurred immediately after joining a game. It was caused by the controls of the first control tick the joining client received getting executed twice. I tried to find some older Clonk binaries on the Internet and found out that CR 4.9.9.7 [312] did not desync on runtime join while 4.9.10.1 [318] did. It was not possible to further narrow down the version where it occurred first since I couldn't find any versions between [312] and [318].
It looked like the method
C4GameControlNetwork::PackCompleteCtrl
was involved in the desync, so I compared the disassembly of [312] and [318] and noticed that the method has changed. I reconstructed the [312] version of the method from its disassembly and used it to replace the current version. Now I'm using this workaround for more than a year and it doesn't seem to have any side effects. However, this is not a real solution and runtime join desyncs still occur, albeit not as often as before the workaround.With the help of some friends I managed to record a desyncing game where all engines involved used the workaround. Unfortunately, the record did not provide any meaningful information as it did with the PackCompleteCtrl desync. Also it was not possible to create debug records since the desync is much less likely to happen when the game is running slow.
So the best way to go is probably to compare the source code of a version without runtime join desync bug with a version where it is present. As I already mentioned, [312] is still working, while [318] contains the desync bug. The only entry in the change log that looks like it could have something to do with the desync is
+ Async network mode {13874, 13879}
, as it appears to be the only change that affects the network code. If this is the cause, version [317] should still work and [318] is the first broken one. I already asked Sven2 if he could send me the relevant source code, but unfortunately the SVN server is no longer active. Since no one replied to a thread in the Clonkspot forum where I asked for the source code of CR versions before [318], I'm asking for it in the OC forum too since all of the old Clonk Rage developers are registered here while on Clonkspot it is only Sven2.TL;DR: I'm looking for the source code of Clonk Rage before build [318], preferably [312] or after.
I've sent an email to the last address I have from matthes, let's see if he still replies :-)
And yes, this is fundamentally about what queue control gets executed where (and when). All a bit hazy now, but I was experimenting quite a bit trying to get that right (might not be reflected in the changelogs). I know I tried to make it more predictable by forcing joins on control ticks. Unfortunately, this seemed to just excarbate the problem because now the propability of having control on the join tick was higher, and therefore we really had to get that right (which was tricky somehow).
Well, maybe you can at least identify something that's more stable than what we ended up with.
I thought about looking into that code anyway and build a more aggressive decentral mode, in which everyone sends the complete control to every client as soon as it has collected it (except for the control of that client).
At the moment, when I play Germany - US with mostly German players, there is usually a number of people to which I have a connection with a stable ping, and a small number of players with spikes to the US (200-1000ms). When a "good" host hosts in central mode, the game runs fluently, but with higher PreSend to everyone. But that host has to be carefully selected.
An improvement over that mode would be "automatic central", where each client notifies its least spikey connection(s) and subscribe with them to send a copy of the control.
Also, "central control" connections should disconnect the client2client-connections. In Germany I had a weird connection that would somehow prioritize: If I created many connections, the ping time of "old" connections would increase drastically (maybe a provider protection against filesharing clients?). That meant if more than X (~8-10) players were in the round, the game would start lagging. It would lag even in central mode because the host connection was the "oldest" so it got the ping increase first.
(This is experience from CR - we don't have bigger rounds or OC yet :( )
> (maybe a provider protection against filesharing clients?).
This is one of the few things which sickens me about Germany... (I needed to rant).
> I thought about looking into that code anyway and build a more aggressive decentral mode, in which everyone sends the complete control to every client as soon as it has collected it
Not sure I understand. What's the difference to "normal" decentral control mode?
> An improvement over that mode would be "automatic central", where each client notifies its least spikey connection(s) and subscribe with them to send a copy of the control.
Assuming the connection remains un-spikey when you start sending more data over it. As far as I'm concerned, the proper solution here would be to incorporate a measure of spikiness into the presend calculation.
> Also, "central control" connections should disconnect the client2client-connections. In Germany I had a weird connection that would somehow prioritize
That sounds weird indeed. Maybe this could just be because we send out all ping packets at the same time? If we overrun a router buffer, it will always drop the same packets. Global output throttling might help with that.
> Didn't somebody (Guenther?) make a complete Git history when we switched over to Git? That would contain the information in question.
I have one, Guenther probably has one too. I can't make it public unfortunately because it contains a bunch of build tools we most certainly do not have the permission to redistribute, and also other stuff matthes hasn't released to the public - the 3dsMAX clonk model comes to mind, for example. Although I can probably pull out the engine binaries, at least.
You do know of this project by kanibal?
Moin Sven.
Ja, ich habe noch ein Repos-Backup. Ich muss es nur noch schaffen, das mal irgendwo hochzuladen.
Gruß,
Matthes
So I hope reproducing the control/network desync will get easier as soon as the old source code is available.
Powered by mwForum 2.29.7 © 1999-2015 Markus Wichitill