Not logged inOpenClonk Forum
Up Topic General / Help and Questions / Looking for source code of old CR versions
- - By Jan Date 2015-10-14 00:52 Edited 2015-10-14 00:55
Hi,

I've been looking for ways to reproduce/fix the runtime join desync in Clonk Rage and need your help. One cause was a bug in loading/saving savegames that occurred in some scenarios. But there is also at least one bug in the control/network part causing desyncs on runtime join. A year ago I managed to reliably reproduce a runtime join desync that occurred immediately after joining a game. It was caused by the controls of the first control tick the joining client received getting executed twice. I tried to find some older Clonk binaries on the Internet and found out that CR 4.9.9.7 [312] did not desync on runtime join while 4.9.10.1 [318] did. It was not possible to further narrow down the version where it occurred first since I couldn't find any versions between [312] and [318].
It looked like the method C4GameControlNetwork::PackCompleteCtrl was involved in the desync, so I compared the disassembly of [312] and [318] and noticed that the method has changed. I reconstructed the [312] version of the method from its disassembly and used it to replace the current version. Now I'm using this workaround for more than a year and it doesn't seem to have any side effects. However, this is not a real solution and runtime join desyncs still occur, albeit not as often as before the workaround.
With the help of some friends I managed to record a desyncing game where all engines involved used the workaround. Unfortunately, the record did not provide any meaningful information as it did with the PackCompleteCtrl desync. Also it was not possible to create debug records since the desync is much less likely to happen when the game is running slow.

So the best way to go is probably to compare the source code of a version without runtime join desync bug with a version where it is present. As I already mentioned, [312] is still working, while [318] contains the desync bug. The only entry in the change log that looks like it could have something to do with the desync is + Async network mode {13874, 13879}, as it appears to be the only change that affects the network code. If this is the cause, version [317] should still work and [318] is the first broken one. I already asked Sven2 if he could send me the relevant source code, but unfortunately the SVN server is no longer active. Since no one replied to a thread in the Clonkspot forum where I asked for the source code of CR versions before [318], I'm asking for it in the OC forum too since all of the old Clonk Rage developers are registered here while on Clonkspot it is only Sven2.

TL;DR: I'm looking for the source code of Clonk Rage before build [318], preferably [312] or after.
Parent - - By Sven2 [us] Date 2015-10-14 01:13
Sorry, but it used to run on SVN where the history is only on the server. And that server is no longer available.

I've sent an email to the last address I have from matthes, let's see if he still replies :-)
Parent - By Jan Date 2015-10-14 19:26
Thanks, I tried to reach him too via the contact form on clonk.de
Parent - - By PeterW [gb] Date 2015-10-14 15:29 Edited 2015-10-14 15:34
Didn't somebody (Guenther?) make a complete Git history when we switched over to Git? That would contain the information in question.

And yes, this is fundamentally about what queue control gets executed where (and when). All a bit hazy now, but I was experimenting quite a bit trying to get that right (might not be reflected in the changelogs). I know I tried to make it more predictable by forcing joins on control ticks. Unfortunately, this seemed to just excarbate the problem because now the propability of having control on the join tick was higher, and therefore we really had to get that right (which was tricky somehow).

Well, maybe you can at least identify something that's more stable than what we ended up with.
Parent - - By Sven2 [us] Date 2015-10-14 15:46
A few thoughts on the network code:

I thought about looking into that code anyway and build a more aggressive decentral mode, in which everyone sends the complete control to every client as soon as it has collected it (except for the control of that client).

At the moment, when I play Germany - US with mostly German players, there is usually a number of people to which I have a connection with a stable ping, and a small number of players with spikes to the US (200-1000ms). When a "good" host hosts in central mode, the game runs fluently, but with higher PreSend to everyone. But that host has to be carefully selected.

An improvement over that mode would be "automatic central", where each client notifies its least spikey connection(s) and subscribe with them to send a copy of the control.

Also, "central control" connections should disconnect the client2client-connections. In Germany I had a weird connection that would somehow prioritize: If I created many connections, the ping time of "old" connections would increase drastically (maybe a provider protection against filesharing clients?). That meant if more than X (~8-10) players were in the round, the game would start lagging. It would lag even in central mode because the host connection was the "oldest" so it got the ping increase first.

(This is experience from CR - we don't have bigger rounds or OC yet :( )
Parent - By Maikel Date 2015-10-14 16:08

> (maybe a provider protection against filesharing clients?).


This is one of the few things which sickens me about Germany... (I needed to rant).
Parent - By PeterW [gb] Date 2015-10-14 17:12

> I thought about looking into that code anyway and build a more aggressive decentral mode, in which everyone sends the complete control to every client as soon as it has collected it


Not sure I understand. What's the difference to "normal" decentral control mode?

> An improvement over that mode would be "automatic central", where each client notifies its least spikey connection(s) and subscribe with them to send a copy of the control.


Assuming the connection remains un-spikey when you start sending more data over it. As far as I'm concerned, the proper solution here would be to incorporate a measure of spikiness into the presend calculation.

> Also, "central control" connections should disconnect the client2client-connections. In Germany I had a weird connection that would somehow prioritize


That sounds weird indeed. Maybe this could just be because we send out all ping packets at the same time? If we overrun a router buffer, it will always drop the same packets. Global output throttling might help with that.
Parent - By Zapper [de] Date 2015-10-14 18:52
I know it won't help much: but I would be really, really happy if anyone would be willing to do some network reliability improvements. The unstable or laggy network rounds are sadly one of the trademarks of Clonk - especially when you compare it to other modern games.
Parent - - By Isilkor Date 2015-10-16 11:29

> Didn't somebody (Guenther?) make a complete Git history when we switched over to Git? That would contain the information in question.


I have one, Guenther probably has one too. I can't make it public unfortunately because it contains a bunch of build tools we most certainly do not have the permission to redistribute, and also other stuff matthes hasn't released to the public - the 3dsMAX clonk model comes to mind, for example. Although I can probably pull out the engine binaries, at least.
Reply
Parent - - By Jan Date 2015-10-16 17:28
For now, it would already help me a lot if you could upload the [312] version of the files C4GameControl.cpp and C4GameControl.h. There is probably nothing secret in these files.
Parent - - By Isilkor Date 2015-10-16 20:31
Reply
Parent - By Jan Date 2015-10-16 21:00
Thanks!
Parent - - By Zapper [de] Date 2015-10-14 18:53
I might have copies of some old CR source on some backups. But as Sven noted we used SVN and thus probably lost our history.

You do know of this project by kanibal?
Parent - By Jan Date 2015-10-14 19:27
Yes, and I'm also a member of the Trello project.
Parent - - By Sven2 [us] Date 2015-10-16 01:35
From matthes:

Moin Sven.

Ja, ich habe noch ein Repos-Backup. Ich muss es nur noch schaffen, das mal irgendwo hochzuladen.

Gruß,
Matthes
Parent - - By Jan Date 2015-10-16 17:30
Thank you, that sounds good.
Parent - - By Sven2 [us] Date 2015-10-16 20:35
OK, I also got the repos. If you need a particular version, I can give you a source checkout (and remove the league+registry secrets).
Parent - - By Jan Date 2015-10-16 21:01
Could you send me the [312] source?
Parent - - By Sven2 [us] Date 2015-12-04 01:38
Sorry for the delay. I hope this is the correct revision: http://cognium.de/misc/CR312source.zip
Parent - By Jan Date 2015-12-06 07:47
Thanks! Yes, that's the correct version.
Parent - - By Clonkonaut [de] Date 2015-10-26 15:54
Have you gotten around looking into this? Any fix yet? :)
Reply
Parent - By Jan Date 2015-10-29 18:42
I'm afraid not. A big problem is that there are also other (unknown) desync bugs besides the one in the control/network part. We already fixed a few but I'm certain we will find even more. For example, yesterday we fixed a desync that occurs when joining a round that is already over, caused by an excess GOAL object being created at the joining client.
So I hope reproducing the control/network desync will get easier as soon as the old source code is available.
Up Topic General / Help and Questions / Looking for source code of old CR versions

Powered by mwForum 2.29.7 © 1999-2015 Markus Wichitill