Not logged inOpenClonk Forum
Up Topic Development / Scenario & Object Development / Floating-point math, redux
- - By Isilkor Date 2017-04-19 21:06 Edited 2017-04-19 21:10
(or: It's That Time Of the Year Again)

Since our last foray into synchronous floating-point math (using SSE instructions) ended in failure due mingw emitting aligned loads when it should be emitting unaligned ones, and some weird interactions in C4Movement, I'm considering reviving the project again, but only for C4Aul (i.e. the script engine). This time, I've picked mpfr (for no particular reason), which supplies us with arbitrary-precision FP arithmetic. Since it depends on GMP*, we'd also get an arbitrarily sized integer math library (which we don't have to use if we can't find anyplace we would).

However, this now poses some language design questions:
1. What kinds of numbers do we want to support in script? We're considered several options (some more and some less seriously):
1a) int and float both exist and are completely segregated. Mixing floats and integers is not permitted; you explicitly have to cast them to a common type. This guarantees you always exactly know which type of number you're working on, but results in a lot of boilerplate casts.
1b) int and float both exist and are one-way implicitly convertible. Mixing floats and integers results in a float, which you have to explicitly convert back to integer if you need one.
1c) int and float both exist and are mostly interchangeable. Mixing floats and integers is fine and results in a float, which implicitly converts to integer where needed. May lead to some type confusion where you expect integer arithmetic to happen, but a float got introduced somewhere.
1d) int gets thrown out; all numbers are float. Would probably mean we'll have to move to 64-bit doubles in order to still be able to do whatever bit twiddling is currently going on. Might be slow because we'll be doing all math in software.

2. Which precision do we want to use? Javascript uses 64-bit doubles, most games use 32-bit floats for speed. We'll not be using the hardware FP units so I don't know (yet) how big the difference is. Technically we could use even larger precisions but I don't think that's anywhere near necessary.

3. How do we handle backwards compatibility? Having Sin() take a floating-point parameter and a precision parameter would obviously be undesirable. We could overload the function to take either a float parameter, or the old int/precision/scale parameter set, but that'd just be confusing. We could also just allow float parameters going forward, but that would invalidate all old code.

(If we do not pick 1d above: 4. Do we also want to support arbitrary-precision integers? The difference between int and big-int would be completely transparent to the user, they'd all handle exactly the same.)

I'd be interested in your thoughts.

* GMP is a pain to compile with MSVC, but I've added it to my dependency builder script if anyone wants to play with it
Parent - By Sven2 Date 2017-04-20 03:13
For 1:

My first choice would be to not have floats at all. I am generally a fan of integer math for its simplicity and predictability. I've always been able to write my stuff with fixed point precision. It's also much easier to debug in engine code.

My second choice, if conversion is done, would be 1d), i.e. throw out all integers and use a representation with at least 32bit precision. The reason is that I've sometimes encountered hard-to-solve bugs in scripting languages like Python (2), where floats and integers are mixed freely. You never really know whether the number you use is integer or float, so it's very easy to have obscure bugs where suddenly something is rounded because you happen to be doing math in integer. Obviously, we should add another operator to do integer division (like Python's '//') to make code conversion easier.

My third choice would be 1a), i.e. floats and ints exist, but all engine functions are still int and float is just used as an "enhanced feature" that people with specialized libraries can use. The only reason I like this is that it's really just like my first choice, because I could ignore the existence of floats ;-)

I also don't feel very strongly about this, so I'm happy to be convinced about the merits of 1b and 1c.

3) Backward compatibility is not really a problem, because the extra parameters to Sin, GetXDir, etc. just scale the value. They can just return or accept a float but still have the scaling parameter. I.e.:
old version: Log("%v, %v", obj->GetXDir(1), obj->GetXDir(100)) logs: "5, 503"
new version: Log("%v, %v", obj->GetXDir(1), obj->GetXDir(100)) logs: "5.031, 503.1"

(We could also warn to make the parameter deprecated, but it wouldn't break anything to keep it).
Parent - - By Clonkonaut [de] Date 2017-04-20 10:09
1c) would be my favourite if it weren't for all the code updating. Not talking about all the functions taking precision (Sin, Cos, Tan, GetXDir, GetYDir, Angle come to mind, probably also ArcCos and ArcSin?) but for all uses of the / operator. I'd guess that many of these instances depend on integer math. Alternative to me would be 1a) then.

What does % usually do in other languages? Implicit conversion to int?
Parent - By Isilkor Date 2017-04-23 15:13

> What does % usually do in other languages? Implicit conversion to int?

C: "The operands of the % operator shall have integer type." (Similar language for C++.)
Python: "The floor division and modulo operators are connected by the following identity: x == (x//y)*y + (x%y)" where // returns the result of the division, rounded down to the nearest integer.
ECMAScript: "the floating-point remainder r from a dividend n and a divisor d is defined by the mathematical relation r = n − (d × q) where q is an integer ... whose magnitude is as large as possible without exceeding the magnitude of the true mathematical quotient of n and d." (Similar language for Java, C#.)
Erlang: specifies that the inputs to the rem operator shall be Integer.
Elm: specifies "(%) : Int -> Int -> Int"
Parent - By Zapper [de] Date 2017-04-20 11:20 Edited 2017-04-20 11:26
I think generally I would favor 1d (i.e. throwing out ints). That would lead to no additional learning overhead for new people and would keep confusion low. And I think the issues that you have when expecting int arithmethics but having floating points are fewer than the opposite case (which we have all gotten used to by now).

Buuut yeah, that might not be the most practical solution because we would likely have to change a lot of code.. Could we even do that automatically?

>2. Which precision do we want to use?

Since we are using software anyway, we can as well use a precision that allows us to convert from int to float losslessly

PS: @Backwards compatibility
At the moment, we still break it all the time. See e.g. the musket. So I guess that should not be the thing that is holding us back. Especially if we can provide conversion scripts
Parent - By Maikel Date 2017-04-20 14:43
Sort of a fan of 1a), this allows people to write enhanced libraries or whatever they want. On the other hand not much interference with current content and the safest option in terms of not breaking things.
Parent - - By Marky [de] Date 2017-04-22 08:40
I'd rather continue using the integers in the engine as they are internally, but add a "smart" float option on top of it (see also Sven2's first reply). I understand that using integer + precision is not very intuitive, but on the other hand I need floats very very rarely. Currently I can think of the following:

* SetPosition(x, y, precision) -> because pixel size position steps look strange when zooming in. Even with precision, some objects/particles in e.g. Hazard look strange when positioned that way
* Sin/Cos(angle, radius, precision) -> here it is often necessary, and also annoying that you cannot use the precision for the radius, so you have to use a scaled radius and then use the results in SetPosition/SetSpeed/etc. (and I know, there are some functions for that in Math.c)

What about having a fixed maximum precision and having a custom float thingy that is actually a scaled integer internally? For example, if the default precision were 1000, then:
1 = 1000
1.000 = 1000
1.2 = 1200
1.001 = 1000
1.0001 = Engine error (or warning if the last digit is simply skipped)
Parent - - By Zapper [de] Date 2017-04-22 10:27

>What about having a fixed maximum precision and having a custom float thingy that is actually a scaled integer internally? For example, if the default precision were 1000, then:

...why? (that's called fixed point numbers btw)

>but on the other hand I need floats very very rarely.

You /need/ them every time you have to find a workaround - like a "precision" argument. Such a workaround is of course possible as well, but saying "we don't have use for higher precision" is somewhat misleading as we obviously do (i.e. we need the precision arguments).

I want to add to your list:
We also frequently use arbitrarily high numbers for different stuff just because we sometimes need to add/subtract smaller fractions (e.g. HP, Trans_Translate, physicals, action speeds...). Also scripters have to pay a lot of attention to the order of math expressions, which we are by now hopefully used to but which can lead to annoying bugs (i.e. damage / shield * speed != speed * damage / shield).
Also you have to pay a lot of attention to using the precision when setting velocities, which lead to projectiles in ClonkRage traditionally only using a few discrete angles (because third-party developers usually don't care about the precision arguments).

I think we really can make good use of real numbers. But of course I also do see the work required if we were to make the transition clean :I
Parent - - By Marky [de] Date 2017-04-23 07:57
I think it was easy to misunderstand what I wanted to say. Instead of using fix point numbers everywhere (typing 1200 instead of 1.2) I wanted to have a parsing mechanism or type or whatever that converts a "float" to our fix point format internally: SetPosition(120.5, 100.001) would do the same as SetPosition(120500, 100001, 1000);
Then, instead of having variable precision values the float value is always converted to a fixed precision, for example with 5 digits or whatever. You type 1.2 => it is understood as 120000 internally. You type 20 => it is understood as 2000000 internally.
Parent - - By Sven2 Date 2017-04-23 14:44
Set functions are not problematic at all. They would automatically convert to the internal representation (C4Fixed) from whatever representation you passed. If you pass the old position parameter, it just divides by that parameter before. The parameter could be slowly phased out, but there would be no hurry because it doesn't break anything.

The difficult functions are get functions, because they would suddenly return a different type. Consider e.g.:
o1->SetPosition(o2->GetX(), o2->GetY())
You would probably want this to use the new float format because you would automatically gain some precision.

However, as soon as you let GetX() return a float, you have to assume that all kinds of computations that were done with integers in mind are now using floats. For example, consider this function:
// Returns whether the line from (x1, y1) to (x2, y2) overlaps with the line from (x3, y3) to (x4, y4).
// Whenever the two lines share a starting or ending point they are not considered to be overlapping.
global func IsLineOverlap(int x1, int y1, int x2, int y2, int x3, int y3, int x4, int y4)
  // Same starting or ending point is not overlapping.
  if ((x1 == x3 && y1 == y3) || (x1 == x4 && y1 == y4) || (x2 == x3 && y2 == y3) || (x2 == x4 && y2 == y4))
    return false; 
  // Check if line from (x1, y1) to (x2, y2) crosses the line from (x3, y3) to (x4, y4).
  var d1x = x2 - x1, d1y = y2 - y1, d2x = x4 - x3, d2y = y4 - y3, d3x = x3 - x1, d3y = y3 - y1;
  var a = d1y * d3x - d1x * d3y;
  var b = d2y * d3x - d2x * d3y;
  var c = d2y * d1x - d2x * d1y;
  if (!c)
    return !a && Inside(x3, x1, x2) && Inside(y3, y1, y2); // lines are parallel
  return a * c >= 0 && !(a * a / (c * c + 1)) && b * c >= 0 && !(b * b/(c * c + 1));

Right now, it's only called with integers. What will happen when we switch to floats? Worse yet, because GetX()/GetY() returns float but most constants in script will be int, we might pass half of the parameters to this function as int and the other half as float!

I don't know (even though I wrote the logic) if it will break. Maybe it all just works apart from some minor fixes. Maybe we'll run into a lot of bugs. E.g. will the rope ladder and the hanging bridge still work?

I also don't think it's a bad idea to just switch everything to float and fix things as they break (but would postpone that after 8.0). Although the mix of integers and floats is something I would like to avoid if possible. Either by having a "stuff is mostly int and you have to use floats explicitly" or by having an "everything is float" policy.

A compromise may be to just implement floats as a helper objects that you have to create explicitly (CreateFloat(int a, int b) = float(a)/b or CreateFloat(string s) = ParseFloat(s)) for the next release and then do the big conversion of engine functions later.
Parent - - By Marky [de] Date 2017-04-23 20:40
Switching everything to float would be OK, too, if we can ensure that there are no rounding erros. For example 1-1 or 1 - 10000 * 0.0001 should always be 0 and not some 3E-8 or so.
Parent - By Zapper [de] Date 2017-04-23 21:51
What is 1/3 then?
Parent - - By Caesar [de] Date 2017-04-24 00:08
That is not possible.
Parent - - By Marky [de] Date 2017-04-24 08:02
Yes, but it is also annoying when doing comparisons. Theoretically all <, >, <=, >=, == and != comparisons can fail if you switch to float. For example if I want to compare ContentsCount() <= x and the contents count is actually x, but returns (x + 1E-8).
Parent - - By Zapper [de] Date 2017-04-24 08:42
Yeah, well that's fortunately not going to happen because the contents count does not include a division and we're going to use a precision that allows us to represent all 32bit integers
Parent - - By Maikel Date 2017-04-24 08:53
How do you actually represent integers with floating point numbers in programming languages?
Parent - By Isilkor Date 2017-04-24 10:33
32 bit (=single-precision) floating point numbers can store all integers between -2**24 and +2**24 without loss of precision; 64 bit (=double precision) floating point numbers can precisely store all integers between -2**53 and +2**53.
Parent - By Isilkor Date 2017-04-23 15:59

> What about having a fixed maximum precision and having a custom float thingy that is actually a scaled integer internally?

What's the advantage of this over floats (or plain ints)? It seems to me like having some fake float that's actually just a parser hack to build integers is just going to end in confusion, for example when someone tries to multiply 1.2 (i.e., 1200) with 1.2 (1200) and ends up with 1440 (1 440 000).
Parent - - By Luchs [se] Date 2017-04-23 18:41
While were discussing new types: a (2d) vector type would be very useful for working with positions and speed. Our current functions split x and y coordinates in parameters, but there's no convenient way to return multiple values.

As a vector type would result in differently-named functions (like GetPosition instead of GetX and GetY) anyways, this could make transition to floating numbers easier.
Parent - By Marky [de] Date 2017-04-23 20:41
We somewhat have this already on the script side with proplists and a vector library, but not in the engine code.
Parent - - By Fulgen [at] Date 2017-04-24 15:47
What would be the difference to an array (1D / 2D)?
Parent - By Zapper [de] Date 2017-04-24 17:37
Only that you could natively do [2, 3] + [1, 5] to get [3, 8]. Or 2 * [1, 5]

But yeah, no fundamental change. Imo it would also suffice to implement some vector helper functions in the engine if you are worried about speed (e.g. VectorAdd, VectorMult)...
Parent - By Luchs [se] Date 2017-04-24 18:14
The vector would be a value type, would have a nicer way to access the two components, and at least the + and - operators would work on them. Positions are also often used in performance-critical code, so not having to allocate tons of arrays would be an improvement.

Additionally, a special vector type would provide type safety which would make the engine interface easier to use (arrays are a bit annoying in the C++ code).
Parent - By Caesar [de] Date 2017-04-24 00:05

>and some weird interactions in C4Movement,

I think I had that fixed. Still, the idea that precision might affect objects in different positions in the landscape differently scared me.

What I had implemented back then with floats was approx 1c, I think, and it didn't cause too many problems, even though I also had many functions like getting positions and rotations return floats. The only real bug I found was that you couldn't turn left on the boompack anymore because of a rounding error.
I also tried to have Sin(float) -> float, Sin(int, int, int) -> int, Sin(float, …) -> error, and similar, but these kind of things get awkward quickly. I would not recommend that, but I don't have a much better idea either…

(I'd be a fan of 1a if c4script were statically typed. I don't really have a preference.)
Up Topic Development / Scenario & Object Development / Floating-point math, redux

Powered by mwForum 2.29.7 © 1999-2015 Markus Wichitill