Not logged inOpenClonk Forum
Up Topic Development / Developer's Corner / Lossy audio formats discussion
- - Date 2015-12-09 13:07
Parent - - By ala [de] Date 2015-12-07 12:37 Edited 2015-12-07 12:53
Please refrain from using .ogg or other compressed formats (f.e. .mp3) for sounds. They are usually in the kilobyte range, and it's just absolutely overkill to save maybe 10 megabyte for all combined sounds just for space.

Lossy formats really loose a lot of audio information, even if you may not be able to hear it on your speakers on first glance. You can hear the difference on good sound systems and also with a trained ear, and all other persons will maybe not be able to detect the difference but still feel it unconsciously.

Lossless formats on the other hand (.flac etc.) would be ok for compression. They have a good algorithm that doesn't destroy audible sounds, that would still save like 50% space. Still it would be overkill for the small amount of space to implement them as a feature.

If you have the original .wav's from the .oggs I would be in favor of removing the ogg's instead.

[Edit] For long sounds, for example the stereo rain - .ogg is a fine compromise between quality and space - since a long sound actually takes some space. But from the sounds I have done the stereo rain is the only sound where this really made sense.
Parent - - By Isilkor Date 2015-12-07 19:14 Edited 2015-12-07 19:18
It sounds to me like you're claiming that it is impossible for a lossy codec to achieve transparency. Can you provide a link to a study (or a series of double-blind ABX tests, or comparable tests) that back that claim, or is it just your personal opinion?
Reply
Parent - - By ala [de] Date 2015-12-07 20:11 Edited 2015-12-07 22:25
It was tought to me by several independed music teachers.

For example one teacher tought it like this in an audio engineering lesson on a high quality speaker system:

Create two files:
A: Take an uncompressed music-track (we were on a mac, so this was .aiff)
B: "Track A" compressed to high quality mp3.
Invert the phase of B.

Now most of us expected a very silent signal, since that would be a wave cancellation (Deutsch: Phasen Auslöschung)- but the difference of the compressed signal is actually big enough that you can still listen to most instruments of the song very clearly, even the voice. The result is the audible difference between uncompressed, and mp3.

The next step is to record this to a new track, C.

Now you can listen to A and activate C and because of the cancellation again, you will have a sort of live-transform between mp3 and aiff. I don't remember but maybe you can even fade track C in and out so you get a percentage of the difference.

By the end of the lesson all students could clearly hear which file was which. Although all of them were trained musicians, so it may not be that simple.

He also did compression on a track multiple times, and you might not have heard the difference immediately from version to version. But by the end of the process it was very audible that the quality had suffered enormously.

>It sounds to me like you're claiming that it is impossible for a lossy codec to achieve transparency.


I wouldn't say impossible, but the algorithms of those codecs are especially tweaked to remove audio that is seemingly obsolute to save space, masked frequencies for example. So by definition that is a very difficult task even if ogg might be twice as good as mp3. The ear is a very fine tuned organ, and the problem with sounddesign is also that you will hear the exact sample thousands of times again in the game. A professional digital drum kit will have at least 20 layers of sampled sounds per hit to create the illusion of a real instrument. If you take less the ear will "know" that something is not right, even if you can't tell that the recording is not real.

Well I'm audiophile obviously :).. yes this is far less important to some people. And the difference between lossy and uncompressed, it's not really significant - but I still feel we don't need to voluntarely give that away just for a few megabytes. The complete soundlibrary of all 500 Sounds has now 35mb btw.
Parent - - By B_E [de] Date 2015-12-07 23:04
This applies to music, and I do agree. However, is this really relevant for small 1-2 second sounds, or even shorter sound loops?
Parent - By ala [de] Date 2015-12-07 23:51
I'd say it's even more critical there, you will hear those short sound snippets over and over again.
Parent - - By PeterW [gb] Date 2015-12-08 16:33 Edited 2015-12-08 16:37
The difference argument sounds pretty bogus. There is no reason to expect the compressed signal to be in phase with the uncompressed signal. After all, the human ear has no way of distinguishing phase, so it's likely the first thing that gets thrown away, right? As a result, whether a certain frequency gets boosted or surpressed in the "difference" audio should be fairly random. It should especially sound virtually the same no matter whether you add or subtract the two tracks.

Masked frequencies are more interesting - is this effect non-linear or delayed in some way?
Parent - - By ala [de] Date 2015-12-08 17:26 Edited 2015-12-08 17:37

>There is no reason to expect the compressed signal to be in phase with the uncompressed signal.


Uhm, of course there is. Latency or keeping everything in sync is standard in most music softwares. Processes you use "offline" (not during playback) keep everything in sync and don't alter startpoints. For live processing (which takes some time) a sync mechanism is added that precalculates all processes and delays the track (usally some miliseconds) to match the speed of the slowest plugins.  All signals and plugins can be monitored and you can even sync severel different software or hardware signals to a master.

>After all, the human ear has no way of distinguishing phase, so it's likely the first thing that gets thrown away, right?


The ear not, but the eye and the machine can. For example you could zoom in to the sine waves of your track to the point of the sample frequency (so 44k per second resolution), and just put the the start of the first waves to the coordinate number.

Edit:

>Masked frequencies are more interesting - is this effect non-linear or delayed in some way?


I don't think it's delayed. To my knowledge I would expect audio compression to work like this: The code analyses the wave form, and decides which frequency information should be removed or altered. This process is implemented with algorithms that try to consider the hearing mechanisms of the ear. The ear is complex in this regard and masks signals of the same frequencies, so it hides some information (for example if a bass and a bass drum are playing at the same time, the bassdrum will not be heard as clearly.) - I was told such masked frequencies get removed in mp3.
Parent - - By PeterW [gb] Date 2015-12-08 23:11 Edited 2015-12-08 23:20
My point was about *compression*. The point of - say - MP3 is to get rid of all information that doesn't make an audible difference. Phase is likely one of them, as you said yourself. Therefore subtracting an MP3 track from a FLAC track is meaningless, because you can only count on frequencies cancelling out each other if you know that their phases are aligned.

A "fair" difference would likely involve doing a (wavelet?) fourier transform and doing the difference in frequency space ignoring phase.

> for example if a bass and a bass drum are playing at the same time, the bassdrum will not be heard as clearly.


Well, but is this linear? As in: The more we have in frequency X, the less we hear frequency Y? Because if that's the case, then it should roughly be that compressed(A) + compressed(B) = compressed(A + B). Hm. On the other hand: If - say - some frequency Z could cause us to hear *more* of frequency Y, then we could get audible differences.
Parent - - By ala [de] Date 2015-12-09 01:05

> Therefore subtracting an MP3 track from a FLAC track is meaningless, because you can only count on frequencies cancelling out each other if you know that their phases are aligned. Phase is likely one of them


But they are aligned? If sine-wave length would be altered by compression that would speed up or slow down a song. Phase does make a huge audible difference, not on one track - but on several. Imagine having a set of several instruments lined up beneath each other for mixing. Phase is never altered unknowingly by any plugin. (You can alter it by hand or plugin if you want to).
Also why should mp3 get rid of phase? That wouldn't save any information as the phase is only the starting position of the sound wave, and therefore actually a time difference, a very small one.

>Well, but is this linear? As in: The more we have in frequency X, the less we hear frequency Y?


In music things are rarely linear (octaves are defined by a doubling of a frequency and volume doubles all 6db) - so I don't think so.

It's also not so simple:
A sound usually consists of the base frequencies and it's overtones: f+f*2+f*3 etc. - those will resonate in a body and the volume of each will be altered due to the resonance of some, and damping of others. This distribution constant gets picked up by the ear, will give each "instrument" it's unique quality of sound.
So we would have 2 tone bodies, but there is also non altered sound from the material and surroundings (=noise).

Now - and this is just guessing - I assume that the partial distribution which our ear picks up, gets disturbed by the second signal if their interval is close and their patterns overlap too much. Another important thing for sounds is the first impulse (the attack), the ear is fine tuned to picking up those first signals or hitpoints - the decay / the rest of the sound is not so important. (there are even some examples of fooling the brain by taking the attack from a trumpet and the decay from a flute, and the brain would not immediately notice the instrument change)

> On the other hand: If - say - some frequency Z could cause us to hear *more* of frequency Y, then we could get audible differences.


A room has that quality, since it will reflect waves, some get absorbed and others amplified.
Parent - By PeterW [gb] Date 2015-12-09 12:56

> If sine-wave length would be altered by compression that would speed up or slow down a song


If the sine-wave length was changed, that would change the frequency, not just the speed - and you would definitely be able to hear that. Phase is independent of both though.

> Phase does make a huge audible difference, not on one track - but on several.


I know. Two audio sources can sync up and give you interference. Probably happens quite a lot with synthesizers because of the narrow frequency patterns. My entire point is that there is no good reason to expect an MP3 and FLAC playback of the same track to do that, as it's designed for very broad frequency patterns (= entire mixed tracks).

> That wouldn't save any information as the phase is only the starting position of the sound wave, and therefore actually a time difference, a very small one.


Sound waves don't actually start at an exact position - after all you need to analyse a period considerably larger than one wave length to tell whether a given frequency is there or not. Therefore the start of a sound wave can actually never be pinned down, similar to how you cannot pin down the position of a quantum object.

This especially applies to the processing both with MP3 compression as well as in your ear. Neither will be able to resolve the "start" of a sound down to a level where it can determine phase. MP3 will likely encode the "start" with something around millisecond precision and leave choosing the phase to the decoder.

> Gets disturbed by the second signal if their interval is close and their patterns overlap too much


Well, that sounds fairly "linear" in nature - if A suppresses the "patterns" of B, then there's no reason that A+C wouldn't also suppress them, right? So a hypothetical transparency-optimised compressing algorithm would be able to remove the "pattern" of B, and be sure that whatever C is, we wouldn't be able to tell the difference.

> Another important thing for sounds is the first impulse (the attack)


That sounds like a more likely candidate, as it's time-dependent. So basically, let's say A is the attack and B the decay. Now we just make C some noise that takes the "punch" out of A, and we presumably wouldn't be able to tell the effect of simplifying A+B, but could tell on A+B+C.
Parent - - By Isilkor Date 2015-12-08 21:36
I'm mostly asking because HydrogenAudio, a known audiophile forum, generally suggests that Vorbis q5 is transparent (as in even trained people can't distinguish them in blind ABX tests with a probability better than random guessing) for most purposes. Obviously this doesn't include actually creating a difference signal from the compressed sound and the source.
Reply
Parent - - By ala [de] Date 2015-12-08 22:18
I would not claim that I can hear the difference from vorbis and wav, since I never tried that. Hm.. I will try and report - will find someone to assist, sounds intresting. I usually convert with VLC player or foobar. Well, not knowing if they have the newest version tough.

What I can say for sure however is that I hear a difference on music pieces I know well if I look them up on youtube, they also use some sort of compression. Also if you go to the real audiophiles like 20year experienced sound engineers.. they probably will hear. I can tell you there are some people that hear a scary amount of details. The audio technic teacher we had could at least sometimes hear where the microphones in your home room were standing and which mircrophones you used to record and gave tipps on how to improve that.
Parent - By Isilkor Date 2015-12-08 22:26
I believe there's an ABX plugin available for fb2k, I'd be interested in your results.
Reply
Parent - By Günther [de] Date 2015-12-21 21:02

> I would not claim that I can hear the difference from vorbis and wav


I'd guess that you can hear the difference if the bitrate is low enough, and can't hear the difference if it's high enough. The youtube compressor is almost certainly configured to sacrifice audio quality to save bandwidth. Video takes so much more space than audio that it's a wonder there's any space left for audio at all. ;-) And then there's no guarantee that the uploader used a high-quality source to begin with.
Reply
Parent - - By Pyrit Date 2015-12-09 13:07
Cool, I tried it out, and it worked. The "difference"-track was really quiet though. I had to crank up the volume excessively high to be able to hear instruments.
Parent - - By PeterW [gb] Date 2015-12-09 23:23
Hm. The fact that it's really quiet kind of contradicts my point from above. Interesting :)
Parent - - By PeterW [gb] Date 2015-12-13 11:57
Tested a bit myself - phases seem to be remarkably stable actually. In fact, after fixing a random delay the MP3 encoder introduced, the signals were pretty much in complete phase. After fixing the fact that the MP3 was also about 0.45 dB more silent, this reduced a test signal of 3 sine waves to complete silence. Doing the same for a music example produced some very warped music, but recognizable.

So compression seems to care more about preserving phase than about preserving loudness... Not sure why.
Parent - - By ala [de] Date 2015-12-13 12:21

>So compression seems to care more about preserving phase than about preserving loudness... Not sure why.


>Imagine having a set of several instruments lined up beneath each other for mixing. Phase is never altered unknowingly by any plugin. (You can alter it by hand or plugin if you want to).


Like I said before, intact phase is very important for every thinkable music application and most applications therefor try to keep phase intact.
In live processing delays can occur (actually this can probably not be handled by the delay compensation like I stated above, logical reason would be that several live application in relation to each other would go out of sync).
Now that we talk about it I remember, the occurence of a small delays in live applications is known in equalizers, and their is a special term for improved plugins Linear Phase EQ.

Preserving volume: The volume is slightly softer because the mp3 cut out frequencies, so the overall level of volume decreases (0.45 is not audible though). Preserving volume is not possible, because in order to do that you'd have to alter all other frequencies and that would be unwanted, especially in orchestral music you would disturb the instrument balance.
Parent - By PeterW [gb] Date 2015-12-13 19:31 Edited 2015-12-13 19:35
Yeah, but your entire point was that MP3 *isn't* meant to be used with mixing, wasn't it? So the fact that phase is important in audio processing (which I don't dispute for a second) doesn't really allow us to conclude anything about MP3's behaviour.

And the theory behind my statement about loudness was that MP3 might be "rounding" it. 0.45 dB just feels like such a random factor, so I'd assume that it varies for different frequencies/loudness levels. This would also explain why you still hear the music faintly after subtraction...
Parent - - By Pyrit Date 2015-12-13 15:46
Well, mp3 takes out audio informations that are not hearable to humans and apperantly these informations are mostly very quiet sounds that come right after louder ones.
Parent - By K-Pone [de] Date 2015-12-13 16:29
In comparison to OGG and other formats like WMA and FLAC you could clearly hear what has been taken out by MP3 conversion, especially at 128 kbit/s and 44100 Hz in stereo. MP3 sounds a bit "water-like".
Up Topic Development / Developer's Corner / Lossy audio formats discussion

Powered by mwForum 2.29.7 © 1999-2015 Markus Wichitill