funsung: SoundStream (RVQ)

Wednesday, March 19, 2025

SoundStream (RVQ)

https://research.google/blog/soundstream-an-end-to-end-neural-audio-codec/

利用 auto-encoder 訓練, 可以將聲音壓縮還原...

再進一步用RVQ, 用多層次把encoder所產生的vector再縮小, 降低資料傳輸bitrate. (降低查表量)

In SoundStream, we address this issue by proposing a new residual vector quantizer (RVQ), consisting of several layers (up to 80 in our experiments). The first layer quantizes the code vectors with moderate resolution, and each of the following layers processes the residual error from the previous one. By splitting the quantization process in several layers, the codebook size can be reduced drastically. As an example, with 100 vectors per second at 3 kbps, and using 5 quantizer layers, the codebook size goes from 1 billion to 320. Moreover, we can easily increase or decrease the bitrate by adding or removing quantizer layers, respectively.

funsung

Wednesday, March 19, 2025

SoundStream (RVQ)

No comments:

Popular Posts

Verse of the Day

AD2