音頻變速變調原理及soundtouch代碼分析

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"概述","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 音頻變速變調在不同的場景可以分爲變速不變調、變調不變速以及變調又變速3種應用。語音變速是指把一個語音在時域上拉長或則縮短,而語音的採樣率、基頻以及共振峯都沒有發生變化。語音變調是指把語音的基因頻率降低或升高,共振峯做出相應的的改變,採樣頻率不變。簡單介紹下音頻變速變調的應用場景。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 1) 變速不變調:各種各樣的視頻播放器中的2倍速,0.5倍速播放就是應用的語音變速不變調原理;當然變速不變調還應用於網絡電話VOIP中的應對網絡抖動,簡單的說,就是當網絡不好的時候,播放端從網絡中拉取到的數據少,緩存區的數據不夠用,這個時候就使用緩存的數據播放的慢一點。反之,緩存區數據過多,就播放的快一點。這部分的實現可以參照webrtc的netEQ模塊。平時在使用微信語音的時候應該能感受到網絡特別卡時,爲了保持語音連續,會故意慢放語音。 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 2)變調不變速:變調不變速主要應用在聲效上,聲音提高音調將男聲變成女生,或則將女生變成男聲;另外,變速不變調配合其他一些音效算法,如EQ,混響,tremolo和vibrato可以實現變聲效果,比如QQ上的蘿莉音,大叔音等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 音頻變速變調在各種音頻編輯器如cooledit,audition,audacity上都有實現,常用的開源代碼是soundtouch,另外還有一個開源代碼爲sonic,都可在github上找到。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"音頻變速原理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 音頻變速不變調的經典文章爲<>,其中","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"介紹了很多種變速不變調算法的實現。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"TSM(Time-Scale Modifacaiton)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 變速不變調的經典算法爲TSM(Time-Scale Modifacaiton),還有一些語音合成的方法來實現變速,通過提取音頻的基音信息和聲道的激勵模型來實現。我們重點介紹TSM方法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" TSM方法的原理很簡單。熟悉音頻處理的朋友都知道,音頻信號爲了保證前後幀處理特徵的平滑性,會在幀與幀之間設置一個重疊(overlap),因此就出現了分幀(analysis fames)和合幀(synthesis frames), 一般設置重疊爲50%。如果分幀(analysis fames)以50%的overlap,而合幀(synthesis frames)時以75%,那麼就實現了慢放。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"TSM算法的步驟爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1c/1c60f31145e8c299a1fb67a8abd73874.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"step1: 原始信號分幀","attrs":{}}]}]}],"attrs":{}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/92/92c9adf5558547f875892057929e776c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"step2:分解好的幀重新定位","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"step3: 合成最終的幀","attrs":{}}]}]}],"attrs":{}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9d/9de1d66edd1921d714de511c5ef29c36.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Hs和Ha分別代碼分幀和合幀的overlap。Rate = Hs/ Ha,如果Ha=Hs,則原速;HsHa時,減速。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"OLA(Overlap-and-Add)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" OLA(Overlap-and-Add, OLA)重疊相加算是音頻變速算法中最簡單的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"時域方法","attrs":{}},{"type":"text","text":",它是後續時域算法(SOLA, SOLA-FS, TD-PSOLA, WSOLA)的基礎。  ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖演示語音被加速播放的情況","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/84/8439f5e53b65185149cefa19a8428b68.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖中x 和 y分別表示處理前後的語音信息。變速的關鍵在c圖和d圖,能夠看到OLA直接暴力的將x(m+1)的波形拷貝到y(m+1)處,並與y(m)進行疊加。此時語音省略掉了x(m)和x(m+1)之間的信息,實現了快放。這個算法最簡單,但是缺點也很明顯,就是沒有考慮到x(m+1)和y(m)之間的連續性,換句話說,沒有考慮到不加速播放時本來該播放的語音y'(m+1)和拷貝過來的y(m+1)之間的相似性。因此會造成下圖這樣的後果,造成相位不連續,相鄰幀重疊區域產生基頻失真。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/21/21da3cea8ff251ce7e8f336935b91a66.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"出現了基頻斷裂的問題,當然是要去解決了,因此出現了改進版的SOLA算法和WSOLA算法。y'(m+1)和y(m)是連續的,只需找到一幀y(m+1)最相似y'(m+1),用它來替換y'(m+1),那麼語音就會很自然。SOLA算法和WSOLA算法都是這個原理。不同的是SOLA算法是固定y(m+1),去尋找y'(m+1)與y(m+1)最相似。WSOLA則是固定y'(m+1),尋找y(m+1)與y'(m+1)最相似。由於soundtouch中使用的是WSOLA算法,主要研究WSOLA算法,關於SOLA算法的具體處理可以看上面的那篇論文","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"WSOLA(Waveform Similarity Overlap-Add)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖是WSOLA的實現步驟","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d5/d512081994ccc9ee5052719bbcdc9da9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" a圖中的實現還是和OLA一樣的,OLA尋找找的替換幀爲x(m+1)。在b圖中有一個Delta(max),會在 Delta(max)這個窗口內尋找與替換幀最相似的那一幀x'(m+1),這個就是WSOLA算法的原理。如何來定義相似呢,常用的方法有:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)相關法(尋找相關峯,soundtouch使用方法)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)AMDF(sonic使用的方法)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"音頻變速還有其他方法:PV-TSM等,","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"音頻變調不變速原理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 最常見的音頻變調就是使用重採樣了,如果將一個8Khz的語音使用16K採樣率播放,那麼能明顯感受到音調升高,但是語速也提高了2倍。因此,音頻變調不變速就是首先使用重採樣算法進行採樣,然後使用變速不變調算法糾正速度。重採樣其實就是對數據一個抽取或則內插的過程,常使用的方法是線性插值重採樣的方法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"soundtouch的原理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  SoundTouch是實現音頻變速變調的開源代碼。其中音頻變調使用的是升降採樣的方法,變速則是使用的是wsola算法。SoundTouch的源代碼目錄結構清晰簡單,在開源代碼中有vs工程文件,soundtouch的主要代碼文件如下:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/63/63880ef341acb5c8700cc83f7a5ce4c1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SoundTouch中設置變速變調的範圍,在其使用手冊中各個範圍爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"\" -tempo=n : Change sound tempo by n percents (n=-95..+5000 %)\\n\" 變速不變調","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"\" -pitch=n : Change sound pitch by n semitones (n=-60..+60 semitones)\\n\" 變調不變速","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"\" -rate=n : Change sound rate by n percents (n=-95..+5000 %)\\n\" 變調變速","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" SoundTouch的主要API是定義在SoundTouch類中的,SoundTouch包含了主要的數據處理類RateTransposer和TDStretch,SoundTouch只負責數據之間的傳遞,和關鍵參數的控制。RateTransposer實現語音的變速變調,TDStretch實現變速不變調。SoundTouch繼承自FIFOProcessor,在rate<=1的情況下會設置TDStretch爲輸出,在rate>1的情況下設置RateTransposer爲輸出。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"變速不變調TDStretch","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"變速不變調通過wsola實現,類中主要成員爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"cpp"},"content":[{"type":"text","text":"class TDStretch : public FIFOProcessor{\n protected:\n int channels;\n int sampleReq; //變速不變調 需要緩存的數據\n int overlapLength; //wsola涉及的overlap\n int seekLength; //在seekLength的長度裏尋找與pMidBuffer相關值最大的offset\n int seekWindowLength; //一幀數據的長度+overlap的長度\n int sampleRate;\n int sequenceMs; //默認40ms\n int seekWindowMs; //默認15ms\n int overlapMs; //默認8ms 根據ms去計算length\n\n float maxnormf;\n\n double tempo;\n double skipFract; //變速需要跳過的步長\n\n SAMPLETYPE *pMidBuffer; //緩存中間的數據,用於計算相關性 數據長度爲overlap \n\n FIFOSampleBuffer outputBuffer;\n FIFOSampleBuffer inputBuffer;\n}\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"overlapLength、seekLength、seekWindowLength見下圖。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d7/d7ffbb4bcfb1c7acaa4f155dbbea7973.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以tempo=2 overlapLength=128 seekLength = 240 seekWindowLength=640爲例。soundtouch的處理流程爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先處理第一幀:","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"拷貝輸入數據inputBuffer中seekWindowLength-2*overlapLength = 384 的數據到outputBuffer中","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"拷貝 inputBuffer 從384起的 overlapLength=128數據到pMidBuffer中","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"inputBuffer 的數據跳躍到 skipFract = 648 的位置 ","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中skipFract += nominalSkip ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"nominalSkip = tempo * (seekWindowLength - overlapLength);","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":4},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"計算inputBuffer 中648位置起的 seekLength個數據 與pMidBuffer的相關性 選取相關性最大的offset 這裏 offset = 119","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"將648 + offset起的128個數據與pMindbuffer中的128個數據疊加輸出到outputBuffer中","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":6,"align":null,"origin":null},"content":[{"type":"text","text":"拷貝inputBuffer 648 + offset +128 位置起的seekWindowLength-2*overlapLength = 384 的數據到outputBuffer中","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":7,"align":null,"origin":null},"content":[{"type":"text","text":"go on","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體實現爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"c"},"content":[{"type":"text","text":"void TDStretch::processSamples(){int ovlSkip;int offset = 0;int temp;\nwhile ((int)inputBuffer.numSamples() >= sampleReq) //緩存數據sampleReq\n{\n if (isBeginning == false)\n {\n offset = seekBestOverlapPosition(inputBuffer.ptrBegin()); //尋找相關性最大的位置\n\n overlap(outputBuffer.ptrEnd((uint)overlapLength), inputBuffer.ptrBegin(), (uint)offset); //overlap \n outputBuffer.putSamples((uint)overlapLength);\n offset += overlapLength;\n }\n else\n {\n isBeginning = false;\n int skip = (int)(tempo * overlapLength + 0.5 * seekLength + 0.5);\n\n #ifdef ST_SIMD_AVOID_UNALIGNED\n // in SIMD mode, round the skip amount to value corresponding to aligned memory address\n if (channels == 1)\n {\n skip &= -4;\n }\n else if (channels == 2)\n {\n skip &= -2;\n }\n #endif\n skipFract -= skip;\n assert(nominalSkip >= -skipFract);\n }\n\n\n if ((int)inputBuffer.numSamples() < (offset + seekWindowLength - overlapLength))\n {\n continue; // just in case, shouldn't really happen\n }\n\n // length of sequence\n temp = (seekWindowLength - 2 * overlapLength);\n outputBuffer.putSamples(inputBuffer.ptrBegin() + channels * offset, (uint)temp);\n assert((offset + temp + overlapLength) <= (int)inputBuffer.numSamples());\n memcpy(pMidBuffer, inputBuffer.ptrBegin() + channels * (offset + temp), \n channels * sizeof(SAMPLETYPE) * overlapLength);.\n skipFract += nominalSkip; // real skip size\n ovlSkip = (int)skipFract; // rounded to integer skip\n skipFract -= ovlSkip; // maintain the fraction part, i.e. real vs. integer skip\n inputBuffer.receiveSamples((uint)ovlSkip); //跳過數據 和tempo相關 實現變速的關鍵\n}\n}\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"變速變調RateTranposer類","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RateTransPoser其實就是通過升降採樣來實現語音的變速變調的。類具體實現如下:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"cpp"},"content":[{"type":"text","text":"class RateTransposer : \n public FIFOProcessor{\n protected:\n /// Anti-alias filter object\n AAFilter *pAAFilter; //抗混疊濾波器\n TransposerBase *pTransposer; //提供實現變速變調的算法\n FIFOSampleBuffer inputBuffer;\n\n /// Buffer for keeping samples between transposing & anti-alias filter\n FIFOSampleBuffer midBuffer;\n\n /// Output sample buffer\n FIFOSampleBuffer outputBuffer;\n\n bool bUseAAFilter;\n\n void processSamples(const SAMPLETYPE *src, \n uint numSamples);\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中核心函數processSamples","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"cpp"},"content":[{"type":"text","text":"void RateTransposer::processSamples(const SAMPLETYPE *src, uint nSamples)\n{\n uint count;\n if (nSamples == 0) return;\n // 接受數據\n inputBuffer.putSamples(src, nSamples);\n // 數據處理\n if (bUseAAFilter == false)\n {\n count = pTransposer->transpose(outputBuffer, inputBuffer);\n return;\n }\n assert(pAAFilter);\n if (pTransposer->rate < 1.0f) \n {\n // 實現升降採樣\n pTransposer->transpose(midBuffer, inputBuffer); \n // AAFilter濾波\n pAAFilter->evaluate(outputBuffer, midBuffer);\n } \n else \n {\n pAAFilter->evaluate(midBuffer, inputBuffer);\n pTransposer->transpose(outputBuffer, midBuffer);\n }\n}\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"TransposerBase類其實爲一個工廠方法,其中涉及了3種升降採樣方法,其實也就是插值方法。分別爲:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b0/b0fd44ddd23db6c271dd243de3097cd7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"插值算法原理待補充。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"soundtouch需要注意的點:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中數據緩存和每次跳躍的大小爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"nominalSkip = tempo * (seekWindowLength - overlapLength);","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"sampleReq = max(intskip + overlapLength, seekWindowLength) + seekLength;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"變速不變調中計算相關性根據不同的平臺可以使用TDStretchSSE或TDStretchMMX。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"TDStreach的實現來看,其具有較大的算法延遲。要實時實現減小延遲,可考慮減小sequenceMs的大小以及緩存數據來處理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"最後","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 簡要的分析了一下soundtouch的實現,關於soundtouch的數據輸入輸出FIFOProcessor等類沒有詳細的介紹。後面會分析下sonic算法原理的實現。A Review of Time-Scale Modification of Music Signals這篇經典文章的地址爲:","attrs":{}},{"type":"link","attrs":{"href":"https://www.mdpi.com/2076-3417/6/2/57","title":null,"type":null},"content":[{"type":"text","text":"https://www.mdpi.com/2076-3417/6/2/57","attrs":{}}]},{"type":"text","text":"。文章中有什麼不對的地方,希望大家一起討論。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"參考","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A Review of Time-Scale Modification of Music Signals:","attrs":{}},{"type":"link","attrs":{"href":"https://www.mdpi.com/2076-3417/6/2/57","title":null,"type":null},"content":[{"type":"text","text":"https://www.mdpi.com/2076-3417/6/2/57","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"知乎變聲導論:","attrs":{}},{"type":"link","attrs":{"href":"https://zhuanlan.zhihu.com/p/110278983","title":null,"type":null},"content":[{"type":"text","text":"https://zhuanlan.zhihu.com/p/110278983","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"論文:基於WSOLA算法的語音時長調整研究","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"論文:AN OVERLAP-ADD TECHNIQUE BASED ON WAVEFORM SIMILARITY (WSOLA) FOR HIGH QUALITY TIME-SCALE MODIFICATION OF SPEECH","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"soundtouch官網:","attrs":{}},{"type":"link","attrs":{"href":"http://www.surina.net/soundtouch/","title":null,"type":null},"content":[{"type":"text","text":"http://www.surina.net/soundtouch/","attrs":{}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章