音頻變速變調原理及soundtouch代碼分析

原創

2021-05-01 10:03

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"概述","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 音頻變速變調在不同的場景可以分爲變速不變調、變調不變速以及變調又變速3種應用。語音變速是指把一個語音在時域上拉長或則縮短，而語音的採樣率、基頻以及共振峯都沒有發生變化。語音變調是指把語音的基因頻率降低或升高，共振峯做出相應的的改變，採樣頻率不變。簡單介紹下音頻變速變調的應用場景。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 1) 變速不變調：各種各樣的視頻播放器中的2倍速，0.5倍速播放就是應用的語音變速不變調原理；當然變速不變調還應用於網絡電話VOIP中的應對網絡抖動，簡單的說，就是當網絡不好的時候，播放端從網絡中拉取到的數據少，緩存區的數據不夠用，這個時候就使用緩存的數據播放的慢一點。反之，緩存區數據過多，就播放的快一點。這部分的實現可以參照webrtc的netEQ模塊。平時在使用微信語音的時候應該能感受到網絡特別卡時，爲了保持語音連續，會故意慢放語音。 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 2）變調不變速：變調不變速主要應用在聲效上，聲音提高音調將男聲變成女生，或則將女生變成男聲；另外，變速不變調配合其他一些音效算法，如EQ，混響，tremolo和vibrato可以實現變聲效果，比如QQ上的蘿莉音，大叔音等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 音頻變速變調在各種音頻編輯器如cooledit，audition，audacity上都有實現，常用的開源代碼是soundtouch，另外還有一個開源代碼爲sonic，都可在github上找到。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"音頻變速原理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 音頻變速不變調的經典文章爲<>，其中","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"介紹了很多種變速不變調算法的實現。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"TSM(Time-Scale Modifacaiton)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 變速不變調的經典算法爲TSM(Time-Scale Modifacaiton)，還有一些語音合成的方法來實現變速，通過提取音頻的基音信息和聲道的激勵模型來實現。我們重點介紹TSM方法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" TSM方法的原理很簡單。熟悉音頻處理的朋友都知道，音頻信號爲了保證前後幀處理特徵的平滑性，會在幀與幀之間設置一個重疊（overlap),因此就出現了分幀（analysis fames）和合幀(synthesis frames)，一般設置重疊爲50%。如果分幀（analysis fames）以50%的overlap，而合幀(synthesis frames)時以75%，那麼就實現了慢放。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"TSM算法的步驟爲","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1c/1c60f31145e8c299a1fb67a8abd73874.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"step1: 原始信號分幀","attrs":{}}]}]}],"attrs":{}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/92/92c9adf5558547f875892057929e776c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"step2：分解好的幀重新定位","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"step3: 合成最終的幀","attrs":{}}]}]}],"attrs":{}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9d/9de1d66edd1921d714de511c5ef29c36.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Hs和Ha分別代碼分幀和合幀的overlap。Rate = Hs/ Ha，如果Ha=Hs，則原速；HsHa時，減速。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"OLA(Overlap-and-Add)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" OLA(Overlap-and-Add, OLA)重疊相加算是音頻變速算法中最簡單的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"時域方法","attrs":{}},{"type":"text","text":"，它是後續時域算法(SOLA, SOLA-FS, TD-PSOLA, WSOLA)的基礎。 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖演示語音被加速播放的情況","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/84/8439f5e53b65185149cefa19a8428b68.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖中x 和 y分別表示處理前後的語音信息。變速的關鍵在c圖和d圖，能夠看到OLA直接暴力的將x(m+1)的波形拷貝到y(m+1)處，並與y(m)進行疊加。此時語音省略掉了x(m)和x(m+1)之間的信息，實現了快放。這個算法最簡單，但是缺點也很明顯，就是沒有考慮到x(m+1)和y(m)之間的連續性，換句話說，沒有考慮到不加速播放時本來該播放的語音y'(m+1)和拷貝過來的y(m+1)之間的相似性。因此會造成下圖這樣的後果，造成相位不連續，相鄰幀重疊區域產生基頻失真。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/21/21da3cea8ff251ce7e8f336935b91a66.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"出現了基頻斷裂的問題，當然是要去解決了，因此出現了改進版的SOLA算法和WSOLA算法。y'(m+1)和y(m)是連續的，只需找到一幀y(m+1)最相似y'(m+1)，用它來替換y'(m+1)，那麼語音就會很自然。SOLA算法和WSOLA算法都是這個原理。不同的是SOLA算法是固定y(m+1)，去尋找y'(m+1)與y(m+1)最相似。WSOLA則是固定y'(m+1)，尋找y(m+1)與y'(m+1)最相似。由於soundtouch中使用的是WSOLA算法，主要研究WSOLA算法，關於SOLA算法的具體處理可以看上面的那篇論文","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"WSOLA（Waveform Similarity Overlap-Add）","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖是WSOLA的實現步驟","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d5/d512081994ccc9ee5052719bbcdc9da9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" a圖中的實現還是和OLA一樣的，OLA尋找找的替換幀爲x(m+1)。在b圖中有一個Delta(max),會在 Delta(max)這個窗口內尋找與替換幀最相似的那一幀x'(m+1)，這個就是WSOLA算法的原理。如何來定義相似呢，常用的方法有：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1）相關法（尋找相關峯，soundtouch使用方法）","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2）AMDF（sonic使用的方法）","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"音頻變速還有其他方法：PV-TSM等，","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"音頻變調不變速原理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 最常見的音頻變調就是使用重採樣了，如果將一個8Khz的語音使用16K採樣率播放，那麼能明顯感受到音調升高，但是語速也提高了2倍。因此，音頻變調不變速就是首先使用重採樣算法進行採樣，然後使用變速不變調算法糾正速度。重採樣其實就是對數據一個抽取或則內插的過程，常使用的方法是線性插值重採樣的方法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"soundtouch的原理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" SoundTouch是實現音頻變速變調的開源代碼。其中音頻變調使用的是升降採樣的方法，變速則是使用的是wsola算法。SoundTouch的源代碼目錄結構清晰簡單，在開源代碼中有vs工程文件，soundtouch的主要代碼文件如下：","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/63/63880ef341acb5c8700cc83f7a5ce4c1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SoundTouch中設置變速變調的範圍，在其使用手冊中各個範圍爲：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"\" -tempo=n : Change sound tempo by n percents (n=-95..+5000 %)\\n\" 變速不變調","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"\" -pitch=n : Change sound pitch by n semitones (n=-60..+60 semitones)\\n\" 變調不變速","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"\" -rate=n : Change sound rate by n percents (n=-95..+5000 %)\\n\" 變調變速","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" SoundTouch的主要API是定義在SoundTouch類中的，SoundTouch包含了主要的數據處理類RateTransposer和TDStretch，SoundTouch只負責數據之間的傳遞，和關鍵參數的控制。RateTransposer實現語音的變速變調，TDStretch實現變速不變調。SoundTouch繼承自FIFOProcessor，在rate<=1的情況下會設置TDStretch爲輸出，在rate>1的情況下設置RateTransposer爲輸出。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"變速不變調TDStretch","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"變速不變調通過wsola實現，類中主要成員爲：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"cpp"},"content":[{"type":"text","text":"class TDStretch : public FIFOProcessor{\n protected:\n int channels;\n int sampleReq; //變速不變調需要緩存的數據\n int overlapLength; //wsola涉及的overlap\n int seekLength; //在seekLength的長度裏尋找與pMidBuffer相關值最大的offset\n int seekWindowLength; //一幀數據的長度+overlap的長度\n int sampleRate;\n int sequenceMs; //默認40ms\n int seekWindowMs; //默認15ms\n int overlapMs; //默認8ms 根據ms去計算length\n\n float maxnormf;\n\n double tempo;\n double skipFract; //變速需要跳過的步長\n\n SAMPLETYPE *pMidBuffer; //緩存中間的數據，用於計算相關性數據長度爲overlap \n\n FIFOSampleBuffer outputBuffer;\n FIFOSampleBuffer inputBuffer;\n}\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"overlapLength、seekLength、seekWindowLength見下圖。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d7/d7ffbb4bcfb1c7acaa4f155dbbea7973.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以tempo=2 overlapLength=128 seekLength = 240 seekWindowLength=640爲例。soundtouch的處理流程爲：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先處理第一幀：","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"拷貝輸入數據inputBuffer中seekWindowLength-2*overlapLength = 384 的數據到outputBuffer中","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"拷貝 inputBuffer 從384起的 overlapLength=128數據到pMidBuffer中","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"inputBuffer 的數據跳躍到 skipFract = 648 的位置 ","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中skipFract += nominalSkip ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"nominalSkip = tempo * (seekWindowLength - overlapLength);","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":4},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"計算inputBuffer 中648位置起的 seekLength個數據與pMidBuffer的相關性選取相關性最大的offset 這裏 offset = 119","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"將648 + offset起的128個數據與pMindbuffer中的128個數據疊加輸出到outputBuffer中","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":6,"align":null,"origin":null},"content":[{"type":"text","text":"拷貝inputBuffer 648 + offset +128 位置起的seekWindowLength-2*overlapLength = 384 的數據到outputBuffer中","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":7,"align":null,"origin":null},"content":[{"type":"text","text":"go on","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體實現爲：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"c"},"content":[{"type":"text","text":"void TDStretch::processSamples(){int ovlSkip;int offset = 0;int temp;\nwhile ((int)inputBuffer.numSamples() >= sampleReq) //緩存數據sampleReq\n{\n if (isBeginning == false)\n {\n offset = seekBestOverlapPosition(inputBuffer.ptrBegin()); //尋找相關性最大的位置\n\n overlap(outputBuffer.ptrEnd((uint)overlapLength), inputBuffer.ptrBegin(), (uint)offset); //overlap \n outputBuffer.putSamples((uint)overlapLength);\n offset += overlapLength;\n }\n else\n {\n isBeginning = false;\n int skip = (int)(tempo * overlapLength + 0.5 * seekLength + 0.5);\n\n #ifdef ST_SIMD_AVOID_UNALIGNED\n // in SIMD mode, round the skip amount to value corresponding to aligned memory address\n if (channels == 1)\n {\n skip &= -4;\n }\n else if (channels == 2)\n {\n skip &= -2;\n }\n #endif\n skipFract -= skip;\n assert(nominalSkip >= -skipFract);\n }\n\n\n if ((int)inputBuffer.numSamples() < (offset + seekWindowLength - overlapLength))\n {\n continue; // just in case, shouldn't really happen\n }\n\n // length of sequence\n temp = (seekWindowLength - 2 * overlapLength);\n outputBuffer.putSamples(inputBuffer.ptrBegin() + channels * offset, (uint)temp);\n assert((offset + temp + overlapLength) <= (int)inputBuffer.numSamples());\n memcpy(pMidBuffer, inputBuffer.ptrBegin() + channels * (offset + temp), \n channels * sizeof(SAMPLETYPE) * overlapLength);.\n skipFract += nominalSkip; // real skip size\n ovlSkip = (int)skipFract; // rounded to integer skip\n skipFract -= ovlSkip; // maintain the fraction part, i.e. real vs. integer skip\n inputBuffer.receiveSamples((uint)ovlSkip); //跳過數據和tempo相關實現變速的關鍵\n}\n}\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"變速變調RateTranposer類","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RateTransPoser其實就是通過升降採樣來實現語音的變速變調的。類具體實現如下：","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"cpp"},"content":[{"type":"text","text":"class RateTransposer : \n public FIFOProcessor{\n protected:\n /// Anti-alias filter object\n AAFilter *pAAFilter; //抗混疊濾波器\n TransposerBase *pTransposer; //提供實現變速變調的算法\n FIFOSampleBuffer inputBuffer;\n\n /// Buffer for keeping samples between transposing & anti-alias filter\n FIFOSampleBuffer midBuffer;\n\n /// Output sample buffer\n FIFOSampleBuffer outputBuffer;\n\n bool bUseAAFilter;\n\n void processSamples(const SAMPLETYPE *src, \n uint numSamples);\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中核心函數processSamples","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"cpp"},"content":[{"type":"text","text":"void RateTransposer::processSamples(const SAMPLETYPE *src, uint nSamples)\n{\n uint count;\n if (nSamples == 0) return;\n // 接受數據\n inputBuffer.putSamples(src, nSamples);\n // 數據處理\n if (bUseAAFilter == false)\n {\n count = pTransposer->transpose(outputBuffer, inputBuffer);\n return;\n }\n assert(pAAFilter);\n if (pTransposer->rate < 1.0f) \n {\n // 實現升降採樣\n pTransposer->transpose(midBuffer, inputBuffer); \n // AAFilter濾波\n pAAFilter->evaluate(outputBuffer, midBuffer);\n } \n else \n {\n pAAFilter->evaluate(midBuffer, inputBuffer);\n pTransposer->transpose(outputBuffer, midBuffer);\n }\n}\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"TransposerBase類其實爲一個工廠方法，其中涉及了3種升降採樣方法，其實也就是插值方法。分別爲：","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b0/b0fd44ddd23db6c271dd243de3097cd7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"插值算法原理待補充。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"soundtouch需要注意的點：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中數據緩存和每次跳躍的大小爲:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"nominalSkip = tempo * (seekWindowLength - overlapLength);","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"sampleReq = max(intskip + overlapLength, seekWindowLength) + seekLength;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"變速不變調中計算相關性根據不同的平臺可以使用TDStretchSSE或TDStretchMMX。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"TDStreach的實現來看，其具有較大的算法延遲。要實時實現減小延遲，可考慮減小sequenceMs的大小以及緩存數據來處理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"最後","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 簡要的分析了一下soundtouch的實現，關於soundtouch的數據輸入輸出FIFOProcessor等類沒有詳細的介紹。後面會分析下sonic算法原理的實現。A Review of Time-Scale Modification of Music Signals這篇經典文章的地址爲：","attrs":{}},{"type":"link","attrs":{"href":"https://www.mdpi.com/2076-3417/6/2/57","title":null,"type":null},"content":[{"type":"text","text":"https://www.mdpi.com/2076-3417/6/2/57","attrs":{}}]},{"type":"text","text":"。文章中有什麼不對的地方，希望大家一起討論。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"參考","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A Review of Time-Scale Modification of Music Signals：","attrs":{}},{"type":"link","attrs":{"href":"https://www.mdpi.com/2076-3417/6/2/57","title":null,"type":null},"content":[{"type":"text","text":"https://www.mdpi.com/2076-3417/6/2/57","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"知乎變聲導論：","attrs":{}},{"type":"link","attrs":{"href":"https://zhuanlan.zhihu.com/p/110278983","title":null,"type":null},"content":[{"type":"text","text":"https://zhuanlan.zhihu.com/p/110278983","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"論文：基於WSOLA算法的語音時長調整研究","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"論文：AN OVERLAP-ADD TECHNIQUE BASED ON WAVEFORM SIMILARITY (WSOLA) FOR HIGH QUALITY TIME-SCALE MODIFICATION OF SPEECH","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"soundtouch官網：","attrs":{}},{"type":"link","attrs":{"href":"http://www.surina.net/soundtouch/","title":null,"type":null},"content":[{"type":"text","text":"http://www.surina.net/soundtouch/","attrs":{}}]}]}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

O2OA(翱途)開發平臺新手上路-信息管理和信息欄目開發

本篇主要使用實例開發的方式講述如何在O2OA(翱途)開發平臺中開發一個簡單的通知公告，包括欄目創建，分類設置，表單設計以及如何驗證表單開發成果。一、先決條件： 1、O2Server服務器正常運行 2、以擁有管理員權限的用戶賬號登錄O

2024-04-18 22:25:54

勞動節H5好難做？那是你沒看到這些模板！

隨着勞動節的步伐越來越近各大門店開始爭相公佈各自的促銷活動了各類社交媒體上的節日宣傳海報也陸續多了起來線下線上到處洋溢着濃郁的營銷氣息衆所周知，營銷宜早不宜遲再不出手，就要出局啦這個時候，我猜大家一定都缺這樣一波勞動

2024-04-18 22:24:40

New！界面控件DevExpress WinForms v24.1預覽版搶先體驗

DevExpress WinForm擁有180+組件和UI庫，能爲Windows Forms平臺創建具有影響力的業務解決方案。DevExpress WinForms能完美構建流暢、美觀且易於使用的應用程序，無論是Office風格的界面，還是

2024-04-18 11:35:17

文心大模型ERNIE-Tiny：輕量化技術的全面解讀

隨着人工智能技術的日益成熟，大模型成爲了衆多領域的研究熱點。大模型通過龐大的數據量和複雜的網絡結構，實現了對數據的深度挖掘和高效處理。然而，大模型的龐大體積和高計算成本也限制了其在一些實際場景中的應用。爲了解決這一問題，文心大模型ERNIE

2024-04-18 11:29:53

Open WebUI大模型對話平臺：適配Ollama的實踐與探索

隨着人工智能技術的飛速發展，大模型對話平臺成爲了衆多領域中的熱門話題。Open WebUI大模型對話平臺作爲一款功能強大的工具，爲我們提供了一種無縫集成文檔交互、輕鬆訪問Web內容、訓練模型以及實現多模態交互的方式。在本文中，我們將深入探討

2024-04-18 11:29:51

Qt/C++音視頻開發70-無感切換通道/無縫切換播放視頻/多通道流暢切換/不同視頻打開無縫切換

一、前言之前就寫過這個方案，當時做的是ffmpeg內核版本，由於ffmpeg內核解析都是代碼實現，所以無縫切換非常完美，看不到絲毫的中間切換過程，看起來就像是在一個通道畫面中。其實這種切換隻能說是取巧辦法，最佳的辦法應該是公用一個open

2024-04-18 10:40:53

架構設計｜基於 raft-listener 實現實時同步的主備集羣

背景以及需求線上業務對數據庫可用性可靠性要求較高，要求需要有雙 AZ 的主備容災機制。主備集羣要求數據和 schema 信息實時同步，數據同步平均時延要求在 1s 之內，p99 要求在 2s 之內。主備集羣數據要求一致要求能夠在主

2024-04-18 01:07:18

一款國產的開發輔助AI插件！

@[toc] 昨天百度舉行了 Create 2024 百度 AI 開發者大會，松哥得以近距離了解了百度的 AI 產品，以前就瞭解文心一言，其他的都用的少。昨天在會場上李彥宏介紹了百度的一個 AI 輔助工具 Comate，晚上回來趕緊體驗了一

2024-04-18 00:15:13

王海峯：百度 500 萬 AI 人才培養目標已提前達成

4 月 16 日，以“創造未來”爲主題的 Create 2024 百度 AI 開發者大會在深圳國際會展中心成功舉辦。百度首席技術官王海峯以“技術築基，星河璀璨”爲題，發表演講，解讀了智能體、代碼、多模型等多項文心大模型的關鍵技術和最新進展。

2024-04-17 23:41:11

O2OA(翱途)開發平臺-設置組織架構及員工信息

本篇主要簡單講述初次使用O2OA(翱途)開發平臺時如何創建人員信息，組織職務信息以及組織成員和組織職務管理的實際意義及使用場景。一、先決條件： 1、O2Server服務器正常運行 2、以擁有管理員權限的用戶賬號登錄O2OA(翱途

2024-04-17 22:25:40

重大變更！Zabbix 7.0 更新開源協議！

近日，Zabbix 原廠宣佈Zabbix 7.0 遵循的許可證的重大變更： Zabbix 7.0 （即將發佈的LTS版本）從 GPL v2 轉爲 AGPL v3 許可證，而任何舊版本的Zabbix都不受影響。此次許可證變更依然允許用戶使

2024-04-17 22:13:16

搶先體驗：Zabbix 7.0全新Dashboard和MFA功能，增強可視化、安全性、靈活性！

（感謝本文作者張世宏，Zabbix開源社區專家，暱稱張思德。） Zabbix 7.0 beta2 已於2024年3月20日發佈，Zabbix 7.0 LTS預計於2024年Q2正式發佈。筆者立即下載體驗，感受是Zabbix 7.0在數據

2024-04-17 22:13:12

「Qt Widget中文示例指南」如何實現行編輯功能

Qt 是目前最先進、最完整的跨平臺C++開發工具。它不僅完全實現了一次編寫，所有平臺無差別運行，更提供了幾乎所有開發過程中需要用到的工具。如今，Qt已被運用於超過70個行業、數千家企業，支持數百萬設備及應用。 Line Edits（行編輯）

2024-04-17 11:37:05

一種融合指代消解序列標註方法在中文人名識別上的應用（上）

技術領域自然語言處理領域。應用場景：適用於自然語言處理領域，通過命名實體識別（Named Entity Recognition，NER），準確識別實體。依託自然語言處理領域，基於人民日報數據及構造的輿情公告數據，提出一

2024-04-17 11:18:18

Spring開發：動態代理的藝術與實踐

本文分享自華爲雲社區《Spring高手之路17——動態代理的藝術與實踐》，作者：磚業洋__。 1. 背景動態代理是一種強大的設計模式，它允許開發者在運行時創建代理對象，用於攔截對真實對象的方法調用。這種技術在實現面向切面編程（AOP）

2024-04-16 22:33:07

24小時熱門文章

最新文章

最新評論文章