Spectrogram是基於STFT變換得到的,非常有助於分析信號的時頻特性,在語音信號處理中常被稱爲"語譜圖"。
python中有一些寫好的模塊可以直接將時域的信號轉化成spectrogram,但這並不利於對其原理的理解,而且橫縱左邊的轉換也不是很方便,在這篇博客中我們嘗試直接基於python的基本操作來手東畫出spectrogram。
Generate synthetic data
每臺模擬電話的撥盤上都會產生2個正弦波信號,例如按下數字1就會產生頻率包含697Hz和1209Hz的正弦波,697Hz表示正弦波會在1s時間內重複整個周i697次,兩個不同頻率的正弦波表示信號是這兩個正弦波的總和。
假設採樣率爲4000Hz,意味着1s採樣4000個點,前3s對應數字1,中間2s爲silencem,最後3s對應數字2,則生成數據的代碼如下:
import numpy as np
import matplotlib.pyplot as plt
import warnings
import librosa
warnings.filterwarnings("ignore", category=RuntimeWarning)
def get_signal_Hz(Hz,sample_rate,length_ts_sec):
## 1 sec length time series with sampling rate
ts1sec = list(np.linspace(0,np.pi*2*Hz,sample_rate))
## 1 sec length time series with sampling rate
ts = ts1sec*length_ts_sec
return(list(np.sin(ts)))
sample_rate = 4000
length_ts_sec = 3
## --------------------------------- ##
## 3 seconds of "digit 1" sound
## Pressing digit 2 buttom generates
## the sine waves at frequency
## 697Hz and 1209Hz.
## --------------------------------- ##
ts1 = np.array(get_signal_Hz(697, sample_rate,length_ts_sec))
ts1 += np.array(get_signal_Hz(1209,sample_rate,length_ts_sec))
ts1 = list(ts1)
## -------------------- ##
## 2 seconds of silence
## -------------------- ##
ts_silence = [0]*sample_rate*1
## --------------------------------- ##
## 3 seconds of "digit 2" sounds
## Pressing digit 2 buttom generates
## the sine waves at frequency
## 697Hz and 1336Hz.
## --------------------------------- ##
ts2 = np.array(get_signal_Hz(697, sample_rate,length_ts_sec))
ts2 += np.array(get_signal_Hz(1336,sample_rate,length_ts_sec))
ts2 = list(ts2)
## -------------------- ##
## Add up to 7 seconds
## ------------------- ##
ts = ts1 + ts_silence + ts2
Plot the generated sound signal in frequency domain
採用DFT變換來畫出信號在頻域上的頻譜圖,代碼如下所示。
def get_xn(Xs,n):
'''
calculate the Fourier coefficient X_n of
Discrete Fourier Transform (DFT)
'''
L = len(Xs)
ks = np.arange(0,L,1)
xn = np.sum(Xs*np.exp(((-1)*1j*2*np.pi*ks*n)/L))
return(xn)
def get_xns(ts):
'''
Compute Fourier coefficients only up to the Nyquest Limit Xn, n=1,...,L/2
and multiply the absolute value of the Fourier coefficients by 2,
to account for the symetry of the Fourier coefficients above the Nyquest Limit.
'''
mag = []
L = len(ts)
for n in range(int(L/2)): # Nyquest Limit
mag.append(np.abs(get_xn(ts,n))*2)
return(mag)
mag = get_xns(ts)
這裏的"get_xns"函數是基於Nyquest限制下計算Fourier係數的,同樣由於Fourier係數的對稱性所以每個Fourier係數的絕對值應該double。
注:這裏原博的“get_xns”中計算係數採用的是: xn = np.sum(Xsnp.exp((1j2np.piks*n)/L))/L
我個人覺得這個是錯誤的,雖然並不影響後續的分析。
相應的波形圖爲:
DFT on entire dataset to visualize the signals at frequency domain for all k=1,…L/2.
可視化信號的頻譜圖:
# the number of points to label along xaxis
Nxlim = 10
plt.figure(figsize=(20,3))
plt.plot(mag)
plt.xlabel("Frequency (k)")
plt.title("Two-sided frequency plot")
plt.ylabel("|Fourier Coefficient|")
plt.show()
相應的頻譜圖爲:
參考我的博客(),第k個頻點上的Fourier係數對應的頻率計算公式爲:
Hz
依據於此,將頻譜圖的x軸座標轉換到以Hz爲單位,那麼就可以看到頻譜圖在697Hz,1209Hz和1336Hz處有峯值出現。
def get_Hz_scale_vec(ks,sample_rate,Npoints):
freq_Hz = ks*sample_rate/Npoints
freq_Hz = [int(i) for i in freq_Hz ]
return(freq_Hz )
ks = np.linspace(0,len(mag),Nxlim)
ksHz = get_Hz_scale_vec(ks,sample_rate,len(ts))
plt.figure(figsize=(20,3))
plt.plot(mag)
plt.xticks(ks,ksHz)
plt.title("Frequency Domain")
plt.xlabel("Frequency (Hz)")
plt.ylabel("|Fourier Coefficient|")
plt.show()
得到的圖形爲:
Create Spectrogram
終於進入今天的正題了~
前面已經介紹了信號的wavfeorm和spectra,這兩個域分別展現了信號的時域和頻域特性。爲了能夠更好地分析信號的時頻特性,於是採用了帶窗的DFT變換,即STFT變換。
信號通過STFT變換得到語譜圖,python中有現成的函數"matplotlib.pyplot.spectram"來計算spectrogram,這裏我們給出具體的STFT計算過程:
def create_spectrogram(ts,NFFT,noverlap = None):
'''
ts: original time series
NFFT: The number of data points used in each block for the DFT.
Fs: the number of points sampled per second, so called sample_rate
noverlap: The number of points of overlap between blocks. The default value is 128.
'''
if noverlap is None:
noverlap = NFFT/2
noverlap = int(noverlap)
starts = np.arange(0,len(ts),NFFT-noverlap,dtype=int)
# remove any window with less than NFFT sample size
starts = starts[starts + NFFT < len(ts)]
xns = []
for start in starts:
# short term discrete fourier transform
ts_window = get_xns(ts[start:start + NFFT])
xns.append(ts_window)
specX = np.array(xns).T
# rescale the absolute value of the spectrogram as rescaling is standard
spec = 10*np.log10(specX)
assert spec.shape[1] == len(starts)
return(starts,spec)
L = 256
noverlap = 84
starts, spec = create_spectrogram(ts,L,noverlap = noverlap )
Plot the hand-made spectrogram
完成STFT變換之後,就可以手動畫出spectrogram:
def plot_spectrogram(spec,ks,sample_rate, L, starts, mappable = None):
plt.figure(figsize=(20,8))
plt_spec = plt.imshow(spec,origin='lower')
## create ylim
Nyticks = 10
ks = np.linspace(0,spec.shape[0],Nyticks)
ksHz = get_Hz_scale_vec(ks,sample_rate,len(ts))
plt.yticks(ks,ksHz)
plt.ylabel("Frequency (Hz)")
## create xlim
Nxticks = 10
ts_spec = np.linspace(0,spec.shape[1],Nxticks)
ts_spec_sec = ["{:4.2f}".format(i) for i in np.linspace(0,total_ts_sec*starts[-1]/len(ts),Nxticks)]
plt.xticks(ts_spec,ts_spec_sec)
plt.xlabel("Time (sec)")
plt.title("Spectrogram L={} Spectrogram.shape={}".format(L,spec.shape))
plt.colorbar(mappable,use_gridspec=True)
plt.show()
return(plt_spec)
plot_spectrogram(spec,ks,sample_rate,L, starts)
得到的語譜圖如下所示,可以清晰地看到前3s包含了頻率爲697Hz和1209Hz的信號,緊接着是2s的slience,最後3s包含了頻率爲693Hz和1336Hz的信號。
Frequency resolution vs time resolution
最後,我想要討論一下在spectrogram中存在的"不確定性原則"(uncertainty principle)。
Uncertainty principle We cannot arbitrarily narrow our focus both in time and in frequency. If we want higher time resolusion, we need to give up frequency resolusion and vise verse.
在之前的spectrogram中,window size設爲256,sample rate設爲4000,因此每個窗包含:
time resolution : second
而 frequency resolution 則與 time resolution 互爲倒數:
time resolution : second
下面的幾張圖表現了在 frequency resolution 和 time resolution 這兩個方面的權衡,如果Spectroogram採用了較大的窗,則頻域信息更加清晰,反之頻帶較寬的話,則時域信息更加清晰。
注:這裏原博的標題是Wideband spectrogram vs narrowband spectrogram,但由於信號本身就有 wideband 和 narrowband 的區別,所以採用這個標題容易引起歧義,我就改爲了Frequency resolution vs time resolution。
plt_spec1 = None
for iL, (L, bandnm) in enumerate(zip([150, 200, 400],["wideband","middleband","narrowband"])):
print("{:20} time resoulsion={:4.2f}sec, frequency resoulsion={:4.2f}Hz".format(bandnm,L/sample_rate,sample_rate/L))
starts, spec = create_spectrogram(ts,L,noverlap = 1 )
plt_spec = plot_spectrogram(spec,ks,sample_rate, L, starts,
mappable = plt_spec1)
if iL == 0:
plt_spec1 = plt_spec
wideband :
middleband :
narrowband: