文檔翻譯:What Are Acoustic Landmarks, and What Do They Describe?

What Are Acoustic Landmarks, and What Do They Describe?

文檔地址:https://speechmrk.com/wp-content/uploads/2016/08/Landmark-Descriptions.pdf
In speech acoustics, landmarks are patterns that mark certain speech-production events. Speech acoustic landmarks come in two classes: peak and abrupt.
在語音聲學中,地標是標記某些語音生成事件的模式。 語音聲學地標分爲兩類:峯值和突然。

Peak: At present, the peak landmarks detected in SpeechMark® are vowel landmarks (VLMs) and frication landmarks. These are identified as instants in an utterance at which a maximum (or peak) of harmonic power or of fractal dimension occurs, respectively, and may be considered the centers of the vowels or fricated intervals (resp.). When plotted with SpeechMark functions, they are drawn below the waveform, labeled by uppercase letters: V or F. Frication landmarks are more fully described elsewhere (e.g., “Frication Peak Landmarks” on the SpeechMark website) and will be ignored here.1
Abrupt: Abrupt or abrupt-consonantal landmarks (AC LMs, or simply LMs) have a more complex specification.
峯值:目前,在SpeechMark®中檢測到的峯值地標是元音地標(VLM)和摩擦地標。這些被識別爲,分別出現諧波功率或分形維數的最大(或峯值)的話語中的瞬間,並可被認爲是元音或摩擦間隔的中心(分別)。當使用SpeechMark函數繪製時,它們被繪製在波形下方,用大寫字母標記:V或F。在其他地方(例如,SpeechMark網站上的“Frication Peak標誌”)更詳盡地描述了摩擦地標,這裏將被忽略。
突然:突然或突然的輔音地標(AC LM,或簡稱LM)具有更復雜的規範。

It is helpful first to distinguish laryngeal-source from vocal-tract events. We denote the former by “+g” (overall onset) or “-g” (overall offset), by “+p” (onset of periodicity) or “-p” (offset, likewise), or by “+j” (upward jump of fundamental frequency, F0) or “-j” (downward, likewise). The detailed rule for the critical +g is particularly complex. However, the central observation is easily stated: Vocal-tract excitation by the laryngeal source is characterized by well-developed voicing.
首先,區分開喉源與聲道事件是有益的。用“+ g”(整體開始)或“-g”(整體偏移),“+ p”(週期性開始)或“-p”(偏移,同樣)或“+ j”表示前者(基頻向上跳躍,F0)或“-j”(同樣向下)。關鍵的+ g的詳細規則特別複雜。而中心觀察很容易說明:喉源的聲道激發的特點是發聲良好。

Voicing is considered well developed when there is evidence of sustained periodic excitation of at least minimal amplitude, as measured over intervals of several milliseconds.2 In spectrogram terms: A narrow-band spectrogram shows clearly defined, smooth, approximately horizontal stripes, reflecting the harmonics of the excitation signal. The spacing between stripes defines the fundamental frequency. Apart from occasional jumps (+j), this frequency must lie within a range specified by the user, or by the client software, or by default. (The current defaults for human speech are: maximum F0 = 350 Hz, minimum F0 = 1/5 of maximum; these are typical of adults, especially females.) The limits of such an interval are denoted by +p and –p events.
當存在至少最小幅度的持續週期性激勵的證據時,如在幾毫秒的間隔內測量,則認爲發聲很好。用頻譜圖術語說:窄帶頻譜圖明確顯示,平滑的近似水平條紋,反映激勵信號的諧波。條紋之間的間距確定了基頻。除偶爾跳轉(+ j)外,此頻率必須位於用戶或客戶端軟件指定的範圍內,或默認情況下。(人類語音的當前默認值爲:最大F0 = 350 Hz,最小F0 =最大值的1/5;這些是典型的成年人,尤其是女性。)此類間隔的限制由+ p和-p事件表示。

Additionally, voicing is considered to be present in a segment of the signal if it occurs shortly before a segment with well developed voicing with (a) similar power, and (b) similar spectral slope to the well-voiced segment. Currently, “shortly before” is up to 50 ms. Such a segment reflects glottalization or other irregular laryngeal motion.
另外,如果聲音在具有良好發聲的聲音的片段之前不久發生(a)相似的功率,並且(b)與良好濁音的片段具有相似的譜斜率,則認爲聲音存在於信號的片段中。目前,“不久之前”最多可達50毫秒。這樣的片段反映了聲門滑動或其他不規則的喉部運動。

Both “g” and “p” LMs occur only in pairs. (Jumps do not.) So we may speak of voicing or of periodic voicing as an attribute of an entire segment of a signal, i.e., of the interval between +g and –g, or between +p and –p, similarly.
“g”和“p”LM均僅成對出現。(跳躍不會。)因此,類似地,我們可以說濁音或週期性濁音作爲一個信號的整個片段的屬性,即+ g和-g之間或者+ p和-p之間的間隔。

Thus, a +g/-g interval must include at least one +p/-p subinterval. However, it may contain more than one, and it may contain both F0 jumps and intervals of irregular motion between +p/-p subintervals. Sometimes it may contain +p/-p intervals separated only by jumps, either upward or downward. Many voiced intervals start with periodicity, so for these intervals, +g and +p are coincident; and similarly for coincident –g and –p LMs.
因此,+ g / -g間隔必須包括至少一個+ p / -p子區間。但是,它可能包含多個,並且它可能包含F0跳轉和+ p / -p子間隔之間的不規則運動間隔。有時它可能包含僅由向上或向下跳躍分開的+ p / -p間隔。許多有聲間隔以週期性開始,因此對於這些間隔,+ g和+ p是重合的;類似地,對於重合的-g和-p LMs。

Informally, but very usefully, the remaining LMs are identified as instants at which the signal shows evidence of rapid change across multiple frequency ranges, on multiple time scales.
通常,剩餘的LM被識別爲在多個時間尺度上信號顯示多個頻率範圍內快速變化的證據的瞬間。

In each case, AC LMs are classified as onset (+) or offset (-) type. They are also classified as voiced or unvoiced, according to their location in a voiced segment (between +g and –g) or an unvoiced one.
在每種情況下,AC LM分爲起始(+)或偏移( - )類型。根據它們在有聲片段(+ g和-g之間)或無聲片段中的位置,它們也被分類爲有聲或無聲。

Processing begins by computing the power in each of several frequency bands. At present, the SpeechMark system normally uses five bands, from 800 to 8000 Hz for adults, or 1200 to 8000 Hz for infants. The instantaneous power is smoothed over two time scales, approximately 25 ms (“fine”) and 50 ms (“coarse”): Coarse smoothing suppresses too-brief events, fine smoothing allows higherprecision placement.
A landmark is detected if power rises or falls by 6 dB simultaneously at both fine and coarse time scales, and in at least 3 of the 5 bands.
處理開始於計算幾個頻帶中的每一個的功率。目前,SpeechMark系統通常使用五個頻段,成人爲800至8000 Hz,嬰兒爲1200至8000 Hz。瞬時功率在兩個時間尺度上平滑,大約25毫秒(“精細”)和50毫秒(“粗略”):粗糙平滑抑制過於短暫的事件,精細平滑允許更高精度的放置。
如果在精細和粗略時間尺度上同時上升或下降6dB,並且在5個頻帶中的至少3箇中,則檢測到界標。

In practice, simultaneity is measured to a precision of 20 ms. That is, three bands must show 6-dB increases or decreases within 20 ms of each other in the coarsely smoothed power contours, and three must show the same in the finely smoothed contours, and the coarse and fine increases or decreases must lie within 20 ms of each other.
在實踐中,同時被測量爲20ms的精度。也就是說,在粗糙平滑的功率輪廓中,三個波段必須在20毫秒內顯示6-dB增加或減少,並且三個必須在精細平滑的輪廓中顯示相同,並且粗略和精細的增加或減少必須在彼此的20ms以內。

In the simplest case, power rises in all the bands, on both time scales, defining a “+b” (unvoiced) or “+s” (voiced) LM. Or it may fall, likewise: “-b” or “-s”, respectively. In practice, it often happens that power rises in three or four frequency bands but stays nearly constant (to within 6 db) in the remaining ones.
在最簡單的情況下,在兩個時間尺度上,所有頻帶中的功率上升,定義爲“+ b”(無聲)或“+ s”(有聲)LM。或者它可能會下降,同樣:“-b”或“-s”。在實踐中,經常發生功率在三個或四個頻帶上升但在其餘頻帶中保持幾乎恆定(在6分貝內)。

A more complicated case arises for fricative-like “f” (unvoiced) or “v” (voiced) onset and offset LMs. Here, the power rises at high frequencies and simultaneously falls at lower frequencies, defining a “+f” or “+v”. Or it may do the opposite, i.e., falling at high frequencies and rising at low ones: “-f” or “-v”, respectively.
對於類似摩擦的“f”(無聲)或“v”(濁音)起始和偏移LM,出現了更復雜的情況。這裏,功率在高頻時上升並同時在較低頻率下降,定義爲“+ f”或“+ v”。或者它可以相反,即,在高頻下降並在低頻上升:分別爲“-f”或“-v”。

Note that “b”/“s” LMs always take precedence over “f”/“v”. That is, if power rises in at least three bands, then SpeechMark detects a “+b”/“+s”; a “f”/“v” LM is not detected even if power falls in the other bands. And likewise for power falling in at least three bands: SpeechMark detects “-b”/“-s”.
注意,“b”/“s”LM總是優先於“f”/“v”。也就是說,如果功率上升至少三個頻段,則SpeechMark會檢測到“+ b”/“+ s”;即使功率下降到其他頻段,也不會檢測到“f”/“v”LM。同樣,電源至少落在三個頻段:SpeechMark檢測到“-b”/“ - s”。

Figure 1 shows an example of the abrupt LMs for one syllable of an infant babble. In contrast to peak LMs, SpeechMark functions draw abrupt LMs above the waveform, labeled by lowercase letters. SpeechMark groups the LMs into one syllabic cluster, covering exactly the segment from the beginning at +g to the ending –g, but (in this example) not beyond. However, they are also grouped into an utterance that does extend beyond this point.
圖1顯示了嬰兒嘮叨的一個音節的突然LM的示例。與峯值LM相比,SpeechMark函數在波形上方繪製突然的LM,用小寫字母標記。 SpeechMark將LM分組爲一個音節羣集,完全覆蓋從+ g開頭到結尾-g的段,但(在此示例中)不超出。然而,它們也被分爲一個超出這一點的話語。

Also notice that the narrow-band spectrogram shows the characteristic horizontal stripes of welldeveloped voicing. However, it further shows two abrupt changes of period at 0.04s and 0.26s, as well as a loss of periodicity (-p) at 0.33s. Finally, notice that an acoustic event at 0.17s is (correctly) not detected as a LM, because it does not appear in enough spectral bands.
另請注意,窄帶頻譜圖顯示了發展良好的發聲的特徵水平條紋。然而,它進一步顯示了0.04s和0.26s的兩個週期的突然變化,以及0.33s處的週期性(-p)的損失。最後,請注意,0.17s的聲學事件(正確地)未被檢測爲LM,因爲它沒有出現在足夠的光譜帶中。
在這裏插入圖片描述
Figure 1. Example landmarks. One syllable of an infant babble is shown. The LMs are placed at instants of abrupt change of energy occurring simultaneously across multiple frequency ranges and at multiple time scales. (top) Waveform with smoothed amplitude envelope, landmarks (+g through –g, green vertical lines), and landmark grouping. Graphics show the interval of voicing (solid red line), grouping as a syllabic cluster (dashed light blue), and grouping as part of an utterance that continues beyond the window (dashed magenta).(bottom) Narrow-band spectrogram of the segment with dotted line through F0 and dashed line through 10 F0. The spectrogram shows the harmonics (horizontal stripes). Periodicity is strong even at the start of voicing (0.01s), so the +g LM is coincident with the corresponding +p (not shown). Note that the event at 0.17s affects too few spectral bands and therefore does not generate a LM. Also note abrupt jumps in F0: Jumps and periodicity events do not contribute to defining the syllabic cluster, so this example is considered a +g+s-s-s-g cluster.
圖1.示例地標。顯示了嬰兒嘮叨的一個音節。 LM在多個頻率範圍和多個時間尺度上同時發生的能量突然變化的瞬間被放置。 (頂部)波形具有平滑的幅度包絡,地標(+ g到-g,綠色垂直線)和地標分組。圖形顯示發聲間隔(實線紅色),分組爲音節簇(藍色淺藍色),並將分組作爲繼續超出窗口的話語的一部分(虛線洋紅色)。(底部)片段的窄帶頻譜圖通過F0的虛線和通過10 F0的虛線。頻譜圖顯示諧波(水平條紋)。即使在發聲開始時(0.01s),週期性也很強,因此+ g LM與相應的+ p(未示出)一致。請注意,0.17s的事件影響光譜帶太少,因此不會生成LM。另請注意F0中的突然跳轉:跳轉和週期性事件無助於定義音節羣集,因此此示例被視爲+ g + s-s-s-g羣集。

The following table summarizes the rules for the abrupt LMs.
下表總結了突然LM的規則。

Table. Rules to identify each type of AC LM. The symbols and mnemonics are not intended to identify underlying articulatory or phonetic events, only to suggest examples: syllabic, voiced frication, etc.
表中指明每種AC LM的規則。符號和助記符並非旨在識別潛在的發音或語音事件,僅用於建議示例:音節,濁音等。

在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章