TIMIT dataset - The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus

TIMIT dataset - The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus

Defense Advanced Research Projects Agency,DARPA:美國國防高級研究計劃局
Advanced Research Projects Agency,ARPA:高等研究計劃局
acoustic /əˈkuːstɪk/:adj. 聲學的,音響的,聽覺的 n. 原聲樂器,不用電傳音的樂器
phonetic /fəˈnetɪk/:adj. 語音的,語音學的,音形一致的,發音有細微區別的
continuous /kənˈtɪnjuəs/:adj. 連續的,持續的,繼續的,連綿不斷的
speech /spiːtʃ/:n. 演講,講話,語音,演說
corpus /ˈkɔːpəs/:n. 語料庫,文集,本金
Texas Instruments,TI:德州儀器
National Institute of Standards and Technology,NIST:美國國家標準與技術研究院
Massachusetts Institute of Technology,MIT:麻省理工學院,麻省理工
SRI International:斯坦福國際研究院

1. The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (聲學-音素連續語音語料庫)

http://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3

TIMIT.zip - 440.21MB

The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT) Training and Test Data

The TIMIT corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT has resulted from the joint efforts of several sites under sponsorship from the Defense Advanced Research Projects Agency - Information Science and Technology Office (DARPA-ISTO). Text corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), Stanford Research Institute (SRI), and Texas Instruments (TI). The speech was recorded at TI, transcribed at MIT, and has been maintained, verified, and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST). This file contains a brief description of the TIMIT Speech Corpus. Additional information including the referenced material and some relevant reprints of articles may be found in the printed documentation which is also available from NTIS (NTIS# PB91-100354).
TIMIT 閱讀語音語料庫旨在提供語音數據,用於獲取語音知識以及開發和評估自動語音識別系統。TIMIT 是由美國國防高級研究計劃局-信息科學與技術辦公室 (DARPA-ISTO) 贊助的多個單位共同努力的結果。文本語料庫 (Text corpus) 是由麻省理工學院 (MIT)、斯坦福國際研究院 (SRI) 和德州儀器 (TI) 的共同努力設計。該語音在 TI 錄製,在 MIT 轉錄,並由美國國家標準技術研究院 (NIST) 進行維護、驗證和準備用於 CD-ROM 生產。該文件包含 TIMIT 語音語料庫的簡短描述。可以在印刷文檔中找到包括參考材料和文章的某些相關重印在內的其他信息,這些信息也可以從 NTIS (NTIS# PB91-100354) 獲的。

sponsorship /ˈspɒnsəʃɪp/:n. 贊助,發起,保證人的地位,教父母身份
transcribe /trænˈskraɪb/:vt. 轉錄,抄寫

1.1 Corpus Speaker Distribution

TIMIT contains a total of 6300 sentences, 10 sentences spoken by each of 630 speakers from 8 major dialect regions of the United States. Table 1 shows the number of speakers for the 8 dialect regions, broken down by sex. The percentages are given in parentheses. A speaker’s dialect region is the geographical area of the U.S. where they lived during their childhood years. The geographical areas correspond with recognized dialect regions in U.S. (Language Files, Ohio State University Linguistics Dept., 1982), with the exception of the Western region (dr7) in which dialect boundaries are not known with any confidence and dialect region 8 where the speakers moved around a lot during their childhood.
TIMIT 總共包含 6300 個句子,來自美國 8 個主要方言地區的 630 位講話者每人說 10 個句子。Table 1 顯示了按性別細分的 8 個方言地區的講話者人數。 百分比在括號中給出。說話者的方言地區是他們童年時代居住的美國地理區域。地理區域與美國公認的方言區域相對應 (Language Files, Ohio State University Linguistics Dept., 1982),但 Western region (dr7) 的方言邊界未知。而方言區域 (dialect region) 8 講話者在他們的童年時期走動很多。

Ohio State University,OSU:俄亥俄州立大學
department /dɪˈpɑːtmənt/:n. 部,部門,系,科,局
linguistics /lɪŋˈɡwɪstɪks/:n. 語言學
dialect /ˈdaɪəlekt/:n. 方言,土話,同源語,行話,個人用語特徵 adj. 方言的
parenthesis /pəˈrenθəsɪs/:n. 插入語,插入成分 n. 圓括號 n. 間歇,插曲
geographical /ˌdʒiːəˈɡræfɪkl/:adj. 地理的,地理學的
Table 1:  Dialect distribution of speakers

      Dialect
      Region(dr)    #Male    #Female    Total
      ----------  --------- ---------  ----------
         1         31 (63%)  18 (27%)   49 (8%)  
         2         71 (70%)  31 (30%)  102 (16%) 
         3         79 (67%)  23 (23%)  102 (16%) 
         4         69 (69%)  31 (31%)  100 (16%) 
         5         62 (63%)  36 (37%)   98 (16%) 
         6         30 (65%)  16 (35%)   46 (7%) 
         7         74 (74%)  26 (26%)  100 (16%) 
         8         22 (67%)  11 (33%)   33 (5%)
       ------     --------- ---------  ---------- 
         8        438 (70%) 192 (30%)  630 (100%)

The dialect regions are:
     dr1:  New England
     dr2:  Northern
     dr3:  North Midland
     dr4:  South Midland
     dr5:  Southern
     dr6:  New York City
     dr7:  Western
     dr8:  Army Brat (moved around)

1.2 Corpus Text Material

The text material in the TIMIT prompts (found in the file “prompts.doc”) consists of 2 dialect “shibboleth” sentences designed at SRI, 450 phonetically-compact sentences designed at MIT, and 1890 phonetically-diverse sentences selected at TI. The dialect sentences (the SA sentences) were meant to expose the dialectal variants of the speakers and were read by all 630 speakers. The phonetically-compact sentences were designed to provide a good coverage of pairs of phones, with extra occurrences of phonetic contexts thought to be either difficult or of particular interest. Each speaker read 5 of these sentences (the SX sentences) and each text was spoken by 7 different speakers. The phonetically-diverse sentences (the SI sentences) were selected from existing text sources - the Brown Corpus (Kuchera and Francis, 1967) and the Playwrights Dialog (Hultzen, et al., 1964) - so as to add diversity in sentence types and phonetic contexts. The selection criteria maximized the variety of allophonic contexts found in the texts. Each speaker read 3 of these sentences, with each sentence being read only by a single speaker. Table 2 summarizes the speech material in TIMIT.
TIMIT 提示中的文本材料 (可在文件 prompts.doc 中找到) 由 SRI 設計的 2 種方言 shibboleth 句子,MIT 設計的 450 個音素緊湊句子以及 TI 選擇的 1890 個音素髮散句子。方言句子 (the SA sentences) 旨在揭示講話者的方言變體,並且所有 630 位講話者都朗讀。音素緊湊的句子旨在提供很好的音素對覆蓋範圍,而且語音上下文的額外出現被認爲是困難的或特別令人感興趣。每個講話者讀其中的 5 個句子 (the SX sentences),每個文本由 7 個不同的講話者朗讀。從現有的文本來源中選擇音素髮散的句子 (the SI sentences),the Brown Corpus (Kuchera and Francis, 1967) and the Playwrights Dialog (Hultzen, et al., 1964) ,增加句子類型和音素文本的多樣性。選擇標準最大程度地提高了文本中所找到的音位變體。每個講話者讀其中的 3 個句子,每個句子僅由一個講話者朗讀。表 2 總結了 TIMIT 中的語音材料。

  • 2 個方言句子 (the SA sentences, dialect sentences),對於 630 中的每個人這 2 個方言句子都是相同的。
  • 5 個音素緊湊句子 (the SX sentences, phonetically-compact sentences),儘可能的包含所有的音素對。Each speaker read 5 of these sentences (the SX sentences) and each text was spoken by 7 different speakers.
  • 3 個音素髮散句子 (the SI sentences, phonetically-diverse sentences),爲了增加句子類型和音素文本的多樣性,使之儘可能的包括所有的音位變體(allophonic contexts)。Each speaker read 3 of these sentences, with each sentence being read only by a single speaker.
Table 2:  TIMIT speech material
  Sentence Type   #Sentences   #Speakers   Total   #Sentences/Speaker
  -------------   ----------   ---------   -----   ------------------
  Dialect (SA)          2         630       1260           2
  Compact (SX)        450           7       3150           5
  Diverse (SI)       1890           1       1890           3
  -------------   ----------   ---------   -----    ----------------
  Total              2342                   6300          10
prompt /prɒmpt/:v. 提示,鼓勵,促進,激起,導致,(給演員) 提白 adj. 敏捷的,迅速的,立刻的,及時的,準時的,(商品) 即期要送的 n. 提示,提詞,(電腦屏幕上的) 提示符,鼓勵,催促,付款期限 adv. 準時地
occurrence /əˈkʌrəns/:n. 發生,出現,事件,發現
allophonic /,æləʊ'fɒnɪk/:adj. 音位變體的,同位音的,音子的

1.3 Suggested Training/Test Subdivision

The speech material has been subdivided into portions for training and testing. The criteria for the subdivision is described in the file “testset.doc”. THIS SUBDIVISION HAS NO RELATION TO THE DATA DISTRIBUTED ON THE PROTOTYPE VERSION OF THE CDROM.
細分標準在文件 testset.doc 中進行了描述。此細分與 CDROM 原型版本上分發的數據無關。

1.4 Core Test Set

The test data has a core portion containing 24 speakers, 2 male and 1 female from each dialect region. The core test speakers are shown in Table 3. Each speaker read a different set of SX sentences. Thus the core test material contains 192 sentences, 5 SX and 3 SI for each speaker, each having a distinct text prompt.
測試數據的核心部分包含 24 個說話者,每個方言區域分別有 2 位男性和 1 位女性。表 3 中顯示了核心測試說話者。每個說話者讀一組不同的 SX 句子。因此,核心測試材料包含 192 個句子,每個說話者 5 個 SX 和 3 個 SI,每個句子都有不同的文本提示。

Table 3:  The core test set of 24 speakers

     Dialect        Male      Female
     -------       ------     ------
        1        DAB0, WBT0    ELC0    
        2        TAS1, WEW0    PAS0    
        3        JMP0, LNT0    PKT0    
        4        LLL0, TLS0    JLM0    
        5        BPM0, KLT0    NLP0    
        6        CMJ0, JDH0    MGD0    
        7        GRT0, NJM0    DHC0
        8        JLN0, PAM0    MLD0

1.5 Complete Test Set

A more extensive test set was obtained by including the sentences from all speakers that read any of the SX texts included in the core test set. In doing so, no sentence text appears in both the training and test sets. This complete test set contains a total of 168 speakers and 1344 utterances, accounting for about 27% of the total speech material. The resulting dialect distribution of the 168 speaker test set is given in Table 4. The complete test material contains 624 distinct texts.
通過包括所有閱讀核心測試集中包含的SX文本的所有發言者的句子,可以獲得更廣泛的測試集。 這樣,在訓練和測試集中都不會出現句子文本。 完整的測試集共包含168個揚聲器和1344個發音,約佔語音材料總數的27%。 表4給出了168個揚聲器測試集的最終方言分佈。完整的測試材料包含624個不同的文本。

subdivision /'sʌbdɪvɪʒ(ə)n; sʌbdɪ'vɪʒ(ə)n/:n. 細分,分部,供出賣而分成的小塊土地
Table 4:  Dialect distribution for complete test set

      Dialect    #Male   #Female   Total
      -------    -----   -------   -----
        1           7        4       11
        2          18        8       26
        3          23        3       26
        4          16       16       32
        5          17       11       28
        6           8        3       11
        7          15        8       23
        8           8        3       11
      -----      -----   -------   ------
      Total       112       56      168

References

Documentation for TIMIT
https://catalog.ldc.upenn.edu/docs/LDC93S1/

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章