使用Google語音識別引擎（Google Speech API）

wget -O "GoogleSpeechAPI.txt" --user-agent="Mozilla/5.0" --post-file=test.flac --header="Content-Type: audio/x-flac; rate=16000" "http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=zh-CN&maxresults=1"

結果如下：

[javascript] view plain copy print ?

{
"status":0, /* 結果代碼，詳細見本文結尾 */
"id":"c421dee91abe31d9b8457f2a80ebca91-1", /* 識別編號 */
"hypotheses": /* 假設，即結果 */
[
{
"utterance":"下午好", /* 話語 */
"confidence":0.2507637 /* 信心，即準確度 */
}
]
}

注：註釋後爲手工添加的結果解釋

返回結果太明瞭了！直接就能拿來用了不是~ 返回的編碼是UTF-8。

對於編碼格式，在測試中使用了FLAC編碼，採樣率爲16kHz，經測試其他採樣率同樣可用，但一定要保證Header裏的rate與實際數據相符。（關於其他格式的實驗請看本文底部。）

總結：

1、基本流程：

一、從音頻輸入設備獲取原始數據。

二、對原始數據進行包裝、編碼。

三、將編碼後的音頻POST至接口地址。

四、分析處理接口返回的JSON並得出結果。

2、請求接口

地址：http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=zh-CN&maxresults=1

請求方式：HTTP POST

頭部信息：Content-Type: audio/x-flac; rate=16000 （注：Content-Type根據所使用的編碼格式不同而不同，詳見文章底部。rate爲音頻採樣率。）

請求數據：編碼後的音頻數據

3、音頻編碼格式：

FLAC或WAV或SPEEX

下面是我寫的Qt(C++)中的請求：

[cpp] view plain copy print ?

void Protocol::Request_SPEECH(QByteArray & audioData)
{
if (!Nt_SPEECH)
{
QNetworkRequest request;
QString speechAPI = "http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=zh-CN&maxresults=1";
request.setUrl(speechAPI);
request.setRawHeader("User-Agent", "Mozilla/5.0");
request.setRawHeader("Content-Type", "audio/x-flac; rate=16000");
Nt_SPEECH = NetworkMGR.post(request, audioData);
connect(Nt_SPEECH, SIGNAL(readyRead()), this, SLOT(Read_SPEECH()));
}
}

至於讀取函數，就不貼在這裏了，具體見：

Protocol: http://pastebin.com/6G6wggfF

AudioInput:

speechInput.h: http://pastebin.com/qdMPeWZD

speechInput.cpp: http://pastebin.com/567B47qF

main:

mainwidget: http://pastebin.com/c8bk7zd2

在翻閱Chromium源碼的過程之中，還發現了其他有用的東西：

Speech Input API Specification http://www.w3.org/2005/Incubator/htmlspeech/2010/10/google-api-draft.html

到目前爲止，Google好像還沒有公開這個API，使用許可依舊不詳，請求也沒有用到任何認證。但它確實能用，而且十分方便，對於編寫非商業程序的人來說，這個東西真的是再好不過了（因爲它有着高的爆表的識別率）。

參考：

Chromium Repository http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

Accessing Google Speech API / Chrome 11 http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/

附：

1、SpeechInputError interface 錯誤信息

[cpp] view plain copy print ?

// This enumeration follows the values described here:
// http://www.w3.org/2005/Incubator/htmlspeech/2010/10/google-api-draft.html#speech-input-error
enum SpeechInputError {
// There was no error.
SPEECH_INPUT_ERROR_NONE = 0,
// The user or a script aborted speech input.
SPEECH_INPUT_ERROR_ABORTED,
// There was an error with recording audio.
SPEECH_INPUT_ERROR_AUDIO,
// There was a network error.
SPEECH_INPUT_ERROR_NETWORK,
// No speech heard before timeout.
SPEECH_INPUT_ERROR_NO_SPEECH,
// Speech was heard, but could not be interpreted.
SPEECH_INPUT_ERROR_NO_MATCH,
// There was an error in the speech recognition grammar.
SPEECH_INPUT_ERROR_BAD_GRAMMAR,
};

2、多種音頻格式的測試

收到朋友的郵件說使用flac實在是很不方便，問我有沒有更好的解決方法，於是我嘗試將其他編碼格式應用於Google Speech API。以下爲結果：

1、WAV格式

請求Header：Content-Type: audio/L16; rate=16000

返回結果：識別成功

2、MP3格式

請求Header：Content-Type: audio/mpeg; rate=16000

返回結果：無法識別的編碼

請求Header：Content-Type: audio/mpeg3; rate=16000

返回結果：無法識別的編碼

請求Header：Content-Type: audio/x-mpeg; rate=16000

返回結果：無法識別的編碼

請求Header：Content-Type: audio/x-mpeg-3; rate=16000

返回結果：無法識別的編碼

請求Header：Content-Type: audio/mp3; rate=16000

返回結果：無法識別的編碼

3、PCM格式

請求Header：Content-Type: audio/x-ogg-pcm; rate=16000

返回結果：無法識別的編碼

請求Header：Content-Type: audio/pcm; rate=16000

返回結果：無法識別的編碼

4、SPEEX格式

請求Header：Content-Type: audio/x-speex-with-header-byte; rate=16000

返回結果：識別成功

請求Header：Content-Type: audio/speex; rate=16000

返回結果：識別成功

由於識別接口並不開放，所以無法得知具體的支持格式，如果哪位朋友發現了新的支持格式，請一定要留言哦！

使用Google語音識別引擎（Google Speech API）

Mac 與 PC鍵盤佈局對比

顏色中英文對照表

Realflow2012中英文對照表2012

剪輯風格

簡要的Windows API函數大全

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結