python自帶語音識別庫識別語音文件（wav）

原創

Cindy-W123

2020-06-19 19:42

最近在試語音轉文本，瞭解了一些相關的東西，記錄一下。

一、python speechRecogniton庫

python自帶的speechRecognition庫是一個多功能的實現語音識別的庫，細節網上有很多，可以搜到，可以參考

https://blog.csdn.net/alice_tl/article/details/89684369

二、使用說明

安裝speechRecognition庫
```
pip install speechrecognition
```

使用，定義不同類別的函數

import speech_recognition as sr

global r
r = sr.Recognizer()

#調用谷歌的語音api
def google(audio):
	try:
		print("Google: ")
		return r.recognize_google(audio)
	except sr.UnknownValueError:
		print("Google Speech Recognition could not understand audio")
		return None
	except sr.RequestError as e:
		print("Could not request results from Google Speech Recognition service; {0}".format(e))
		return "None"

#使用wit的
def wit(audio):
	# recognize speech using Wit.ai
	WIT_AI_KEY = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx"  # Wit.ai keys are 32-character uppercase alphanumeric strings
	try:
		#print("Wit.ai: ")
		return r.recognize_wit(audio, key=WIT_AI_KEY)
	except sr.UnknownValueError:
		print("Wit.ai could not understand audio")
		return "None"
	except sr.RequestError as e:
		print("Could not request results from Wit.ai service; {0}".format(e))
		return "None"

#調用bing的
def bing(audio):
	BING_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
	# recognize speech using Microsoft Bing Voice Recognition
	try:
		#print("Microsoft Bing Voice Recognition: ")
		return r.recognize_bing(audio, key=BING_KEY)
	except sr.UnknownValueError:
		print("Microsoft Bing Voice Recognition could not understand audio")
		return "None"
	except sr.RequestError as e:
		print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))
		return "None"
	
# Query IBM
def ibm(audio):

	# recognize speech using IBM Speech to Text
	IBM_USERNAME = "xxxxxxxxxxxxxxxxxxxxxxxxxx"  # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
	IBM_PASSWORD = "xxxxxxxxxxxxxxxxx"  # IBM Speech to Text passwords are mixed-case alphanumeric strings
	try:
		#print("IBM Speech to Text: ")
		return r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD, show_all=False)
	except sr.UnknownValueError:
		print("IBM Speech to Text could not understand audio")
		return "None"
	except sr.RequestError as e:
		print("Could not request results from IBM Speech to Text service; {0}".format(e))
		return "None"

#使用sphinx的
def sphinx(audio):
	try:
		print("-------------Sphinx successfully recognized the audio ---------")
		return r.recognize_sphinx(audio)
	except sr.UnknownValueError:
		print("Sphinx could not understand audio")
	except sr.RequestError as e:
		print("Sphinx error; {0}".format(e))

需要注意的是，其中sphinx的可以離線使用，需要安裝sphinx包，其他的幾個都要聯網。谷歌的不需要註冊，其他幾個需要註冊碼。

使用定義的函數識別具體的語音文件：需要注意，只能識別wav格式，如果不是，先轉成wav

from pydub import AudioSegment


r = sr.Recognizer()


def speech_to_text(path_file):
    #轉格式
    song = AudioSegment.from_mp3(path_file)
    song.export("audio.wav", format="wav")#默認是本地路徑

    with sr.AudioFile('audio.wav') as source:  # AudioFile 類可以通過音頻文件的路徑進行初始化，並提供用於讀取和處理文件內容的上下文管理器界面。

        audio = r.record(source)  # 從音頻文件中獲取數據
        print(audio)

    print("Submitting To Speech to Text:")
    determined = sphinx(audio)  # Instead of google, you can use ibm or bing here
    print(determined)
    return  determined

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python自帶語音識別庫識別語音文件（wav）

一、python speechRecogniton庫

二、使用說明

Latex編輯論文入門經驗總結（2）--如何在IEEEtrans模板中插入中文

Latex編輯論文入門經驗總結（3）--IEEE access的latex排版注意點彙總

GAN合成語音相關論文

python自帶語音識別庫識別語音文件（wav）

【小技巧】如何向論文原作者索要數據集和源碼

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結