最近在試語音轉文本,瞭解了一些相關的東西,記錄一下。
一、python speechRecogniton庫
python自帶的speechRecognition庫是一個多功能的實現語音識別的庫,細節網上有很多,可以搜到,可以參考
https://blog.csdn.net/alice_tl/article/details/89684369
二、使用說明
- 安裝speechRecognition庫
pip install speechrecognition
- 使用,定義不同類別的函數
import speech_recognition as sr global r r = sr.Recognizer() #調用谷歌的語音api def google(audio): try: print("Google: ") return r.recognize_google(audio) except sr.UnknownValueError: print("Google Speech Recognition could not understand audio") return None except sr.RequestError as e: print("Could not request results from Google Speech Recognition service; {0}".format(e)) return "None" #使用wit的 def wit(audio): # recognize speech using Wit.ai WIT_AI_KEY = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx" # Wit.ai keys are 32-character uppercase alphanumeric strings try: #print("Wit.ai: ") return r.recognize_wit(audio, key=WIT_AI_KEY) except sr.UnknownValueError: print("Wit.ai could not understand audio") return "None" except sr.RequestError as e: print("Could not request results from Wit.ai service; {0}".format(e)) return "None" #調用bing的 def bing(audio): BING_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # recognize speech using Microsoft Bing Voice Recognition try: #print("Microsoft Bing Voice Recognition: ") return r.recognize_bing(audio, key=BING_KEY) except sr.UnknownValueError: print("Microsoft Bing Voice Recognition could not understand audio") return "None" except sr.RequestError as e: print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e)) return "None" # Query IBM def ibm(audio): # recognize speech using IBM Speech to Text IBM_USERNAME = "xxxxxxxxxxxxxxxxxxxxxxxxxx" # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX IBM_PASSWORD = "xxxxxxxxxxxxxxxxx" # IBM Speech to Text passwords are mixed-case alphanumeric strings try: #print("IBM Speech to Text: ") return r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD, show_all=False) except sr.UnknownValueError: print("IBM Speech to Text could not understand audio") return "None" except sr.RequestError as e: print("Could not request results from IBM Speech to Text service; {0}".format(e)) return "None" #使用sphinx的 def sphinx(audio): try: print("-------------Sphinx successfully recognized the audio ---------") return r.recognize_sphinx(audio) except sr.UnknownValueError: print("Sphinx could not understand audio") except sr.RequestError as e: print("Sphinx error; {0}".format(e))
需要注意的是,其中sphinx的可以離線使用,需要安裝sphinx包,其他的幾個都要聯網。谷歌的不需要註冊,其他幾個需要註冊碼。
-
使用定義的函數識別具體的語音文件:需要注意,只能識別wav格式,如果不是,先轉成wav
from pydub import AudioSegment r = sr.Recognizer() def speech_to_text(path_file): #轉格式 song = AudioSegment.from_mp3(path_file) song.export("audio.wav", format="wav")#默認是本地路徑 with sr.AudioFile('audio.wav') as source: # AudioFile 類可以通過音頻文件的路徑進行初始化,並提供用於讀取和處理文件內容的上下文管理器界面。 audio = r.record(source) # 從音頻文件中獲取數據 print(audio) print("Submitting To Speech to Text:") determined = sphinx(audio) # Instead of google, you can use ibm or bing here print(determined) return determined