python自帶語音識別庫識別語音文件(wav)

最近在試語音轉文本,瞭解了一些相關的東西,記錄一下。

一、python speechRecogniton庫

python自帶的speechRecognition庫是一個多功能的實現語音識別的庫,細節網上有很多,可以搜到,可以參考

https://blog.csdn.net/alice_tl/article/details/89684369

 

二、使用說明

  1. 安裝speechRecognition庫
    pip install speechrecognition

     

  2. 使用,定義不同類別的函數
    import speech_recognition as sr
    
    global r
    r = sr.Recognizer()
    
    #調用谷歌的語音api
    def google(audio):
    	try:
    		print("Google: ")
    		return r.recognize_google(audio)
    	except sr.UnknownValueError:
    		print("Google Speech Recognition could not understand audio")
    		return None
    	except sr.RequestError as e:
    		print("Could not request results from Google Speech Recognition service; {0}".format(e))
    		return "None"
    
    #使用wit的
    def wit(audio):
    	# recognize speech using Wit.ai
    	WIT_AI_KEY = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx"  # Wit.ai keys are 32-character uppercase alphanumeric strings
    	try:
    		#print("Wit.ai: ")
    		return r.recognize_wit(audio, key=WIT_AI_KEY)
    	except sr.UnknownValueError:
    		print("Wit.ai could not understand audio")
    		return "None"
    	except sr.RequestError as e:
    		print("Could not request results from Wit.ai service; {0}".format(e))
    		return "None"
    
    #調用bing的
    def bing(audio):
    	BING_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    	# recognize speech using Microsoft Bing Voice Recognition
    	try:
    		#print("Microsoft Bing Voice Recognition: ")
    		return r.recognize_bing(audio, key=BING_KEY)
    	except sr.UnknownValueError:
    		print("Microsoft Bing Voice Recognition could not understand audio")
    		return "None"
    	except sr.RequestError as e:
    		print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))
    		return "None"
    	
    # Query IBM
    def ibm(audio):
    
    	# recognize speech using IBM Speech to Text
    	IBM_USERNAME = "xxxxxxxxxxxxxxxxxxxxxxxxxx"  # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
    	IBM_PASSWORD = "xxxxxxxxxxxxxxxxx"  # IBM Speech to Text passwords are mixed-case alphanumeric strings
    	try:
    		#print("IBM Speech to Text: ")
    		return r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD, show_all=False)
    	except sr.UnknownValueError:
    		print("IBM Speech to Text could not understand audio")
    		return "None"
    	except sr.RequestError as e:
    		print("Could not request results from IBM Speech to Text service; {0}".format(e))
    		return "None"
    
    #使用sphinx的
    def sphinx(audio):
    	try:
    		print("-------------Sphinx successfully recognized the audio ---------")
    		return r.recognize_sphinx(audio)
    	except sr.UnknownValueError:
    		print("Sphinx could not understand audio")
    	except sr.RequestError as e:
    		print("Sphinx error; {0}".format(e))

    需要注意的是,其中sphinx的可以離線使用,需要安裝sphinx包,其他的幾個都要聯網。谷歌的不需要註冊,其他幾個需要註冊碼

  3. 使用定義的函數識別具體的語音文件:需要注意,只能識別wav格式,如果不是,先轉成wav

    from pydub import AudioSegment
    
    
    r = sr.Recognizer()
    
    
    def speech_to_text(path_file):
        #轉格式
        song = AudioSegment.from_mp3(path_file)
        song.export("audio.wav", format="wav")#默認是本地路徑
    
        with sr.AudioFile('audio.wav') as source:  # AudioFile 類可以通過音頻文件的路徑進行初始化,並提供用於讀取和處理文件內容的上下文管理器界面。
    
            audio = r.record(source)  # 從音頻文件中獲取數據
            print(audio)
    
        print("Submitting To Speech to Text:")
        determined = sphinx(audio)  # Instead of google, you can use ibm or bing here
        print(determined)
        return  determined

     

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章