科大訊飛--讓你的APP學會說學逗唱

####### 關鍵字: 訊飛語音識別語音合成配置

在本篇blog中, 您將學到:

使用訊飛語音實現語音識別
使用訊飛語音實現朗讀文字
瞭解他們的內部封裝

一.走近訊飛(iFly)

語音技術實現了人機語音交互，使人與機器之間溝通變得像人與人溝通一樣簡單。語音技術主要包括語音合成和語音識別兩項關鍵技術。讓機器說話，用的是語音合成技術；讓機器聽懂人說話，用的是語音識別技術。此外，語音技術還包括語音編碼、音色轉換、口語評測、語音消噪和增強等技術，有着廣闊應用空間。

早期的語音識別技術讓人啼笑皆非, 就連Siri剛出道時, 也是漏洞百出. 但是訊飛通過多年的不懈努力, 最近發展迅速, 這也是技術型項目前期技術積累的必然結果.
百度也推出了自己的語音識別, 但是因爲技術積累尚淺, 移植和測試體驗尚不如訊飛 – 本條個人觀點.

科大訊飛從開始的只做語音識別和語音合成, 到現在的廣告+統計+廣場+人臉識別+聲紋識別+推送, 可以看出它的野心–打造綜合性平臺, 同時又不放棄專營業務(並且擁有難以記憶的英文縮寫和logo).

從使用訊飛的SDK過程中, 還是能感覺到誠意的, 很多設計很人性化, 免費提供了諸多測試和使用接口, 讓人好感倍增, 這也是爲啥我爲其做了這麼多廣告.

二.搭建環境

登錄開發者平臺

註冊用戶並且登錄
創建新應用

選擇創建新應用:

這裏可以比較隨意填寫, 但是注意平臺別搞錯.

應用創建好之後, 請記錄下訊飛爲該APP生成的Appid: 56678310 (每個人都不一樣哦)
爲新應添加服務

新創建的應用可以在”我的應用”中查看, 開始的時候, 這個應用沒有使用任何SDK, 我們需要向訊飛註冊一下我們的app都需要哪些服務.

點擊”開通更多服務”, 選擇語言聽寫和在線語音合成兩個SDK, 第一個開發語義是自己添加上的.
下載相應SDK

進入下載SDK界面, 您可以通過諸多位置進入到這裏, 可能與截圖不符, 但沒有問題.

這裏選擇”組合服務SDK下載”, 勾選圖中前兩個.

選擇平臺

最後選擇剛纔創建的引用, 之後點擊下載.
新建xcode(singleView)工程, 將下載好的文件夾中lib下的iflyMac導入(拖入)工程
添加引用庫

三.代碼+++++

在storyBoard的viewController中拖入幾個控件, 一個UILable用來顯示語音翻譯後的文字, 兩個UIbutton用來觸發”帶界面的實時翻譯”和”不帶界面的實時翻譯”. 併爲他們拖出屬性和響應方法.
如圖:

appdelegate.m中, 添加如下代碼(註冊):
AppDelegate.m 的 didFinishLaunchingWithOptions中:

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
    // 根據appid登錄到訊飛的服務器, 過程需要身份驗證 , 56678310
    NSString *initString = [[NSString alloc] initWithFormat:@"appid=%@",@"你的appid, 別用我的"];
    [IFlySpeechUtility createUtility:initString];
    return YES;
}

下面是寫好後的ViewController代碼:

#import "ViewController.h"
#import <iflyMSC/iflyMSC.h>
// 這個頭文件是做什麼的呢?
#import "ISRDataHelper.h"
// 還有這個.
#import "IATConfig.h"


@interface ViewController ()<IFlyRecognizerViewDelegate, IFlySpeechRecognizerDelegate>
// 翻譯好的Text會展示在這個label上.
@property (weak, nonatomic) IBOutlet UILabel *textView;

/*!
 *  語音識別控件
 *    錄音時觸摸控件結束錄音，開始識別（相當於舊版的停止）；觸摸其他位置，取消錄音，結束會話（取消）
 *  出錯時觸摸控件，重新開啓會話（相當於舊版的再說一次）；觸摸其他位置，取消錄音，結束會話（取消）
 *
 */
@property (nonatomic,strong)IFlyRecognizerView * iflyRecognizerView;

/*!
 *  語音識別類
 *   此類現在設計爲單例，你在使用中只需要創建此對象，不能調用release/dealloc函數去釋放此對象。所有關於語音識別的操作都在此類中。
 */
@property (nonatomic, strong)IFlySpeechRecognizer * iFlySpeechRecognizer;
@end

@implementation ViewController

- (void)viewDidLoad {
    [super viewDidLoad];
    
    // 語音識別視圖空間及配置.
    [self initRecognizerView];
    
    // 語音識別類的初始化及配置.
    [self initSpeechRecognizer];
}

// !!!:語音識別視圖空間及配置--方法.
-(void)initRecognizerView{
    _iflyRecognizerView = [[IFlyRecognizerView alloc] initWithCenter:self.view.center];
    _iflyRecognizerView.delegate = self;
    [_iflyRecognizerView setParameter: @"iat" forKey: [IFlySpeechConstant IFLY_DOMAIN]];
    //asr_audio_path保存錄音文件名，如不再需要，設置value爲nil表示取消，默認目錄是documents
    [_iflyRecognizerView setParameter:@"asrview.pcm " forKey:[IFlySpeechConstant ASR_AUDIO_PATH]];
}
// !!!:語音識別類的初始化及配置--方法.
-(void)initSpeechRecognizer{
    //單例模式，無UI的實例
    if (_iFlySpeechRecognizer == nil) {
        _iFlySpeechRecognizer = [IFlySpeechRecognizer sharedInstance];
        
        [_iFlySpeechRecognizer setParameter:@"" forKey:[IFlySpeechConstant PARAMS]];
        
        //設置聽寫模式
        [_iFlySpeechRecognizer setParameter:@"iat" forKey:[IFlySpeechConstant IFLY_DOMAIN]];
    }
    _iFlySpeechRecognizer.delegate = self;
    
    if (_iFlySpeechRecognizer != nil) {
        IATConfig *instance = [IATConfig sharedInstance];
        
        //設置最長錄音時間
        [_iFlySpeechRecognizer setParameter:instance.speechTimeout forKey:[IFlySpeechConstant SPEECH_TIMEOUT]];
        //設置後端點
        [_iFlySpeechRecognizer setParameter:instance.vadEos forKey:[IFlySpeechConstant VAD_EOS]];
        //設置前端點
        [_iFlySpeechRecognizer setParameter:instance.vadBos forKey:[IFlySpeechConstant VAD_BOS]];
        //網絡等待時間
        [_iFlySpeechRecognizer setParameter:@"20000" forKey:[IFlySpeechConstant NET_TIMEOUT]];
        
        //設置採樣率，推薦使用16K
        [_iFlySpeechRecognizer setParameter:instance.sampleRate forKey:[IFlySpeechConstant SAMPLE_RATE]];
        
        if ([instance.language isEqualToString:[IATConfig chinese]]) {
            //設置語言
            [_iFlySpeechRecognizer setParameter:instance.language forKey:[IFlySpeechConstant LANGUAGE]];
            //設置方言
            [_iFlySpeechRecognizer setParameter:instance.accent forKey:[IFlySpeechConstant ACCENT]];
        }else if ([instance.language isEqualToString:[IATConfig english]]) {
            [_iFlySpeechRecognizer setParameter:instance.language forKey:[IFlySpeechConstant LANGUAGE]];
        }
        //設置是否返回標點符號
        [_iFlySpeechRecognizer setParameter:instance.dot forKey:[IFlySpeechConstant ASR_PTT]];
    }
}

- (void)didReceiveMemoryWarning {
    [super didReceiveMemoryWarning];
    // Dispose of any resources that can be recreated.
}

// !!!: push界面的識別,點擊事件
- (IBAction)voiceToText:(id)sender {
    [self.iflyRecognizerView start];
}

// !!!: 觸發語音識別類的點擊事件
- (IBAction)voiceToTextWithoutUI:(id)sender {
    self.textView.text = @"";
    // 這個需要手動停止翻譯.
    [_iFlySpeechRecognizer cancel];
    
    //設置音頻來源爲麥克風
    [_iFlySpeechRecognizer setParameter:IFLY_AUDIO_SOURCE_MIC forKey:@"audio_source"];
    
    //設置聽寫結果格式爲json
    [_iFlySpeechRecognizer setParameter:@"json" forKey:[IFlySpeechConstant RESULT_TYPE]];
    
    //保存錄音文件，保存在sdk工作路徑中，如未設置工作路徑，則默認保存在library/cache下
    [_iFlySpeechRecognizer setParameter:@"asr.pcm" forKey:[IFlySpeechConstant ASR_AUDIO_PATH]];
    
    [_iFlySpeechRecognizer setDelegate:self];
    
    [_iFlySpeechRecognizer startListening];
}

// !!!:實現代理方法
// !!!:注意有沒有s, 語音識別的結果回調(帶界面的那個)
-(void)onResult:(NSArray *)resultArray isLast:(BOOL)isLast
{
    NSMutableString *result = [[NSMutableString alloc] init];
    NSDictionary *dic = [resultArray objectAtIndex:0];
    for (NSString *key in dic) {
        [result appendFormat:@"%@",key];
    }
    
    // 注意: 語音識別回調返回結果是一個json格式字符串, 解析起來比較麻煩, 但是我們只需要其中的字符串部分, 這個過程訊飛也覺得麻煩, 就推出了一個工具類, 能將這個josn解析最終字符串返回. 這也是前面導入ISRDataHelper.h的作用.
    NSString * resu = [ISRDataHelper stringFromJson:result];
    self.textView.text = [NSString stringWithFormat:@"%@%@",self.textView.text,resu];
}
// !!!:解析失敗代理方法
-(void)onError:(IFlySpeechError *)error
{
    NSLog(@"解析失敗了");
}

// !!!:語音識別類的回調方法(不帶界面的那個)
- (void) onResults:(NSArray *) results isLast:(BOOL)isLast{
    NSMutableString *result = [[NSMutableString alloc] init];
    NSDictionary *dic = [results objectAtIndex:0];
    for (NSString *key in dic) {
        [result appendFormat:@"%@",key];
    }
    NSString * resu = [ISRDataHelper stringFromJson:result];
    self.textView.text = [NSString stringWithFormat:@"%@%@", self.textView.text, resu];
}

@end

在上面的代碼中, 使用了兩個類:

1 2	#import "ISRDataHelper.h" #import "IATConfig.h"

他們的功能已經在註釋中說明, 那麼這兩個類的源文件怎辦呢… 讓我想想…
算了, 最後我把這個工程傳到git上吧. 你們從哪裏扒下來就好了.

四.語音合成

其實iOS自帶語音合成, 我們不必使用訊飛也可以達到這樣的效果, 下面的代碼能讓你的APP讀出這些文字.
Appdelegate.m中, 添加一個延展, 並且 didFinishLaunchingWithOptions 中添加如下代碼:

@interface AppDelegate ()
@property(nonatomic,strong)AVSpeechSynthesizer * speechSynthesizer; // 合成器
@property(nonatomic,strong)AVSpeechUtterance * speechUtterance; // 合成器所說的內容
@end

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
    // Override point for customization after application launch.
    
    self.speechSynthesizer = [[AVSpeechSynthesizer alloc] init];
    
    self.speechUtterance = [[AVSpeechUtterance alloc] initWithString:@"啪啪啪"];
    
    [self.speechSynthesizer  speakUtterance:self.speechUtterance];
    
    return YES;
}

運行之後, app能讀出”啪啪啪”. 女性發音效果更好.

使用訊飛實現啪啪啪的功能
我們直接在上面的工程裏添加吧.
首先在sb中的viewcontroller裏, 再拖一個textfiled, 我們讓訊飛朗讀textfiled中的內容.

viewController.m, 這個將三個功能寫到了同一個controller中.比較臃腫, 你們自己捋順一下, 封裝成類, 供以後使用.

#import "ViewController.h"
#import <iflyMSC/iflyMSC.h>
#import "ISRDataHelper.h"
#import "IATConfig.h"

#import "PcmPlayerDelegate.h"
#import "PcmPlayer.h"
#import "TTSConfig.h"

typedef NS_OPTIONS(NSInteger, SynthesizeType) {
    NomalType           = 5,//普通合成
    UriType             = 6, //uri合成
};

@interface ViewController ()<IFlyRecognizerViewDelegate, IFlySpeechRecognizerDelegate, IFlySpeechRecognizerDelegate>
// 翻譯好的Text會展示在這個label上.
@property (weak, nonatomic) IBOutlet UILabel *textView;
// 朗讀這裏的內容.
@property (weak, nonatomic) IBOutlet UITextField *VoiceText;

/*!
 *  語音識別控件
 *    錄音時觸摸控件結束錄音，開始識別（相當於舊版的停止）；觸摸其他位置，取消錄音，結束會話（取消）
 *  出錯時觸摸控件，重新開啓會話（相當於舊版的再說一次）；觸摸其他位置，取消錄音，結束會話（取消）
 *
 */
@property (nonatomic,strong)IFlyRecognizerView * iflyRecognizerView;

/*!
 *  語音識別類
 *   此類現在設計爲單例，你在使用中只需要創建此對象，不能調用release/dealloc函數去釋放此對象。所有關於語音識別的操作都在此類中。
 */
@property (nonatomic, strong)IFlySpeechRecognizer * iFlySpeechRecognizer;


@property (nonatomic, strong) IFlySpeechSynthesizer * iFlySpeechSynthesizer;//語音合成對象
@property (nonatomic, strong) PcmPlayer *audioPlayer;//用於播放音頻的
@property (nonatomic, assign) SynthesizeType synType;//是何種合成方式
@property (nonatomic, assign) BOOL hasError;//將解析過程中是否出現錯誤
@end

@implementation ViewController

- (void)viewDidLoad {
    [super viewDidLoad];
    
    // 語音識別視圖空間及配置.
    [self initRecognizerView];
    
    // 語音識別類的初始化及配置.
    [self initSpeechRecognizer];
 
    // 初始化語音合成
    [self initMakeVoice];
}
// !!!:語音識別視圖空間及配置--方法.
-(void)initRecognizerView{
    _iflyRecognizerView = [[IFlyRecognizerView alloc] initWithCenter:self.view.center];
    _iflyRecognizerView.delegate = self;
    [_iflyRecognizerView setParameter: @"iat" forKey: [IFlySpeechConstant IFLY_DOMAIN]];
    //asr_audio_path保存錄音文件名，如不再需要，設置value爲nil表示取消，默認目錄是documents
    [_iflyRecognizerView setParameter:@"asrview.pcm " forKey:[IFlySpeechConstant ASR_AUDIO_PATH]];
}
// !!!:語音識別類的初始化及配置--方法.
-(void)initSpeechRecognizer{
    //單例模式，無UI的實例
    if (_iFlySpeechRecognizer == nil) {
        _iFlySpeechRecognizer = [IFlySpeechRecognizer sharedInstance];
        
        [_iFlySpeechRecognizer setParameter:@"" forKey:[IFlySpeechConstant PARAMS]];
        
        //設置聽寫模式
        [_iFlySpeechRecognizer setParameter:@"iat" forKey:[IFlySpeechConstant IFLY_DOMAIN]];
    }
    _iFlySpeechRecognizer.delegate = self;
    
    if (_iFlySpeechRecognizer != nil) {
        IATConfig *instance = [IATConfig sharedInstance];
        
        //設置最長錄音時間
        [_iFlySpeechRecognizer setParameter:instance.speechTimeout forKey:[IFlySpeechConstant SPEECH_TIMEOUT]];
        //設置後端點
        [_iFlySpeechRecognizer setParameter:instance.vadEos forKey:[IFlySpeechConstant VAD_EOS]];
        //設置前端點
        [_iFlySpeechRecognizer setParameter:instance.vadBos forKey:[IFlySpeechConstant VAD_BOS]];
        //網絡等待時間
        [_iFlySpeechRecognizer setParameter:@"20000" forKey:[IFlySpeechConstant NET_TIMEOUT]];
        
        //設置採樣率，推薦使用16K
        [_iFlySpeechRecognizer setParameter:instance.sampleRate forKey:[IFlySpeechConstant SAMPLE_RATE]];
        
        if ([instance.language isEqualToString:[IATConfig chinese]]) {
            //設置語言
            [_iFlySpeechRecognizer setParameter:instance.language forKey:[IFlySpeechConstant LANGUAGE]];
            //設置方言
            [_iFlySpeechRecognizer setParameter:instance.accent forKey:[IFlySpeechConstant ACCENT]];
        }else if ([instance.language isEqualToString:[IATConfig english]]) {
            [_iFlySpeechRecognizer setParameter:instance.language forKey:[IFlySpeechConstant LANGUAGE]];
        }
        //設置是否返回標點符號
        [_iFlySpeechRecognizer setParameter:instance.dot forKey:[IFlySpeechConstant ASR_PTT]];
    }
}

// !!!:語音合成的初始化
-(void)initMakeVoice{
    TTSConfig *instance = [TTSConfig sharedInstance];
    if (instance == nil) {
        return;
    }
    
    //合成服務單例
    if (_iFlySpeechSynthesizer == nil) {
        _iFlySpeechSynthesizer = [IFlySpeechSynthesizer sharedInstance];
    }
    
    _iFlySpeechSynthesizer.delegate = self;
    
    //設置語速1-100
    [_iFlySpeechSynthesizer setParameter:instance.speed forKey:[IFlySpeechConstant SPEED]];
    
    //設置音量1-100
    [_iFlySpeechSynthesizer setParameter:instance.volume forKey:[IFlySpeechConstant VOLUME]];
    
    //設置音調1-100
    [_iFlySpeechSynthesizer setParameter:instance.pitch forKey:[IFlySpeechConstant PITCH]];
    
    //設置採樣率
    [_iFlySpeechSynthesizer setParameter:instance.sampleRate forKey:[IFlySpeechConstant SAMPLE_RATE]];
    
    //設置發音人
    [_iFlySpeechSynthesizer setParameter:instance.vcnName forKey:[IFlySpeechConstant VOICE_NAME]];
}


- (void)didReceiveMemoryWarning {
    [super didReceiveMemoryWarning];
    // Dispose of any resources that can be recreated.
}

// !!!: push界面的識別,點擊事件
- (IBAction)voiceToText:(id)sender {
    [self.iflyRecognizerView start];
}
// !!!: 觸發語音識別類的點擊事件
- (IBAction)voiceToTextWithoutUI:(id)sender {
    self.textView.text = @"";
    [_iFlySpeechRecognizer cancel];
    
    //設置音頻來源爲麥克風
    [_iFlySpeechRecognizer setParameter:IFLY_AUDIO_SOURCE_MIC forKey:@"audio_source"];
    
    //設置聽寫結果格式爲json
    [_iFlySpeechRecognizer setParameter:@"json" forKey:[IFlySpeechConstant RESULT_TYPE]];
    
    //保存錄音文件，保存在sdk工作路徑中，如未設置工作路徑，則默認保存在library/cache下
    [_iFlySpeechRecognizer setParameter:@"asr.pcm" forKey:[IFlySpeechConstant ASR_AUDIO_PATH]];
    
    [_iFlySpeechRecognizer setDelegate:self];
    
    [_iFlySpeechRecognizer startListening];
}
- (IBAction)speechAction:(id)sender {
    if ([self.VoiceText.text isEqualToString:@""]) {
        return;
    }
    
    if (_audioPlayer != nil && _audioPlayer.isPlaying == YES) {
        [_audioPlayer stop];
    }
    
    _synType = NomalType;
    
    self.hasError = NO;
    [NSThread sleepForTimeInterval:0.05];
    _iFlySpeechSynthesizer.delegate = self;
    [_iFlySpeechSynthesizer startSpeaking:self.VoiceText.text];
}

// !!!:實現代理方法
// !!!:注意有沒有s, 語音識別的結果回調
-(void)onResult:(NSArray *)resultArray isLast:(BOOL)isLast
{
    NSMutableString *result = [[NSMutableString alloc] init];
    NSDictionary *dic = [resultArray objectAtIndex:0];
    for (NSString *key in dic) {
        [result appendFormat:@"%@",key];
    }
    
    // 注意: 語音識別回調返回結果是一個json格式字符串, 解析起來比較麻煩, 但是我們只需要其中的字符串部分, 這個過程訊飛也覺得麻煩, 就推出了一個工具類, 能將這個josn解析最終字符串返回. 這也是前面導入ISRDataHelper.h的作用.
    NSString * resu = [ISRDataHelper stringFromJson:result];
    self.textView.text = [NSString stringWithFormat:@"%@%@",self.textView.text,resu];
}
// !!!:解析失敗代理方法
-(void)onError:(IFlySpeechError *)error
{
    NSLog(@"解析失敗了");
}

// !!!:語音識別類的回調方法
//語音合成回調函數
- (void) onResults:(NSArray *) results isLast:(BOOL)isLast{
    NSMutableString *result = [[NSMutableString alloc] init];
    NSDictionary *dic = [results objectAtIndex:0];
    for (NSString *key in dic) {
        [result appendFormat:@"%@",key];
    }
    NSString * resu = [ISRDataHelper stringFromJson:result];
    self.textView.text = [NSString stringWithFormat:@"%@%@", self.textView.text, resu];
}
@end

科大訊飛--讓你的APP學會說學逗唱

科大訊飛--讓你的APP學會說學逗唱

一.走近訊飛(iFly)

二.搭建環境

登錄開發者平臺

選擇創建新應用:

這裏可以比較隨意填寫, 但是注意平臺別搞錯.

新創建的應用可以在”我的應用”中查看, 開始的時候, 這個應用沒有使用任何SDK, 我們需要向訊飛註冊一下我們的app都需要哪些服務.

點擊”開通更多服務”, 選擇語言聽寫和在線語音合成兩個SDK, 第一個開發語義是自己添加上的.

進入下載SDK界面, 您可以通過諸多位置進入到這裏, 可能與截圖不符, 但沒有問題.

這裏選擇”組合服務SDK下載”, 勾選圖中前兩個.

選擇平臺

最後選擇剛纔創建的引用, 之後點擊下載.

三.代碼+++++

四.語音合成

容器中nginx無法使用同一個網絡下的容器域名

Python: SunMoonTimeCalculator

NETCore中實現一個輕量無負擔的極簡任務調度ScheduleTask

docker使用特定的網絡

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

nodejs學習07——API

避免DbContext同時在多個線程調用

GPT-4o 引領人機交互新風向，向量數據庫賽道沸騰了

基於AFNetworking3.0網絡封裝

科大訊飛--讓你的APP學會說學逗唱

放肆地使用UIBezierPath和CAShapeLayer畫各種圖形

iOS地圖

極光推送

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結