<<<<<<< HEAD

线性麦克风阵列

20

M240 线性 4 麦可以拾取 5m 以内距离的声源,声源定位角度为 0-180°,不分前后,唤醒角度分辨率为 5°。

线性麦克风阵列拾音是通过固定波束拾取指定角度范围内的声源,并抑制其它角度声源的影响。固定波束指的是模块通过阵列麦克风的不同拾音差异,通过算法滤波输出的某个特定方向音频,波束指向方向仅为拾音方向,并不会影响和改变唤醒时的角度等任何信息。波束示意图如下所示

20

github代码

https://github.com/elephantrobotics/mercury_speech

代码流程

离线唤醒——>录音——>在线语音听写——>在线语音合成——>播放音频

代码运行流程:运行python脚本后,说出唤醒词(小微小微),听到唤醒词后开始录音(默认4s),生成.pcm的音频文件,然后将录音文件上传到科大讯飞在线端,并获取语音听写输出的文字。同时也将对话的文字上传到科大讯飞在线端,合成对话音频并播放出来。

在线语音听写和在线语音合成功能服务需要自行申请。

科大讯飞应用申请

=======

Microphone Array

21

The M240 linear 4-microphone can pick up sound sources within a distance of 5m, with a sound source positioning angle of 0-180°, regardless of front and back, and a wake-up angle resolution of 5°.

The linear microphone array pickup picks up sound sources within a specified angle range through a fixed beam and suppresses the influence of sound sources at other angles. Fixed beam refers to the module using different pickup differences of the array microphones, and outputs a certain direction of audio through algorithm filtering. The beam pointing direction is only the pickup direction, and does not affect or change any information such as the angle when waking up. The beam diagram is shown below.

20

github

https://github.com/elephantrobotics/mercury_speech

Code flow

Offline wake-up -> recording -> online voice dictation -> online speech synthesis -> play audio

Code running process: After running the python script, say the wake-up word (Xiaoxiaoxiaowei), start recording after hearing the wake-up word (default 4s), generate a .pcm audio file, and then upload the recording file to iFlytek online, and get the text output by voice dictation. At the same time, upload the text of the conversation to iFlytek online, synthesize the conversation audio and play it out.

Online voice dictation and online speech synthesis services need to be applied for by yourself.

iFLYTEK Application

5673ade8eb00a28794557125b7d1c8cd332434a9

https://www.xfyun.cn/

<<<<<<< HEAD

登录讯飞开放平台,注册和登录用户,并且实名认证

Log in to the iFlytek open platform, register and log in as a user, and complete real-name authentication.

5673ade8eb00a28794557125b7d1c8cd332434a9

8

9

<<<<<<< HEAD 1.主页面找到控制台,点击进去

10

2.在左上角,找到创建新应用

11

3.根据需求填写应用名称,应用分类和功能描述,并提交

12

4.返回到我的应用,点击刚刚创建的应用

13

5.找到在线语音听写功能,右上角就是该应用的授权信息。

需要注意的是:应用创建时为体验版(免费),应用部分功能的服务量有效期可能为30天或者90天不等,如果服务量需求较多可以额外购买套餐。

1.Find the console on the main page and click in.

10

2.In the upper left corner, find Create New App

11

3.Fill in the application name, application category and function description according to the requirements and submit

12

4.Return to My Apps and click on the app you just created.

13

5.Find the online voice dictation function, and the authorization information of the application is in the upper right corner.

It should be noted that: when the application is created, it is a trial version (free), and the service validity period of some functions of the application may vary from 30 days to 90 days. If the demand for service volume is large, you can purchase additional packages.

5673ade8eb00a28794557125b7d1c8cd332434a9

https://www.xfyun.cn/services/voicedictation

14

<<<<<<< HEAD

6.在线语音合成申请英文发音人授权,同样的申请的语音合成也为体验版,如果服务量需求较多可以额外购买套餐。

6.Apply for English speaker authorization for online speech synthesis. The speech synthesis applied for is also a trial version. If the demand for services is large, you can purchase additional packages.

5673ade8eb00a28794557125b7d1c8cd332434a9

https://www.xfyun.cn/services/online_tts

15

16

17

<<<<<<< HEAD 7.需要获取APPID,APISecret,APIKey,中英文参数,语音合成音频参数

SpeechManager 类用于管理语音相关的功能,包括语音唤醒、在线语音识别、在线语音合成、录音和播放 PCM 文件。

参数:
- APPID: 应用的 APPID,用于 API 调用。
- APISecret: API 秘钥,用于身份验证。
- APIKey: API 密钥,用于 API 授权。
- Recognition_BusinessArgs: 在线语音识别的业务参数,用于指定识别的行为。
- Synthesis_BusinessArgs: 在线语音合成的业务参数,用于指定合成的行为。
- port: 串口端口号 (默认值为 '/dev/ttyACM0'),用于离线唤醒功能。
- baudrate: 串口的波特率 (默认值为 115200),用于离线唤醒的串口通信。
=======
7.You need to obtain APPID, APISecret, APIKey, Chinese and English parameters, and speech synthesis audio parameters.

The SpeechManager class is used to manage speech-related functions, including speech wake-up, online speech recognition, online speech synthesis, recording, and playing PCM files.

Parameters:

  • APPID: APPID of the application, used for API calls.
  • APISecret: API secret key, used for authentication.
  • APIKey: API key, used for API authorization.
  • Recognition_BusinessArgs: Business parameters for online speech recognition, used to specify the behavior of recognition.
  • Synthesis_BusinessArgs: Business parameters for online speech synthesis, used to specify the behavior of synthesis.
  • port: Serial port number (default value is '/dev/ttyACM0'), used for offline wake-up function.
  • baudrate: baud rate of the serial port (default value is 115200), used for serial communication of offline wake-up.

    5673ade8eb00a28794557125b7d1c8cd332434a9 ```

18

<<<<<<< HEAD

环境搭建

1.拉取代码

Environment Construction

1.Pull code

5673ade8eb00a28794557125b7d1c8cd332434a9

git clone https://github.com/elephantrobotics/mercury_speech.git

<<<<<<< HEAD

2.终端输入下面指令,安装依赖的文件

2.Enter the following command in the terminal to install the dependent files.

5673ade8eb00a28794557125b7d1c8cd332434a9

sudo apt-get install portaudio19-dev python3-dev libportaudio2
cd ~/mercury_speech/
pip install -r requirement.txt

<<<<<<< HEAD

代码示例

1.检测一下哪个是语音模块串口

终端输入cutecom,打开串口助手,查看ttyACM*的串口设备

如果出现/dev/ttyACM* permission denied,权限不够的情况下,可以输入

Code Sample

1.Check which is the voice module serial port

Enter cutecom in the terminal, open the serial port assistant, and check the serial port device of ttyACM*

If /dev/ttyACM* permission denied appears, you can enter

5673ade8eb00a28794557125b7d1c8cd332434a9

sudo chmod 777 /dev/ttyACM*

<<<<<<< HEAD

如果不想每次开机都操作一遍,可以输入下面指令,username修改为自己的用户名

If you don't want to do this every time you start the computer, you can enter the following command and change username to your own username.

5673ade8eb00a28794557125b7d1c8cd332434a9

sudo usermod -aG dialout username

<<<<<<< HEAD

语音模块会500ms固定发出“a5 01 01 04 00 00 00 a5 00 00 00 b0”的握手请求。那么ttyACM3就是语音模块。

22

2.修改main.py文件终端port串口并保存

3.点击桌面右上角,找到系统设置,要选中麦克风和扬声器设备。

输出设备是:Analog Output - USB Audio Device

输入设备是:Multichannel Input - XFM-DP-V0.0.18

25

3.终端输入python main.py运行

The voice module will send a handshake request of "a5 01 01 04 00 00 00 a5 00 00 00 b0" every 500ms. Then ttyACM3 is the voice module.

22

2.Modify the main.py file terminal port serial port and save it.

3.Click the upper right corner of the desktop, find the system settings, and select the microphone and speaker devices.

The output device is: Analog Output - USB Audio Device

The input device is: Multichannel Input - XFM-DP-V0.0.18

25

4.Enter python main.py in the terminal to run.

5673ade8eb00a28794557125b7d1c8cd332434a9

from speechmanager import SpeechManager

if __name__ == "__main__":
    manager = SpeechManager(APPID='ea8d6b60', 
                            APISecret='YjcyY2M4NDk0Y2Q4ODY2ZTMxYzk3Y2E3',
                            APIKey='1bc296f114a83f3f37db4f8ab93837d4',
                            Recognition_BusinessArgs = {"domain": "iat", "language": "zh_cn", "accent": "mandarin", "vinfo":1,"vad_eos":10000},     #使用中文在线语音听写
                            # Recognition_BusinessArgs = {"domain": "iat", "language": "en_us", "accent": "mandarin", "vinfo":1,"vad_eos":10000},   #使用英文在线语音听写
                            Synthesis_BusinessArgs = {"aue": "raw", "auf": "audio/L16;rate=16000", "vcn": "xiaoyan", "tte": "utf8"},                #使用中文在线语音合成
                            # Synthesis_BusinessArgs = {"aue": "raw", "auf": "audio/L16;rate=16000", "vcn": "x4_enus_luna_assist", "tte": "utf8"},  #使用英文在线语音合成
                            port = '/dev/ttyACM3',  # 根据实际情况调整串口名称
                            baudrate = 115200
                            )

    while True:
        # 检查是否有新的唤醒信息
        wakeup_info = manager.get_wakeup_info()
        if  wakeup_info:
            print("Wake word detected!")
            # 处理唤醒信息,例如打印
            # print(json.dumps(wakeup_info, indent=4))  # 打印抓取的JSON数据
            # print(f"Wakeup information: {wakeup_info}")
            print(f"Result: {wakeup_info['content']['result']}")
            print(f"Info: {wakeup_info['content']['info']}")

            manager.start_recording(4,'/home/elephant/r818.pcm')  # 开始录音4s,并保存为r818.pcm音频文件(绝对路径)

            transcribed_text = manager.online_speech_recognition('/home/elephant/r818.pcm') # 在线语音听写,上传r818.pcm音频文件到科大讯飞在线端

            print(f"Transcribed Text: {transcribed_text}") #打印语音听写生成的文字
<<<<<<< HEAD

            manager.online_speech_synthesis('好的,你的录音我已听到','/home/elephant/reply.pcm') #在线语音合成,上传文字到科大讯飞在线端,并保存reply.pcm音频文件

=======

            manager.online_speech_synthesis('好的,你的录音我已听到','/home/elephant/reply.pcm') #在线语音合成,上传文字到科大讯飞在线端,并保存reply.pcm音频文件

>>>>>>> 5673ade8eb00a28794557125b7d1c8cd332434a9
            manager.play('/home/elephant/reply.pcm') # 播放语音合成的音频文件

            manager.play('/home/elephant/r818.pcm')  # 播放录音的音频文件

19

<<<<<<< HEAD

运行效果

=======

Operation effect

5673ade8eb00a28794557125b7d1c8cd332434a9

23

24

<<<<<<< HEAD

功能接口描述

online_speech_recognition(AudioFile)

  • 在线语音识别

​ param AudioFile: 录音文件路径

​ return: 识别后的文本

online_speech_synthesis(Text,pcm_file)

  • 在线语音合成

​ param Text: 待合成的文本

​ param pcm_file: 合成后保存的文件名

play(filename)

  • 播放指定的 PCM 文件

    param filename: PCM 文件路径

start_recording(TIME, pcm_file)

  • 设置录音参数并验证输入的录音时长

​ param TIME: 录音时长,单位为秒,必须在 0 到 60 秒之间 (默认值为 4 秒)

​ param pcm_file: 保存录音的文件名 (默认文件名为 'r818.pcm')

get_wakeup_info()

  • 获取唤醒的信息,如果有新数据,则返回唤醒信息,并重置事件状态。

​ return 唤醒信息

Functional interface description

online_speech_recognition(AudioFile)

  • Online Speech Recognition

​ param AudioFile: Recording file path

​ return: Recognized text

online_speech_synthesis(Text,pcm_file)

  • Online speech synthesis

​ param Text: Text to be synthesized

​ param pcm_file: The name of the file saved after synthesis

play(filename)

  • Play the specified PCM file

    param filename: PCM file path

start_recording(TIME, pcm_file)

  • Set recording parameters and verify the input recording duration

​ param TIME: The recording duration, in seconds, must be between 0 and 60 seconds (the default value is 4 seconds)

​ param pcm_file: The file name to save the recording to (the default file name is 'r818.pcm')

get_wakeup_info()

  • Get the wake-up information. If there is new data, return the wake-up information and reset the event status.

​ return: wake_result

5673ade8eb00a28794557125b7d1c8cd332434a9

results matching ""

    No results matching ""