A guide on adjusting the timeout for Azure text to speech silence in JavaScript

Question

A guide on adjusting the timeout for Azure text to speech silence in JavaScript

Currently, I am utilizing Azure SpeechSDK services to convert speech to text transcription using recognizeOnceAsync. The existing code structure is as follows:

var SpeechSDK, recognizer, synthesizer;
var speechConfig = SpeechSDK.SpeechConfig.fromSubscription('SUB_KEY', 'SUB_REGION');
var audioConfig  = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();
recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);
new Promise(function(resolve) {
    recognizer.onend = resolve;
    recognizer.recognizeOnceAsync(
        function (result) {
            recognizer.close();
            recognizer = undefined;
            resolve(result.text);
        },
        function (err) {
            alert(err);
            recognizer.close();
            recognizer = undefined;
        }
    );
}).then(r => {
    console.log(`Azure STT interpreted: ${r}`);
});

In my HTML file, I import the Azure package in the following manner:

<script src="https://aka.ms/csspeech/jsbrowserpackageraw"></script>

My concern is that I wish to prolong the duration of "Silence time" allowed before the recognizeOnceAsync method returns the result. I want to be able to pause and take a breath without the method assuming that speech has ended. Is there a way to achieve this using fromDefaultMicrophoneInput? I have attempted various techniques such as:

const SILENCE_UNTIL_TIMEOUT_MS = 5000;
speechConfig.SpeechServiceConnection_EndSilenceTimeoutMs = SILENCE_UNTIL_TIMEOUT_MS;
audioConfig.setProperty("Speech_SegmentationSilenceTimeoutMs", SILENCE_UNTIL_TIMEOUT_MS);

Unfortunately, none of these methods successfully extend the "silence time allowance" as desired.

For reference, I have been consulting the following resource: https://learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/propertyid?view=azure-node-latest

javascript azure speech-to-text azure-cognitive-services azure-speech

Answer 1

Answer №1

From your explanation, it seems that setting the segmentation silence timeout is necessary. However, a current issue with the JS SDK is causing the

PropertyId.Speech_SegmentationSilenceTimeoutMs

to not be properly configured.

To address this, you can set the segmentation timeout using the following workaround:

const speechConfig = SpeechConfig.fromSubscription(subscriptionKey, subscriptionRegion);
speechConfig.speechRecognitionLanguage = "en-US";

const recognizer = new SpeechRecognizer(speechConfig);
const connection = Connection.fromRecognizer(recognizer);
connection.setMessageProperty("speech.context", "phraseDetection", {
    "INTERACTIVE": {
        "segmentation": {
            "mode": "custom",
            "segmentationSilenceTimeoutMs": 5000
        }
    },
    mode: "Interactive"
});

recognizer.recognizeOnceAsync(
    (result) =>
    {
        console.log("Recognition completed!!!");
        // Handle the recognition result
    },
    (error) =>
    {
        console.log("Recognition failed. Error:" + error);
    });

It's important to note that the segmentation timeout should fall within the range of 100-5000 ms (inclusive).

Answer 2

From your explanation, it seems that setting the segmentation silence timeout is necessary. However, a current issue with the JS SDK is causing the

PropertyId.Speech_SegmentationSilenceTimeoutMs

to not be properly configured.

To address this, you can set the segmentation timeout using the following workaround:

const speechConfig = SpeechConfig.fromSubscription(subscriptionKey, subscriptionRegion);
speechConfig.speechRecognitionLanguage = "en-US";

const recognizer = new SpeechRecognizer(speechConfig);
const connection = Connection.fromRecognizer(recognizer);
connection.setMessageProperty("speech.context", "phraseDetection", {
    "INTERACTIVE": {
        "segmentation": {
            "mode": "custom",
            "segmentationSilenceTimeoutMs": 5000
        }
    },
    mode: "Interactive"
});

recognizer.recognizeOnceAsync(
    (result) =>
    {
        console.log("Recognition completed!!!");
        // Handle the recognition result
    },
    (error) =>
    {
        console.log("Recognition failed. Error:" + error);
    });

It's important to note that the segmentation timeout should fall within the range of 100-5000 ms (inclusive).

A guide on adjusting the timeout for Azure text to speech silence in JavaScript

Answer №1

Similar questions

You can easily dismiss the modal by clicking on any part of the screen, not just the button

The option to "open in new tab" is absent from the right-click menu when clicking a link on a website

Only two options available: validate and hide; no additional options necessary

Tips on getting the Jquery .load() function to trigger just once and executing an Ajax request only once

Using jQuery to verify the presence of an element, especially one that may have been dynamically inserted via AJAX

Failed to retrieve information using a custom header in the HTTP request

How can I log an object definition and text in the same console.log statement?

Overlapping background images of flex elements in Safari

Modifying an image's src using JavaScript is not possible

The combination of jQuery, using .load method in javascript to prevent scrolling up, making XMLHttpRequest requests, updating .innerHTML elements, and troubleshooting CSS/JS

Is there a way to prevent the text in my text boxes from staying there when I refresh the page?

Incorporating an array of JSON into a Mongoose schema in JavaScript

Filtering out strings of a certain length from an array in JavaScript

Experiencing difficulty when attempting to save a zip file to the C drive

Incorporating external files into Javascript code may cause issues with its functionality

Is it possible to choose several classes with identical names and then trigger a shared function simultaneously?

showcasing products from database with the help of Angular 12

Ordering a string of whole numbers using JavaScript

When using a callback function to update the state in React, the child component is not refreshing with the most recent properties

Having trouble displaying the output on my console using Node.js