Currently, I am utilizing Azure SpeechSDK services to convert speech to text transcription using recognizeOnceAsync
. The existing code structure is as follows:
var SpeechSDK, recognizer, synthesizer;
var speechConfig = SpeechSDK.SpeechConfig.fromSubscription('SUB_KEY', 'SUB_REGION');
var audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();
recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);
new Promise(function(resolve) {
recognizer.onend = resolve;
recognizer.recognizeOnceAsync(
function (result) {
recognizer.close();
recognizer = undefined;
resolve(result.text);
},
function (err) {
alert(err);
recognizer.close();
recognizer = undefined;
}
);
}).then(r => {
console.log(`Azure STT interpreted: ${r}`);
});
In my HTML file, I import the Azure package in the following manner:
<script src="https://aka.ms/csspeech/jsbrowserpackageraw"></script>
My concern is that I wish to prolong the duration of "Silence time" allowed before the recognizeOnceAsync
method returns the result. I want to be able to pause and take a breath without the method assuming that speech has ended. Is there a way to achieve this using fromDefaultMicrophoneInput
? I have attempted various techniques such as:
const SILENCE_UNTIL_TIMEOUT_MS = 5000;
speechConfig.SpeechServiceConnection_EndSilenceTimeoutMs = SILENCE_UNTIL_TIMEOUT_MS;
audioConfig.setProperty("Speech_SegmentationSilenceTimeoutMs", SILENCE_UNTIL_TIMEOUT_MS);
Unfortunately, none of these methods successfully extend the "silence time allowance" as desired.
For reference, I have been consulting the following resource: https://learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/propertyid?view=azure-node-latest