Speech Recognition

BitSmith · 11 Aug 2017, 13:41

Greetings!

I am attempting to incorporate very simple speech recognition into a simple script. Please see https://autohotkey.com/board/topic/9645 ... cognition/ for what I am using to accomplish this.

I have downloaded both the Speech Recognition.ahk from https://gist.github.com/Uberi/6263822
and
SpeechSDK5.1 from https://www.microsoft.com/en-us/downloa ... x?id=10121

When running the ahk file, I am receiving the following error and am unsure as to why.

Error: Could not set Grammar diction state: 0x88890008 -
Source: (null)
Description: (null)
HelpFile: (null)
HelpContext: 0

--> Line 238: Thorw, Exception(Could not set Grammar diction state)

Any help would be MUCH appreciated. Thanks so much!

SOTE · 05 Jan 2018, 19:29

I was looking at this, and found Uberi's script to still work. This can be the basis of speech recognition that links to hotkeys or various commands from your script. However, a person must do some preliminary work.

You likely already have Speech Recognition installed, if you are using Windows 8 or Windows 10. So the step to download SAPI or Speech Platform SDK can be totally unnecessary. Probably best to test if Speech Recognition is already working on your computer.

1) The person should make sure that their microphone is working and plugged in.

2) Check if Speech Recognition is working on your computer by going to the Control Panel and selecting "Start Speech Recognition".
2B) You can also do a search for Speech Recognition; Windows logo key + S, then type Speech Recognition

3) Go through the steps of Start Speech Recognition. You will see a large Windows Speech Recognition microphone at the top center of your desktop.

Note 1- From that point, you have confirmed that Speech Recognition is working properly on your computer. You can now close it. Uberi's script does NOT require the Windows Speech Recognition microphone. It was just being used, in this case, to confirm that Speech Recognition was working.

Note 2- I did notice that Speech Recognition on Windows can be a bit flaky, so you might have to go to task manager and kill it, then restart it again.

4) Uberi's script gives various example code at the top. You can edit one of the examples, so that they will work for testing purposes. Remove the "/* and */"
https://gist.github.com/Uberi/6263822

Example: recognizing a specific list of phrases

Code: Select all

TrayTip, Speech Recognition, Say a number between 0 and 9 inclusive
s := new SpeechRecognizer
s.Recognize(["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"])
Text := s.Prompt()
TrayTip, Speech Recognition, You said: %Text%
Sleep, 3000
ExitApp

BitSmith wrote:Greetings!

I am attempting to incorporate very simple speech recognition into a simple script. Please see https://autohotkey.com/board/topic/9645 ... cognition/ for what I am using to accomplish this.

I have downloaded both the Speech Recognition.ahk from https://gist.github.com/Uberi/6263822
and
SpeechSDK5.1 from https://www.microsoft.com/en-us/downloa ... x?id=10121

When running the ahk file, I am receiving the following error and am unsure as to why.

Error: Could not set Grammar diction state: 0x88890008 -
Source: (null)
Description: (null)
HelpFile: (null)
HelpContext: 0

--> Line 238: Thorw, Exception(Could not set Grammar diction state)

Any help would be MUCH appreciated. Thanks so much!

GeneBene · 05 Jan 2018, 23:47

SOTE wrote:I was looking at this, and found Uberi's script to still work. This can be the basis of speech recognition that links to hotkeys or various commands from your script. However, a person must do some preliminary work.

You likely already have Speech Recognition installed, if you are using Windows 8 or Windows 10. So the step to download SAPI or Speech Platform SDK can be totally unnecessary. Probably best to test if Speech Recognition is already working on your computer.

1) The person should make sure that their microphone is working and plugged in.

2) Check if Speech Recognition is working on your computer by going to the Control Panel and selecting "Start Speech Recognition".
2B) You can also do a search for Speech Recognition; Windows logo key + S, then type Speech Recognition

3) Go through the steps of Start Speech Recognition. You will see a large Windows Speech Recognition microphone at the top center of your desktop.

Note 1- From that point, you have confirmed that Speech Recognition is working properly on your computer. You can now close it. Uberi's script does NOT require the Windows Speech Recognition microphone. It was just being used, in this case, to confirm that Speech Recognition was working.

Note 2- I did notice that Speech Recognition on Windows can be a bit flaky, so you might have to go to task manager and kill it, then restart it again.

4) Uberi's script gives various example code at the top. You can edit one of the examples, so that they will work for testing purposes. Remove the "/* and */"
https://gist.github.com/Uberi/6263822

Example: recognizing a specific list of phrases
Code: Select all
TrayTip, Speech Recognition, Say a number between 0 and 9 inclusive
s := new SpeechRecognizer
s.Recognize(["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"])
Text := s.Prompt()
TrayTip, Speech Recognition, You said: %Text%
Sleep, 3000
ExitApp
BitSmith wrote:Greetings!

I am attempting to incorporate very simple speech recognition into a simple script. Please see https://autohotkey.com/board/topic/9645 ... cognition/ for what I am using to accomplish this.

I have downloaded both the Speech Recognition.ahk from https://gist.github.com/Uberi/6263822
and
SpeechSDK5.1 from https://www.microsoft.com/en-us/downloa ... x?id=10121

When running the ahk file, I am receiving the following error and am unsure as to why.

Error: Could not set Grammar diction state: 0x88890008 -
Source: (null)
Description: (null)
HelpFile: (null)
HelpContext: 0

--> Line 238: Thorw, Exception(Could not set Grammar diction state)

Any help would be MUCH appreciated. Thanks so much!

would you know how I can write what I speak to a text file?

Delta Pythagorean · 06 Jan 2018, 13:45

You can grab what you say by doing this:
You must include the class that's in the main script in a separate file (As used in the #Include line.)

Code: Select all

#NoEnv
#SingleInstance, Ignore
#Warn All
#Warn LocalSameAsGlobal, Off
#Persistent

OnExit, Exit

S := New CustomSpeech(A_ScriptDir "\Log_Speech.log")
S.Recognize(True)
Return

Exit:
	OnExit, % S := "" ; Keep this here so when the program closes, it doesn't go in a loop.
	ExitApp

Class CustomSpeech Extends SpeechRecognizer {
	__New(File) {
		This.FileObj := FileOpen(This.File, "w")
	}

	OnRecognize(Text) {
		; Everytime you say something and finish saying it, it outputs to a file
		FormatTime, Time,, M/d/yyyy h:mm tt
		This.FileObj.Write("`n" Time ": " Text)
	}
}

#Include, %A_ScriptDir%\SpeechRecognizer.ahk

theimmersion · 07 Jan 2018, 08:44

I tried it, its not working well enough sadly. I added a few keywords to recognize but it recognizes some of them while saying some completely different things which makes it useless.

theimmersion · 07 Jan 2018, 08:46

Man, if only I could somehow make cortana run a script while sending the word it recognized as an argument to ahk script. I could literally do anything via speech. Because cortana seems to recognize my word really well. Sadly its so limited it too is useless to me. xD

A_AhkUser · 07 Jan 2018, 14:07

As for me, I had no luck using it since it does not seem to recognize another language than the english one (so btw if anyone know how to set the recognition language, especially programmatically, I would be very grateful for that...). When I need to integrate to my script and need to use programmatically speech recognition capabilities I use this interface [francophone forum] written in ahk and able to communicate with a chrome extension - not having Windows 10 to potentialy use Cortana. I'm a french native speaker but I found it to recognize very well when I speak either russian or spanish for example.

Simple exemple displaying interim results in a traytip and which will run a program if a specific word has been pronunced in the selected recognition language:

Code: Select all

#NoEnv
#SingleInstance ignore
#KeyHistory 0
SetWorkingDir % A_ScriptDir
SendMode, Input
#NoTrayIcon
; #Warn
#Warn, ClassOverwrite, Off ; 1.1.27.00+

#Include %A_ScriptDir%\Class.Dictation.ahk

global Sr
global wordToRecognize := "blabla" ; the word to recognize

if not (Sr:=new Dictation()) {
	MsgBox, 64,, Could not initialize Dictation.
ExitApp
} else Sr.onInterimResult("updateInterimResults"), Sr.onResult("saveToClipboard") ; specify callback functions
Sr.setRecognitionLanguage("Français") ; set the recognition language using its native name (here french)
for __language, __LID in Dictation.LID ; for all available languages...
	Menu, Languages#Language, add, % __language, TrayMenu_Languages#Language
Menu, Tray, add, Language, :Languages#Language
Menu, Languages#Language, Check, % Sr.recognitionLanguage
Menu, Tray, Icon ; shows the tray icon once Dictation interface is loaded
OnExit, handleExit
return

!r:: ; start/stop recognition
	Sr.recognitionToogleState()
	if (ErrorLevel) {
		MsgBox, 64,, Could not interact with Dictation.`r`nThe program will exit.
	ExitApp
	}
return
handleExit:
	Sr := "" ; release the reference to Sr (this will also automatically close the hidden instance of chrome)
ExitApp

TrayMenu_Languages#Language(__itemName, __itemPos, __menuName) {
Menu, % __menuName, Uncheck, % Sr.recognitionLanguage
Sr.setRecognitionLanguage(__itemName)
Menu, % __menuName, Check, % Sr.recognitionLanguage
}

updateInterimResults(__dictation, __lastInterimResult) { ; updateInterimResults callback

	if (__dictation.waitForInterimResultTimeRemaining) { ; If there is still any time left to recognize...
		TrayTip, % A_ScriptName, % __lastInterimResult,, 0x1 ; display any interim result in a traytip
		if (InStr(__lastInterimResult, wordToRecognize)) { ; if the word-to-recognize has been pronunced...
			run % "https://duckduckgo.com/?q=" . wordToRecognize ; ...search it using DuckDuckGo
			__dictation.recognitionToogleState() ; stop the recognition
		}
	} else __dictation.recognitionToogleState() ; otherwise stop the recognition

}
saveToClipboard(__dictation, __result) { ; onResult callback
	clipboard := __result
	TrayTip, % A_ScriptName, Result has been copied to clipboard.
}

07 Jan 2018, 14:50

Not sure if translation is one of the services it triggers ...
[probably OT] [Sirius] and its successor [Lucida] [/probably OT]

scriptor2016 · 07 Jan 2018, 14:55

I've been running this script for years and it works perfectly. Found it somewhere on the forums, doesn't say the author in it. But try it out..

Code: Select all

#Persistent
#SingleInstance
; For voice recognition to work you need Microsoft SAPI installed in your PC, some versions of Windows don't support voice recognition though.
; You may also need to train voice recognition in Windows so that it will understand your voice.

MsgBox, Please wait until voice recognition is activated. You will be notified when that happens. Press OK to continue.

pspeaker := ComObjCreate("SAPI.SpVoice")

;plistener := ComObjCreate("SAPI.SpSharedRecognizer") 

plistener:= ComObjCreate("SAPI.SpInprocRecognizer") ; For not showing Windows Voice Recognition widget.


paudioinputs := plistener.GetAudioInputs() ; For not showing Windows Voice Recognition widget.

plistener.AudioInput := paudioinputs.Item(0)   ; For not showing Windows Voice Recognition widget.

ObjRelease(paudioinputs) ; Release object from memory, it is not needed anymore.

pcontext := plistener.CreateRecoContext()

pgrammar := pcontext.CreateGrammar()

pgrammar.DictationSetState(0)

prules := pgrammar.Rules()

prulec := prules.Add("wordsRule", 0x1|0x20)

prulec.Clear()

pstate := prulec.InitialState()

;================================================================================
; Add your words here:

pstate.AddWordTransition( ComObjParameter(13,0) , "Apple") ; ComObjParemeter(13,0) is value Null for AHK_L
pstate.AddWordTransition( ComObjParameter(13,0) , "Banana") ; ComObjParemeter(13,0) is value Null for AHK_L
;================================================================================



prules.Commit()

pgrammar.CmdSetRuleState( "wordsRule", 1)

prules.Commit()

ComObjConnect(pcontext, "On")

If (pspeaker && plistener && pcontext && pgrammar && prules && prulec && pstate)
   {	
   pspeaker.speak("Voice recognition initialisation succeeded. Available voice commands:")
   
   MsgBox, Available Voice recognition initialisation succeeded. Available Voice Commands:`nApple`nBanana
   
   }
Else 
{
 pspeaker.speak("Starting voice recognition initialisation failed")
 MsgBox, Starting voice recognition initialisation failed 
}
return

OnRecognition(StreamNum,StreamPos,RecogType,Result)
{
	
 
	  
   Global pspeaker    
 
   Msgbox Command Recognised.

   ; Grab the text we just spoke and go to that subroutine
   
   pphrase := Result.PhraseInfo()
 
   sText := pphrase.GetText()
  
   pspeaker.Speak("You said " sText)
   
   MsgBox, Command is %sText%
   
   voice_command = %sText%
   
      
   ; Send voice command to execute a code block.
 
   
   ; check if it is a Label 
   if(IsLabel(voice_command)) 
      gosub, %voice_command% 

   
   ObjRelease(pphrase) ;release object from memory
   ObjRelease(sText)
   
   }

;;;; Voice command Labels
Apple: 
Msgbox, You said Apple
Return 

Banana: 
Msgbox, You said Banana
Return

07 Jan 2018, 15:24

scriptor2016 wrote:I've been running this script for years and it works perfectly. Found it somewhere on the forums, doesn't say the author in it. But try it out..

[...]

The oldest thread that I've found that uses 'your' (AHK_L) code is

[here]. If that's it, the credit would go to our fellow member sp

scriptor2016 · 07 Jan 2018, 15:36

That looks like it's it! Credit to sp for making an awesome voice script

scriptor2016 · 07 Jan 2018, 20:12

Regarding the code that I posted in this topic. When I run the script, it sets my microphone input recording level to '62'. This is a problem, as I need to keep the input level much lower at all times such as, say, '5'. But when I run the script it bumps it back up to '62'.

Would anyone happen to know exactly where in the code it's resetting the microphone level? I've tried cancelling out many of the lines one-by-one in the code to see which one is resetting the mic, but so far no dice.

Here's the code again:

Code: Select all

#Persistent
#SingleInstance
; For voice recognition to work you need Microsoft SAPI installed in your PC, some versions of Windows don't support voice recognition though.
; You may also need to train voice recognition in Windows so that it will understand your voice.

MsgBox, Please wait until voice recognition is activated. You will be notified when that happens. Press OK to continue.

pspeaker := ComObjCreate("SAPI.SpVoice")

;plistener := ComObjCreate("SAPI.SpSharedRecognizer") 

plistener:= ComObjCreate("SAPI.SpInprocRecognizer") ; For not showing Windows Voice Recognition widget.


paudioinputs := plistener.GetAudioInputs() ; For not showing Windows Voice Recognition widget.

plistener.AudioInput := paudioinputs.Item(0)   ; For not showing Windows Voice Recognition widget.

ObjRelease(paudioinputs) ; Release object from memory, it is not needed anymore.

pcontext := plistener.CreateRecoContext()

pgrammar := pcontext.CreateGrammar()

pgrammar.DictationSetState(0)

prules := pgrammar.Rules()

prulec := prules.Add("wordsRule", 0x1|0x20)

prulec.Clear()

pstate := prulec.InitialState()

;================================================================================
; Add your words here:

pstate.AddWordTransition( ComObjParameter(13,0) , "Apple") ; ComObjParemeter(13,0) is value Null for AHK_L
pstate.AddWordTransition( ComObjParameter(13,0) , "Banana") ; ComObjParemeter(13,0) is value Null for AHK_L
;================================================================================



prules.Commit()

pgrammar.CmdSetRuleState( "wordsRule", 1)

prules.Commit()

ComObjConnect(pcontext, "On")

If (pspeaker && plistener && pcontext && pgrammar && prules && prulec && pstate)
   {	
   pspeaker.speak("Voice recognition initialisation succeeded. Available voice commands:")
   
   MsgBox, Available Voice recognition initialisation succeeded. Available Voice Commands:`nApple`nBanana
   
   }
Else 
{
 pspeaker.speak("Starting voice recognition initialisation failed")
 MsgBox, Starting voice recognition initialisation failed 
}
return

OnRecognition(StreamNum,StreamPos,RecogType,Result)
{
	
 
	  
   Global pspeaker    
 
   Msgbox Command Recognised.

   ; Grab the text we just spoke and go to that subroutine
   
   pphrase := Result.PhraseInfo()
 
   sText := pphrase.GetText()
  
   pspeaker.Speak("You said " sText)
   
   MsgBox, Command is %sText%
   
   voice_command = %sText%
   
      
   ; Send voice command to execute a code block.
 
   
   ; check if it is a Label 
   if(IsLabel(voice_command)) 
      gosub, %voice_command% 

   
   ObjRelease(pphrase) ;release object from memory
   ObjRelease(sText)
   
   }

;;;; Voice command Labels
Apple: 
Msgbox, You said Apple
Return 

Banana: 
Msgbox, You said Banana
Return

theimmersion · 09 Jan 2018, 21:20

I just tied the voice recognition from SP and its essentally performing the same as Uberis script. I talk random stuff and trying to use words that dont even remotly sound like apple but it keeps poping up, You said Apple.

Still in search of something that will enable me to do some crazy stuff with ahk and voice recognition.

A_AhkUser · 09 Jan 2018, 23:04

Same here it keeps poping up: 'You said Apple' so that I think it is an advertising not being called one...

theimmersion wrote:Still in search of something that will enable me to do some crazy stuff with ahk and voice recognition.

Did you try the script I provided ? It still works fine for me and I successfully implemented this way voice recognition capabilities in a previous project. I posted an snippet in the forum some time ago that demonstrate how you can even do without the need of a chrome extension. Admittedly, this solution work only using chrome.

theimmersion · 15 Jan 2018, 10:31

Yep, twice.
Didnt work, the first time it tried to open some site that got blocked by some native chrome pop-up blocker and tried it again where it just doesnt start anymore.
It is bad practice on your part for test scripts to use #NoTrayIcon and to open random internet sites, at least without first asking me.
I take it its a test script because its in help forum. You should add #NoTrayIcon but comment it out with ;.
It is also bad practice on my part for not checking it first before i ran it.
Got lots of work and have no time to debug the script, sorry.

But i still think it wont work not even closely to cortanas capabilities. Makes me think, WHY THE F*** didnt they add a better API for it.
Honestly, the capabilities are hindered by 90%. All im asking is one custom word to start the listen procedure and after that pass any recognized words to what ever .exe as a parameter. >:(
Ok, addmiting here, the special word would be Hey GLaDOS, and any word after that would execute this procedure:
Recognized word > run some ahk script with the recognized words as parameter, ahk script would take action based on those parameters.
Something like, Open downloads, open documents, select this folder etc etc. pretty much anything would be possible this way via ahk.

A_AhkUser · 15 Jan 2018, 22:42

theimmersion wrote:Didnt work, the first time it tried to open some site that got blocked by some native chrome pop-up blocker

I guess you talk about an ERR_BLOCKED_BY_CLIENT message. Actually, there's no need to debug the script - @Dictation.__New one can find:

Code: Select all

RegRead, __regKey, HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\App Paths\Chrome.exe
; ...
run % """" . __regKey . """ --app=" . (__url:="chrome-extension://" . Dictation.ID . "/popup.html#progress"),, UseErrorLevel

This run command will lead to this message only in the following two cases:

- Dictation.ID is blank (that is, you did not specify the ID on the top of Class.Dictation).
- The extension was disabled the moment you start the script.

theimmersion wrote:It is bad practice on your part for test scripts to use #NoTrayIcon

Also it is a bad practice to not read comments:

Code: Select all

Menu, Tray, Icon ; shows the tray icon once Dictation interface is loaded

#NoTrayIcon is not a bad practice as long as you make sure you you can show the icon therafter (for example by means of a hotkey). Yet I guess I thought Dictation.__Newshould necessary return since once chrome is launched __New use only some window commands that cannot prevent the script from reaching the Menu, Tray, Icon command (or the ExitApp upon ErrorLevel)... but it appears that one of the command, a WinWait command, lacked its timeout parameter... omission which also explain btw why when you

theimmersion wrote:tried it again [...] it just doesnt start anymore.

since #SingleInstance ignore is specified on the top of the script (the script was still waiting the window)... I admit it is a fault on my part...

In any event, calendar coincidence, the website has got a new skin, I updated the code. Even if I see that you fell in love with Cortana...

Hopefully some ahk superstar could help you in this direction.

Cheers.

theimmersion · 16 Jan 2018, 09:50

Menu, Tray, Icon ; shows the tray icon once Dictation interface is loaded

I did read it but on my part forgot about it afterwards. So wanted to note out.

Sometimes some scripts bug up (code error or my c**ppy laptop swallowing an instruction?, i know... weird) and remaining running without any visual indicator for me to know and kill it.
Dont really want to regularly search through task manager to kill scripts but oh well.

Well, im not really in love with cortana. Im simply interested in whatever could provide good voice recognition. The only reason i want to kinda force cortana is because its native to windows in my case since i am using Win 10. And im focusing on either ahk based functionality or native to the OS. Because permissions, licensing etc is a pain in the a** when i want to release something publicly with third party programs etc. And dont get me started if you would want to add (god forbid) donation to perhaps help with keeping the server up. Know what i mean? And im rather shocked how well cortana works since... ya know... its microsoft.

Could you somehow send me some kind of i*iot proof zip with all needed scripts for test? Lots of work, little time.

Speech Recognition

Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Re: Speech Recognition

Who is online