Using WinHttp to Send Audio File to Whisper on OpenAI for Transcription

dox · 17 May 2024, 13:36

Reference: viewtopic.php?f=76&t=82198#p358662

Background:

viewtopic.php?f=82&t=127698#p571301
https://github.com/kdalanon/ChatGPT-AutoHotkey-Utility/blob/main/ChatGPT%20AutoHotkey%20Utility.ahk
https://platform.openai.com/docs/api-reference/making-requests
https://platform.openai.com/docs/guides/speech-to-text/quickstart

Greetings! Completely new to v2 and trying to accomplish something very practical here but bumping into walls. I simply want to use "native AHK v2" to send an audio file from my hard-drive to OpenAI for quasi-real-time transcription. As we know, so far there hasn't been an "AHK library" among the list of OpenAI's community-maintained libraries.

Nonetheless, folks have used the JXON library and WinHttp to successfully interact with ChatGPT. I am not personally interested in ChatGPT so much. But I would like to get live transcription from OpenAI"s hosted Whisper. My experience with CoLab versions of Whisper and other instances has not been that positive. Transcribing audio from OpenAI itself, however, has been fast and painless. By using the published Python snippets from OpenAI, uploading an audio file is a cinch. It's also not difficult to call the Python script from AHK with a hotkey, etc.

However, I would prefer an "all-AHK" method.

Below are the bare bones that have been tested to work for ChatGPT, which spits out a random quote from Oscar Wilde:

Code: Select all

#Requires AutoHotkey v2.0.2
#SingleInstance
#Include "_jxon.ahk"

API_Key := "sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
API_URL := "https://api.openai.com/v1/chat/completions"
API_Model := "gpt-4"
Prompt := "Give me a random aphorism by Oscar Wilde"
Messages := '{ "role": "user", "content": "' Prompt '" }'

F2::
{

    WHR := ComObject("WinHttp.WinHttpRequest.5.1")
    WHR.open("POST", API_URL, true)
    WHR.SetRequestHeader("Content-Type", "application/json")
    WHR.SetRequestHeader("Authorization", "Bearer " API_Key)
    JSON_Request := '{ "model": "' API_Model '", "messages": [' Messages '] }'
    WHR.SetTimeouts(60000, 60000, 60000, 60000)
    WHR.Send(JSON_Request)

    WHR.WaitForResponse
    try 
    {
        if (WHR.status == 200) 
        {
            SafeArray := WHR.responseBody
            pData := NumGet(ComObjValue(SafeArray) + 8 + A_PtrSize, 'Ptr')
            length := SafeArray.MaxIndex() + 1
            JSON_Response := StrGet(pData, length, 'UTF-8')
            var := Jxon_Load(&JSON_Response)
            JSON_Response := var.Get("choices")[1].Get("message").Get("content")
            MsgBox(JSON_Response)
        } 
    }

}

However, I am unable to get the request for Whisper transcription below to work. I think the hold-up is in the file upload with respect to "multipart/form-data".

Code: Select all

#Requires AutoHotkey v2.0.2
#SingleInstance
#Include "_jxon.ahk"

API_Key := "sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
SAPI_URL := "https://api.openai.com/v1/audio/transcriptions"
;;FilePath := '{ "C:\PathToFile\Recording.m4a" }'
path := "C:\PathToFile\Recording.m4a"
API_Model := "whisper-1"
JSON_Request := '{ "model": "' API_Model '" }'

F2::
{
    SplitPath path, &fileName
    f := FileOpen(path, "r")
    sfArray := ComObjArray(VT_UI1:=0x11, f.length)
    pvData := NumGet(ComObjValue(sfArray) + 8+A_PtrSize, 'Ptr')
    f.RawRead(pvData + 0, f.length)
    f.Close()

    WHR := ComObject("WinHttp.WinHttpRequest.5.1")
;;    WHR.open("POST", SAPI_URL, true)
    WHR.open("PUT", "https://api.openai.com/v1/audio/transcriptions" fileName, true)
    WHR.SetRequestHeader("Content-Type", "multipart/form-data")
    WHR.SetRequestHeader("Authorization", "Bearer " API_Key)
    WHR.SetTimeouts(60000, 60000, 60000, 60000)
;;    WHR.Send(FilePath)
;;    WHR.Send(JSON_Request)
    WHR.Send(sfArray)


    WHR.WaitForResponse
    try 
    {
        if (WHR.status == 200) 
        {
            SafeArray := WHR.responseBody
            pData := NumGet(ComObjValue(SafeArray) + 8 + A_PtrSize, 'Ptr')
            length := SafeArray.MaxIndex() + 1
            JSON_Response := StrGet(pData, length, 'UTF-8')
            var := Jxon_Load(&JSON_Response)
            JSON_Response := var.Get("choices")[1].Get("message").Get("content")
            MsgBox(JSON_Response)
        } 
    }

}

Any ideas on how I might use WinHttp to send audio file through the OpenAI speech-to-text API? If needed I can post an API key for anyone to test with, except OpenAI can be fairly trigger-happy in banning the key if it senses what it considers to be "unusual" pattern of usage.

Thanks in advance!

dox · 18 May 2024, 10:57

Update: going through "WinHttp" is no longer necessary for strictly my purposes at hand. Although from curiosity and backup points of view I'd still like to get it working. But right now it's just for "academic interest".

I suspect I'd need to work things out from the v1 example in here: viewtopic.php?t=68417, which is way beyond my skill level.

In the end, I simply went with cURL. As soon I got the right tips on cURL formatting in Windows shell, it became the easiest of the easy, with Run %ComSpec% and variations.

If it helps anyone: https://github.com/doxgt/PlayGround/blob/main/GPT_cURL.ahk

Using WinHttp to Send Audio File to Whisper on OpenAI for Transcription

Using WinHttp to Send Audio File to Whisper on OpenAI for Transcription

Re: Using WinHttp to Send Audio File to Whisper on OpenAI for Transcription

Who is online