Openai Whisper - Multipart/form-data that is binary

Started by bottomleypotts, July 31, 2024, 05:58:49 AM

Previous topic - Next topic

bottomleypotts

Just curious if anyone has done something like this before and has a solution that does not involve building a gawd awful stream.

According to openai, to use whisper I can use a curl command like:

curl https://api.openai.com/v1/audio/translations \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/german.m4a" \
  -F model="whisper-1"

I cannot work out how I am going to easily create this in winbatch. Any help would be appreciated.

td

It can be done with either COM Automation or the WinInet extender. The following is a very rough guess and will not work. Hopefully, you can correct what needs to be corrected to make it usable if you decide to use COM Automation.

objHttp=CreateObject('MSXML2.XMLHTTP.6.0')

url = "https://api.openai.com/v1/audio/translations"
objHttp.Open('POST',url,@FALSE)
objHttp.SetRequestHeader('Authorization','Bearer $OPENAI_API_KEY')
objHttp.SetRequestHeader('Content-Type','multipart/form-data')

; 'm not sure about the following send line...
objHtt.Send('file=@/path/to/file/german.m4a:':'&':'model=whisper-1')
While objHttp.ReadyState!=4
   objHttp.WaitForResponse(100)
EndWhile

; Get either text or full response body...
strResponse = objHttp.responseText()
;or full response body...
hBuf = BinaryAllocArray(objHttp.ResponseBody)
BinaryWrite(hBuf, 'OpenAiresponse.m4a') ; Choice the appropriate file name and extension.
BinaryFree(hBuf)

; Do something with the response...
exit
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

bottomleypotts

Does an MSXML2.XMLHTTP.6.0 object really have the ability to read in a binary file by substitution like CURL and submit it as data using the Content-Type multipart/form-data according to RFC2388? I was expecting to have to build this part entirely using an ADODB.Stream object that I had to build the long way1

td

Yes, it does. Binary data is returned as a COM safearray of VT_UI1 values. Of course, you can always create an ADODB.Stream object if you really want to but it is not necessary. The BinaryAllocArray function understands placing VT_UI1 safearrays into binary buffers without data loss.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

Forgot to mention that BinaryAllocArray can handle more than just arrays with elements of type VT_UI1.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

bottomleypotts

Well as I thought, no MSXML2.XMLHTTP.6.0 does not do any substitution. The send command is totally wrong. I actually want to upload the file at "/path/to/file/german.m4a" as a variable called "file", in a multipart/form-data.

-F file="@/path/to/file/german.m4a"
The -F part of curl will literally upload the file with the variable of 'file' at the path, not just send through the text.

td

As I originally stated, you would need to make modifications to get it to work. Note the comment "'m not sure about the following send line...".

Here is a link to a Stackoverflow example that illustrates uploading mutilpart form data.

https://stackoverflow.com/questions/43130609/need-help-to-post-multipart-form-data-using-the-post-method-in-vba

There is always the WinInet extender. If you scroll down a bit you can find a multipart post example using WinInet

"https://techsupt.winbatch.com/webcgi/webbatch.exe?techsupt/nftechsupt.web+Tutorials+HTTP~and~WinInet~-~An~Opus.txt

You can also use WinBatch CLR hosting.

My suggestions are just a starting point. You may need to try a few things and think through the problem as the intention is not to provide a canned anwer.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

bottomleypotts


bottomleypotts

As promised.

; Function to convert text to binary
#DefineFunction TextToStream(text)
stream=CreateObject(`ADODB.Stream`)
stream.Type=2 ; Text
stream.Charset=`ascii`
stream.Open
stream.WriteText(text)
stream.Position=0
stream.Type=1 ; Binary
Ret=stream.Read
stream.Close
Return Ret
#EndFunction


; Constants
cForReading=1
cBoundary=`----FormBoundary7MA4YWxkTrZu0gW`

; Set the API key and file path
apiKey=`YOUR API KEY`
fileName=`toTranscribe.mp3`
apiUrl=`https://api.openai.com/v1/audio/transcriptions`
responseFileName=`response.txt`

; Create the request body parts
header=`--`:cBoundary:@CrLf
header:=`Content-Disposition: form-data; name="file"; filename="`:fileName:`"`:@CrLf
header:=`Content-Type: audio/mpeg`:@CrLf:@CrLf

footer=@CrLf:`--`:cBoundary:@CrLf
footer:=`Content-Disposition: form-data; name="model"`:@CrLf:@CrLf
footer:=`whisper-1`:@CrLf
footer:=`--`:cBoundary:@CrLf
footer:=`Content-Disposition: form-data; name="timestamp_granularity[]"`:@CrLf:@CrLf
footer:=`segment`:@CrLf
footer:=`--`:cBoundary:@CrLf
footer:=`Content-Disposition: form-data; name="response_format"`:@CrLf:@CrLf
footer:=`verbose_json`:@CrLf
footer:=`--`:cBoundary:`--`

; Initialize the ADODB Stream object for reading the file
objStream=CreateObject(`ADODB.Stream`)
objStream.Type=1 ; Binary
objStream.Open
objStream.LoadFromFile(fileName)

; Read the binary data from the file
fileContent=objStream.Read()
objStream.Close

; Combine the request parts into a single stream
objCombinedStream=CreateObject(`ADODB.Stream`)
objCombinedStream.Type = 1 ; Binary
objCombinedStream.Open
objCombinedStream.Write(TextToStream(header))
objCombinedStream.Write(fileContent)
objCombinedStream.Write(TextToStream(footer))

; Initialize the WinHttpRequest object
objHttp=CreateObject(`WinHttp.WinHttpRequest.5.1`)

; Open the HTTP connection
objHttp.Open(`POST`,apiUrl,@False)
objHttp.Option(9)=2560

; Set the headers
authStr=`Bearer `:apiKey
objHttp.SetRequestHeader(`Authorization`,authStr)
contentType=`multipart/form-data; boundary=`:cBoundary
objHttp.SetRequestHeader(`Content-Type`,contentType)

; Set timeouts to infinity
objHttp.SetTimeouts(0,0,0,0)

;objHttp.SetProxy(2,`127.0.0.1:8888`,``)

; Send the request with the combined stream data
objCombinedStream.Position=0
objHttp.Send(objCombinedStream.Read())

; Get the response
responseText=objHttp.ResponseText

Rc=FilePut(responseFileName,responseText)

Exit

spl

I applaud you! Although I have no need to work with Whisper in WB, it is nice when a user brings in a new direction for WB. Not sure there might be debates over .NET streaming / Binary functions vs. ADODB.Stream, but ADODB is the ticket if one knows how to user it.

Stupid question, but it is there a size limit for the recorded text? I think in MS Azure it is 25mg?
Stan - formerly stanl [ex-Pundit]

bottomleypotts

That was the purpose of the initial question. I would be interested in knowing how else to address the solution.

And yes, 25 meg file limit. It's rather accurate. I would like to compare Whisper with Google Clouds Speech-to-text.

spl

Quote from: bottomleypotts on August 09, 2024, 09:33:13 PMThat was the purpose of the initial question. I would be interested in knowing how else to address the solution.

I stand to be fact checked, but I believe adodb.stream does not perform well with Unicode, while .NET streamreader/streamwriter [both work with WB CLR] permits encoding. You can search 'streamreader' in this section for threads with code examples.

As for the 25mg limit, there is PyDub (which you probably already know about) for chunking larger audio files, or may be possible with .NET Stream classes.
Stan - formerly stanl [ex-Pundit]

td

Quote from: bottomleypotts on August 09, 2024, 09:33:13 PMThat was the purpose of the initial question. I would be interested in knowing how else to address the solution.

If I recall correctly, you should be able to use binary buffers or WIL arrays to POST data using the "Send" method if you convert the data to a Safearray before dropping it into the method. When I get the time I will try to verify that.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade