How To Add OpenAI Speech to Text To Your Bubble App (Complete Guide)

If you want to build your own speech-to-text feature using OpenAI’s Whisper and Bubble App, you are in the right place. This hands-on tutorial will let you discover the exact steps needed to build an app that empowers users to effortlessly convert audio recordings into accurate text transcriptions. 

Imagine the possibilities; from transcribing crucial meetings and calls to developing your personalized AI-driven note-taking tool, the potential applications are endless.

The best part is, you don’t need any prior coding experience to embark on this journey. We’re here to guide you every step of the way, ensuring a smooth and stress-free learning experience.

The steps to integrate the OpenAI Speech to Text with Bubble App include:

  1. Generating your own OpenAI API keys
  2. Creating an API call from Bubble to OpenAI
  3. Designing the UI of your Bubble app
  4. Building the workflows that power the feature
  5. Setting up your Bubble database
  6. Creating an audio player

Full Transcript of Tutorial

1. Generating your own OpenAI API key

Before creating a custom API call, we’ll need to first create a connection between our Bubble app and OpenAI.

To do this, we’re going to head over to our Bubble editor, then open the plugins tab. Inside the plugins library, you’ll need to install the free ‘API Connector’ plugin, which is built by Bubble. 

After installing the plugin, we’ll need to add a new API within our Bubble app. When it comes to this API, we’re going to follow the instructions laid out within the documentation that OpenAI provides. So, within our checklist, we’ve added a link to a documentation page for the specific API we’re going to use today, which is the ‘Speech-to-Text’ API, also known as ‘OpenAI Whisper’. 

If we scroll down, you’re going to see the ‘Quickstart’ option here for the Transcriptions API, which is exactly what we want to work with. If you’re viewing this API, you can choose to view this as a Curl or in Python. In our opinion, it’s easier to work with a Curl. There are other benefits to this, like being able to import this Curl directly into Bubble. Today, however, we’re going to show you how we can do this step by step from scratch. 

So, essentially, what we need to do is move all of this information over into our Bubble app so we can easily start to communicate with OpenAI. 

Back in our Bubble editor, we’re going to create our very first API. And we’re going to call this ‘OpenAI’ because this is the service that we’re connecting to. When you’re creating APIs, the name you add here is going to be the name of the platform you’re connecting to, not the name of the specific API service. 

Further down, you’ll see that you can name an API call. In the part of our tutorial, this is what we would call the ‘Whisper API’ or the ‘Speech-to-Text API’.

For this API, Bubble needs to recognize where it connects to. So, we need to create what’s known as an ‘authentication’. Today, we’re going to select the option known as “Private Key in Header”. 

And how do we know to select this option? When you’re working with APIs, most of the time, this is the option you’ll choose. It’s standard practice within the industry. But what we can also see in our documentation are the parameters known as ‘Header’. 

This is the section where OpenAI wants us to connect an authorization using our API key. And look, if you’re brand new to Bubble, we completely understand if everything we just said sounded like complete jargon.

Essentially, if we have two services -Bubble and OpenAI – they’ll be able to communicate to each other between platforms. This means that the term ‘key’ is kind of like opening a door so the two services can to talk to each other.

What you’ll notice is the word ‘authorization’ appears in the header of the API documentation. If you re-open your Bubble editor, it now displays the same word within an input field. Next to this field, you’ll need to paste in your OpenAI API key. 

In order to source your API key, select the ‘API keys’ tab within your OpenAI account. You can then create a new secret key, then copy that value across to your Bubble editor. 

Over in the documentation, you can see that our “token” here, which is known as our API key, right before that, has the word ‘Bearer’ followed by a space. 

Once again, adding the word ‘Bearer’ before an API key is a pretty standard industry practice. It just means that you’re the person who bears the API key, so you’re the owner of it. So, we’ll type in the word ‘Bearer’ with a capital ‘B’ and then a space. And then from here, you’re going to paste in your secret key. So, we’re going to paste our key in. And that is all we’ll need to change for this point in time. 

What this means by adding our ‘Authorization’ key to the overall API is that for every single API call we create, we don’t need to reauthorize it. It’s going to use the universal authorization that we’ve just created here. Because in the one service of OpenAI, there are plenty of different API calls you could reference. There are things like: 

  • Text generation, 
  • Image generation, 
  • The speech-to-text, 
  • Text-to-speech, and so on. 

So, if you didn’t add this ‘Authorization’ key at the top level, you would essentially need to have to create it for every single individual API call that you add in. The other thing you may have noticed is that in our documentation, there was another key here or a headed key, we should say, known as the ‘Content-Type’. 

You could theoretically add this at the top level for your API. But we like to read that one on every single API call just because when it comes to the ‘Content-Type’ of each API call you use, it’s going to be different. So, if you’re working with images, you might need to leverage files, whereas if you’re working with text generation, you’re going to need to use text. 

So, the content format will be different for every single use case. So, instead of adding that as a universal ‘Content-Type’ on the overall API, we like to add that on every single individual API call. And so that’s exactly what we’re going to do. We’re now going to create our very first API call which will allow us to send some data through to OpenAI and have it return some information. So, in our use case today, we’re going to send it an audio file, and it’s going to return text. 

2. Creating an API call from Bubble to OpenAI

After connecting both platforms, we’ll now need to create an API call. When it comes to this API call, we’ll name this ‘Whisper (Speech-to-Text)’. Feel free to call this whatever you like. It doesn’t matter, but this is just the naming convention we’re going to use today. 

From here, we need to make a few minor tweaks. When it comes to API calls, you can choose to either receive data from a service or send data to a service. So, if you’re receiving data from a service, it’s known as the ‘Data’ option, whereas if you’re sending data to a service, it’s known as an ‘Action’. And so obviously in this case today, what we’d like to do is send some data to OpenAI, and that data is going to be an audio file that we record.

Then, for the next field, which is the ‘Data type’, we’re just going to leave that as “JSON”. But below this, you’ll see an input field, which is very important. This is where we need to add the address for this API service. And if we jump back over to our OpenAI documentation, you’ll see right at the top, that there’s a URL. What we need to do is highlight this, make a copy, jump back into Bubble, and paste this in. This URL tells Bubble where this API lives on the internet. So, it’s kind of like the postal address.

And so what we want to do today is write a letter and post it to that address. And inside that letter is going to be an audio file, and we’re going to say, “Hey when you receive this, can you please turn this audio into text?”.

And it will then return a letter with all of that text, but of course, in the backend, it’s going to be a whole lot more technical than that. That then leads us to our next point. We just need to update the way we’re going to work with this API. So, we’re not going to get data from this API. Instead, we’ll post data to this API. The data we’ll be posting is, of course, that audio file. And this is all we’ll need to change for this little top section. 

If we scroll down the page, you’ll see the option to add a header. This is what we were referring to before when we mentioned that you could add specific header values for each individual API call. So, in our documentation, there was the header value known as the ‘Content-Type’. Instead of adding that to the overall API itself, we’ll now add this to our individual API call. 

To do this, we’ll make a copy of this text, then revert back to Bubble and add a new header. We’re now going to paste that in as the header key. Then for the value, we just need to jump back into the documentation and copy across the value that’s provided. So, this is the ‘multipart/form-data’. 

We’ll re-open Bubble, then paste that into the relevant field. It was super easy, but we also just need to uncheck this option to make this ‘private’, which is just going to make it easier to access. 

This is the only header value we’ll need to add on this specific API call because, of course, we’ve already added the ‘Authorization header’ to the overall API. 

So, now that we’ve added both headers, we can move down and update the next field. This field is known as the ‘body type’.

When looking at the OpenAI documentation, it mentions that the body type should be set as ‘multipart/form-data’. So this means we’ll select the option from the dropdown menu listed as ‘From-data’. Now, we can add in all of the necessary parameters.

Parameters are just a fancy way of saying what information we’re going to send through an API call. In our API today, there’s going to be two pieces of data. OpenAI actually lays these out for us within the documentation. These parameters include an audio file. This parameter will need to send the URL of where an audio file lives on the internet. When creating this parameter, we’ll need to call this ‘file’.

If you’re brand new to working with APIs in Bubble, one thing you should note is that you’ll need to add a sample value for this parameter before you configure this API. This means that when you run a preview of this, it has some information to use within a demo. If we’re sending through an audio file, we now need to create one.

If you’ve got something like an MP3 on your device, you can upload this. But if you don’t, what you can do is record a quick audio file. For instance, we’re currently using a Mac. If we up the voice memos app, we can easily create a quick audio recording. We’re going to say something like,

“Hello, YouTube. Today, we’re going to teach you how to integrate the OpenAI Whisper API with Bubble’s no-code tool”. 

Once recorded, we can then upload this audio to the file uploader provided in Bubble. Once uploaded, we’ll then uncheck the box which makes the parameter private. This will ensure that we can modify this parameter when building our workflows.

When it comes to our next paramater, we’ll need to determine which OpenAI model we’d like to use on our audio file. In our tutorial today, the model will be ‘Whisper-1’. Now, obviously in the future, as OpenAI releases new updates, you might want to leverage Whisper-2, 3, 4, and who knows where we’ll get – maybe 99. 

But for today, we’ll just highlight this value here known as Whisper-1, copy that, jump back over into Bubble, add a new parameter, and we’re going to call this the “model” spelled in all lowercase letters. 

For the value of this, we’ll paste in the specific model, which, of course, is ‘whisper-1’. Like before, we’ll also need to uncheck the ‘private field’. And this is now the fun part. We are entirely finished setting up our API integration. 

In order to double-check our work, we’re going to click the ‘initialize’ button, then Bubble will confirm if our API call is successful. If it is, it will display a popup with all of the successful configurations.

If, for some reason, Bubble displays an error message saying that you don’t have enough credit, what you’ll need to do is just go into your OpenAI account under your usage tab and buy some credits. They do give you a couple of free API calls, but if you are testing multiple APIs, you will need to put something like 5 or $10 on your account. Thankfully, the API calls are fairly cheap anyway. But that’s just something that we wanted to point out. From here, though, we’re going to choose to save this API, and we are done.

3. Designing the UI of your Bubble app

Now look, from here, this is where the fun part actually starts. We can now design the Bubble app. In our opinion, playing around with APIs isn’t too sexy, but building workflows and interfaces in Bubble definitely is. Let’s jump back into Bubble and open up our design tab.

Now, ladies and gentlemen, we’d like to introduce you to the grand opening of ‘Transcrbify’, our new SaaS that takes any voice notes or meeting notes for that matter, and just transcribes those into text. This, of course, is just one small use case for transcribing audio. 

We’re sure you probably have your own app where you’d like to integrate this, but for now, we’re just going to quickly walk you through how we’ve set this up. And more importantly, we’re going to show you how to build out the workflows to connect with that API we just created.

First of all, we just want to walk you through the elements on our page. We obviously have some text headings at the top. We’ve got a button that displays the words ‘Start recording’. And below this, we’ve just added an audio recorder element. So this is a plugin we had to install (it’s a free plugin built by Bubble). Over in our ‘Plugins’ tab, we’ve got the ‘Audio recorder and visualizer’ plugin – which again is built by Bubble. It’s entirely free. We’ve added this to our page, and when this button is clicked, we’ve selected to start recording in this particular audio recorder. So if we quickly just select this button here and open up the workflow, what you’ll see is that we have two separate workflows. 

4. Building the Workflows That Power the Feature

Within our first workflow, you’ll see that we’re just starting our audio recorder. If we wanted to rebuild that from scratch, this is what it would look like. 

We would choose to run a workflow whenever an ‘element is clicked’. The element would be our ‘start recording’ button, and when it comes to the workflow, we would just type in the word ‘start,’ and we’re going to choose to start/stop a recorder. As we only have one audio recorder on our page, it’s automatically going to reference that element.

But the thing about working with this audio recorder is that starting and stopping are the exact same actions. So we just need to be specific and help Bubble recognize when it needs to start and when it needs to stop it. And so that’s why what we need to do is add a condition on our workflow trigger. 

We only want this audio recorder to start if it obviously isn’t already recording something. So on our workflow trigger, we’re going to create a condition, and we’re only going to allow this to run when our audio recorder elements ‘is recording’ status is currently ‘no’, meaning it is not recording and it should, in fact, start recording.

Now, for the sake of our tutorial, we’ve also updated the workflow color to green because green means go, and we’re going to start recording that audio. Now look, we’re going to delete that there because the next workflow is where a little bit of the magic is going to start to happen. 

So we’ve created an exact replica of our first workflow, but when it comes to the condition, we’ve made this the inverse option. So when the audio recorder status is, in fact, ‘yes’, not ‘no’, and of course, this trigger is when the ‘button is clicked’. We’ll choose to start the audio recorder, that’s fine, but from here, we’re then going to upload that audio file into our database.

5. Setting up your Bubble database

Now we should also show you how we’ve set up our database. It’s super straightforward. Over in our ‘Data’ tab here, we have a data type known as ‘audio recording,’ and inside of this data type, we have two fields. 

Number one, there’s the ‘Audio file’, and this is set as a file type. Then there’s the ‘Transcription’, which is set as text. So obviously at this point in time, we would have an audio file if we’ve just recorded something, so we need to create a new audio recording in our database and store that within the file. But right then and there, we wouldn’t have the transcription. We obviously need to send that through to OpenAI so it can transcribe that file.

Now, your data type might look completely different based on the app you’re trying to build. You might be trying to transcribe meeting notes, so your data type might be something like a ‘meeting’, but you just need to make sure whatever you’re trying to create and transcribe, you have these two data fields.

From here, we’ll now revert back to our workflow tab and continue building out the necessary workflows.

After our audio recorder has been stopped, what we’ll need to do is upload a version of that audio as a file. So if we type in the word ‘upload’, you’ll see a workflow related to our audio recorder element called  ‘upload content of a recorder.’ And once again, by default, we only have one recorder on our app, so that’s why we selected that already.

Now, if you want, you can choose to customize this name with a dynamic value, but we’re just going to leave ours as the standard static file name here. But after this, once we’ve uploaded that piece of content, we need to store that in our database. To do this, choose to add another action to your workflow, open the ‘data’ tab, and choose the ‘create a new thing’ option.

This will allow us to create a new audio recording, from which we just want to be able to store the audio file, and that’s going to be the result of step two. That is literally all we need to change.

Now, from here, if we jump back into our design tab at the bottom of our page, we’ve just added a repeating group to display all of the audio recordings in our database. So for the type of content we’re displaying, it’s a list of audio recordings. For the data source, we’re just searching for all the audio recordings in our database. So there’s nothing too complex about this, but inside of our repeating group, we’ve also added a standard group so that it can store all of the elements that we need to display. 

On the left-hand side here, we’re just displaying the date that this particular audio recording was created. So in this case, it’s just the ‘parent group’s audio recording’ and the ‘creation date’, and we’ve chosen to format that as a time. And then we’ve got a dash, and then we’re adding the creation date once again, and we’re formatting that as a date. So that way it should just look like a timestamp followed by the date.

6. Creating an audio player

Inside our repeating group, we’ll also need to add an audio player. This will allow us to actually play the audio that OpenAI returns.

So if you want to add an audio player, in theory, you could install a plugin to do that. There are plenty of great plugins that don’t require any code or any HTML. But today, we’ve going to opt for a simpler choice, which is just adding a HTML audio player.

Now, if you don’t know how to add this, it’s super straightforward. If you visit the W3 Schools website, you’ll see they provide a snippet of HTML code that contains an audio player.

Simply copy all of this HTML, then re-open Bubble. In Bubble, add a HTML element into your repeating group. Inside this element, paste in the audio player code from W3 Schools.

The only thing you’ll need to change is the file URL that you’d like to play.

As you’ll see in their example, the file was called ‘horse.OGG’ and ‘horse.MP3’, so all we need to do is add the URL of our own audio recording. To do this, back in Bubble, we’re just going to remove that text that’s added in, and between our quotation marks, we’re going to insert dynamic data and just reference the ‘parent group’s audio recording’, the audio file, the URL of that. 

So that’s where this particular audio file has been uploaded and where it lives on the internet. We’ll then need to do the same thing for the source here. So for the ‘horse.MP3’, we’re going to delete that, then we’re going to insert dynamic data, and once again, reference the ‘parent group’s audio recording’, the audio file, and the URL of where that file lives. And just like that, you will now have a fully functional audio player.

Now, we personally try not to look at all that HTML because it definitely can seem overwhelming. But once again, it wasn’t too difficult to add that in. 

From here,  we’re interested in showing you the feature that allows us to transcribe this audio recording. So within our group, you’ll see we have a couple of hidden elements. There is a button, and then there is a text element. So this button will trigger the workflow that sends our audio file through to OpenAI to have this audio file transcribed.

Something we should just point out – when it comes to this button under our ‘Layout’ tab, we have unselected that this should be ‘visible on page load’. So by default, it’s not going to be shown. And the reason for that is we only want this button to be shown when this text has not yet been transcribed. If it has been transcribed, we obviously don’t want someone to think that they could transcribe it again. That’s pointless. 

So we’ve created a ‘condition’ on this button that just recognizes when ‘the parent group’s audio recording’, when the transcription field is, in fact, empty – meaning there’s no value there – it hasn’t been transcribed, and this element should be visible.

We’ve then done the exact same thing but the inverse option for a text element below this. So for this text element, all we’ve done is add the word ‘transcription’, and we chose to bold that text. 

Now, once again, by default, this text element is not visible on page load; however, we’ve created a condition that just recognizes when the parent group’s audio recording, when its transcription text is not empty, meaning it has been transcribed, this element should be visible, and we’ve ticked that that should be true.

7. Finalising the workflows

This leads us to the very last and most important part of our tutorial, which is how we can build out the workflow that runs when our ‘Transcribe’ button is clicked. So if we edit our workflow, what you’ll see is we’ve already added a couple of steps here. Now look, we’re happy to build these out from scratch again so you can follow along. 

Within this workflow, we’re going to do two things. The first thing is send our audio file using our ‘API call’. So if we head to the plugins menu, you’ll see our API call, which is known as ‘OpenAI Whisper Speech to Text’. We’ll click on this, and what you’ll now see is that we have the ability to send information using our parameters. 

The first parameter is the ‘File’. We just need to reference which particular audio file we’re going to send through to OpenAI. And in this case, it’s the audio file of the current cell that we just hit ‘Transcribe’ on. But before we reference the URL of that audio file, we just need to make a minor tweak.

This is an example of what an audio file URL will look like, so you’ll see it’s got the link, but something it doesn’t have is the initial ‘https’ at the start of the URL. It’s got the two slashes that go behind that, but because this is, in fact, a proper URL, we need to add those in. So we’ll remove this static text, type in the letters ‘https’, add a semicolon, and then we’re going to reference the audio file that we want to transcribe. 

So we’ll insert Dynamic data, and we’re just going to pull the ‘parent group’s audio recording’, its actual audio file. Then for the next parameter, which is our model, this isn’t going to change; it’s just going to be ‘Whisper-1’ because that is the OpenAI model we need to send this data to.

Then finally, once we’ve sent this information, OpenAI will return the end result, which is the transcription. We then just need to store that transcription in our database. To do this, we’re going to add an additional step within our workflow. We’re going to head to our ‘data’ tab, then we’re going to choose to make changes to an existing thing. In this case, the existing thing we’ll change is the ‘parent group’s audio recording’. We just want to change the transcription, and we want it to be the ‘result of step one’ where we had sent this audio to OpenAI. That is everything we need to build out inside of this workflow.

8. Previewing the finished app

After building all of the workflows, we can now run a preview of the finished product. 

In a preview of our application, we’re going to choose to ‘start recording’ a new audio file, and we’re going to say something like:

‘We love the pleasures in life like OpenAI, Bubble, no code, and anything else in between.’

We’re then going to stop our recording; it’s going to create that audio file entry in our database.

Then if we scroll down, we can see that we have a brand new audio recording here, but it has not yet been transcribed. So of course, what we need to do is transcribe this. 

After selecting the transcribe button, it’s going to run our workflow, send it to OpenAI, and as you’ll see, it has returned that value as text that we have now transcribed. 

And just like that, you now know how to integrate your Bubble app with OpenAI to create your very own speech-to-text feature. And as you’ll see, the entire process wasn’t too complex. It’s nothing that we couldn’t handle.

Never miss a course 👇