CAPITAL FRIDAY

View Original

How To Use AI To Start A Transcription Business

You’ve likely already heard about Chat GBT, but I bet you haven’t heard about it’s little brother Whisper. Whisper was also created by Open AI and is an automatic speech recognition (ASR) system.

What Is an Automatic Speech Recognition (ASR) System?

Automatic Speech Recognition (ASR) is a technology that enables machines to recognize and transcribe spoken language into text. ASR uses machine learning algorithms and neural networks to analyze spoken words and phrases and convert them into a written form that can be easily processed and analyzed by machines. ASR systems are widely used in various applications, including voice assistants, automated phone systems, captioning for video content, and more. The accuracy of ASR systems can vary depending on the quality of the audio input, the complexity of the language being spoken, and the overall performance of the ASR algorithm being used. With advancements in machine learning, ASR technology has improved significantly in recent years, making it more accurate and reliable for a wide range of applications.

Now Whisper, Open AI’s ASR, was trained on 680,000 hours of data collected from the web. And it just became available to use as the backbone of your business as they included the model in their API starting in March 2023.

How Much Does It Cost to Use Whisper, Open AI’s Speech to Text Artificial Intelligence?

Now this is the interesting part if you’re looking to disrupt the transcription business. Whisper transcriptions currently cost $0.006 per minute to transcribe audio… That’s cheap.

What Do Transcription Sites Charge to Transcript Audio Using AI?

Sites right now charge more than what you can pay to use Open AI to back your own transcription service. Prices below reflect ad hoc or monthly pricing as of March 2023, but this is going to come down quick.

See this chart in the original post

Where’s the Opportunity in AI Transcription?

I think the opportunity is to create simple interfaces for transcription targeted to niche uses. When do you have the ability to use a world leading artificial intelligence as the background tech to your business for a few pennies? Not often. If I had the ability to focus on one idea I would create a simple audio capture / transcription app structure and then market it to students for lectures, office jockeys for meetings, podcasters for transcripts to boost SEO, etc. Then I would create a super simple one page website where you can upload a file, get the transcript and pay for that one transaction.

Currently there’s services that have additional value ads for certain sectors and the ability to have a human transcribe as well. Then on the other hand you have Whisper and AWS that will let you use AI to transcribe through their API. But as far as I can tell there’s no minimum viable product that is just an easy interface to access the AI’s tech.

Actually, I think a super interesting website would be one where you can use Open AI’s Whisper or AWS side by side. People may be interested to see what works best for their use case and you could potentially make more money as people transcribe items twice to see which AI does the job better.