Bhavin Tandel

projects, blogs, about me

Github LinkedIn Medium Twitter

Exploring ML Tools - AWS Transcribe

Audio is part of our life which exists in many forms like, voice messages, podcasts, songs, lectures, recorded conversations etc. These files are really hard to process by computer and so it just stays in magnetic disk or expensive disk and never actually used proactively unless its needed. However, machines have become very intelligent in process texts. And thus, so we can convert audio to text and put machine to work for getting insight.

AWS Transcribe Logo

Introduction

There have pretty good research in speech-to-text conversion with model like Hidden Markow Models(HMM), Dynamic Time Warping and many neural networks. AWS have provide fully managed service called AWS Transcribe which can be used to transcribe audio, video files and also transcribe medical files.

How it Works?

Features

Usecase

Usage

The AWS transcribe can be used via:

Input

Process

Following steps will walk through transcribing steps via python SDK.

import boto3
transcribe_client = boto3.client('transcribe', region_name='eu-west-1')
short_job_uri = "https://s3.eu-west-1.amazonaws.com/exploring-ml-tools/aws-transcribe/assets/VodaFoneCallCenter.mp3"
# Start Transcribing job
job_name = "TEST_JOB_With_VodaFone"
trascribe_response = transcribe_client.start_transcription_job(
    TranscriptionJobName=job_name,
    Media={'MediaFileUri': short_job_uri},
    MediaFormat='mp3',
    LanguageCode='en-IN',
    Settings={
        'ShowAlternatives': True,
        'MaxAlternatives': 6
    }
)
import time

# Get Transcribing job
while True:
    status = transcribe_client.get_transcription_job(TranscriptionJobName=job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        break
    print("Not ready yet...")
    time.sleep(5)
print(status)
if status['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED':
    uri = status['TranscriptionJob']['Transcript']['TranscriptFileUri']

response = requests.get(uri)

if response.status_code == 200:
    json_data = json.loads(response.text)

json_data['results']['transcripts']
# Output >>>[{'transcript': 'his taste for a phone call centre.'}]

If you dont specify output bucket then transcripts will be deleted after expiration of job that is in 90 days.

Output

The output files can be stored in your output bucket if specified in job or by default transcribing job stores the data up to 90 days. The output will contain alternatives as well which show other output with various confidence level.

Also, if we have enabled redaction then the redacted file will replace the PII value with [PII] tag.

Cleanup

# Delete Transcribing job
response = transcribe_client.delete_transcription_job(
    TranscriptionJobName='TEST_JOB'
)

Transcribing job expires in 90 days.

Findings

Pricing

Bibliography

back