Popular

New

OpenAI Whipser

pro

audio_video

Created Mar 29, 2025

$ 0.006 / Per Minute

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition, translation, and language identification.

Technical Specifications

Service Type: Speech-to-Text Generation
Supported Formats:
- Input: Audio file URL (supports various audio formats)
- Output: Text transcription with optional English translation

Usage Examples

Convert speech in audio to text

User prompt:

use whisper speech to text https://storage.oaphub.ai/19/194371087193079808/138778300075511946903f4?k=ca694125

Result:

Result: \" This is a short test of the speech generation system. How does it sound?\"

Raw Tool call (How LLM might use this tool)

[
  {
    "name": "Whisper-Speech-To-Text",
    "arguments": {
      "audio": "https://storage.oaphub.ai/19/194371087193079808/138778300075511946903f4?k=ca694125"
    }
  }
]

Raw tool result from the MCP server

[
  {
    "type": "text",
    "text": "Result: \" This is a short test of the speech generation system. How does it sound?\"",
    "annotations": null
  }
]

Tools

Whisper-Speech-To-Text

Usage: Convert speech in audio to text

Input Arguments:

Name	Type	Required	Description
`audio`	string	✓	URL to the audio file to transcribe. Must be a publicly accessible URL pointing to an audio file (MP3, WAV, M4A, etc.)
`translation`	boolean		When set to `true`, translates the transcribed text to English. Default: `false`. Useful for non-English audio content