Import media content to transcribe

Take advantage of Whisperit's transcription features

Last updated About 2 months ago


Upload your audio and video files to ease note-taking and meeting reports for even faster processing

With Whisperit, you can import text or audio files. This feature allows you to add specific context to your requests and enhanced and faster documents by transcribing your media content.

How to import your audio and video files into Whisperit

Importing and transcribing audio and video files is easy with the built-in document assistant.

Step 1: Create a New Document

  • Log in to the application and navigate to the Documents section.

  • Click to create a new blank document.

Step 2: Access the Assistant

  • Find the assistant interface on the side panel of your document workspace.

Step 3: Upload Your audio or video file

  • Locate the “Upload” or file drag-and-drop area within the assistant.

  • Drag your media file into this area. Wait until the upload completes.

Step 4: Listen and Transcribe

  • Once uploaded, you’ll be able to play back the audio directly.

  • To transcribe, click the “Reprocess” button.

Step 5: Review and Use the Transcription

  • The transcribed text will appear instantly (or after processing, for multi-speaker files).

  • Review the results directly inside your document.

The different transcription modes

Before clicking on "reprocess” you can select the different transcription modes on the assistant button.

The Text to Speech modes

These use cases help select the appropriate template based on the transcription or content processing goal. Each serves a distinct role from raw transcription to structured document production or multi-speaker analysis. This guide should help quickly choose the best template for your intended output.

Standard

Converts speech to text exactly as spoken.

  • Suitable for straightforward speech-to-text conversion without additional processing or response.

  • Use when you only need an exact text transcript as spoken.

Multi-speaker meeting (Diarization)

Designed for transcriptions involving multiple speakers in meetings.

  • Use when you need to separate and identify who said what for clear meeting documentation.

  • Indicate the number of speakers if prompted, then reprocess the file.

  • The assistant will attempt to identify and separate speakers for more detailed transcription.

Note: Multi-speaker processing may take longer than standard transcription.

The AI-Enhanced Transcription modes

  • Smart Transcription
    Use when detailed understanding and interactive processing of speech is required. It transcribes speech accurately, analyzes for questions or instructions, and responds helpfully, especially when complex knowledge is involved (legal, medical, business, etc.).

  • Note
    Optimal for cleaning up dictated notes by fixing writing issues, adding punctuation, and emphasizing key points while keeping the original words largely intact.

  • List
    Best for when the user wants all dictated items extracted and presented clearly in list format for easy reference.

  • Professional Email
    Use to convert notes or dictation into a polished professional email format, including subject line creation and recipient mention where applicable.

  • Journal
    Suitable for turning spoken input into a journal entry format with clear titling and date to keep a personal or professional daily record.

  • Surgical Edit
    Ideal for precise document creation or updates where exact editing is needed, such as formal documents or legal texts requiring careful revision and accuracy.

Add your instructions

Once your files are imported, you can include a voice or written instruction if needed. This helps the AI better understand your expectations and take both the files and your guidance into account.

What types of files can I import?

Whisperit supports a variety of common file formats to suit your needs:

  • Audio files: .wav, .mp3, .mpeg, webm, ogg, m4a. etc

  • Video files: .mp4, .mov, etc