Splitting a transcript into individual speaker files

Have you ever received an interview transcript from a third party and wanted to apply it to your multi-track sequence? Maybe you've wanted to get started transcribing an interview with automatic transcription but later apply a third-party transcript to your edited composition. 

In this Advanced-level tutorial, we'll show how you can take a pre-written transcript and break it up into individual files using Visual Studio Code, a free code editing program available on Microsoft.com.

After following Microsoft's instructions for installation, go ahead and load Visual Studio Code to get started. You'll also want to make sure your transcript follows these formatting tips before proceeding.

To get started, copy the contents of your script into a new Visual Studio Code file.

Copy_Script_to_VSC.gif

You'll want to make sure to remove any titles/headers from the file. Additionally, each of your lines must have a speaker assigned or backspacing the free lines up into the previous line.

Give_Every_Line_a_Label.gif

Once complete, open the search panel on the left-hand sidebar of Visual Studio Code.

Open_VSC_Search_Bar.gif

Here you'll enter the name of your first speaker, the colon character, and then the characters .* (period - asterisk). Next, click on the Use Regular Expression button (which also looks like a period and asterisk) just to the right of the search bar. This will select all of the lines for your first speaker.

Select_First_Speaker.gif

You'll want to quickly scan through each line of your transcript to ensure that there aren't any missing or un-selected lines for this speaker. If there are, add the speaker to that line or backspace the line until it's joined with the correct line before it.

Move your cursor to any of the search results, right-click, and select Copy all to copy all of the selected lines in the script. Double-click the space to the right of your file's tab to create a new file. Right-click in the body of the new file and select Paste.

Copy_Paste_First_Speaker_Selection.gif

The new file may include some formatting that you'll need to remove. Start by removing the line numbers at the beginning by removing the text in the Search bar. Then type:

  • Two spaces ('  ')
  • backslash, d, plus ('\d+' - this is a regular expression that selects the first number)
  • comma (',')
  • backslash, d, plus ('\d+')
  • colon (':')
  • space (' ')

Make sure the "Replace" box is empty. Once complete, click the Replace All button just to the right of the "Replace" text box to remove the line number prefix from your speakers. Afterward, you can remove the extra title at the top.

Remove_Prefix_and_Title.gif

You now have an extracted transcript that can be used to sync with your first audio file using the Import Transcript process.

Repeat this process for each other speaker in your script until all files have been imported into Descript.