Combining Audio with a Static Text Image for YouTube with Ruby

Why post to YouTube with a static text image for audio?

We had a big occasion (Ordination/Installation Mass for a new bishop) that I was helping lead one of the sections on, but I quickly realized that if I could record parts and share them, that would allow everyone to get in extra practice and be prepared. It was a large collection of parts broken out by voice and by song and quickly became too much to email. I went with file sharing and/or SoundCloud, but the fact of the matter is that YouTube is the most ubiquitous [and relatively free] sharing platform and presented far less friction for choir members than other platforms.

Generating the Files

For my original run, I just manually added text on a picture of my piano for all of the videos (using ffmpeg which I’ve returned to for this script). I may add the background image back in for the next run, but manually adding all of the text to the individual parts was tedious and left room for me to grab the wrong parts both at the time of combining them into video and when I uploaded them.

OBS Studio was a natural followup, because I just added all the audio track parts I was going to use to the library and tweaked the static “title card” for each render. But still, I wanted a consistent set of text across a whole set of videos.

What you’ll need

  • ffmpeg (I installed via brew install ffmpeg a while ago)
  • imagemagick (Again, brew install imagemagick for macOS)
  • A recent Ruby with RMagick gem installed (Just gem installed at the moment)

Creating the Title Card Image

I have a function title_text that takes in the name of the audio file in a “Title – Description.mp3” format and creates a "Title\nDescription" string. (TITLE is a constant despite the title being extracted from the filename because the filename title is an abbreviation. create_title_card writes out a white on black image text from the title text and saves it to a .png based on the original file name. I also create a text file with the save info so that I can copy/paste into YouTube on upload.

# basename without extension
def squished_filename(filepath)
  File.basename(filepath).gsub(/\..*$/,'').gsub(' ','')

def title_text(filename)
  sub_desc = filename.split('-')[1]
  full_sub_desc = case sub_desc
                  when /And/
                                  '\1 with \2 accompaniment')
                  when /Over/
                                  '\1 primary with \2 voice and accompaniment')


def create_title_card(file)

  canvas =, 1080){self.background_color = 'black'}

  text =
  text.pointsize = 50
  text.fill = 'white'
  # center horizontally and vertically
  text.gravity = Magick::CenterGravity

  text.annotate(canvas, 0, 0, 0, 0, title_text(filename))

  canvas.write("#{filename}.png")"#{filename}.txt", "wt") do |f|

Creating the Video

For now, I’m just calling out to ffmpeg using backpacks with the .png .mp3 and .mp4 (image input, audio input, and video output) interpolated. The -i parameters are used for both the input image and input audio file.


Dir[File.expand_path(File.join(MP3_DIR, WILDCARD))].each do |file|
  `ffmpeg -loop 1 -i #{squished_filename(file)}.png -i "#{file}" -c:v libx264 -tune stillimage -c:a aac -b:a 192k -pix_fmt yuv420p -shortest #{squished_filename(file)}.mp4`



Garage Band mp3 exports
Generated Text, Static Text Image, and mp4 files for upload

The full code is available on my GitHub repo (this commit for the source here)

Title Card and Audio File Combined

Leave a Reply

%d bloggers like this: