Why post to YouTube with a static text image for audio?
We had a big occasion (Ordination/Installation Mass for a new bishop) that I was helping lead one of the sections on, but I quickly realized that if I could record parts and share them, that would allow everyone to get in extra practice and be prepared. It was a large collection of parts broken out by voice and by song and quickly became too much to email. I went with file sharing and/or SoundCloud, but the fact of the matter is that YouTube is the most ubiquitous [and relatively free] sharing platform and presented far less friction for choir members than other platforms.
Generating the Files
For my original run, I just manually added text on a picture of my piano for all of the videos (using ffmpeg
which I’ve returned to for this script). I may add the background image back in for the next run, but manually adding all of the text to the individual parts was tedious and left room for me to grab the wrong parts both at the time of combining them into video and when I uploaded them.
OBS Studio was a natural followup, because I just added all the audio track parts I was going to use to the library and tweaked the static “title card” for each render. But still, I wanted a consistent set of text across a whole set of videos.
What you’ll need
- ffmpeg (I installed via
brew install ffmpeg
a while ago) - imagemagick (Again,
brew install imagemagick
for macOS) - A recent Ruby with RMagick gem installed (Just
gem install
ed at the moment)
Creating the Title Card Image
I have a function title_text
that takes in the name of the audio file in a “Title – Description.mp3” format and creates a "Title\nDescription"
string. (TITLE
is a constant despite the title being extracted from the filename because the filename title is an abbreviation. create_title_card
writes out a white on black image text from the title text and saves it to a .png
based on the original file name. I also create a text file with the save info so that I can copy/paste into YouTube on upload.
# basename without extension
def squished_filename(filepath)
File.basename(filepath).gsub(/\..*$/,'').gsub(' ','')
end
def title_text(filename)
sub_desc = filename.split('-')[1]
full_sub_desc = case sub_desc
when /And/
sub_desc.gsub(/(.*)And(.*)/,
'\1 with \2 accompaniment')
when /Over/
sub_desc.gsub(/(.*)Over(.*)/,
'\1 primary with \2 voice and accompaniment')
end
"#{TITLE}\n#{full_sub_desc}"
end
def create_title_card(file)
filename=squished_filename(file)
canvas = Magick::Image.new(1920, 1080){self.background_color = 'black'}
text = Magick::Draw.new
text.pointsize = 50
text.fill = 'white'
# center horizontally and vertically
text.gravity = Magick::CenterGravity
text.annotate(canvas, 0, 0, 0, 0, title_text(filename))
canvas.write("#{filename}.png")
File.open("#{filename}.txt", "wt") do |f|
f.puts(title_text(filename))
end
end
Creating the Video
For now, I’m just calling out to ffmpeg
using backpacks with the .png
.mp3
and .mp4
(image input, audio input, and video output) interpolated. The -i
parameters are used for both the input image and input audio file.
MP3_DIR="~/Music/GarageBand"
WILDCARD="DoYouHear*.mp3"
Dir[File.expand_path(File.join(MP3_DIR, WILDCARD))].each do |file|
create_title_card(file)
`ffmpeg -loop 1 -i #{squished_filename(file)}.png -i "#{file}" -c:v libx264 -tune stillimage -c:a aac -b:a 192k -pix_fmt yuv420p -shortest #{squished_filename(file)}.mp4`
end
Results


The full code is available on my GitHub repo (this commit for the source here)