I always like finding new ways to apply artificial intelligence (AI) tools to my day-to-day productivity tasks. Last year, I showed how I used generative AI to rescue some bad audio and otherwise tweak a short how-to video. I used Photoshop's Generative Fill, Adobe Podcast, and what was then a new background replacement feature in Final Cut Pro.
This time, I'm using an AI gimbal to help the camera follow my movements, Apple's Voice Memos AI transcription feature in MacOS Sequoia to transcribe an unscripted video, and ChatGPT to suggest titles, tags, and a description for an unboxing video.
Also: How to use Photoshop's Generative Fill AI tool to easily transform your boring photos
Let's start with the project. I do videos for my YouTube channel as often as I can, but my primary work product is writing. So I try to find ways to optimize my limited non-writing time for my various YouTube projects.
The video I worked on most recently was the unboxing of a multi-filament 3D printer. TheAnycubic Kobra 3 Combo can print using up to four colors at once. Unboxing videos have always been popular with my viewers, so I wanted to get the video done quickly.
The challenge with unboxing is that it's often hard to know what to film because I never know what's inside the box until I open it. The best way to be sure I get good film is to place a bunch of cameras all around my work area, and then just do my unboxing thing.
Also: How to download YouTube videos for free - 2 ways
The problem is, I often move around the workshop while unboxing. In previous videos, I'd often wind up with shots where I'm out of frame, or coming in and out of frame. I tried some auto-follow gimbals in the past, but they always got confused unless I was facing the gimbal directly at all times.
David GewirtzNot this time.
I picked up theHohem iSteady v3 gimbal on sale at Amazon for$100. (It's usually$129.) I watched a few reviews of this gimbal, and began to realize that gimbal AI has come a long way in the past year. This gimbal has a whole bunch of app-assisted features, but what I liked most is that it has an "AI module" that orients the gimbal properly, regardless of whether you're running an app, or even what you're using for a camera.
Also: My 9 must-have gadgets for creating quality YouTube videos
Even if you don't have the app installed, the gimbal responds to a few simple hand gestures. I have yet to install the app and I've made a great video with amazing tracking of my movement.
The setup is super easy. Charge it via USB C, then pull out the little built-in tripod legs and insert your camera. I used my old iPhone SE in the little clamp. Long pressing the power button turns it on. It will auto-calibrate, setting your phone to film in portrait mode.
To switch to landscape mode, you simply point both your thumbs to the left. Then, give it the OK sign and it will track you as you walk around.
Also: NASA has a problem, and it's offering up to$3 million if you have a solution
This gimbal completely solved my out-of-frame problem right out of the box because the onboard machine learning in the AI module tracked me perfectly. It tracked me correctly when I moved behind a workbench and behind the big box I was unboxing. It tracked me when I walked toward the camera and when I turned around and walked away. The only time it lost track of me was when I walked completely out of the room, and all I had to do to get its attention again was hold my hand up in the OK sign.
In addition to the phone in the gimbal, I used a second iPhone pointed down from a high vantage point. I also used two iPads that were filming from their front-facing cameras so I could watch what was on-frame while filming. Yes, the front-facing cameras are a little lower in resolution, but it's worth the trade-off to have a built-in monitor at all times.
This video was entirely off the cuff, so I didn't have a pre-written script I could feed into YouTube for closed captions. I also didn't have a script to give to ChatGPT to help me with SEO and tag suggestions.
Instead, I just recorded my commentary into theDJI Mic 2 , which was connected via Bluetooth to one of my iPads. After recording into all four iOS devices, I transferred the video into Final Cut Pro and used the multicam feature to match up the timing of all four camera angles. That allowed me to easily switch between angles during editing by simply typing 1, 2, 3, or 4, corresponding to whichever camera I wanted to show footage from at that point in the film.
Also: I was a Final Cut Pro diehard until DaVinci Resolve won me over with these 3 features
To get an audio file suitable for transcription, all you have to do is open the completed video file produced by Final Cut in QuickTime Player. Under the File menu, select Export As > Audio. You're not given a choice of formats, so you're stuck with m4a. Fortunately, this will work for our purposes.
Next, open the Voice Memos app in Sequoia. This won't work on earlier versions of MacOS. There's no import option in Voice Memos, but if you drag and drop your m4a audio file onto the list of recordings, you'll briefly see a green plus sign and it will be accepted into the list of clips. Note that Voce Memos places your clip chronologically based on when it was recorded, not based on when you insert it into Voice memos.
Once it's imported, click the very tiny gray transcribe icon.
Screenshot by David GewirtzWait a minute and it will generate a transcript.
Screenshot by David GewirtzLet's be clear. This is a poor transcription. It got my name wrong, it got the product names wrong, and it didn't have any concept of paragraphs or line breaks. It doesn't seem to use any sort of custom on-device dictionary culled from the millions of words I've typed on the Mac it's running on.
It's nothing like what would come from the commercialRev.com service, but at two bucks a minute for human transcription, this little video would have cost over$20. Using this Apple Voice Memos hack was free (although you do get what you pay for). I'm not knocking Rev.com. I use the service anytime that quality is important for client work.
Also: I improved my iPhone's battery life by changing these 11 settings
But for my little box opening? It just wasn't worth the cost.
To get the text out of Voice memos, hit the Edit button and copy. You'll need to paste it into your text editor of choice and save it for later. Take a brief moment to make some edits. You'll want to search and paste on your name and product names, so at least they're correct in the transcript.
We're going to use this transcript for a few things on YouTube Studio. First, once your video is uploaded, go into YouTube Studio and click the Subtitles tab. If your video has been in the system for a while, YouTube is likely to have generated automatic captions, as shown with my video.
Screenshot by David GewirtzBut above the Automatic Captions entry, there's usually a series of three dots where you can enter your own "English (video language)" transcript. Mine's complete here, but you would click on the arrow and upload your somewhat corrected text file from earlier.
YouTube uses this to help produce closed captions, comparing what you upload with what it creates internally. I've also heard from other YouTubers that having a full set of uploaded captions gets you a bit more SEO juice because YouTube has more insight into what your video is about, and the algorithm is reputed to maximize exposure based on that.
Also: I'm a ChatGPT power user - and this new feature instantly made me more productive
Next up are three easy-to-write components of the video listing: the headline, the description, and the tags. As a writer, this is the easiest part of the whole project for me, but as an AI researcher, here's another opportunity to see what we can get an LLM like ChatGPT to do for us.
I was pleasantly surprised. The AI wouldn't directly "watch" my video, butChatGPT Plus did ingest my transcript. I gave it the prompt:
Read the following and then wait for additional instructions.
Then I gave it this prompt:
This is a transcript from a YouTube video. Please give me 10 high-impact possible YouTube video titles.
It returned the following 10 video titles: