Text on YouTube Thumbnails
Some of YouTube's highest-CTR thumbnails have zero text. Others would fail without it. The answer isn't a rule — it's a decision that depends on what your image can communicate on its own.
Same Video. Too Much Text vs. Right Amount.
The thumbnail with less text communicates more — because the image does the work the text was trying to do.
No credit card required · 5 free credits on signup
The Question Most Creators Answer Wrong
Most creators treat text on thumbnails as a default — something to add unless there's a specific reason not to. The more useful default is the opposite: start with no text, and add it only when removing it would make the thumbnail worse.
This reversal matters because text on thumbnails has a cost. Every word you add reduces the space available for visual elements. Every word you add requires a viewer to read rather than just perceive. And reading is slower than visual recognition — in a feed where the window to capture attention is under half a second, switching a viewer from image processing to text processing is a disadvantage unless the text is doing something the image cannot do on its own.
The goal of this guide is not to tell you to use text or avoid it. It is to help you determine which is true for your content — and give you the rules that apply when you do add it.
When Text on a Thumbnail Hurts Your CTR
When the image is already self-explanatory
A before-and-after fitness transformation, a completed DIY project, or a finished dish from a cooking video communicates its own story visually. Adding text that says "INCREDIBLE TRANSFORMATION" or "AMAZING RESULTS" doesn't add information — it adds noise. The image is the hook. Text in this context dilutes it by splitting the viewer's attention between reading and seeing.
When the text duplicates the title
If your title says "I Lost 20 Pounds in 60 Days" and your thumbnail also says "20 LBS IN 60 DAYS," you have used two surfaces to communicate one piece of information. The viewer who reads the title already has that information. The viewer who sees the thumbnail before the title gets no additional reason to click. Duplicate information between thumbnail and title is the most common text mistake — and the easiest to fix by removing the text and letting the image use that space.
When you have a well-recognized face doing the work
For established creators whose face is itself a signal — viewers who recognize the creator's face will click based on that recognition. Adding text in this context competes with the face for visual attention without adding meaningful click incentive. The exception: if the text adds specific context that the face doesn't carry ("I Was Wrong About This" next to a surprised expression works because it creates curiosity that the face alone doesn't resolve).
When the text is too small to read at thumbnail scale
This is the most damaging form of thumbnail text. Text that looks readable on a 1280×720 canvas becomes illegible at 250 pixels wide — which is the size most search result thumbnails are displayed. Text that can't be read communicates nothing and clutters the image. If your text requires a reader to zoom in or squint, it should not be there.
When Text on a Thumbnail Meaningfully Increases Clicks
Topic-based content with no visual equivalent
Finance, software tutorials, and abstract educational content often lack a visual hook that communicates the topic clearly. A video about tax optimization doesn't have an obvious image. A video about a programming concept has no visual shorthand. In these cases, text is not competing with the image — it is completing an image that couldn't communicate on its own. "SAVE $4,200" over an image of a spreadsheet is more clickable than the spreadsheet alone.
Numbers as the hook
Specific numbers — dollar amounts, timeframes, quantities, percentages — are among the most clicked elements in thumbnails when they are large, high-contrast, and central. The reason: numbers are concrete. They communicate a specific, verifiable claim rather than a vague promise. "7 DAYS" next to a challenge video, "$0 TO $10K" next to a business video, "97%" next to a product review — each creates a specific, evaluable proposition that a viewer can decide to click or skip based on real information.
Contradiction or surprise text
Short phrases that create cognitive dissonance — "I WAS WRONG," "NEVER AGAIN," "ACTUALLY..." — work because they create an immediate question the viewer needs the video to answer. This text isn't giving information; it's creating a gap. The key is that the text must be paired with an expression or image that reinforces the contradiction. "I WAS WRONG" next to a neutral expression doesn't create the same pull as "I WAS WRONG" next to a visibly regretful or surprised face.
The Rules That Apply When You Do Use Text
Word count: three to five words maximum
This is not a stylistic preference — it is a function of reading time at thumbnail scale. A viewer has approximately 150–200 milliseconds of attention on a thumbnail before deciding to continue or scroll. Reading speed for well-formatted text is approximately 200–250 words per minute for average readers. Five words at an average of 4 characters each takes roughly 100–150 milliseconds to read at glance speed. Six or more words pushes past the attention window for a viewer who hasn't decided to stop yet. Short text is read. Long text is skipped.
Font weight: bold only
Regular-weight fonts become illegible at thumbnail scale in most display contexts. Light-weight fonts are completely unreadable. Use bold or extra-bold typefaces. The stroke width needs to be substantial enough to survive JPEG compression, which tends to blur thin strokes. Sans-serif fonts in bold weight (Inter, Montserrat, Impact, Anton) are the most legible at small sizes.
Contrast: text must pass the squint test
Your text must be readable when you squint at the thumbnail from arm's length. If it disappears when the image is slightly blurry, it will disappear at small thumbnail sizes. The two reliable solutions: dark text on a light area, or light text with a dark stroke or background panel. A dark drop shadow behind light text also works but requires more weight to survive compression. Never place text over a busy, multi-colored area without a background element behind it.
How Thumbnail Text Relates to Your Video Title
The strongest thumbnail-title combinations divide the information rather than duplicating it. The thumbnail shows what can be shown visually — the result, the expression, the subject. The title provides the context that the image can't convey — the timeframe, the why, the stakes. When the thumbnail has text, that text should be the one piece of information the image couldn't communicate on its own — not a compressed version of the title.
The test: cover the title and look at only the thumbnail. Does it make you want to know more? Now cover the thumbnail and read only the title. Does it make you want to see the video? If both answers are yes, the two elements are working as a unit. If covering either one makes you less interested in clicking, the unit is not balanced.
Almost always use text: Finance, business, technology tutorials, news/commentary — the topic often has no strong visual equivalent.
Situational: Fitness, lifestyle, cooking — use text for numbers and specific hooks, skip it for transformation/result visuals.
Rarely necessary: Travel cinematography, nature, ASMR, visual art — the image is the content. Text competes rather than complements.
Depends on face presence: Reaction, commentary, vlog — if the expression tells the story, text is often redundant; if not, a single phrase can create the missing curiosity gap.
Generate a thumbnail that balances text and visuals automatically
Describe your video and Titles.video builds the thumbnail — including whether to use text and what it should say. Free to start.
Create a Thumbnail FreeFrequently Asked Questions
Should YouTube thumbnails have text on them?
It depends on whether the image communicates the video's hook without text. For topic-based content (finance, tech, education) where there is no strong visual shorthand for the topic, text usually increases CTR. For visually self-explanatory content (transformation results, before/after, food, travel) text often reduces CTR by competing with the image. The default question to ask: "Does removing this text make the thumbnail less clickable?" If yes, keep it. If no, remove it.
How big should text be on a YouTube thumbnail?
Large enough to read at 250 pixels wide — which is the approximate display size in YouTube search results. A practical rule: if your text is less than 15% of the thumbnail height, it will likely be illegible in search. Test by scaling your design down to 250px wide in your editor and reading it from arm's length. If you cannot read it easily at that scale, increase the font size.
What font works best for YouTube thumbnails?
Bold sans-serif fonts with thick strokes. Popular choices: Anton (heavy, high-impact), Montserrat Bold or Black, Inter Bold, Bebas Neue (for shorter words), and Impact (older but still effective). Avoid thin fonts, script fonts, and decorative fonts — they become illegible under JPEG compression and at small display sizes. Consistency across your channel thumbnail text also builds brand recognition over time.
Should thumbnail text repeat what is in the video title?
No — this is the most common text mistake on thumbnails. If your title says "How I Made $5,000 in One Month" and your thumbnail also says "$5,000 / MONTH," you have used two surfaces to say the same thing. The thumbnail and title work as a unit; they should divide the information between them rather than duplicate it. The thumbnail should show what can be shown visually. The title should explain the context the image can't convey.