Add TF-IDF to Your SEO Research Before Your Competition Discovers it

No, we’re not about to tell you to completely scrap your keyword research. But, we are going to advise you to add a few columns to the spreadsheet; Crucial columns that could help you leapfrog the competition.

Today, we’re going to discuss TF-IDF, which we find insanely exciting. As Toronto’s best SEO agency, we pride ourselves on staying on top of SEO trends, without jumping on every little hack or trick that may or may not still be a thing in 12 months.

With that in mind, we can confidently say that TF-IDF is “for reals,” as the kids say… Do kids still say that? We have no idea. SEO trends are the only ones we follow.

In any case, TF-IDF is not an SEO fad. It’s a legit game-changer that very few people are taking advantage of. This is why you need to jump on this right now before “The Other Guys” do.

We’re going to take a deep dive into TF-IDF and explore what it is and what it is not. We will also show you exactly how it can help your site, and frame it all by putting it in the context of keyword research changes in recent years.

The Keyword Research That You are Probably Doing Now 

You’re probably using some variation of this content model right now:

  1. You use a keyword tool like Google Keyword Planner, ahrefs (our go-to), or SEMRush
  2. You find the keywords to match your customers’ problems and your company’s offering
  3. Weigh search volume against the competition score to find the keywords you want
  4. Pop all of this data into a big ol’ spreadsheet or content planner
  5. Create each respective blog with a primary and secondary keyword in mind
  6. Add them to your blog as naturally as possible, without stuffing them in
  7. Repeat for each respective blog

To be clear, this system works! This is some rock-solid SEOing. If you’re doing this now, you’re ahead of most businesses out there today. For god sake, more than a third of surveyed businesses are still keyword stuffing.

If you’re using the method above and your competition is also doing it, it’s a battle to see who can do it better.

Allow us to show you how to do it better.

TF-IDF sEO Research

What is TF-IDF?

TF-IDF stands for “Term frequency–inverse document frequency.” 

How does it impact your keyword strategy? Simply put, the tried-and-trusted keyword tools we mentioned are fantastic at telling you the keywords you need to optimize to earn Google’s attention. TF-IDF takes this a step further by showing you the other words that you will need to legitimize this piece as authoritative and complete.

Performing a TF-IDF analysis reveals the relevant words used in the top 10 results for whatever keyword you’re going after. It can show you words that could be viewed as conspicuous by their absence, in the eyes of Google.

Let’s say you’re writing a blog on the Avengers. You would probably use keywords like Iron Man, Hulk, Thor, and Black Widow. However, Google’s algo may not see your piece as truly authoritative if you’re not using words like superhero, comic book, hammer, or Marvel. But, doing some TF-IDF research would show you that 10/10 of the highest ranking blogs on The Avengers all have those other words. The data also shows you exactly how frequently each word is used in each blog.

For the purposes of this article, we’re not going to get into the granular minutiae of the algo. We’ll focus on its impact on SEO and the world of keywording as we know it.

How to use TF-IDF With Your Keywords 

Let’s say you wanted to rank for the term, ‘How to paint stripes on a wall.’ This is a pretty good mix of search volume and low competition. If you own a local paint or hardware store, this could be a good one to go after.

So, you would punch ‘How to paint stripes on a wall’ into your keyword tool (we used ahrefs, as usual) and get the following:

From here, you would look at the keyword ideas by volume and make ‘How to paint stripes on a wall’ your primary keyword and use secondary/ tertiary keywords such as:

  • How to paint horizontal stripes on a wall
  • How to paint vertical stripes on a wall
  • How to paint stripes on a wall without tape

That’s a great start. Now, let’s take this to the next level by doing some TF-IDF analysis to see the other words that we should be using.

Today, we’re using Surfer keyword analyzer. There are other options out there such as:

I like Surfer’s simple layout and deep-dive insights.

First of all, let’s enter our primary keyword, ‘How to paint stripes on a wall’ to see the other words that we should be using in our blog.

From the Popular Words tab, we can see what words frequently showed up in the current top-ranking sites for this query. We can see: 

  • The #1 ranked result used the word ‘Tape’ 35 times (1.94% density)
  • The #2 used it 52 times (0.73% density)
  • The #3 used it 20 times. 

You can also see the breakdown for other words like ‘Paint’ and ‘Wall.’

Would you have written this article without the word ‘Tape?’ Probably not. But this gives you an idea of how often the top performers are using it.

Let’s see what phrases the highest-ranked articles have in common. We click the Popular Phrases tab.

Now, we see the multi-word phrases that appear in the top results the most frequently.

Some of these phrases won’t apply to your blog. ‘Reply Beth’ was used in the #2 blog’s three times, but likely makes no sense in yours. But, you can see other terms that you should be using like ‘Base Coat’ or ‘Paint Stripes.’

Let’s drill down a bit further. 

The Common Words tab will show you which words showed up in all 10 of the top 10 results. In this case, you can see they all used the words: 

2019, wall, colors, color, paint, tape, time, stripes, painting, walls, painter, measure, base, stripe

It’s interesting to see that you may not have included ‘2019’ in your blog, but every page ranking in the Top 10 did.

Now you can take a similar drill down with the Common Phrases Tab.

TF-IDF sEO Research common phrases

You can see that 9 out of the Top 10 used the terms ‘Paint stripes’ and “Painter s tape.” Your blog probably would have too, but it’s good to know. It’s pretty cool stuff and shows you a more complete view of what you’re trying to rank against.

From here, you can see that if you and your competition each write an article about how to paint a stripe on a wall, adding the word “2019” could possibly give you the edge you need to outrank them… even though it may not have shown up in your initial keyword research.

How to Use and Implement TF-IDF Data

As you can see, TF-IDF data doesn’t replace your old keyword data, it gives it a nitrous oxide boost.

So, now what? What do you do with all of these new insights? You can see that the word ‘Tape’ makes up 1.93% of the top-ranking blogs. Does that mean you’re going to make sure that you use it 19 times in 1000 words? Or even bump it up to 20?

No, please do not do that! Unless you want to frustrate your reader and publish a truly terrible blog. 

Think of this data as more of a checklist to ensure you’re telling a complete story, and creating something the Google’s algo will see as comprehensive and authoritative.

Do not look at this data and say:

“We’re only using the word ‘Tape’ 6 times, we need to bump that up or we won’t rank.”

Instead, look at it and say:

“We’re already using the word ‘Tape,’ which is good. We’ll try to add it more where it makes sense. It looks like we should also pepper in the terms ‘Base Coat’ and ‘2019’ where we can.” 

Like traditional keyword stuffing, trying to awkwardly shoehorn these words in where they won’t belong will do 3 things:

  1. Kill the quality of the writing
  2. Frustrate the human reader
  3. Tip Google off to the fact that this is a not-so-good piece 

✖ The Wrong Way To Use This Information 

Create a finite list of words you need to use in your content, and use them according to the density your research has revealed.

✅ The Right Way to Use This Information

Use this data to identify any gaps in the story you may be telling (i.e. words you’re not using) to create the best odds for your content to rank.

Why You Want This Data Before Your Competition

If you’re currently embroiled in a competitive battle over spots in the SERPs, this is the leg-up you were looking for.

TF-IDF Data can help you make more informed SEO decisions and help you create better content. Or, you can use it to identify why some of your content creation work hasn’t paid off and you’re not seeing the traffic or ranking wins you had hoped for.

We sunk a good amount of time and effort into a few long-form pieces of content, only to see some disappointing results. We were left scratching our heads a bit. These were (in our humble view) high-quality pieces, with valuable and original content. We meticulously researched and used primary and secondary keywords.

However, after a quick TF-ID analysis, we discovered a few gaps. We saw that the top-ranking posts that we were trying to overtake were more complete and they had several terms that we simply did not.

This research uncovered sub-topics that we did not cover. It was a real lightbulb moment.

Simply put, these are the fresh and deep insights you need to claim SEO space from your competition. You can use it for a boost to leapfrog them in the rankings and send more traffic to your site. 

Vicious SEO From Outta Nowhere

If you didn’t know about TF-IDF’s value in SEO, you could be left staring at your reports wondering how your competition came from out of nowhere to overtake you. 

That’s because the changes you make with this data are small and subtle. They’re almost imperceptible. People can see that you suddenly started adding (or retroactively adding) keywords to your title tags or headings. That’s an obvious change and it’s a clear sign of why you/your competition could be gaining ground in the rankings.

But simply adding a few (seemingly benign) words to your copy? Even a hummingbird couldn’t catch that work.

The Evolution of Keyword Research Over the Years

We truly believe that TF-IDF is not just a fad, but a legit turning point in the journey to a more evolved internet. For the past few years, Google has sought to reward the people and companies who are offering the most complete user experience.

To give you a better idea of where we are going, it may be helpful to take a look at where we’ve been.

Here is a brief overview of the history of SEO and keywording.

1991- 2000: The Wild West

The early internet seems laughable to us now. Dial-up modems, AOL CDs, and webpages that looked like they were made in a Word Document.

There was no real law in this wild, wild west. The entire world was trying to figure out what this newfangled internet thing was and businesses tried to figure out how to ride this wave.

Many of the spammy techniques we still (sigh) see being used today were born in the early days … because they actually worked back then. Keyword stuffing and getting meaningless backlinks could actually help you!

It’s important to remember that we didn’t have to worry about Google updates, because Google hadn’t taken over the world yet. Google wasn’t even founded until 1998. With no defacto choice, search engines were a matter of personal preference and you could choose from:

  • HotBot
  • Altavista
  • Excite
  • WebCrawler
  • Ask Jeeves
  • Yahoo

The content was pretty much all text-based. Our dial-up speeds were incredibly tedious, so easily consumable videos and pictures were not even close to on the radar yet.

Keyword research was very primitive, as was optimization. There was an overall feeling of, “If it works, keep doing it,” with no real playbook besides the one you wrote.

Spammers, keyword stuffers and other black hats were able to thrive in the 90s because there was no central adjudicator to make them stop. Again, there were half a dozen prominent search engines and their algorithm updates were all very slow to roll out, so you could get away with a lot for a long period of time.

Even the biggest brands in the world were guilty of keyword stuffing and using link schemes. But more on that later…

2000-2007: One Search Engine to Rule Them All

Somewhere around the turn of the century (it’s hard to lock down an exact date), Google started to grow from a start-up company to a household name. You didn’t look things up anymore, you Googled them. This forced the other search engines out of the picture. Excite declared bankruptcy in 2001, and pretty soon, the rest fell off of the map. Google’s algorithm became the only one marketers needed to care about.

As Google grew in size and reach, it also became more sophisticated. This is when we first started to hear “content is king” as the old tactics of simply stuffing and spamming were now being shunned in a more evolved internet.

It wasn’t just small businesses that had to adapt. Major international brands were being called out for shady tactics to gain online traction. For example, BMW was completely removed from Google’s index for massive keyword stuffing and using doorway pages.

There was now a rulebook, and we all had to follow it. Keyword research now had to be much more complete and methodical, as did the way you used keywords in your content.

2008 – 2011: The Age of Enlightenment 

Google was evolving. Google’s Universal Search now blended a search into a new streamlined experience, and users were finding what they wanted in less time than ever.

Marketers had to do a lot more to earn a piece of that traffic. However, they now had the tools to do it. Google’s algorithm updates were now a 2-way conversation, as their Webmasters Blog would give us a fair warning with transparent (yet secretive) updates about what changes were on the horizon.

At the same time, marketers now had Google analytics and other tools to tap into the keywords they needed to zero in on, and the playbook to do so.

Organic SEO started to take shape, as an art and science.

2011- 2014 Content is Truly King

Marketing teams no longer simply had to worry about “SEO.” There was now different aspects that you needed to blend to build an entire web presence.

Your keyword research would now have to include:

  • On-page SEO (Your website and blog)
  • Off-page SEO (Link building, guest posts and influencer marketing)
  • Social media (Facebook, Twitter, and YouTube)
  • Paid search (Google Adwords and PPC)

Google’s guidelines were now more than the law; they were The Commandments. Nobody was above them, as JC Penny was exposed for building their web presence via link schemes in 2011. The same year, the once-prominent disappeared from the SERPs they once dominated after they were discovered exchanging discounts for links.

2014 – Present: Living in a Mobile-first World

The widespread use of smartphones and voice searches has placed more importance on longtail keywords. Would-be customers are now Googling complete questions and the SEO wins go to the companies who provide the complete answers.

Smarter keyword research tools like ahrefs and SEMrush now help companies do more granular research, allowing companies to pick their SEO battles more methodically. You can now focus less on high-competition search terms (ie. “iPhone”) to more specific lower-hanging-fruit (“cracked screen on an iPhone 6”).

The name of the game is organic. The Google algo is now rewarding people and businesses who earn traffic through organic content. At the same time, you’re now able to use your keywords in your content more organically, without having to use exact match.

We’re also now living in a mobile-first world, where your mobile site has to be better than your desktop site. This is placing a newfound focus on tapping into the exact search terms your target audience is searching for while they’re on the go. You also have to maximize every pixel of space on a tiny mobile site with SEO-rich (but not stuffed) copy and keyword optimized images.

Parting Thoughts

As we like to say around here, “The Google algo has never made more sense than it does today.” 

The TF-IDF algo also makes a lot of sense. However, to content creators, it’s less of an exact match formula and more of a checklist to ensure a given piece of content is covering all of the ideas your user will expect.

Looking at the density of TF-IDF data to find ‘supporting words’ is massively helpful. However, if you try to replicate those exact words, in those exact ratios, you will ruin your content. It’s just like how trying to stick to a strict 2.5% keyword ratio would ruin your content. Also, neither will lead to any SEO wins. 

Use these new terms as organically as possible. Like your actual keywords, it is probably best to know your supporting words in advance of even starting to write a piece. That way, you’re not scrambling to retroactively add them, which can lead to awkward sentences, choppy content and perceived ‘stuffing.’

As always, it helps to work with an experienced SEO firm who can guide you through thorough TF-IDF research and help you make it a part of a comprehensive SEO strategy. If you want to use TF-IDF to supercharge your keywords, we would love to talk about it.

Request A Quote