How to build a data-driven SEO strategy using NLP

Natural language processing – or NLP – is becoming increasingly prominent in the world of SEO. But how can marketers harness the power of NLP to build a truly data-driven SEO strategy? I discussed this last week at the BrightonSEO conference, and have put together a handy summary of the key points from the talk.

What is an SEO strategy?

A successful SEO strategy needs a transparent narrative, a strategic direction and a plan to improve organic search traffic and drive relevant users. There are four key pillars to consider:

  1. Growth: Identifying opportunities and new territories to target
  2. Optimisation: Maximising your current site’s organic search footprint
  3. Utilisation: What your activities are going to look like and how you prioritise
  4. Measurement: How you’re going to measure success and forecast ROI

When we say SEO, we mean more than just keywords

Within SEO there are four key areas to consider:

  • Intent: How and why users are searching for keywords — the three overarching categories being transactional, informational and navigational
  • Format: How well you meet the needs of the user in accordance with the SERP — is it a product page, listicle or FAQ? 
  • Authority: Is your content receiving citations and links from authoritative sites?
  • Accessibility: This involves the more technical aspects — such as speed and core web vitals – but also the text on the page.

Creating a narrative

Being able to clearly communicate SEO activities to peers, clients and the wider masses is crucial. If someone asks you what the SEO strategy is, you don’t want to have to pull out a spreadsheet or a deck with hundreds of slides. Create a summary like this:

  • State of play: Where you are now
  • Action: What you are doing
  • Climax: Where you’re going to be

We’re finding now that SEO is naturally getting harder because search is constantly changing. With core changes to search algorithms like BERT and other elements such as voice search becoming more powerful, Google and other search engines want us to be more conversational with machines.

This means that search queries are becoming broader, and that means more data. Broad keywords that get lots of search volume are now being siphoned into longer tail keywords as they better match user intent. Figuring out the intent of search queries to serve relevant search results works, and according to the American Customer Satisfaction Reports, 79% of Google’s users were satisfied with their results.

These improvements are oriented around improving language understanding, particularly for more natural language/conversational queries, as BERT is able to help better understand the nuances and context of words in searches and better match those queries with helpful results.

This is particularly relevant for longer, more conversational queries or searches where prepositions like “for” and “to” matter a lot to the meaning. Search will be able to understand the context of the words in your query, so you can search in a way that feels natural to you.

What is natural language processing – and why does it matter to SEO?

Natural language processing (NLP) is a field in AI that gives machines the ability to read, understand and derive meaning from human languages. It allows companies to perform efficient indexing on masses of unstructured text, and distil relevant information.

It’s so necessary for these reasons:

  • Search is changing – Post-BERT, Google is tapping into more natural conversational searches that will become harder to quantify.
  • There’s an abundance of data – Now more than ever, we have access to huge amounts of data sets from Google Search Console to Semrush.
  • Search is getting ‘messy’ – However, with all this data at hand, getting the actionable insights we need at a top level like keyword intent is rather time-consuming.
  • We need critical insights – You can save your brain some computing power and gain core strategy recommendations plus actions.

The good news is NLP is becoming more accessible and there are now rich resources on how to use it.

Python, the highly dexterous programming language, is essential for automating and processing these objectives. It’s so beneficial because of its simplicity and the support it has within the SEO community. You don’t have to be a developer to use it either, just searching for “python for SEO” will give you a plethora of resources online.

Alongside Python, you can use Google Colab, the Google research product which emulates a Jupyter Notebook that is easily shareable and usable in browsers. You can collaborate, share your projects, build on others and take advantage of the communal aspect.

So let’s now take a look at some of the ways in which you can use NLP to inform the four key pillars of any SEO strategy – growth, optimisation, utilisation, and measurement.


Topic modelling, clustering and taxonomy

Categorising keywords gives you a top-level view that can show you areas with the most volume but can also inform things like information architecture. You’ll find that many keywords are semantically similar, and variations of keywords may bring near similar results, so you want to be able to cluster these queries together (using Python). For instance, looking at all search queries containing ‘hat’, and clustering these together gives the following results.

SEO strategy using NLP

You can also use Querycat, the demo repository created by JR Oakes, which is helpful for the categorisation of keywords and gives you the ability to use BERT for visualisation. It has a Google Colab notebook with the scripts already set up.

The Apriori algorithm is also relevant here. It uses Association Rule Mining (ARM), a machine learning method that is able to build correlations between itemsets. ARM is used across the web for things like recommendation attributes for purchases or content. Using this on a set of keywords allows you to categorise keywords within a strict limit.

Quantifying your data

By quantifying all your data, you can start to see patterns and areas that you may want to target. You can even use this information for building out your information architecture and taxonomy.

Knowing which keywords to target doesn’t tell you the full picture and you’ll want to be able to extract the user intent for a better understanding of what type of content or page that you need to create. By using a pre-trained data set combined with Ludwig (Created by Uber engineering) and powerful Google machine learning products like Tensorflow, you can map each keyword to a specific intent. Ranksense has a handy guide on how to do this.


Using data based on current rankings either via paid tools or even using Google Search Console, you can see key areas to tackle. Using the same method of topic modelling mentioned earlier (Apriori algorithm), you can create a quantified footprint. Mix that with intent classification and you can get key insights into what your activities should focus on.

Here we come across the concept of entities and the Google patent called ‘Question answering using entity references in unstructured data’: where an entity may be a person, place, item, idea etc. Google uses entities to find broad relationships with keywords, e.g. in the example below, James Joyce matched to being a writer, which in turn helps with matching intent and contextual search.

SEO strategy using NLP

Using this we’re able to see entities possibly known to Google, which is powerful in knowing how Google might interpret the text. You’ll also see a salience score under each entity within the panel, which indicates the NLP model’s idea of relevancy of the entity within the entire document.

Advanced: Entity matching

Combining scraping content from your competitors with your own content and comparing via Google’s natural language tool, you can see if your content is aligned with what surfaces on the first page of SERP.

  1. Map your URLs to your target keyword
  2. Scrape content from your site using the web scraping tool Trafilatura
  3. Use Google’s natural language tool to extract the main entity on the page with a salience score
  4. Then using your target keyword as the query, plug this into Querycat to gather the URLs for the first page of Google
  5. Scrape this content using Trafilatura and the main entity/salience score

What you should be able to see is whether the content you have matches with the competitor URLs, and this should help you to optimise accordingly. This will give you an overall idea of how much work needs to be done to pages in order to align with user intent.


Actioning insights from Google’s Natural Language Demo

To start putting insights from Google’s Natural Language Demo into action, there are several key areas you will need to look at:

  1. What does the tool say is the most relevant entity in your text? Does it relate to the keyword(s) targeted?
  2. Is there a main entity in the text that isn’t being picked up?
  3. Study the structure of competitor URLs and compare its relation to your own copy.
  4. Look at the key attributes of language use such as passive, assertive, positive, neutral and negative.

Is it able to semantically relate the text back to a relevant category?

Sentiment analysis

Sentiment analysis “inspects the given text and identifies the prevailing emotional opinion within the text, especially to determine a writer’s attitude as positive, negative, or neutral.” This helps with understanding the intent behind search and how machines are interpreting the consensus. It gives a score of the sentiment range from -1.0 (Very Negative) to 1.0 (Very Positive).

SEO strategy using NLP


Syntax looks at how copy is structured and the relationship it shares within the context. It can give a really good insight into how NLP is able to distinguish attributes within the dataset and how it is able to find non-linear connections in sentences. It is also able to classify the content based on the text with a confidence score.

Summarising the text with Tensorflow

Tensorflow is an extremely powerful open-source platform for machine learning created by Google. We can utilise this quite easily by using the transformers created by the huggingface team. The one we are interested in is the SummarizationPipeline, where we can use T5 (text-to-text transfer transformer) using the Tensorflow framework.

Create meta descriptions using BERT

If you’re new to Python, Andrea Volpini created an excellent article on how to do this with a step-by-step Google Colab template, allowing you to create meta descriptions from your website in minutes.



If you’ve followed the method for topic modelling, you should be able to find key areas you’d like to target. Looking back at our hats example from earlier:

  • For penetrating new territories, you could set a KPI to gain a footprint within the “bucket hat” sphere for instance
  • For optimisations you could look to improve click-through rate for branded terms.

Essentially it’s about setting the scene and creating that narrative. 

Topic modelling and forecasting

Utilising a automated time series model like Facebook Prophet can bring you powerful insight into where specific categories of your SERP footprint is going.

You’ll need to have previous data connected with dates with each category, which is easy to pull from Google Search Console. This can then be plugged into Facebook Prophet via Python, and plot with matplotlib. Set an upper or lower yield for certain dates based on SEO activity. If you’re new to using Facebook Prophet, Ahrefs has an excellent article – How to Use Data Forecasting for SEO – with the scripts included.

SEO strategy using NLP

What’s next?

There’s a huge amount of potential for SEOs to harness the quality that NLP can bring to any site. 

It can be used to create meaningful content strategies at scale, as well as assist content writers by creating automated content briefs with detailed SERP analysis. It can also feed into schema opportunity spotting. The possibilities are endless.

If you want to know more, get in touch.

by Croud
31 March 2021



Related posts