Beyond keywords: How entities impact modern SEO strategies
Learn the paradigm shift from keywords to entities, entity SEO's importance in modern search and ways to adapt and thrive in this new era.
This article was co-authored by Paul DeMott.
The transition from “search engine 2.0” to “search engine 3.0” has brought significant changes, particularly with the introduction of entities.
This article explores these shifts, the impact of entities on modern SEO and how to adapt your strategies to thrive in this new era.
Building your own SEO ‘notional machine’
In my early years of learning to code, a teacher introduced an impactful concept known as “notional machine,” which reshaped my approach to programming and, later, SEO.
Simply put, it’s a developer’s approximate mental model of what happens inside the computer when they click run.
My teacher emphasized that the more detailed and accurate this mental representation was, the better equipped I would be to tackle new problems.
The most successful programmers were those who had developed the most accurate and reliable notional machines!
Drawing a parallel to SEO, when we absorb new concepts, examine a case study, or observe the impact of a change, we are continuously updating our mental model (our own notional machine) of how search engines work.
The difference between skilled SEO’s and unskilled SEOs is that they can drive results because they can pull solutions from a more accurate model.
Research in the field of expertise conducted by Anderson Ericsson provides substantial evidence to affirm this point.
His studies on expertise reveal that those who excel in their fields possess superior and more readily accessible mental models.
These models enable them to understand the intricate cause-and-effect relationships, distinguish what truly matters in a complex scenario, and perceive the underlying processes that are not immediately apparent.
With the introduction of entity SEO, several major components within Google’s search engine were altered.
It appears that many SEO professionals still operate under the rules of “search engine 2.0′, even though “search engine 3.0” now follows a slightly different set of rules.
Entity SEO introduces vocabulary and concepts that originate from machine learning and information retrieval disciplines.
These terms may seem complex because they have not been simplified into their core meanings. Once we distill them, you’ll find the concepts are not overly complicated.
My goal is to construct a simple yet effective notional machine of how the latest search engines use entities.
More specifically, I want to illustrate how your understanding of SEO needs to be updated to reflect this new reality.
While understanding the “why” behind these changes might seem unimportant, many SEO professionals effectively “hack the matrix” by using their understanding of how Google interprets the web to their advantage.
In recent times, people have built million visitor sites and transformed Google’s understanding of subject matter by manipulating these concepts.
Refresher: How we arrived at search engine 2.0
Before exploring the differences between “search engine 2.0” and “search engine 3.0”’, let’s review the core changes from the initial version 1.0.
In the beginning, search engines operated on a simple “bag of words” model.
This model treated a document as a mere collection of words, neglecting the contextual meaning or arrangement of these words.
When a user made a query, the search engine would refer to an inverted index database – a data structure mapping words to their locations in a set of documents – and retrieve documents with the highest number of matches.
However, due to its lack of understanding of the context and semantics of both documents and user queries, this model often fell short of delivering relevant and precise search results.
For example, if a user searched for “jaguar” using a “bag of words” model, the search engine would simply pull up documents containing the word “jaguar” without considering the context.
This could yield results about the Jaguar car brand, the jaguar animal, or even the Jacksonville Jaguars football team, irrespective of the user’s intent.
With the advent of “search engine 2.0,” Google adopted more sophisticated strategies. Instead of just matching words, this iteration aimed to decipher the user’s intent behind their query.
For instance, if a user searched for “jaguar,” the engine could now consider the user’s search history and location to infer the likely context.
If the user had been searching for car models or resided in an area where Jaguar cars were popular, the engine might prioritize results about the car brand over the animal or the football team.
Introducing personalized search results – considering factors like user history and location – significantly enhanced the relevance and precision of search results. This marked a significant evolution from the basic “bag of words” model to “search engine 2.0”.
Search engine 2.0 vs. 3.0
As we transitioned from “search engine 1.0” to “search engine 2.0”, we had to update our mental models and change our practices.
The quality of backlinks became crucial, prompting SEO professionals to abandon automated backlinking tools and seek backlinks from higher-quality websites, among a number of key changes.
In the era of “search engine 3.0”, it’s clear that the mental shift to accommodate these changes is still in progress.
Many concepts from the 2.0 era persist, largely because practitioners need time to observe the correlation between their adjustments and the subsequent outcomes.
A significant number of SEO professionals have yet to fully adapt to these substantial changes, or they may have made attempts to do so but haven’t quite hit the mark.
To clarify these new distinctions and provide guidance on modifying your approach, I will present an oversimplified yet useful comparison of “search engine 2.0” and “search engine 3.0”.
Query processing and information retrieval
Imagine typing the search query “Elvis” into Google.
In the era of Google’s search engine 2.0, the sophistication of the underlying algorithms enabled an understanding of user intent behind a query, beyond just matching keywords.
For instance, if a user searched for “Elvis”, the system would use natural language processing and machine learning to understand and anticipate the intent behind the query.
It would look up “Elvis” in its index and return results that mentioned the word “Elvis” or were based (almost completely) on the relevance of copy on webpages, and personalization parameters such as user history and location.
However, this model still had its limitations, as it was largely dependent on keywords, user search history, location and phrases within the text of indexed webpages.
The context of “Elvis” could mean Elvis Presley, Elvis Costello, or even a local restaurant named “Elvis”.
The challenge was that it largely relied on the user to specify and refine their query, and was still limited by the semantics of the keywords.
Query processing improvements in 3.0
Many people don’t yet realize how fundamentally the introduction of entities revolutionized the way search worked.
Since 2012, Hummingbird and RankBrain have paved the way for entities to take a more central role.
In this 3.0 model, entities refer to distinct and unique concepts or things, whether they are people, places, or objects.
Using our earlier example, “Elvis” is no longer simply a keyword, but recognized as an entity, likely referring to the famous musician Elvis Presley.
For instance, when an entity like “Elvis Presley” is identified, the search engine can now associate a wealth of attributes with this entity, including aspects such as his music, his filmography, and his birth and death dates.
This new approach significantly broadens the scope of the search. Before, a query for “Elvis” might primarily consider about 2,000,000 pages that contained the exact keyword “Elvis.”
Now, in this entity-centric model, the search engine looks beyond this to consider any pages related to Elvis’s attributes.
This could potentially widen the search field to include 10,000,000 pages, even if some of them don’t explicitly mention “Elvis.”
Moreover, this model allows the search engine to understand that other keywords related to the attributes of the Elvis entity, such as “Graceland” or “Blue Suede Shoes,” are implicitly connected to “Elvis”.
Therefore, searching for these terms could also bring up information about Elvis, widening the net of potential search results.
Get the daily newsletter search marketers rely on.
Query processing and topic boundaries in search engine 3.0
Another significant shift brought about by these improvements to entities in query processing was how Google perceives the scope of topics that should reside on a single page.
In the "search engine 2.0" era, it was advantageous to create a separate page for each identified keyword, so that the page could be optimized for that term specifically.
However, in "search engine 3.0", the boundaries have become more fluid and are updated in real-time based on machine learning predictions and observed user behavior.
In this new era, the boundaries of a subject matter can be vast or narrow, covering a wide range of topics or focusing intensely on a particular aspect. This flexibility allows for websites to become authorities in both broad and niche areas.
Consider the example of crayons. One website might aim to cover all there is to know about crayons in general – their history, types, manufacturing process, usage tips, etc.
This website aims to become a topical authority on 'crayons' as a whole.
On the other hand, another website might focus solely on red crayons – their unique pigments, popularity statistics, cultural significance, and so forth.
This site is trying to establish its topical authority in a narrower context, but still a valid one. It is crucial, however, that the focus on 'red crayons' aligns with the overall purpose of the website.
Adding micro-contexts that don't match the broader purpose of your website might confuse Google about the relevance and authority of your site, potentially diluting its topical authority.
In theory, a website could even delve further into a micro-context and center its content solely around the “labels used on red crayons.”
This is an incredibly specific focus, and one might wonder if Google would recognize it as a topical authority.
Social media websites use machine learning to predict user interactions with content items related to a certain topic.
If a user frequently interacts with content about “labels used on red crayons,” the system might identify this as a topic of interest for the user, and the website providing the content could be recognized as an authority on this topic.
It can be theorized that Google can do something similar or at least maintain expectations of how good content should perform according to the user metrics they track.
To determine this, Google considers several factors:
Is there a significant amount of search activity around this topic?
If people are actively searching for information about the 'labels used on red crayons,' and the site provides comprehensive, valuable content on this topic, it could very well be recognized as a topical authority in this micro-context.
Are there good user metrics?
If users spend a long time on the site, have low bounce rates, and demonstrate other signs of engagement, Google may interpret this as a sign of the site's authority on the topic.
Remember, topical authority is a concept based on the relativity of different subjects (entities). Your site can be considered a topical authority on subjects as broad as 'technology' or as narrow as “vintage typewriters.”
The crucial factor is that your site exhibits positive user behavior and employs entities effectively to establish relationships within the content. By doing so, Google begins to rely on your site to enhance its own understanding of the subject matter, regardless of the overall search volume of the topic.
SEO applications and takeaways
More comprehensive content wins
In prior versions, many webpages were overlooked for queries because they did not contain the exact words included in the search.
For instance, a well-linked page that didn't incorporate a particular search term would not appear in the results, regardless of its other robust ranking factors, such as user engagement and backlinks.
This encouraged SEO’s to write fewer more focused content pieces to achieve rankings for a target keyword.
However, with the advent of 3.0 and its focus on understanding entities and their relationships, the game has changed.
It's not about whether the exact search term appears on the page. Google will now search for relative entities on your page, and attempt to link these entities to related entities throughout your entire site.
It will then determine approximative relativity and rank you accordingly. This fundamental shift brings pages with strong ranking factors into the competition, even if they're missing specific terms
The key takeaway for content creators and SEO strategists is to lean toward creating more comprehensive and expansive content.
Centralize your backlink efforts on these broad, in-depth pieces instead of splitting topics across multiple narrow-focused articles.
Use current SERPs as a starting point to identify important topics, but don't be confined by them.
Aim to go beyond the existing topical coverage in SERPs and provide valuable, comprehensive content to the user.
This will cater to the user's existing query and potential related queries they might have, ultimately boosting your content's relevance and visibility in this new era of search.
Answer intent instead of focusing on keyword usage be careful with headlines
In the "search engine 3.0" era, SEO strategy has evolved. It's no longer enough to simply insert keywords from your Search Console report into your content and hope for improved rankings.
Google's advanced algorithms can now detect when a keyword is used out of context, which can confuse the algorithm and potentially lead to lower rankings.
Header order matters
Use your brain to connect the key ideas most relevant to your page’s goal. Make sure the content under the header matches the topic of the header.
Remember the days of brainstorming for your writing classes in elementary school?
We would draw out circles, write topics within the circles, then link them by drawing a straight line to smaller circles with relative topics to our story.
Don’t overcomplicate things. Use this strategy to form your headings, too.
In short, "search engine 3.0" requires a more thoughtful approach to keyword usage, addressing user intent and maintaining context to improve relevance and ranking potential.
Scoring and ranking the documents
Once a search engine like Google has fetched potentially relevant documents, the next crucial step is to score these pages and rank them for a user to select.
The evolution of artificial intelligence (AI) and natural language processing (NLP) has significantly transformed the way documents are ranked, marking a clear distinction between the 2.0 and 3.0 eras.
2.0 era (post-bag-of-words, pre-RankBrain)
In the 2.0 era, Google's scoring system was primarily driven by algorithms like PageRank, Hummingbird, Panda, and Penguin.
These algorithms relied heavily on keyword matching and the number of backlinks to rank documents. Each document would get a score based on the pages and be sorted based on rank order.
Algorithm evolutions like Panda and Penguin were less about moving away from keyword matching and more about penalizing sites trying to game the system.
Keyword-based systems were still more efficient, and the hardware wasn’t advanced enough to deliver fast search results with evolved language methods.
Scoring and ranking in the era of search engine 3.0
In the "search engine 3.0" landscape, Google's approach to scoring and ranking documents has evolved significantly.
This is the result of both software and hardware improvements. Google assesses a page's suitability for a search query based on several key factors.
The key difference is an improved ability to quantify relevancy, rather than relying on outside signals like backlinks to identify the best content pieces:
Factually accurate content from reputable sources continues to rank higher. Google's Knowledge-Based Trust confirms this, stating:
"We call the trustworthiness score we computed Knowledge-Based Trust (KBT)... Manual evaluation of a subset of the results confirms the effectiveness of the method.
User interaction signals
"Post low-quality content now and edit later" strategies can be problematic for these reasons. Google now considers both historical and present user engagement data associated with a webpage.
This shift is outlined in Google's patent titled "Engagement and Experience Based Ranking" (US20140244560A1), which emphasizes the use of historical engagement scoring as part of their ranking considerations.
Engagements, such as long clicks where a user stays on your page for a significant amount of time, are beneficial.
However, non-quality engagements, like quick returns to the search results (known as "pogo-sticking"), can negatively impact your ranking.
These engagement metrics can influence your ranking position and impressions, boosting your topical authority.
However, poor user engagement can lead to a drop in your page's ranking. Recovery from such a drop can take time, underscoring the importance of consistently providing high-quality, relevant content that encourages positive user engagement.
SEO takeaways and applications
Google can check factual accuracies. Invest time in creating factually accurate content.
This includes proper research, fact-checking, and citing reputable sources. Implement fact-check schema to build credibility, and relevance, for your informative articles
Pay attention to your page's user engagement metrics. If your content isn't engaging users as expected, consider revising your content strategy.
Crawling and indexing
As we wrap up our exploration of the search process, let's look at how Google's web crawling and indexing techniques have evolved with its focus on entities.
Understanding these changes is crucial as they directly impact how you should structure your website and formulate your content strategy, including constructing your topical map.
In the "search engine 2.0" era, Google's web crawlers, also known as spiders, systematically browsed the internet to discover new and updated pages.
They would follow links from one webpage to another and collect data about each page to store in Google's index. This process was primarily about discovering new content and ensuring the index stayed up-to-date.
Once the crawlers discovered a page, it was added to Google's index – a massive database of all the webpages Google has found.
The content of each page (including text, images, and videos) was analyzed, and the page was categorized based on this content.
The primary focus was on the keywords and phrases within the text and factors like backlinks, which were used to determine a page's relevance and authority.
Fast forward to the "search engine 3.0" era, and things have become more complex.
Google's crawlers are still discovering new and updated pages by following links across the internet. But now, they're also trying to understand the entities that the keywords on a page represent.
For example, a page about "Elvis" might also be indexed under related entities like "rock and roll music," "Graceland" and "Blue Suede Shoes."
Additionally, they are following your internal links to understand which entities your site relates together.
This is a bit like a librarian not just cataloging books based on their titles but also reading them to understand how the chapters relate to each other and to the book's overall theme.
This deeper understanding helps Google deliver more relevant and precise search results.
But how does crawling relate to topical authority and entities?
Well, when Google crawls a website, it's not just looking at the individual pages in isolation anymore. It's also looking at the overall theme or topic of the website.
This is where topical authority comes in.
If a website consistently publishes high-quality content on a specific topic, it can be seen as an authority on that topic.
If Google deems the site an authority, it can boost it in the search results. (Often, you’ll see sites with small backlink profiles ranking across competitive terms, likely due to a topical authority score boost they are getting.)
Interestingly enough, the concept of topical authority has been around for at least a few years now, but it’s only recently been acknowledged by Google.
On May 23, 2023, Google published “Understanding News Topic Authority.”
Although many seasoned SEOs believed that topical authority was a ranking factor, no one could verify this through Google-published content (outside of digging through pending patents).
Don’t be misled by the word “news” in this release. Topic authority relates to all sites on the web that Google crawls, not just news sites.
This concept of topical authority is outlined in Google's patent US20180046717A1.
The patent describes a process of determining a website's authority based on the consistency and depth of a specific topic within the site.
For example, a website consistently publishing high-quality content about "organic gardening" may have a high purity factor (yes, Google looks at your site's ability to stay on topic), contributing to a higher authority score.
Moreover, Google can extract the main themes from your content and graph your content, much like ChatGPT graphs words in embeddings (a feature vector).
This allows Google to visually see if your content is similar and consistent, further enhancing its understanding of your website's topical authority.
So, in essence, the shift in Google's indexing system is not just about understanding the content of individual pages but also about recognizing the topical focus of a website.
This underscores the importance of maintaining a consistent focus in your content strategy, as it can significantly impact your website's visibility in search results.
SEO takeaways and applications
Consistent topic focus
Google can identify when your site deviates from its main topic. If your content is inconsistent, it can confuse your website's purpose and objective.
Maintain a consistent focus in your content strategy to benefit from scoring boosts associated with topical authority.
Building depth in your content is key, but it should be relevant depth. Use your understanding of your site's primary purpose to guide the depth of your content.
For instance, if your site's primary purpose is to provide information about digital photography techniques, don't divert to writing in-depth about the history of film cameras.
While it's related to photography, it doesn't align closely with your site's primary focus on digital techniques. Instead, deepen your content by exploring various digital photography techniques, reviewing digital cameras, or providing tips for editing digital photos.
Too much content may dilute your authority
Too much content on your website can dilute the meaning and purpose of your website.
Comb through your sitemap and make sure it only includes content that supports your key ideas and that the content is quality enough to help Google understand entities.
Use of contextual bridges
When creating new content, it's important to use "contextual bridges" to connect it back to the primary purpose of your site.
Instead of simply adding new content to your website, always ask yourself how you can tie a new page back to your primary goal.
This will allow Google to begin associating your new page’s entities with your primary goal entity.
Limitations and constraints of topical authority
While we want to focus on building topical authority across any site we create, there are still some limitations.
These limitations are lingering ranking factors from the Web 2.0 days to which Google still grants a reasonable amount of ranking power: time on the web and backlinks.
First of all, topic authority takes time to build. With the recent explosion of AI content creation tools, this timeline can be drastically shortened, but it still takes time.
The use of topical authority is also relative to exactly how ‘authoritative’ other sites in your niche are.
For example, if you create great content based on an incredible topical map, you will still be compared to other sites in your niche.
If these other sites also have developed a great topical authority over time, we then defer to the age-old problem of backlinks and time on the web.
It is extremely difficult to outrank sites that have developed a great entity development and have done so on a domain that has been on the web for several years or longer. Possible, sure, but difficult nonetheless.
Let’s talk about backlinks.
While it is quite possible to build out sites that rank well without the use of backlinks, even seasoned SEOs may struggle doing so.
Backlinks are still a very important ranking factor. Sure, they might not be as powerful as they once were, but they are still powerful.
The problem with giving such a huge amount of ranking power to backlinks comes from large news conglomerate sites that don’t actually "specialize" in any topic.
We’ve all seen it: we Google "best widget for xyz" and the first 10-15 results are news network sites that all claim to have the best guides for buying these widgets.
Do the news sites specialize in the development or selling of these widgets?
Do these news sites have topical authority when it comes to these widgets?
Not at all.
If the news sites don’t have topical authority over these widgets, why are they still dominating SERPs? It comes down to time on the web, and backlink profiles.
Since editors of these large news networks know they will rank extremely high once they click the publish button, they solicit the sale of ad space on their sites.
Companies also know their product will come up toward the top of the Google SERPs, so they willingly pay thousands and thousands of dollars for this feature.
They are, in essence, leaching off of the ability of the news site to dominate SERPs whenever they publish anything – hence the name parasite SEO.
No matter how topically authoritative your site is, it will have difficulty competing with these news site powerhouses.
Unfortunately, until Google addresses this issue, becoming a topical authority isn’t enough to compete with some of these hot SERPs that news sites dominate.
Mastering SEO in the age of entities
Hopefully, by guiding you through the journey from query processing to indexation and ranking, I have helped you update your "notional machine" to better account for the newest changes to the Google search engine.
This refined understanding should help improve your tactics, where you focus your time and rankings of your own website, and those of your clients.
Lastly, it's crucial to remember that theory truly shines when applied in practice.
For instance, affiliate SEO practitioners discovered quite some time ago that producing a substantial amount of content on their subject matter could trigger a topical authority SEO boost.
This was realized long before the evolution of our understanding of entity SEO came into play.
The journey of SEO is always evolving, full of opportunities for discovery and improvement.
So, armed with this knowledge and insights, it's time for you to dive in, experiment, and shape your own SEO strategies. After all, the proof of the pudding is in the eating. Happy testing!
This is the third article in the entity SEO series. If you would like to start by reading the first two articles, they are linked here:
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.
New on Search Engine Land