Have you ever wondered how a search engine like Google thinks and ranks content?
Search engines do what we call “crawling” to find and index pages and display them to users when they search for something. But how does it work exactly?
In this article, we will explore the working of search engines, particularly Google, and understand how it thinks and ranks content. Let’s first briefly understand what a search engine is and some of its basics.
The term “search engine” is broad. It does not necessarily mean Google or Bing. Because, by definition, a search engine is a program that searches for items in a database based on user queries.
The search engines designed for searching something on the internet—Google or Bing—have a similar job. For example, a search engine like Google retrieves webpages from its database, which is called Google’s index. When you search for a keyword, like “buy coffee,” the search engine searches through its index of billions of webpages and identifies pages that closely match the keyword and its intent, presenting to you the most relevant ones to visit.
A search engine has two main components through which it works:
Search index
Ranking algorithms
A search index is a library or directory of billions of webpages that the search engine has been able to find and read. When a user searches for something, the search engine uses its search/ranking algorithms to find the best results and present them to the user.
Also called search algorithms, ranking algorithms are computers that evaluate the webpages within the index and the order in which to display them to users. These are hundreds of algorithms that work together to sort the best possible results for users.
For example, Google’s ranking algorithms evaluate hundreds of factors to rank webpages, including relevancy and recency.
A search engine uses ranking factors, which are criteria it used to evaluate and rank webpages.
Every search engine has its own ranking factors; and there can be multiple of them. Nobody knows what these ranking factors are except for what search engines reveal, so that people don’t game the search engine and exploit its ranking system.
However, people are able to “guess” some of the ranking factors of search engines based on analysis and trial and error. For example, a Google ranking factor we know is the amount of external links pointing toward a webpage, which make it more likely to rank. This is proven by many analyses.
Web crawlers are automated programs that find and collect webpages. These are known as bots, crawlers, or even spiders (for a more sci-fi feel), whereas Google’s crawlers are called Googlebots.
A crawler travels from one page to another page via a link that points to that page. For example:
Suppose Webpage-A has a hyper link to Webpage-B.
When crawling Webpage-A, Google’s crawler discovers the link to Webpage-B.
It crawls Webpage-A but also travels to Webpage-B and crawls it as well.
Likewise, Webpage-B might have links to Webpage-C and -D, and so on. This is how Google and other search engines continue to discover newer webpages from existing ones.
What happens when a webpage is crawled?
When a page is crawled, the crawler downloads the webpage—its text and media. In addition to downloading, Googlebot also renders a page to see if it looks exactly how users would perceive.
When we say how Google thinks, we mean how it ranks and presents the webpages. This is a three step process, including the following stages:
Before Google can rank and present a webpage to users, it has to discover and collect that page. This process of discovering and collecting webpages is known as “crawling.”
Crawling happens in different ways. Here are three most common ones:
Following Links and Existing Pages: As explained, a crawler can locate and use links on existing pages to discover newer and undiscovered pages. This process happens automatically and continuously as the search engine’s crawlers continue to explore the web. However, finding and indexing individual pages this way could take time because of the sheer vastness of the internet.
Direct URL Submissions: Google’s search console allows webmasters/site owners to directly submit a webpage URL for crawling. It is like requesting Google to index that specific page, which gets queued until it is finally crawled and indexed in a day or so. This method for having Google crawl your pages is much faster compared to relying on crawlers to find the page on their own.
XML Sitemaps: An XML (Extensible Markup Language) sitemap is like a blueprint of a website. It is essentially a file that lists all the URLs on a website that the owner wants search engines to know about and crawl, which is another method to get webpages crawled.
Additionally, there are indexing APIs and feed-style sitemaps that allow websites to get faster and more frequent crawling. These are usually used by big websites that change fast and frequently, like news websites.
Once Google crawls a page, it moves on to indexing it.
Indexing is when Google understands and stores the webpage.
Once it collects the page, it analyzes its text and media elements if there are any, like videos or images, to understand the page and what it is about. After that, it stores the webpage in its index, where billions of other webpages are stored.
But that’s not all. Google also assesses a webpage’s quality during the indexing stage. For example, it checks for plagiarism to check if the page is unique or duplicating existing content. It also checks for some other signals for a better understanding of the page, like its language and region. This helps Google determine where the page should appear. For example, Google likely won’t show you an Arabic webpage if you’re searching in English.
The search engine prioritizes high-quality content to low-quality content and might exclude the latter type from indexing to prevent it from appearing in users’ searches.
How does Google understand which page is about what?
Google understands the meaning and purpose of a page by looking at its keywords and processing other key elements, such as title tag, headings, meta data, alt text and others. This allows it to better understand the page’s purpose and present it accordingly.
The last stage of the process involves ranking and presenting the indexed webpages to users. Here are three key stages of this process:
It starts with a user query. When a user searches for something, Google analyzes its keywords to understand the “intent” behind their query, which is the “why” reason behind their search. Why are they searching for something? Are they looking for an answer? A location? Or a website? To better understand the keyword, Google corrects the keywords’ spelling if there are any mistakes and also looks for synonyms.
After query analysis, Google’s algorithms start evaluating the relevant indexed webpages based on ranking signals. Google has over “known” 200 ranking factors, and possibly many more unknown ones, which decide which and how the results are going to appear in the SERP (search engine results page) for the query. Some common ranking factors include:
Relevance: Relevancy determines how well the page's content matches the user's query? It considers the keywords present in the content, especially in body, headings, and the title.
Quality & Trust (E-E-A-T): E-E-A-T stands for Experience, Expertise, Authoritativeness, and Trustworthiness. It is a quality criteria that Google uses to judge the content by factors like the author who wrote the content and the site's overall reputation.
Backlinks (Authority): A backlink is a link that points to page A, which is being judged, from page B. Google considers if the page is linked to by other credible and high-quality websites, treating them as "votes of confidence" in the page's value. A good backlink increases a page’s “authority” in Google’s eyes, as well as the likelihood of it being ranked higher than others.
Usability: Usability determines a user’s experience on a page. Is the page fast to load, mobile-friendly, and easy for a user to interact with? Core Web Vitals, which are metrics for load time, interactivity, and visual stability of a page, are vital for ensuring a smooth page experience. And the better the page experience, the more favorable it might be to Google.
Context: Factors like the user's location, language, search settings, and recent search history are further used to personalize and localize the results. For example, if you search for “cafe,” Google will show you nearby cafes to visit. You won’t see a Wikipedia entry on “Café,” because it understands that that’s not what most people want when searching for this keyword.
Google also evaluates various other SEO signals, like plagiarism scores. If you don’t remove plagiarism from your content, it will likely not appear on top and might get penalized with low rankings. These are a few ranking signals that determine which webpages appear when you search for something.
Finally, once the algorithms rank webpages, the pages with the highest score—based on relevancy, quality, and other evaluated factors—gets displayed at the top.
However, this is only true for organic results. Site owners can pay Google a price to present their website at the top via Google ads. If that happens, all organic results show beneath the paid results.
Additionally, Google may also display special features at the very top, like AI Answers, which are synthesis of information from existing content, answering user queries concisely on the go. This, of course, takes space and pushes the organic results down a bit. However, sometimes, Google may display a webpage before the AI Answer snippet. The search engine can also display maps, image carousels, and tools like a color picker or currency exchange calculator before organic or even paid results.
This is how Google thinks and ranks content for a user. The user can scroll the SERP and visit their favorite website by simply clicking the link.
Google’s thinking and ranking has three major phases: crawling, indexing, and ranking and presenting the results. When a new webpage is made, Google crawls the page to download it. It then understands the page by looking at its key elements, including keywords, headings, and title, and indexes it, which means it stores the webpage in its database. This is how Google stores billions of webpages and is ready to display them to users. When a user searches for something, the search engine analyzes their query and searches for relevant webpages in its database to display. It then evaluates each and every webpage’s relevancy and quality and ranks them based on these signals, displaying the most relevant and high-quality results at the top, followed by less relevant and lower-quality pages.