On the 4th of November, 2020, a team of Google workers published a podcast talking about the dupe detection and canonicalization process at Google. John Mueller, Martin Splitt, Gray Illyes, and Lizzi Harvey were hosts of this podcast. They talked about some really amazing stuff that we all should be knowing about how they process a lot of information and content available online and how they maintain a higher quality of search engine relevance to the audience by providing them the top quality and original content. 

The podcast started with some really refreshing environment, and then Gary Illyes went really well with explaining a significant difference between dupe detection and canonicalization. 

What is dupe detection? 

To begin with the process, Google creates a Checksum for each page, meaning a unique fingerprint based on the words of a particular page. By using checksums of multiple pages, Google can identify the pages that have similar content. To do so, Google collects small-sized data derived from a set of digital data with a purpose to identify flaws that may have occurred during the time of transmission or storage. Additionally, checksums verify the integrity of data available, but it may sometimes fail to examine its authenticity. 

Going further, Gray mentioned that dupe detection and canonicalization are two different things. Dupe detection is the primary step, followed by canonicalization. In the dupe detection process, Google clusters similar-looking content together and then chooses one out of them as a final one or a “leader,”; known as canonicalization. Another thing that we must consider is that duplication includes cluster building and canonicalization. Dupe detection mainly relies on the hash or checksums made by reducing content, followed by a thorough comparison. Converting content into hash or checksums makes it easier to do dupe detection. Gray explains further that scanning texts take more resources, but it will show almost similar results that Google gets from checksums. 

In the process of dupe detection, checksums detect “exact” and similar kinds of content. Google has many algorithms that find and exclude the boilerplate from the pages. To describe this in other words, we can say that Google eliminates navigation and footer content for checksum calculation and examines only the centric piece of pages. 

What happens after dupe detection?

After collecting and detecting dupe, how does Google process canonicalization? Canonicalization factors are inclusive of content, page rank, HTTPS, sitemap file, server redirect signal, and real canonical. Machine learning algorithms decide the weightage of all parameters, which generally puts higher weightage on redirect and canonical tag. Gary further explains that although ML puts more emphasis on some factors, it doesn’t have any consequences on rankings. The page that Google chooses as canonical will rank, but it is not based on these factors. 


Smrutri KakkadDigital Marketing Manager

After graduating from Be.IT, Smruti decided to gain her expertise in the field of digital marketing. Be it SEO or Paid advertising; she knows how to make it work like a pro. Plus, she likes to watch a variety of movies and has a great taste of cloth fashion. Leading a team with a great level of motivation and inspiration is what she is known for.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Blogs

News(Read Time: min.)December 4, 2020Google launches new and improved crawl stats report in Search Console

On 24th November 2020, Google announced that it had launched new and improved crawl stats within Google Search Console. This update is quite useful for all the web developers, S

Read More

Digital Marketing(Read Time: min.)November 30, 20206 Proven Ways to Increase Your Website Traffic in 2021

Think about getting organic traffic in the early ’90s, and it wasn’t a big task then, but if someone asks you the same today, it will drive you crazy. As there are uncountable

Read More

News(Read Time: min.)November 27, 2020Page Experience Are Becoming Ranking Signals in May 2021

On the 28th of May 2020, Google first announced that it would include page experience as an additional ranking factor in the coming days. Furthermore, it also declared that Google

Read More

Digital Marketing(Read Time: min.)November 23, 2020The Ultimate Guide To Build Backlinks In 2021 (100% White Hat)

Ask any SEO about the importance of backlinks, and they will not deny its significance in any way. When we talk about SEO, backlinks or also known as inbound links, are a set of li

Read More

News(Read Time: min.)November 13, 2020How Google Calculates Duplicate Content via Dupe Detection

On the 4th of November, 2020, a team of Google workers published a podcast talking about the dupe detection and canonicalization process at Google. John Mueller, Martin Splitt, Gra

Read More

Digital Marketing(Read Time: min.)November 9, 2020Digital Marketing vs Traditional Marketing: What’s the Difference?

In today's time, we all check our phones for new notifications and pass our free time by scrolling through Instagram and Facebook feeds, right? Accessing a smartphone and internet

Read More

News(Read Time: min.)November 6, 2020New Google Passage Based Indexing & Ranking Coming 2021

On 20th October 2020, Google announced that it would soon identify passages from a webpage, and it will improve 7% of search queries across all languages once practised globally. W

Read More

Digital Marketing(Read Time: 3 min.)November 2, 20206 Benefits of SEO You Should Know

People will look at you with a strange look if you tell them that you have never heard the term “SEO” before. SEO has become an essential and inseparable part of every business

Read More

Digital Marketing(Read Time: 3 min.)September 19, 201915 Outstanding Tips For Shopify Store Owners to Boom Sales in 2019

Commencing and promoting an online business with a Shopify store for the first time is exciting. However, it can even be a nail-biting experience if you are not aware of all the Sh

Read More

Digital Marketing(Read Time: 10 min.)April 18, 20178 – Useful Ways To Get Your Mobile App Ranking on Google Play Store

It’s an era of Mobile application, there are millions of applications available in the app store and thousands of applications are launching every day. Today all the major indust

Read More

YOU’VE GOT A PROJECT IN MIND Let’s Build Something Together

Drop us a message with a brief description of your dream project.

Our industry domain expert will review it and get back to you within 24 hours with free consultation and best reliable sulutions.