On the 4th of November, 2020, a team of Google workers published a podcast talking about the dupe detection and canonicalization process at Google. John Mueller, Martin Splitt, Gray Illyes, and Lizzi Harvey were hosts of this podcast. They talked about some really amazing stuff that we all should be knowing about how they process a lot of information and content available online and how they maintain a higher quality of search engine relevance to the audience by providing them the top quality and original content.
The podcast started with some really refreshing environment, and then Gary Illyes went really well with explaining a significant difference between dupe detection and canonicalization.
What is dupe detection?
To begin with the process, Google creates a Checksum for each page, meaning a unique fingerprint based on the words of a particular page. By using checksums of multiple pages, Google can identify the pages that have similar content. To do so, Google collects small-sized data derived from a set of digital data with a purpose to identify flaws that may have occurred during the time of transmission or storage. Additionally, checksums verify the integrity of data available, but it may sometimes fail to examine its authenticity.
Going further, Gray mentioned that dupe detection and canonicalization are two different things. Dupe detection is the primary step, followed by canonicalization. In the dupe detection process, Google clusters similar-looking content together and then chooses one out of them as a final one or a “leader,”; known as canonicalization. Another thing that we must consider is that duplication includes cluster building and canonicalization. Dupe detection mainly relies on the hash or checksums made by reducing content, followed by a thorough comparison. Converting content into hash or checksums makes it easier to do dupe detection. Gray explains further that scanning texts take more resources, but it will show almost similar results that Google gets from checksums.
In the process of dupe detection, checksums detect “exact” and similar kinds of content. Google has many algorithms that find and exclude the boilerplate from the pages. To describe this in other words, we can say that Google eliminates navigation and footer content for checksum calculation and examines only the centric piece of pages.
After collecting and detecting dupe, how does Google process canonicalization? Canonicalization factors are inclusive of content, page rank, HTTPS, sitemap file, server redirect signal, and real canonical. Machine learning algorithms decide the weightage of all parameters, which generally puts higher weightage on redirect and canonical tag. Gary further explains that although ML puts more emphasis on some factors, it doesn’t have any consequences on rankings. The page that Google chooses as canonical will rank, but it is not based on these factors.
On 24th November 2020, Google announced that it had launched new and improved crawl stats within Google Search Console. This update is quite useful for all the web developers, SRead More ➜
Think about getting organic traffic in the early ’90s, and it wasn’t a big task then, but if someone asks you the same today, it will drive you crazy. As there are uncountableRead More ➜
On the 28th of May 2020, Google first announced that it would include page experience as an additional ranking factor in the coming days. Furthermore, it also declared that GoogleRead More ➜
Ask any SEO about the importance of backlinks, and they will not deny its significance in any way. When we talk about SEO, backlinks or also known as inbound links, are a set of liRead More ➜
On the 4th of November, 2020, a team of Google workers published a podcast talking about the dupe detection and canonicalization process at Google. John Mueller, Martin Splitt, GraRead More ➜
In today's time, we all check our phones for new notifications and pass our free time by scrolling through Instagram and Facebook feeds, right? Accessing a smartphone and internetRead More ➜
On 20th October 2020, Google announced that it would soon identify passages from a webpage, and it will improve 7% of search queries across all languages once practised globally. WRead More ➜
Commencing and promoting an online business with a Shopify store for the first time is exciting. However, it can even be a nail-biting experience if you are not aware of all the ShRead More ➜
It’s an era of Mobile application, there are millions of applications available in the app store and thousands of applications are launching every day. Today all the major industRead More ➜