Wednesday, July 08, 2009

Duplicate Content

Most people usually think of duplicate content as either articles that are republished or pages that appear with only the slightest variation on a website.

I'm sure Google discounts or ignores these pages. If content is widely copied throughout the web, I would suspect that Google awards the original source with some special points since everyone appears to be copying and citing the original source.

To illustrate what I believe, assume:
- an article first appears on site A
- the article then appears all over with the web with same outgoing links in it to site B, but also with links back to the original site A
Then, Google will:
- highly count all the links back to site A since it appears to be content that is widely cited evidenced both by the links back and the copying of content
- will not heavily count the links to site B since it's basically the same link all over the place
- will neither reward or punish all the copycat sites

I started thinking about duplicate content since there is a major article on sphinn which deals with detecting duplicate content within a website. And it mostly seems concerned with duplicated page titles and meta descriptions. This is a sloppy error that I'm sure my sites have a lot of. sigh sigh sigh. The article is by Shimon Sandler: Finding Duplicate Content with Free Tools


