A view on Google’s Patent: Information Retrieval Based on Historical Data


A view on Google’s Patent: Information Retrieval Based on Historical Data

 by: Peter Faber

Google doesn’t stop innovating their search engine, and there where others try to follow, Google is not just 1 step ahead, but 10 steps ahead. Their latest innovation, which actually may already be in place for a year or longer, can be found in the patent: “Information Retrieval Based on Historical Data.”

The abstract of the patent is: “A system identifies a document and obtains one or more types of history data associated with the document. The system may generate a score for the document based, at least in part, on the one or more types of history data“.

This article has the goal to give a implified representation of this patent + contains recommendations as to what would be the best SEO techniques to obtain high rankings, with a specific focus on links. This article is the opinion of the writer and following recommendation in this article is done at your own risk.

Google’s search results have been increasingly difficult to explain and many theories have been developed on what is going on. Most popular is the “sand box” theory, which says that a new site is put in a virtual sand box and has to wait until it has aged before obtaining high rankings. This patent has some excellent information that can explain this phenomenon.

Information Retrieval

The information that this invention of Google is claimed to retrieve based on the historical data are:

  1. Age/Time

  2. Change

  3. Trends

A score is calculated based on the above 3 factors which can then, at least partially, be used to rank the selected pages.

Historical Data

The patent describes a huge amount of historical data. The following is an overview of most items for which historical data can be measured:

  • Pages/sites

  • Links

  • Anchor Texts

  • Content

  • Query

  • Traffic

  • Ranking

  • User

  • Domain

Ranking Based On Information Retrieved From Historical Data

The patent describes in quite a lot of detail how selected pages are ranked based on the information retrieved from historical data. This chapter will describe the basic logic applied.

Age/Time

Of all historical data a date of inception is used to determine 4 important values:

  • Age

  • Average Age

  • Date

  • Average Date

These factors can be determined for pages, links, anchor text, content, topics, queries, etc. Comparing the age or date of a page to the average of the site for example tells the search engine if this information is relatively new or old.

Comparing the average age or date of a page to the average age or date of all pages selected for a query (keyword phrase) tells the search engine if the page is relatively new or old. This information can be used to rank the selected pages.

Comparing to an average has the advantage that there is no preset base of rules that determine the rankings of a page. For one query 6 months may be considered new (product descriptions for example) while for another page 6 days may be considered old (news items for example). It all depends on the average age.

This same logic applies to links. In order to determine how popular a page or site is, the average age of all back links tells the search engine if the popularity of the page is recent or not. It makes sense that if most back links have been obtained 4 years ago and that hardly anybody has been interested to link to this page/site since then, that the page is not as popular as the existing back links would suggest.

The patent goes even as far as determining age factors for anchor texts of links.

Change

Information changes over time. Opinions change, knowledge changes, popularity changes, etc. Like mentioned before, a page that was popular 4 years ago, may be totally forgotten now, but still have most of its backlinks that were obtained when the page actually was popular. However, if this page all the sudden becomes popular again, and new back links start showing up, the average age of the backlinks will remain high. This will prevent the page of ranking high.

Detecting changes is crucial to give old information the chance to rank high again. Consequently, the lack of change can be a reason to lower the rank of a page.

Trends

Even though comparing to averages is a great way to get information about freshness, it fails to recognize smaller events like a sudden increase in popularity of a page. Though detecting changes do help to recognize smaller events, more information can be obtained by detecting trends.

Sudden increases of popularity can be caused by seasonal events like Christmas or the Super Bowl. For this reason the search engine will try to determine trends within pages links, anchor text, content, topics, queries, etc. Detecting trends makes it possible to rank pages higher that would not be ranked high with the standard ranking methods or with comparing to average ages or dates. Google has recognized here a very important fact of information: Relevance and importance of information is (con)temporary.

Detecting Spam Using Historical Data

Having all kinds of historical data available can be used to detect search engine spam. Unexpected events that happen to a site can be an indication of spam. Obviously a strong improvement of 1 single factor would not be a direct indication of spam, generally multiple factors are showing strange behavior when a site is using spam to increase rankings. It would not be in Google’s interest to penalize a site for advertising. However, excessive advertising in sites/pages that are totally unrelated will not do your site any good.

Recommendations

Nothing changed in regards to links. This patent pretty much confirms what we at www.textlinkbrokers.com already knew and have been explaining to our customers as well. The following recommendations can be helpful:

Keep links related

Related links matter, unrelated links can be considered spam.

Build links on a continuous moderate bases

As the patent describes, the average age of your backlinks should not be too high. It is therefore wise to continue adding backlinks to secure a reasonable average age of all your backlinks. How many you need to add over time depends on your market.

Be better than the average

Very important is to be better than the average, but don’t overdo it. It would be expensive and unnecessary.

Focus on seasonal events

A good way to increase the success of your website is to set up text link campaigns for seasonal events. Start your advertising campaign 2 to 3 months before the actual event to give Google the time to find the links and update your site’s information with it. After the event you can let these links go again.

Spread links over multiple sites (unique backlinks)

A very important factor is the number of unique websites in your backlinks. Google seems to put a strong emphasis on this factor.