Saturday, 2 April 2016

The Amazing PageRank Patent

It is amazing, because it has made Google a billion dollar company. US Patent 6285999 is owned by Stanford College. The inventor is Larry Page, one of the two founders of Google. For that reason, this patent is extremely interesting to those in SEO, a.k.a. Search Engine Optimization, or what may be described as the art of getting your page on Google.

The patent, which you can find here, is a rather important one. Although, it should be expiring by January 2017 (its priority date is 10th January 1997).

In the abstract, it is described as:
"A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database. The rank assigned to a document is calculated from the ranks of documents citing it. In addition, the rank of a document is calculated from a constant representing the probability that a browser through the database will randomly jump to the document. The method is particularly useful in enhancing the performance of search engine results for hypermedia databases, such as the world wide web, whose documents have a large variation in quality."
In simple language, it's a means of assigning value to hypertext documents, based on the ranks of documents citing it. A second factor is the constant probability that a user will get to that document after following links through random documents.

Unfortunately, for the citations to be ranked themselves, the rank of documents citing those citing documents must also be ascertained. This is why Mr. Page wrote that, the method requires:
"obtaining a plurality of documents, at least some of the documents being linked documents, at least some of the documents being linking documents, and at least some of the documents being both linked documents and linking documents, each of the linked documents being pointed to by a link in one or more of the linking documents; assigning a score to each of the linked documents based on scores of the one or more linking documents; and processing the linked documents according to their scores."

Steps to PageRank.

  1. Get a bunch of documents, preferably all of them linked to each other, whether one way or two;
  2. Assign scores to some linked documents based on scores of other documents linking to it;
  3. Process linked documents according to their scores.
Naturally this raises the question that, when Google first started, none of the documents were ranked. So how do you assign ranking to documents based on documents which have not themselves been ranked? It makes for a very interesting question. It's almost like circular reasoning. "Rank this, based on other documents citing it." Oh good. But we don't know their ranking. "Rank them, based on documents citing them." Yes, about that... We don't know the ranking of those documents, either.....

Of course, the question in the preceding paragraph is no longer relevant. It shouldn't be, because Google's been around for years and its stock is trading merrily on the US Stock Exchange. That means that there are sufficiently ranked documents online for new documents to be ranked every day. Ranked, and re-ranked. Because documents live forever on the Internet. (No, not really. It's just my April Fools' Joke.)

Applying the same logic to the patent, it seems it's a rather important patent. If U.S. patent #6285999 could be assigned a PageRank, it must be quite high, because there are many, many patents citing this one. I copied the whole bunch into LibreOffice (I'm an open source fan) and ran a bunch of data sorts, to find that:
  • As of 1st April 2016, the PageRank patent was cited by 1,245 other patents;
  • The first patent to cite it was US Patent 6,631,496 which was filed on 22nd March 1999, by NEC Corporation. ("System for personalizing, organizing and managing web information"
  • The second patent to cite it was US Patent 7,302,429 which was filed on 11th April 1999, by William Paul Wanker ("Customizable electronic commerce comparison system and method") (Real name, not April Fool Joke.)
  • The last patent to cite it was US Patent 9,280,212 which was filed on 11th February 2015 by Google Inc. ("Velocity based content delivery.")
  • The second last patent to cite it was US Patent 9,152,678 which was filed on 8th of December 2014 by Google Inc ("Time based ranking")
  • About 461 patents by Google Inc. cite this patent.
I'm obviously not an SEO expert, but I've read other blogs that PageRank is only one of the factors today. There are "positive signals" and "negative signals" now. Obviously, by learning those signals, I might be able to get this blog to rank higher on Google's search results.

I am qualified as a patent agent. If you need someone to draft up a patent for you, please feel free to drop me a line. (My contact details are on this blog.)

Thanks for reading.
