Get Updates by Email

Saturday, 4 September 2010

Check for Plagiarism with Search Engines

Dear reader

Plagiarism is a problem for content providers. Sometimes companies or publications pay for independent writers to provide write-ups for their website, only to find that the articles provided to them are lifted from other sources. This would not be so bad if the articles were paraphrased or summarized, but in some cases the articles are reproduced word-for-word. This is known as "plagiarism" and should not go unheeded. It is embarassing for the purchaser of the content who has unknowingly purchased stolen content, on the belief that the article is original and genuine.

Once, I was informed by a lecturer at a private college that plagiarism among students is also a problem. Some of these students do not want to, or could not be bothered to, do their own research and come to their own conclusions. They find the art of writing articles or white papers too boring. Who can blame them? This is the generation of youth that was raised in an always-connected environment, where entertainment is just a click away. Fail to engage their attention within 5 minutes and they start looking for an exit. (It used to be thought that attention-deficit syndrome was confined to certain hyper-active children.) The problem is that these students cut and paste from various sources and call it "research". This is definitely not the proper way to write a paper.

I was told then, and am suggesting to you now, that the way to find out whether the content of a document, or article, or paper, that you are inspecting, is original or not, is to enter certain key phrases into the search engine. Of course we know that not everything is available online. Sometimes even "subscription-based" databases make their listings available on Google, but accessing the articles themselves (in full) is another story.

One plagiarism-focussed search engine I came across recently is CopyScape. It works by you entering your website address (for the relevant page) and then it searches online. It is different from Google and other search engines in that your input is the URL of the website, and the search engine does the computation. It's a little like handing your article to the librarian and trusting that s/he will somehow magically pull out from the shelves those articles which copyright infringe or have been infringed by the article.

On 14th February 2008, an article at compared the performance of five (5) different search engines in an effort to gauge which search engine was more efficient at detecting plagiarism. The test compared the ability of the search engines to detect plagiarism of various types of content based on their length and wording. Yahoo! and Google finished at a tie; whereas (Microsoft's search engine) was unrepresented. This is due to Bing being launched only on 1st June 2009. Incidentally, shortly after its launch, on 29th July 2009, Yahoo! and Microsoft announced a deal where Bing would power Yahoo! in its search engine. (Having said that, interested readers can read the article dated 3rd June 2009 which compared the performance of Bing, Yahoo and Google in detecting plagiarism.)

For the interested reader, here are links to some articles on how to use the search engine to detect plagiarism.
  1. Wikipedia entry on plagiarism detection, includes a few links to both free and commercial online plagiarism detection systems.
  2. Search Engine Journal's article , "Top Online Plagiarism Checkers - Protect Your Content". Dated 27 October 2008.
It may be of interest that in 2001 Microsoft filed a patent on a method of comparing two texts for plagiarism. This appears in US patent no. 7356188 "Recognizer of text-based work" under claim no. 7.

Other computer-based systems for plagiarism detection have also been patented or applied for. But that is a discussion for another article. What is certain is that when there is plagiarism, the law that applies is copyright. Plagiarism is an infringement of copyright.
Share this article :