((lambda (x) (x x)) (lambda (x) (x x)))

Monday, September 22, 2008

Phishing and the Robots Exclusion Standard

Phishing:

'Phishing' is a term used to describe a variety of techniques for perpetuating criminal activities via the Internet by means of techniques revolving around the core mechanism of producing content intended to first decieve the consumer of said content into believing the content creator to be a trusted party and then proceeding to create a plausible motivation for the content consumer to take a course of action resulting in private information belonging to the content consumer being transmitted to the content creator. The type of personal information sought most often is banking or financial information such as bank account or credit card numbers. Common trusted parties that a phishing content creator may try to impersonate will often include (but are certainly not limited to) banks, insurance agencies or other financial institutions. The most common means of delivering the deceptive conent to content consumers is usually by means of an email message inviting the user to either reply to the email with one containing personal information, or to click on a link bringing them to a similarily deceptive website into which personal information may be entered. These are not the only means used, however, and email is not always involved - another popular technique is to purchase domain names closely resembling that of a trusted entity but with single letter typos, in the hopes of decieving unwary web surfers who may accidentally input the incorrect address into their browser and thereby come across the false site.

The best defense an individual user may take to avoid falling prey to such a scheme is unfortunately not a technical one that can be simply installed on their computer and trusted to do it's work, but simply perceptiveness on the part of the user. A user should take a critical eye towards any online communications with institutions likely to be impersonated. The deceptive content produced by phishers is often of less than sterling quality, and a careful observer is likely to take note of many minor errors in the text of the deceptive content. If the user has any suspicions regarding the veracity of the communication, they should immediately contact the trusted party by another channel, using contact information acquired previousl and known to be correct, and seek confirmation of whether the communication recieved was authentic.

Numerous technical measures exist with the goal of making it easier for users to determine whether a suspect email message or website is legitimate, but the problem is at it's core not a technical one, and as such no purely technical solution is truly able to solve the problem. Educating users as to how to identify false communications is the only method likely to have any impact.


Robot Exclusion Standard:

The Robot Exclusion Standard is an informal convention that has been adopted by many webmasters and search engine operators to allow webmasters to determine which pages on a particular website web robots - tools used by search providers to discover and index pages on the World Wide Web - will be allowed to index. This convention states that by including a specially formatted text file named robots.txt in the top directory of a web site, files and directories specified in this file will not be indexed by web robots complying with the standard.

The biggest weakness is that as this is an informal convention there is no particular requirement that any given piece of web robot software obey the standard, and providers who wish to disobey the standard are as such able to do so largely at their leisure, with no particular legal countermeasure possible to discourage such behaviour.

An example robots.txt configuration, taken from my personal home web server, is as follows:

User-agent: *
Disallow: /

When the robots.txt file is configured as above, all cooperating web robots will refrain from indexing any content on the web site in question.


References:
(This section left blank as no published articles or other sources of information were referenced during the writing of this article.)

No comments: