Input! Input!


In the classic 1986 Film "Short Circuit" the main character is a cute Robot
with an acute thirst for knowledge known only as 'Number 5'.

http://www.dvdreview.com/fullreviews/shortcircuit.shtml

One of Robot number 5s most engaging traits is that whenever he stumbles
across anything interesting he rushes off enthusiastically to collect and
index the data with cries of "Input! Input!".

Robots designed to collect and index data actually exist but they are not
made of exotic metals, they are software programs and are used by Search
Engines to gather information about the relevance of your web site to a
particular search term. These Robots ( sometimes called spiders or
crawlers ) are smart but not very selective. Unless you provide unambiguous
ground rules for visiting Search Engine robots, excluding them from areas
you don't want them to enter, then every file on your web site will be
perceived as "Input!" and is likely to get indexed.
"But", I hear you ask, "I want to get indexed by Search Engines, why is this
a problem"?
Indexing everything sounds superficially smart, however as part of a
coherent web site promotion and Search Engine optimization strategy it has a
number of important disadvantages.

Search Engine spiders should be actively discouraged from visiting areas
where sensitive information might be stored.

Indiscriminately indexing everything can seriously dilute the relevancy of
your web sites overall theme and can produce a sub-optimal rank in Search
Engine listings.

Allowing a Search Engine spider to index everything can even inadvertently
lead to the perception by some of the Search Engine that your web site
contains spam, this can lead to your site being blacklisted.

For multilingual web sites it's imperative to focus English language
robots onto the relevant English language pages and to direct robots from
international Search Engines, who might be looking for Spanish, German or
French language resources, to the appropriately localized content areas of
your site.

Search Engine robots can only "read" text. Dynamic content or graphical
components cannot be read or indexed, rendering your site effectively
invisible to Search Engines.

Some robots "rapid fire" requests causing severe, server loading problems
which can detract from your visitors browsing experience and ultimately
cause loss of business.
The answer to this problem lies in having a Robot exclusion file on your web
server.
Robot exclusion files, normally in the form of "robots.txt" are ASCII text
files which reside in the document root directory of web servers and are
used to set access permissions and control the actions of robots or spiders.
Most of the major US and international Search Engines deploy spiders which
look for a robots.txt file during their visit to a web site. There is an
agreed industry standard for robots.txt files and, in order to work as
anticipated, robots.txt has to be correctly formatted and placed in the
proper location on the web server. Once uploaded to your server robots.txt
is utilized to notify individual spiders about which elements of a web site
cannot be visited and should not be made available on the public Internet.
Used in conjunction with Search Engine optimization tools and/or services
robots.txt can significantly enhance your sites chances of that
all-important first page listing on the major US and international Search
Engines by focusing individual spiders on specific content.

Although only a small ASCII text file, robots.txt enables a significant
degree of fine tuning to be applied to your Search Engine optimization
program. Used intelligently robots.txt can do a big job, significantly
improving your knowledge about, and control of, visiting Search Engine
robots. This is particularly the case where a web site owner either wishes
to deliver specific content optimized for a particular Search Engine, or has
paid for an accelerated Search Engine listing service where if would be
useful to track the activity of the robot associated with that specific
paid-for service.

Just as Robot Number 5 gathered more and more input and transformed this
data into useful information, so web site owners can use the data generated
by the interaction between robots.txt, visiting spiders and their web logs
to gain significant competitive advantage.

About the Author

Ken Garner is the CEO of Atlanta based Analyst Software Inc. Visit their web
site ( www.analystsoftware.com ) for links to outstanding Internet tools,
products and services designed to enhance the immediate growth of your
on-line enterprise.