| ||||||||||||||||
CHAPTER 5 Website design tips Competitions Adding a competition to your website will encourage people to signup to your newsletter and therefore allow you to capture user data which you can use for future marketing campaigns. It also adds a form of sticky content to your website which will encourage users to return to your website. New!! Products Make new products stand out from the crowd and highlight these products to make them easy to find for regular customers, think of the TV adverts which always say NEW & Improved, customers love the word NEW!! So draw attention to it. Adding new products to your website will keep your content fresh and will give your users an incentive to return to your website. There is nothing worse than visiting a website on a regular basis to find that nothing has changed. Though shalt always validate my HTML The first time I thought about this apparently simple act of validating my html, I thought it was rather amusing - not any more! Get this: I had a set of sub pages built and listed very high on three search engines for over 6 months (a commercial site no less). I couldn't figure out why people weren't sticking around for more than one click after hitting the site. I finally validated my html and found that people using Netscape were getting a blank page because my table tags were non-standard (shiver). After going back and looking at the logs, sure enough, the Netscape users were gone after one click... (Read and heed) Did
you know a recent university study showed that you lose 10% of your visitors with
a "best view with" browser button on your home page? It's insulting
to those using another type of browser and they leave your page. The Robots META tag is a tag to tell a robot if it is ok to index this page or not. It also is used to invite a spider to walk down through all your pages. It is growing in importance. It is also useful if you don't have access to your servers root directory to control a robots.txt file. Some search engines, such as Inktomi now fully obey the Robots Meta Tag. Inktomi will crawl down through a site if the Index,Follow syntax is used. Robots Meta Tag Format The Robots META tag is placed in the HEAD section of your HTML document: The format is quite simple: (case is not significant) <HTML> Robot
Meta Tag Options At this point, only the following combinations make sense: The INDEX directive tells the robot it is ok to index the page. The FOLLOW directive tells the robot it is ok to follow the links found on this page. Some search engine articles on Robots Meta tag say the predefined defaults are INDEX and FOLLOW, not true with Inktomi. The default with Inktomi is index,nofollow. There are also, two global directives that can specify both actions: ALL=INDEX,FOLLOW, and NONE=NOINDEX,NOFOLLOW. Robots Meta Tag Examples: <META NAME="ROBOTS"
CONTENT="INDEX,FOLLOW"> Robots.txt Tutorial Search engines will look in your root domain for a special file named "robots.txt" (http://www.mydomain.com/robots.txt). The file tells the robot (spider) which files it may spider (download). This system is called, The Robots Exclusion Standard. The format for the robots.txt file is special. It consists of records. Each record consists of two fields: a User-agent line and one or more Disallow: lines. The format is:
The robots.txt file should be created in UNIX line ender mode! Most good text editors will have a UNIX mode or your FTP client *should* do the conversion for you. Do not attempt to use an HTML editor that does not specifically have a text mode to create a robots.txt file.
User-agent: googlebot You may also use the wildcard character "*" to specify all robots: User-agent: * You can find user
agent names in your own logs by checking for requests to robots.txt. Most major
search engines have short names for their spiders. The second part of a record consists of Disallow: directive lines. These lines specify files and/or directories. For example, the following line instructs spiders that it can not download email.htm: Disallow: email.htm You may also
specify directories: Which
would block spiders from your cgi-bin directory. If you leave the Disallow line blank, it indicates that ALL files may be retrieved. At least one disallow line must be present for each User-agent directive to be correct. A completely empty Robots.txt file is the same as if it were not present. White
Space & Comments Some
spider will not interpret the above line correctly and instead will attempt to
disallow "bob#comment". The moral is to place comments on lines by themselves.
Disallow: bob #comment Examples This
one keeps all robots out. This
one bans Roverdog from all files on the server: This one bans keeps
googlebot from getting at the cheese.htm file: For more complex examples, try retrieving some of the robots.txt files from the big sites like CNN, or Looksmart. Extensions to the Standard Although
there have been proposed standards extensions such as an Allow line or robot version
control, there has been no formal endorsement by the Robots exclusion standard
working group. | ||||||||||||||||
Copyright 2004, Viverdi Ltd. All rights reserved. | ||||||||||||||||