Website Crawlability
Getting the search engines to index all the pages within your site can be the key to success or failure of your website. There are several things that you need to consider when designing a website.
Navigation Structure
This is something that needs planning before a site is designed and goes live on the internet as it is more difficult to change once live as you don’t want to be moving pages to different locations as you might have back-links to the page.
You should never have pages that are found within many folders within folders.
e.g. /website/optimisation/articles/may/2008/what-is-website-optimisation.html
This example has too many subfolders, it is recommended that you keep the pages as close to the root folder as you can, its also easier for the user to remember the page address. We have done extensive testing on this and its an old technique that SEO companies used to use to add keywords into the address. I would suggest that you only go two folders deep then the page name.
e.g. /optimisation/articles/what-is-website-optimisation.html
As you can see by this the drill down through folders is only to folders. If Google or other search engines like to quickly run through a site and not have to follow deep links to pages. Getting your site pages indexed can mean make or break for you.
Something that relates to navigation that should be mentioned here is the fact that users do not want to have to click several times to find the product or service they are looking for so when mapping you site make sure it does not take more than 3 clicks to get to a page product or service or they will just leave your site and look somewhere else. A little of track I now but worth mentioning.
When designing the pages make sure the navigation has text links as Google can read these the best and you can make them hold weight for the link text.
Add a sitemap
If you have a link on each page that goes to a sitemap then no matter what page the search engine hits it will read where the map is and then understands where all the pages are. Another sitemap that should be added is a .xml sitemap. This is really easy to do as there are several tools to produce these for you. Here is a site I use: xml-sitemaps.com it does all the work for you, just add the domain name and it will crawl your site and build a sitemap you can copy and paste or download which ever is desirable to you. All you need to do then is add a robots.txt file to the root of the site that will tell the search engine to index the sitemap. Here is an example how this can be done:
User-Agent: *
Allow: /
Sitemap: http://www.yourdomain.com/sitemap.xml
Copy all the above bold text into a text file and save a robots.txt
Its that easy, this is saying you have allowed all pages to be indexed and also where the sitemap is too. This will give you peace of mind that the search engines know about all pages on your website. In the reverse you can ask pages not to be indexed if you deem it necessary.
Just add:
Dissallow: /foldername/pagename.extetion
e.g.
Dissallow: /optimisation/pagerank.html
Dissallow: /search-engines/google.php
It’s that easy.
When completed just upload the file to the root folder within your hosting space and then job done. For more information please go to robotstxt.org I hate linking to sites that are full of adverts but the information on the page is too good too not be passed on.
By following the above rules your site should get indexed easily by the search engines giving you time to work on adding more fresh content that you know will get indexed.
Final Note
When adding new pages you will need to amend the sitemap to include the new pages so the engines know those pages exists. Any questions on website crawlability drop us a line.