Page-based Tracking vs. Log File Analysis
Why use page-based tracking for your website analysis rather than analysing your servers log files?
Page-based tracking, like that used in InSite, is 30% more accurate.
Why is it more accurate?
Website hosting companies, your internet service provider, your companies own network and even your own browser will all try to cache (save) copies of the web pages you request.
Why do they do this?
To try and reduce the amount of traffic in their respective networks. Rather than your browser having to go out on to your network, it can first check to see if it has the page in its local cache on your computer. If it does it will serve that instead.
Why does this effect the accuracy of log file analysis?
Log files are primarily a technical resource. They are there to keep a record of every time the web server is asked to do something, their primary purpose being to assist server administrators manage the technology.
For this reason, only pages requested DIRECTLY from the web server will be recorded. Any requests that result in a cached page being displayed will not. Therefore, if a visitor is served a cached copy of a page, you will never know.
Why does this not effect page-based tracking?
A page-based tracking system uses special technology (often referred to by admen as cache-busting) to ensure that even if your page is cached it will still know its been viewed by a visitor.
To put it simply, a page-based tracking system records each and every time one of your web pages is loaded in to a visitors browser, regardless of where that page was actually served from.
So what does this all mean?
Web server log analysis will consistently under report on page requests by about 30%. Page-based tracking systems such as Google Analytics will give you a far more accurate picture of what is happening on your website.
Spiders and robots
A large percentage of internet traffic is non-human activity - spiders and robots. This activity distorts your website analysis.
What are spiders and robots?
A spider is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index.
A robot is any program that visits web sites for any number of purposes. Spiders are a type of robot.
How do they effect my website analysis?
From a marketing analysis perspective, this activity is nothing more than noise. You want to understand what PEOPLE are doing on your site. If this information is muddied by the activity of spiders and robots then you will not be able to get a clear picture of how your site is being used.
What do I do about it?
You need to ignore as much of this activity as you can. The more you can filter out, the more accurate a picture you will have of what your visitors are really doing.
How does page-based tracking help?
Due to the way page-based tracking works, 90% of this non-human activity is never recorded, leaving you with a much clearer picture of what your 'visitors' are doing.