Design for Analysis

If you are interested in web analysis you know you need to measure your online activities. However, just buying log analyzers or plugging some tracking code into your web pages is not enough. There are some additional steps you need to take in the design of your ads and your web pages. If you miss these steps you won’t have enough detail to understand what is happening.

This article explains the basic steps you need to take to ensure you gather data in enough detail to measure and improve. This concept of “Design For Analysis” originated with Netgenesis, but is now a commonly accepted first principle of web analysis.

Trackable URL’s

The most important components in any tracking structure are your URL’s. When you look at visitor behavior, conversion rates, or target actions; the atomic unit of your analysis will be the URL. Visitors come from URL’s, land on URL’s, and read URL’s. Your target actions are URL’s. If you do web analysis, you live and breath URL’s. Your URL’s need to be sufficiently clean for analysis, sufficiently granular to provide the detail you need, but not so granular you loose site of the patterns in the detail.

Clean URL’s

Here are two (genuine) URL’s for different product groups from the same site – see if you can spot the difference:



In actual fact, I’ve edited these down to half their length, the originals were much worse. If you read closely you can see that the first is for used books and the second for new books.

These URL’s are extremely difficult to read. If you’re presented with a table of landing pages which consists of entries like this, it’s extremely difficult to make sense of it. While I recognize that some aspects of a page name are determined by your content management system, there is always some room for movement. In the example above “%20” represents a space. Spaces are not allowed in URL’s, so they have to be translated into “escape sequences” which represent them by their code numbers. This means a designer created components in their site called “used books” and “new books.” Since this will always be converted into “used%20books” and “new%20books,” even by visitors to the site, this helped no one. The words should have been merged together as “usedBooks” or “used_books.”

The lesson is – don’t use spaces in your names.

Here’s another example:


Any idea what the product we’re examining is? You may argue that someone working on this site will know what the numbers mean, but the numbers here suggest hundreds of categories with hundreds of products in each. In actual fact, this site is an electrical components wholesaler with literally tens of thousands of products online, and no one can remember what the numbers mean. If I want to analyze pages in this site, I need a code book as a reference while I try to look for patterns

Some CMS systems demand numbers, but not all, so use meaningful words instead of numbers as much as possible.

Incoming Links

Much of your analysis involves looking at traffic sources. You often need to understand behavior by source. In addition, if you’re buying visitors, you need to know what you’re getting for that spend.

Tracking incoming traffic requires that the link into your site contain a specific component which you can attribute to that activity. This is especially important with regard to search advertising. If you take nothing else from this article, at least do this. A typical tracking parameter I use involves adding “?src=gaw” to links into client sites, so the link in the ad reads something like:

The ?src=gaw has no functional effect on the website, it’s merely there for tracking purposes.

If you are running Google ads, you will be getting traffic from those ads in Google, and in the sites which take syndicated Google ads. If you see visitors coming from with ?src=gaw in the request you know is running your Google AdWords. Without that parameter you may think the site has a link to you.

This also means you can aggregate all the visitors, from any source, who enter with ?src=gaw, so you can get an accurate understanding of how the ad performing across all sources.

As well as using tracking parameters to aggregate, you can use them to separate. If you are listed in the search engines, and also doing search ads, without a tracking parameter you won’t be able to separate visitors from the listings from visitors from the ads. If someone comes from MSN, was it the native listing or the Overture ad?

In preparation for this article I ran a quick test to see how much ad tracking I could see. I ran searches in Google then examined the ads and landing pages to see how many tracking URL’s I could see. For “buy books online,” 4 out of 8 were tracking their ads. This is a mature and competitive industry, yet some extremely large online merchants were not measuring the effect of their spend. “mortgage online” is one of the most expensive phrases you can bid for. Because the value of a mortgage over its lifetime is so high, mortgage providers can spend extremely large sums to buy visitors. However, only 1 out of the 8 listings had any form of measurement. By now I wanted to find a sector where this was done properly, and we all know where the money is on the web. However, a search for “sex online” revealed that even here only 5 out of 8 sites were tracking their ads.

If you want to do any form of segmentation, I recommend multiple tracking parameters in your ads. Use combinations which represent market segments and sources. For example, we could use “ub” for used books and “nb” for new books with “go” for Google and “ov” for Overture. If someone entered our site with a parameter of “?src=ubgo” we know they came from a Google AdWords for used books. Segmentation should go down to the level you wish to respond to. There’s no point separating two sets of visitors from each other if you’re going to funnel them through the same pages in exactly the same way.

Direct email campaigns and email newsletters should also contain tracking links so that when people click on the links you can identify them. Since email doesn’t report itself to the website as a referring source the way websites will, if you fail to do this you will be unable to separate email-sourced visitors from people who come direct to your site by typing in your URL directly.

Outgoing Links

If you link to other people’s sites, and you want to track that out-going traffic, you have two choices. The sophisticated option is to have some code in each outgoing link which records the click. This is great if your system can do this, and if the people creating these links remember to add the correct codes.

The cheaper alternative is to have the outgoing link go to a redirect page on your site which then sends the visitor to the external site. This redirect page view needs to get recorded by your system, so the way it redirects visitors needs to match your recording mechanism. Servers can redirect visitors when a specific page is requested. This will probably work if you are using log analysis. However, if you are using page-based tracking this would mean the redirect never got tracked. For page-based reporting systems you have a webpage which contains something like a JavaScript onLoad function which activates a document.location function. Thus the viewing of this page is recorded before the visitor is sent on.

Record Non-page Activity

Not all of the important activity in your site involves reading web pages. Often a critical target action is sending an email or downloading something. In many sites this behavior is never recorded, which can make determining conversion rates and other KPI’s impossible.

Recording Email

Even though clicking an email link on your pages does not result in a page view or a record in your log files, it is possible to record it on a system which doesn’t record every mouse click. As with outgoing traffic, you can have the email link call a page which redirects to email. The JavaScript command would be something like document.location=”’Mail_Enquiry'”;

As with out-going traffic, it is this redirection page which gets recorded.

Record Downloads

Many sites offer software, PDF files, word documents, excel spreadsheets, and other material for downloading. If you are using log analysis all this activity will be recorded, but not if you’re using page-based tracking. These downloads can be tracked using the redirection technique. In other words, rather than link directly to the material, link to a page which then redirects to the material to be downloaded.