Statistics

  Jun 15, 2004

Finding a decent host proved to be far more work than I first anticipated. I thought I was done two weeks ago, when I, subsequent to rigorous research, signed up with UnitedHosting. I was not.

It's not that UnitedHosting aren't a serious business, they very much are. They're real professional and run a very serious business. However, it seems like they've gotten more customers than they can handle. It seems like they just don't have time or capacity to manage us all. Besides that, the connection speed to the server varied quite a bit, from very fast to very slow.

So I did them and myself a favor and moved to a Swedish hosting provider; equally serious, not nearly as big, but one that has time and capacity to handle their customers (knock on wood). So far they've been quite flexible and very helpful. They even tried their best to get my installation of AwStats running properly, even though that's kind of crossing the border of the scope of their services.

I almost gave up on AwStats, tried hard and long to find an alternative free statistics software, but there was none. None that I found to be as good as AwStats, at least. I did in fact not get AwStats running in CGI-mode at my new host, but instead settled for having it generate static pages. Doesn't matter really.

As I was poking around with AwStats though, changing settings back and forth, etc, I realized that my website statistics have become more and more ambigous, as more and more people subscribe to the XML feeds. Crawlers and aggregators are pounding on my feeds, giving me a harder and harder time to read anything out of the statistics.

Before I started syndicating content, it was my opinion that hits were insignificant; in terms of numbers, what mattered was page downloads, visits, and unique vistors. But what is a hit on an XML feed? Is it just a hit, or is it a page download? It's a page download, because it's content being downloaded, not graphics or CSS that -- roughly speaking -- only serve to make the content prettier. But it's also just a hit, because just because the XML feed was requested, that doesn't mean anyone read it, perhaps it was only checked by the aggregator software?

The first and absolutely necessary step is to differentiate between a web browser and a crawler. By default, AwStats considers NetNewsWire to be a browser, not a robot. I don't agree with that definition. It might be semantically correct, but it reduces meaning from my statistics. So I removed NetNewsWire from the list of browsers (browsers.pm in the lib directory), and instead added it, and all other aggregators that showed up in the list of "unknown" user agents, to robots.pm, also in the lib directory.

AwStats handles robots differently than it handles web browsers. For instance, a visit by a robot does not count as a visit, only as a hit. I'm not sure if AwStats considers it a page download though, but wether it does or not, this is still not enough to completely make sure that aggregators aren't polluting the statistics with their regular and, often times, fanatical polling of feeds. Because, the list of aggregators in robots.pm is and never will be complete, nor inclusive.

So what I had to do was to isolate the requests of the XML feeds from the page downloads, because otherwise the page downloads statistics just didn't make any sense at all. I did that by changing what AwStats considers to be a "page", by adding RDF, XML and RSS files to the "NotPageList" list:

NotPageList="css js class gif jpg jpeg png bmp ico xml rdf rss"

Great, now my page statistics aren't polluted by aggregators and crawlers pounding on my feeds every hour, or minute. My page statistics will show only page downloads, not downloads of XML feeds.

Super, but that doesn't bring any clarity to how often my feeds are downloaded, nor how many people do. At this point, I'm glad I stuck with AwStats instead of finding an alternative log analyzer, because AwStats has this neat feature called "Extra Sections", or "Marketing Sections", which allows you to make an additional customized chart of traffic regarding a specific page, user agent, host or referrer.

Having added the following lines to the configuration file, AwStats presents me with a chart of any (existing) feed being requested, how many times each was downloaded, and how much data was downloaded:

# Report of requests of xml/rdf/rss feeds
ExtraSectionName1="Feed Requests"
ExtraSectionCodeFilter1="200 304"
ExtraSectionCondition1=
"URL,(\.xml)$|URL,(\.rss)$|URL,(\.rdf)$"
ExtraSectionFirstColumnTitle1="Feed"
ExtraSectionFirstColumnValues1="URL,(.+)"
ExtraSectionFirstColumnFormat1="%s"
ExtraSectionStatTypes1=HBL
ExtraSectionAddAverageRow1=1
ExtraSectionAddSumRow1=1
MaxNbOfExtra1=20
MinHitExtra1=1

In this particular case, I chose to include requests which were answered with HTTP Status 304, "Not Modified", but it makes just as much sense to only include those responded to with HTTP Status 200, "Ok", which should equate to the number of times a person has read your content using his/her aggregator.

I've chosen to add charts for feed requests (the one above), feed downloads (the one above, except only for 200 responses), as well as top aggregators by host (i.e. crawlers), and top aggregators by user agent.

Oh, and I thought I'd hook up three two one of my regulars with Gmail, give a shout if you still haven't got one. Sorry, I'm all out now. We'll do this again if/when Google hands me more invitations.

Update: If you're interested in getting a Gmail invitation, read the comments to this post, my pal cyberhill has a few left.

Permanent link
View/add comments (10 comments so far)

Tables

  Jun 09, 2004

A few days ago, I was given the task of developing a very basic company presentation, a two-day project, from graphic design, through HTML/CSS-implementation, to final touches and finish. Figuring I probably don't have time to develop it using XHTML and CSS, I had to resort to table-based design...

Continue reading...
View/add comments (14 comments so far)

OS X

  Jun 05, 2004

While regular use of Mac OS X has done a favorable impression on me, I wouldn't say it's "superior" to Windows XP. I would say that using Mac OS X is a more laid-back kind of experience though. As they say, it's the "little things" that makes the experience such an appreciated one...

Continue reading...
View/add comments (19 comments so far)

New Host

  Jun 03, 2004

So I signed up with DreamHost, only to leave them two days later. Support tickets usually took a full day to be answered, and the connection to the server was ridiculous, rivalling that of $1 hosts...

Continue reading...
View/add comments (6 comments so far)