Husband. Father. Software engineer. Ubuntu Linux user.
It’s really common to add Google Analytics to your site to gain valuable insights about who’s visiting. I think most companies use Google Analytics, and because it’s free and easy many solo developers and smaller sites do too. But recently, I think a lot of people have been questioning the value. For many small or independent blogs and websites, Google Analytics is an overly-complicated product that’s difficult to use and navigate. Moreover, using Google Analytics requires loading their relatively large, bloated javascript on your site. When Google Analytics competitor Plausible was starting out a few years ago, they highlighted a bunch of other reasons you might not want to use Google Analytics. In short, there’s gotta be a better analytics tool for small websites right?
This blog is a small website, and I used to use Google Analytics. (I don’t anymore!) I removed Google Analytics from this blog back in 2021. I agree with most of the criticisms in the article linked above – I don’t want to force my visitors to be tracked, I don’t want a bloated script potentially slowing down my site, and Google Analytics is way overkill for what I need. All I really care about is occasionally checking how many people are visiting my site, seeing which pages they look at, and watching for errors. Everything else is bonus, as long as it’s easy to use and understand. For a year or two, I operated without any analytics at all. I used Google Search Console to gain a little insight into which pages people were visiting. Search Console is actually great, and I’m still using it today. It provides a simple view of how people find your site from Google, which is a good approximation for which pages are popular anyway. But Search Console was never a true representation of all my traffic (because some people don’t come to my site from Google), and therefore isn’t a perfect replacement for Google Analytics.
I wanted to find a better tool for analyzing traffic to my website. I started looking for alternatives. Plausible seemed really appealing, and I liked a lot of their selling points. Simple analytics that are easy to understand. Lightweight script. No cookies. Open source. Unfortunately, I didn’t like their price tag. As of December 2023, their cheapest plan is $9/mo. While that’s probably reasonable for most customers, it just doesn’t make sense for me. This blog is a statically generated site that I operate for dollars per year. And while I could afford to pay $9/mo, I just don’t want to. I don’t think I’d get $108 dollars of annual value from analytics on my personal blog that doesn’t make any money. On top of that, I’ve come to enjoy operating this blog really frugally. I take some pride in keeping things simple and cheap, and in knowing that you can run a website really well on a very low budget using open source software.
So, if not Plausible Analytics, where should I turn? I was intrigued by the idea of server-side analytics. I shouldn’t need clients to run javascript to know who’s visiting my website, I should be able to tell who’s connecting to my server and what page they’re requesting! Even though my blog is a static site, my hosting provider uses Apache, and they offer access to the logs. That should be all I need for basic analytics right?!?
AWStats is a great tool! Despite the first three letters, AWStats has nothing to do with AWS. In fact, it was created six years before AWS, in May, 2000. A time when PHP, ASP, Perl, ColdFusion, and JSP were the some of the most popular server-side languages. AWStats stands for Advanced Web Statistics. It’s a log analyzer for Apache (and nginx and other) log files. Whether you’re serving static HTML through Apache or using something like mod_php, AWStats will parse your server log files and provide info and statistics about your web requests and visitors. It’s designed to be run on your web server and hosted by Apache – sort of like an admin panel for your web site. It has a separate process that can run on a cron schedule to ingest new log files, and it uses Apache to serve the reports as web pages. I wondered if I could use AWStats to analyze the Apache log files for my static site!
Normally, AWStats expects to be run on a web server. It expects to save its
configuration in /etc/awstats/awstats.mysite.conf
and it expects to create a
database in /var/lib/awstats
. I could run AWStats on my hosting provider,
but I’d prefer not to. For one, I’m currently only hosting static pages, and
don’t want to set up (or pay for) AWStats to run CGI Perl scripts. Also, because
I pay (pennies) for (megabytes of) server storage, and in the spirit of doing
things frugally, I don’t want to leave all my logs on the server indefinitely.
And finally, I’m the only one that needs to see the reports. I don’t want them
available on the internet, and don’t want to deal with locking down my
configuration to prevent public access. As an alternative to all this, I could
install AWStats on my laptop. This would be easy since I use Ubuntu – just sudo
apt install awstats
and I’m done. But I don’t really like that solution either.
AWStats expects to be on a web server, and I don’t want to deal with a bunch of
special (and tricky to reproduce) web server config on my laptop. Docker to the
rescue!
With Docker, I can mount my log files into a (temporary) AWStats container
running on my laptop to parse my log files and serve reports. I mount the log
files into the container and run the AWStats script to ingest log files. I can
save state between runs by persisting /var/lib/awstats
in a Docker volume. And
I can run Apache from the container to view my reports! (AWStats can generate
static HTML to be viewed without running Apache, but only for a single month at
a time so I find it much nicer to let Apache run the CGI script and change the
time frame as I please.) I started using this AWStats Docker container to view
my server statistics a few weeks ago, and I’m really happy with it! AWStats
produces really useful reports, and I have all the info I need about traffic to
my site without running any analytics javascript on clients! Perhaps more
importantly, it includes a section showing statistics about my error status codes
and which URLs produced the error.
Here’s my current setup. For a long time, I’ve used a Rakefile (a Makefile, but with Ruby) to build and deploy my site. I added a task to my Rakefile to grab the access logs off the server and save them to my laptop (where I keep them and back them up). I also added tasks to ingest the logs into the Docker container (with the AWStats database persisted on a Docker volume) and a task to serve the reports. (I’m using the pabra AWStats Docker image, which appears to be the most well-maintained an up-to-date.) Here’s what that looks like. (Most of this is just bash, so could be easily ported to a Makefile or another script.)
desc 'Fetch logs from the server'
task :logs do
puts '======== FETCH LOGS ========'.bold
sh "scp mikekasberg.com:/home/logs/access_log #{__dir__}/logs/access.log"
size = File.size("#{__dir__}/logs/access.log")
if size > 10000000 # 10MB
puts "Size exceeds 10MB. Rotating log file to local only."
date_string = Time.now.getutc.iso8601
FileUtils.mv("#{__dir__}/logs/access.log", "#{__dir__}/logs/access_#{date_string}.log")
sh "ssh mikekasberg.com 'rm /home/logs/access_log'"
end
end
desc 'Ingest logs for AWStats Docker'
task :awstats_ingest do
puts '========== INGEST =========='.bold
sh 'docker volume create awstats-db-mike-jekyll'
sh "docker run --rm -e AWSTATS_CONF_LOGFILE='logresolvemerge.pl /logs/*.log |' -e AWSTATS_CONF_LOGFORMAT=1 -e AWSTATS_CONF_SITEDOMAIN=www.mikekasberg.com -e AWSTATS_CONF_HOSTALIASES='localhost 127.0.0.1' -e AWSTATS_CONF_ALLOWFULLYEARVIEW=3 -v ${PWD}/logs/:/logs/:ro -v awstats-db-mike-jekyll:/var/lib/awstats pabra/awstats:7.9 awstats.pl -config=awstats -update"
end
desc 'Run AWStats Interface'
task :awstats => :awstats_ingest do
puts '========== AWStats =========='.bold
sh 'xdg-open "http://localhost:8080/awstats.pl"'
exec("docker run --rm -it -p 8080:80 -e AWSTATS_CONF_SITEDOMAIN=www.mikekasberg.com -e AWSTATS_CONF_HOSTALIASES='localhost 127.0.0.1' -e AWSTATS_CONF_ALLOWFULLYEARVIEW=3 -v awstats-db-mike-jekyll:/var/lib/awstats pabra/awstats:7.9")
end
Grabbing logs from my server is as easy as running rake logs
, and I do that
at least every couple months. And viewing my website statistics is as easy as
running rake awstats
. (The awstats
task depends on the ingestion task, so it
will ingest any new logs first.) I even added a line that will automatically
open the reports in my web browser so I don’t need to remember the URL!
(xdg-open
works on Linux, on macOS you could just use open
.) What can you do
with AWStats? It does the basics well (visitors and page views). Beyond that,
it’s great for looking at the most common error URLs, and it’s also good for
looking at referrers.
AWStats is really old technology, but it’s really good at what it does and it’s reliable! There are lots of potential problems with javascript analytics but perhaps the most important is this: Users don’t like it and many try to block it. I think server-side analytics is a great (obvious?) alternative to client-side analytics that’s less intrusive, and could be a great fit for small websites and blogs. Running AWStats in a temporary Docker container is a great way to get web statistics from server logs!
👋 Hi, I'm Mike! I'm a husband, I'm a father, and I'm a senior software engineer at Strava. I use Ubuntu Linux daily at work and at home. And I enjoy writing about Linux, open source, programming, 3D printing, tech, and other random topics.
I run this blog in my spare time, without any ads. There's no need to pay to access any of the content on this site, but if you find my content useful and would like to show your support, this is a small gesture to let me know what you like and encourage me to write more great content!