on tracking users

Audrey Watters has opted out of tracking people on her websites. It’s a good read. I agree 100%.

I’ve felt creeped out by the pervasive tracking networks online – analytics, ad networks, cookies, super-cookies, browser fingerprinting, etc…. This surveillance ecosystem is the end result of an arms race to find out about people reading web pages online. There are a few reasons, but my gut says it boils down to 2:

  1. Monetization
  2. Ego stroking

Monetization – to sell ads, in whatever flavour, requires metrics. How many people view a page? How many see an ad? How many click on it? How many then buy a product? How many return to the site? etc… So much data. I don’t sell anything, and don’t have ads1 and I don’t sell product placement guest posts (never have, never will). So, this isn’t a reason to track people on my site.

Ego stroking – this is actually a good reason to track readers. Is it worth the privacy violation? I don’t think it’s worth feeding Google’s surveillance machine, so I used a self-hosted copy of Piwik for a year. But, after thinking about it after reading Audrey’s post, I’ve stopped that, too2. My ego is just fine. I don’t need to be propped up. I do this for me. If people read it, hey, that’s great. If nobody (else) reads it, hey, that’s OK too. So, this isn’t a reason to track people on my site.

I just used the excellent Ghostery plugin for Firefox, to report web trackers. My site reports clean, with two exceptions: Gravatar and Google Analytics. Gravatar is from the WordPress comment system. It’s innocuous. Wait. Google Analytics? I don’t even USE Google Analytics! It’s there because I embedded a YouTube or Vimeo video. Which brings along all kinds of snooping trackers as part of the deal. Awesome. Once that post falls off the front page, Google Analytics will drop off (except on the pages that display that post and others with embedded media from YouTube etc…)

For work projects, though, it’s not as simple. There are variations of Reason #1 that are needed – although not monetization in the pure sense, I need to be able to answer the question “does anyone use the website or resource? Is it worth supporting it?” – there are 2 ways to answer that. The first is with web analytics. The second is with testimonials from users. I need to be able to provide both.

But, I won’t feed Google’s surveillance machine in order to meet my needs. So, I host a copy of Piwik on campus, and use it to track aggregated and anonymized web analytics. I’ve set it to not store full IP addresses – I have no idea which on-campus computers are accessing our stuff. I have no idea about any individual users. But, I can show traffic patterns and spikes, and that is important information when we’re planning support – I know exactly when spikes occur during semesters, and I know exactly when we need to develop support resources and have them online and available before the traffic spikes.

  1. I dabbled with Google Adsense on my blog several years ago – made some crazy cash and bought an expensive lens for my camera – then had an epiphany that it was creepy and not enough money to justify selling my soul – so I nuked all ads long ago. []
  2. I nuked it from my blog last night, but it will take some time for me to find and delete the tracking code in static pages and various project stuff []

UGuelph and D2L sitting in a tree

News of a new collaboration between UGuelph and D2L, on a major pedagogy research initiative:

The pedagogy research project strives to help schools track and report on learning outcomes across programs over time. Researchers will use D2L’s predictive analytics capabilities to document and discover the effectiveness of assessment tools on specific subjects while working with educators to develop a curriculum that results in greater student success.

via University of Guelph to Leverage Desire2Learn’s Integrated Learning Platform for $6 Million Pedagogy Research Initiative | Desire2Learn Press Release.

2 quick thoughts1 on this:

  1. awesome! D2L really does play well with others, and invests in improving teaching and learning rather than just polishing shiny baubles.
  2. surely there is more to this than just predictive analytics. I’d love to see a pedagogical collaboration that was about in-the-trenches teaching (and learning) online, and not just massaging the data gathered about online activities. D2L has been trying to foster an online community of teachers (and others) in their D2L Community site23. It would be really cool to push that community up a few notches and open the doors so anyone can follow along (or join in).

Desire2Learn really feels like they care about teaching and learning – the Fusion conference last year was different from any other vendor conference I’ve been to, and felt decidedly like a good teaching-and-learning conference rather than a buy-our-shiny-products vendor conference.

  1. my own thoughts, not the official position of the university or anything []
  2. which is actually running in the D2L LMS itself []
  3. but it requires a login to see the stuff that goes on inside it []

goaccess live webserver stats on hippie hosting

I just installed the GoAccess apache log processing application on the Hippie Hosting Co-op server, giving users a way to watch the stats for their sites in realtime, without having to rely on privacy-invading analytics bugging software. This software works on the command line, so just SSH into your account and type:

goaccess -f statistics/logs/access_log

That tells goaccess to load with the logfile at the specified location. You can feed it other logfiles, but the default one for a Hippie Hosting account should be at statistics/logs/access_log.

It will prompt you for the type of log file. Select NCSA Combined (arrow-down, hit enter to select, then F10 to continue. yeah. intuitive software…)

It’ll give you something like this, updating live:

collusion – tracking and mapping links between websites

I’ve been pretty mindful about avoiding trackers on my site. I don’t use an external web analytics package (I do have the apache logs, crunched by AWStats, but nothing anywhere near the level of a Google Analytics or even WordPress Stats tracking). But, websites connect to other websites. That’s kind of their job. And other websites track stuff. So, even a website that doesn’t directly track people, by using YouTube videos and other hosted media, exposes people’s activity to those who track them.

I saw a post about Collusion – a Firefox add-on that maps links between websites, both the ones you go to directly, and the ones that send media and pull tracking info.

Here’s what about an hour’s worth of activity looks like, after letting Collusion monitor my browsing in Firefox:

The icons that are glowing are sites that I went to directly (all work-related, of course…), and the non-glowing icons are sites that either fed media to the sites I did visit, or who tracked my activity as a third party. Looking at my blog, with no third party tracking explicitly set up, there are still several sites indirectly monitoring activity of people.

That’s kind of creepy. The only way to completely avoid this is to host everything yourself, and never link to anything else. But that kind of goes against the whole purpose of this online-community-hootenanny thing…

still no analytics

It’s been almost 6 months since I killed all active analytics on my blogs. I scrubbed it of Google Analytics and WordPress.com Stats. The only numbers I get now are passive and highly aggregated and anonymized, webserver logs automatically crunched by Urchin.

I don’t miss the detailed active analytics one bit. I still find out if anyone links to my stuff, through the WordPress Dashboard links widget. But I have no clue about how many people read my stuff, nor how many RSS subscribers there are.

And that’s (still) highly liberating. I can’t let myself play egocentric mind games with numbers. I can’t delude myself into believing this space is Important, or *cringe* **popular** because those things aren’t real, and don’t matter.

So now, it’s still just me. And, maybe, a few others out there somewhere. And I don’t think I could ever go back to the number OCD of active analytics. I’ve let go of meaningless statistics.

WSJ on web trackers

The Wall Street Journal has been on a roll, looking at privacy online. The [latest article looks at the trackers, bugs, beacons, and cookies](http://blogs.wsj.com/wtk/) used by various websites to monitor you (and then share that data). For example, the simple site dictionary.com tracks a fair bit of data about visitors:

Screen shot 2010-08-04 at 11.22.37 AM.png

***234*** activity trackers. To look up the definition of a word.


(via [information aesthetics](http://infosthetics.com/archives/2010/08/what_they_know_how_websites_expose_visitors_to_monitoring.html))

I’m wondering what it’s going to take before we have some form of regulatory oversight on what is allowed to be collected, by whom, and how (if at all) it is allowed to be shared. We’ve stumbled our way into an extremely invasive and pervasive culture of active and passive monitoring of everything we do online. And since we do much of our communication and other activities online, it affects a significant portion of our lives. But we don’t seem to know or care…

If any government agency had proposed building a system capable of monitoring this much information about citizens, there would have been an uproar.

Oh. Wait. No, there wouldn’t. But people would have **totally** changed their Twitter avatars or something…

over one million served

I just cracked open the Google Analytics stats for my blog, and was curious to see how much data was available. I had it display all data (going back as far as November 16, 2005, which is apparently when I first started using Analytics). Google has tracked over 1 million page views on my blog. Over 600,000 unique visitors. The scale of that just blows my mind.