on tracking users

Audrey Watters has opted out of tracking people on her websites. It’s a good read. I agree 100%.

I’ve felt creeped out by the pervasive tracking networks online – analytics, ad networks, cookies, super-cookies, browser fingerprinting, etc…. This surveillance ecosystem is the end result of an arms race to find out about people reading web pages online. There are a few reasons, but my gut says it boils down to 2:

  1. Monetization
  2. Ego stroking

Monetization – to sell ads, in whatever flavour, requires metrics. How many people view a page? How many see an ad? How many click on it? How many then buy a product? How many return to the site? etc… So much data. I don’t sell anything, and don’t have ads1 and I don’t sell product placement guest posts (never have, never will). So, this isn’t a reason to track people on my site.

Ego stroking – this is actually a good reason to track readers. Is it worth the privacy violation? I don’t think it’s worth feeding Google’s surveillance machine, so I used a self-hosted copy of Piwik for a year. But, after thinking about it after reading Audrey’s post, I’ve stopped that, too2. My ego is just fine. I don’t need to be propped up. I do this for me. If people read it, hey, that’s great. If nobody (else) reads it, hey, that’s OK too. So, this isn’t a reason to track people on my site.

I just used the excellent Ghostery plugin for Firefox, to report web trackers. My site reports clean, with two exceptions: Gravatar and Google Analytics. Gravatar is from the WordPress comment system. It’s innocuous. Wait. Google Analytics? I don’t even USE Google Analytics! It’s there because I embedded a YouTube or Vimeo video. Which brings along all kinds of snooping trackers as part of the deal. Awesome. Once that post falls off the front page, Google Analytics will drop off (except on the pages that display that post and others with embedded media from YouTube etc…)

For work projects, though, it’s not as simple. There are variations of Reason #1 that are needed – although not monetization in the pure sense, I need to be able to answer the question “does anyone use the website or resource? Is it worth supporting it?” – there are 2 ways to answer that. The first is with web analytics. The second is with testimonials from users. I need to be able to provide both.

But, I won’t feed Google’s surveillance machine in order to meet my needs. So, I host a copy of Piwik on campus, and use it to track aggregated and anonymized web analytics. I’ve set it to not store full IP addresses – I have no idea which on-campus computers are accessing our stuff. I have no idea about any individual users. But, I can show traffic patterns and spikes, and that is important information when we’re planning support – I know exactly when spikes occur during semesters, and I know exactly when we need to develop support resources and have them online and available before the traffic spikes.

  1. I dabbled with Google Adsense on my blog several years ago – made some crazy cash and bought an expensive lens for my camera – then had an epiphany that it was creepy and not enough money to justify selling my soul – so I nuked all ads long ago. []
  2. I nuked it from my blog last night, but it will take some time for me to find and delete the tracking code in static pages and various project stuff []

Pearson and surveillance of students

Pearson is apparently monitoring social media, to detect signs of cheating during exams. That’s insanely creepy, and a horrible violation.

“And for those who think "Well, its Twitter, its public", remember this: So is walking down the street. But is it OK for the government to monitor us with street surveillance cameras and send us fines for not crossing with the crosswalk?”

via Pearson Caught Spying On Students. Big Brother Is Here.

I’m going to go on record with this: I will do everything I can to prevent this kind of surveillance culture at my university. Thankfully, we have a highly student-centric administration, and I can’t imagine something like this taking hold on our campus. But, we have recurring requests from instructors for TurnItIn or similar tools, which I see as the thin edge of the wedge leading to Pearson and Tracx. (their whitepaper bragging about this stuff is down, and a version of that whitepaper archived in 2014 only mentions staff use of social media)

With Pearson pushing into the LMS with textbook integrations (and now, SIS integration as well), this stuff makes me really nervous. This takes Creepy Treehouse to a whole new level.

schneier on wiretapping the internet

from [Bruce Schneier](http://www.schneier.com/blog/archives/2010/09/wiretapping_the.html):

> Formerly reserved for totalitarian countries, this wholesale surveillance of citizens has moved into the democratic world as well. Governments like Sweden, Canada and the United Kingdom are debating or passing laws giving their police new powers of internet surveillance, in many cases requiring communications system providers to redesign products and services they sell. More are passing data retention laws, forcing companies to retain customer data in case they might need to be investigated later.

and

> Any surveillance system invites both criminal appropriation and government abuse. Function creep is the most obvious abuse: New police powers, enacted to fight terrorism, are already used in situations of conventional nonterrorist crime. Internet surveillance and control will be no different.

and

> An infrastructure conducive to surveillance and control invites surveillance and control, both by the people you expect and the people you don’t. Any surveillance and control system must itself be secured, and we’re not very good at that. Why does anyone think that only authorized law enforcement will mine collected internet data or eavesdrop on Skype and IM conversations?

and the clincher:

> It’s bad civic hygiene to build technologies that could someday be used to facilitate a police state. No matter what the eavesdroppers say, these systems cost too much and put us all at greater risk.

Building the technology to support pervasive surveillance is harmful. Participating in that form of surveillance, even/especially in exchange for free zombie-super-poking apps, is a shameful waste of liberty.

Google is watching us

Google has been powering almost all search queries for an eternity in internet years. It knows an awful lot about what we all search for. And they keep pushing into new ways to index data and mine the activity of people.

It started out pretty simple:

* Public content on the web (web page)
* Search queries
* Websites viewed as a result of search queries

And they kept adding individually trackable data on:

* [Google Groups](http://groups.google.com/) (including the entire known history of Usenet)
* posted content
* activity patterns (who responds to whom, etc…)
* [Analytics](http://www.google.com/analytics/)
* tracking visitors to enabled websites, including data about where they are, what they do on the site, how they found the site, and what they click on
* Advertising tracking
* every page that serves you ads. and how you got there. and what you did there. and where you went afterward.
* [Adsense](https://www.google.com/adsense/)
* if you put adsense ads on your site, they have your banking info as well.
* [Doubleclick](http://doubleclick.com/)
* [Google Alerts](http://www.google.com/alerts) – search queries that you care enough to subscribe to.
* [GMail](http://mail.google.com)
* all messages you send or receive
* contacts
* connection info when using other protocols (POP, IMAP, SMTP)
* [Google Talk](http://www.google.com/talk)
* chat data
* contacts
* activity patterns (chat status, times active, locations, etc…)
* [Google Calendar](http://calendar.google.com/)
* calendar data
* contacts
* who has access to shared calendars
* who is invited to your events
* subscriptions
* what calendars do you subscribe to?
* who subscribes to your calendars?
* [Google Contacts](http://www.google.com/contacts) – your addressbook. everyone you know.
* [Orkut](http://www.orkut.com) – professional social network (if you’re Brazillian…)
* [Google Docs](http://docs.google.com)
* document content
* contacts and activity
* collaborators
* viewers of published documents
* [Maps](http://maps.google.ca/) & [Earth](http://earth.google.com)
* your location via GPS, cell, IP location, etc…
* searched locations
* customized maps (paths, locations, notes, areas of interest)
* directions (from, to, method of travel)
* [Street View](http://www.google.com/intl/en_us/help/maps/streetview/)
* image of locations
* data sniffed by camera vehicle
* WIFI hotspot location matching
* Satellite view
* [Google DNS](http://code.google.com/speed/public-dns/)
* every server your computer contacts, on any protocol, including the time, location, and IP address of your requests.
* [Bookmark sync](http://www.google.com/tools/firefox/browsersync/)
* [Tasks](http://mail.google.com/mail/help/tasks/)
* [GReader](http://www.google.com/reader/)
* subscribed feeds
* activity
* read items
* starred items
* shared items
* time and location of user activity while reading feeds
* people you follow, and what they do
* people who follow you, and what they do
* [Feedburner](http://feedburner.google.com)
* Everyone that subscribes to a blog powered by Feedburner
* who they are
* where they are
* what app(s) they use to read feeds
* matching other sites of interest to those subscribers
* [News](http://news.google.com)
* sources of news read by a person
* news items read
* [Picasa Galleries](http://picasaweb.google.com/)
* photo data
* geolocation (time and place of photos)
* contacts (invited to view photos)
* face recognition (are you in a crowd photo somewhere?)
* [Translated content](http://translate.google.com/?hl=en#)
* pages translated
* source and target language(s)
* Online video ([Google Video](http://video.google.com/) and [YouTube](http://www.youtube.com) )
* content uploaded
* activity
* views
* searches
* comments
* contacts (subscriptions, etc…)
* faves, playlists, etc…
* [iGoogle](http://www.google.com/ig) Gadgets – which gadgets and data sources are grouped together? which ones used most often?
* [Google Desktop](http://desktop.google.com/)
* searches (for Google query lookup)
* usage data
* [Google Books](http://books.google.com)
* which online books you read
* which parts of these books you read
* [Google Notebook](http://www.google.com/notebook) – content of notes
* [Google Wave](https://wave.google.com/wave/) (if anybody used it)
* content
* activity
* contacts
* other apps and data that you integrate
* [Buzz](http://www.google.com/buzz) – wtf is buzz, anyway? but they index whatever it is…
* [Google Checkout](https://checkout.google.com/main)
* purchase history – merchant, item, price, time and location, etc…
* credit card info
* [Google Health](http://health.google.com)
* medical history
* hospitals and clinics you’ve used
* prescriptions
* [Android](http://www.android.com/)
* where you are (location data sent by phone)
* what apps you have
* who you call, and who calls you

And now they’re adding:

* Visitors to websites using [WebFonts](http://code.google.com/webfonts)
* [TV viewing activity](http://en.wikipedia.org/wiki/Google_TV)
* Web apps through the proposed [Chrome Web Store](https://chrome.google.com/webstore)

I’m probably missing a bunch of stuff. Much of this is pretty innocuous. Much of it is opt-in, or voluntarily contributed. But the sheer scope and scale of the managed data, and the widely varied sources of the data, make it potentially possible for some interesting connections to be made. Sure, much of it is claimed to be anonymized, but there’s [not really any such thing as true anonymity](http://www.readwriteweb.com/archives/eff_your_browser_has_a_fingerprint.php).

What is to stop Google from connecting the dots to say “*show me a list of people who have searched for ‘alternative medicine’ who have visited an out of country clinic, have a history of cancer, and have searched for ‘google jobs’ and ‘insurance plan’?*”

My point is, if any government agency proposed tracking this level of data on individuals, there would be (should be) riots in the streets. At the very least, it would be a high profile election issue.

But, we just accept it. Google makes things easy, so we just ignore what’s going on. People complain about the “evil” of the iPhone App Store, because fart apps are not approved. But we then ignore that this much data is being systematically collected on us, by a company that chants “do no evil.”

This isn’t meant as a paranoid “the government is keeping aliens in area 51, and cars that run on water, man. WATER!” post. But there is something big going on, and we’re complicit in it.

**Update**:

Oh, and [they get private user data from social networking sites through advertising, without user’s consent](http://tiny.darcynorman.net/7b) . Nice.