more on going stealth online

I’ve been trying to extricate myself from Google’s All Seeing Gaze. (for more info on why, see this article linked by @brlamb).

There are plugins and opt-out cookies etc… but all of those work only in the browser. Often, in just a specific browser. I think I’ve found a better way. No opt-out. Works for any app that touches The Tubes.

Just modify your /etc/hosts file to include the contents of this great shared .hosts file. All requests for nefarious tracking servers will be dumped to 127.0.0.1 (your own computer) rather than routed out to The Big Snoops In The Ether. Some semblance of privacy, without having to opt out in every browser you use.

The sample file had ad.doubleclick.net commented out because it breaks sears.com and other sites who somehow route actual content through the ad tracking network. I say, if a site is that evilly designed, screw ’em. I’ve uncommented the line and am blocking all requests for known doubleclick servers.

Also, I switched my DNS away from the convenient and fast Google DNS servers. Sure, they’re fast, but using their DNS servers means they’re able to see everything I do online, no matter what app, no matter what protocol. No, thanks.

Finally, I’ve stopped using Google Quick Search Box. It’d probably be OK to just turn off the “send usage data to Google” and “suggest web pages…” settings, but I’m reverting to just using Spotlight instead. It’s local. It doesn’t report stuff to The Cloud.

Google is watching us

Google has been powering almost all search queries for an eternity in internet years. It knows an awful lot about what we all search for. And they keep pushing into new ways to index data and mine the activity of people.

It started out pretty simple:

* Public content on the web (web page)
* Search queries
* Websites viewed as a result of search queries

And they kept adding individually trackable data on:

* [Google Groups](http://groups.google.com/) (including the entire known history of Usenet)
* posted content
* activity patterns (who responds to whom, etc…)
* [Analytics](http://www.google.com/analytics/)
* tracking visitors to enabled websites, including data about where they are, what they do on the site, how they found the site, and what they click on
* Advertising tracking
* every page that serves you ads. and how you got there. and what you did there. and where you went afterward.
* [Adsense](https://www.google.com/adsense/)
* if you put adsense ads on your site, they have your banking info as well.
* [Doubleclick](http://doubleclick.com/)
* [Google Alerts](http://www.google.com/alerts) – search queries that you care enough to subscribe to.
* [GMail](http://mail.google.com)
* all messages you send or receive
* contacts
* connection info when using other protocols (POP, IMAP, SMTP)
* [Google Talk](http://www.google.com/talk)
* chat data
* contacts
* activity patterns (chat status, times active, locations, etc…)
* [Google Calendar](http://calendar.google.com/)
* calendar data
* contacts
* who has access to shared calendars
* who is invited to your events
* subscriptions
* what calendars do you subscribe to?
* who subscribes to your calendars?
* [Google Contacts](http://www.google.com/contacts) – your addressbook. everyone you know.
* [Orkut](http://www.orkut.com) – professional social network (if you’re Brazillian…)
* [Google Docs](http://docs.google.com)
* document content
* contacts and activity
* collaborators
* viewers of published documents
* [Maps](http://maps.google.ca/) & [Earth](http://earth.google.com)
* your location via GPS, cell, IP location, etc…
* searched locations
* customized maps (paths, locations, notes, areas of interest)
* directions (from, to, method of travel)
* [Street View](http://www.google.com/intl/en_us/help/maps/streetview/)
* image of locations
* data sniffed by camera vehicle
* WIFI hotspot location matching
* Satellite view
* [Google DNS](http://code.google.com/speed/public-dns/)
* every server your computer contacts, on any protocol, including the time, location, and IP address of your requests.
* [Bookmark sync](http://www.google.com/tools/firefox/browsersync/)
* [Tasks](http://mail.google.com/mail/help/tasks/)
* [GReader](http://www.google.com/reader/)
* subscribed feeds
* activity
* read items
* starred items
* shared items
* time and location of user activity while reading feeds
* people you follow, and what they do
* people who follow you, and what they do
* [Feedburner](http://feedburner.google.com)
* Everyone that subscribes to a blog powered by Feedburner
* who they are
* where they are
* what app(s) they use to read feeds
* matching other sites of interest to those subscribers
* [News](http://news.google.com)
* sources of news read by a person
* news items read
* [Picasa Galleries](http://picasaweb.google.com/)
* photo data
* geolocation (time and place of photos)
* contacts (invited to view photos)
* face recognition (are you in a crowd photo somewhere?)
* [Translated content](http://translate.google.com/?hl=en#)
* pages translated
* source and target language(s)
* Online video ([Google Video](http://video.google.com/) and [YouTube](http://www.youtube.com) )
* content uploaded
* activity
* views
* searches
* comments
* contacts (subscriptions, etc…)
* faves, playlists, etc…
* [iGoogle](http://www.google.com/ig) Gadgets – which gadgets and data sources are grouped together? which ones used most often?
* [Google Desktop](http://desktop.google.com/)
* searches (for Google query lookup)
* usage data
* [Google Books](http://books.google.com)
* which online books you read
* which parts of these books you read
* [Google Notebook](http://www.google.com/notebook) – content of notes
* [Google Wave](https://wave.google.com/wave/) (if anybody used it)
* content
* activity
* contacts
* other apps and data that you integrate
* [Buzz](http://www.google.com/buzz) – wtf is buzz, anyway? but they index whatever it is…
* [Google Checkout](https://checkout.google.com/main)
* purchase history – merchant, item, price, time and location, etc…
* credit card info
* [Google Health](http://health.google.com)
* medical history
* hospitals and clinics you’ve used
* prescriptions
* [Android](http://www.android.com/)
* where you are (location data sent by phone)
* what apps you have
* who you call, and who calls you

And now they’re adding:

* Visitors to websites using [WebFonts](http://code.google.com/webfonts)
* [TV viewing activity](http://en.wikipedia.org/wiki/Google_TV)
* Web apps through the proposed [Chrome Web Store](https://chrome.google.com/webstore)

I’m probably missing a bunch of stuff. Much of this is pretty innocuous. Much of it is opt-in, or voluntarily contributed. But the sheer scope and scale of the managed data, and the widely varied sources of the data, make it potentially possible for some interesting connections to be made. Sure, much of it is claimed to be anonymized, but there’s [not really any such thing as true anonymity](http://www.readwriteweb.com/archives/eff_your_browser_has_a_fingerprint.php).

What is to stop Google from connecting the dots to say “*show me a list of people who have searched for ‘alternative medicine’ who have visited an out of country clinic, have a history of cancer, and have searched for ‘google jobs’ and ‘insurance plan’?*”

My point is, if any government agency proposed tracking this level of data on individuals, there would be (should be) riots in the streets. At the very least, it would be a high profile election issue.

But, we just accept it. Google makes things easy, so we just ignore what’s going on. People complain about the “evil” of the iPhone App Store, because fart apps are not approved. But we then ignore that this much data is being systematically collected on us, by a company that chants “do no evil.”

This isn’t meant as a paranoid “the government is keeping aliens in area 51, and cars that run on water, man. WATER!” post. But there is something big going on, and we’re complicit in it.

**Update**:

Oh, and [they get private user data from social networking sites through advertising, without user’s consent](http://tiny.darcynorman.net/7b) . Nice.

over one million served

I just cracked open the Google Analytics stats for my blog, and was curious to see how much data was available. I had it display all data (going back as far as November 16, 2005, which is apparently when I first started using Analytics). Google has tracked over 1 million page views on my blog. Over 600,000 unique visitors. The scale of that just blows my mind.

visitors_views
stats_overview

Google Earth on iPod Touch

Google Earth. On my iPod Touch. Seriously. This app is fracking amazing. Pinch to zoom or rotate. Tilt the iPod to tilt the view. The controls are so smooth and intuitive that I was actually disappointed when the view didn’t rotate as I spun my chair around. Maybe on a fancy schmancy iPhone 3G? Still – VERY cool app. Well done, Google Earth team!

Network vs. Machine

Cole wrote a post about how his Twitter network helped him solve a problem. His blog suddenly decided to stop accepting comments, and he wasn’t sure how that happened, or how to fix it. I was just going to post this as a comment on his blog, but, well, it’s still not accepting comments 😉 (and I apologize if this post comes across as snarky – not intended to – it’s just a pre-caffeinated response to a blog, first thing in the morning…)

Posting a question to the Network via Twitter etc. is great, and it really IS impressive that people provide answers so quickly. But one thing that I wonder about is the reliance on other people rather than our own referencing and querying skills. I’m probably more guilty of this than anyone I know – heck, I have a whole tag of “lazyweb” posts here on my blog.

What I find puzzling, and I’m not meaning to pick on Cole here, is that the same answers to the same question could have been found in less than 5 seconds with a properly worded Google query. Like this, for instance:

Google Query for wordpress enable comments on all posts
Google Query for wordpress enable comments on all posts

The trick is to know roughly what you’re looking for. Key words like “enable comments” might not just roll off the fingertips of everyone with the problem. But variations might work as well.

I’m really NOT trying to discount the power of the Network in pooling resources and brains, but we also need to remember that we have tools at our own fingertips to help enlist the huge databases of the Machine to help find information to solve problems independently.

on google and the recursive cycle of spam

The spam problem has been the bane of openly available “web 2.0” sites since, well, forever. Everyone universally hates spam. Everyone, universally, wants to see it go away. Why is it still a problem?

Wait. Not everyone wants it to go away. There are two groups of people who benefit from spam.

  • spammers
  • google

Of course spammers won’t stop – they have a money factory running, and are locked in an arms race against the global online community in an effort to game ever larger lumps of cash from Google.

Google says they want it to stop. They came up with a wonderful solution that would have stopped spam in its tracks – the only downside was that the solution would have destroyed the network effects of the web by negating links. Baby? Meet bathwater. Meet half-assed “solution” that lets Google say “hey! we tried! Really we did!”

But, why did Google stop at a half-assed solution? Why not go fully-assed? Because they benefit from spam. Every time some moron stupidly clicks on a spam factory’s Google ads, Google gets a cut, and they happily send cash to the spammer.

Recursive Cycle of Spam

The evil spam roaches inflict their spam on the various “web 2.0” resources – anything that has an open form intended to foster dialogue and conversation – this spam gets indexed by Google, who then send the roaches a cut of all proceeds from the ads on those spam factory websites.

Anyone else see a conflict of interest here?

There is an easy solution.

Google: to stop the spam, you have to stop paying the spammers.

How to do that? Well, I’m not a multi-bajillion-dollar company stuffed to the rafters with PhDs or anything, but how about this for a start:

If someone reports a website as a spam factory, their adsense revenue goes into an escrow-like state until it can be shown to NOT be spam. They don’t lose any money if they’re legit, but they have the opportunity to lose their revenue if they are shown to be evil spam roaches. What to do with the revenue seized from verified spam factory adsense accounts? Google can’t keep it – it just maintains the conflict of interest. They should donate it all to the EFF or something similar.

Photo credits:

1 Month with Google Reader

I can’t believe it’s been a whole month since I started trying out Google Reader (GR) full time. I wanted to see if I could live in a browser-based aggregator, and was curious about how far it had come since the early days.

The short version is: it’s less efficient at reading boatloads of feeds and items. But, the always-on, available-anywhere design of GR makes it worthwhile.

The long version is, well, longer. I still much of the niceties of BlogBridge (BB). Things like having a “photo gallery” view, for viewing images in feeds (I subscribe to a fair number of Flickr tag feeds, so this is quite handy). I’ve got a workaround for the star ratings that BB uses – I’ve created two “tags” in GR: “5-stars” and “4-stars” and have applied them to appropriate feeds. That definitely helps prioritize reading important stuff from all of my feeds/tags without having to hunt for them. Because it’s browser based, I can use native del.icio.us interfaces, so that feature from BB isn’t missed. The most annoying thing I’ve found with GR isn’t directly GR’s fault. I have to do a fair bit of clicking to get through all of my tags. I need to do some more work to add appropriate feeds to “5-stars”, “4-stars”, “3-stars” etc… so I can focus on levels of importance rather than subjects.

I do like the “trends” view in GR. Not because it is helpful in organizing or accessing information (it isn’t), but it’s kinda interesting in its own right. Here’s a screenshot as of 5 minutes ago:

Google Reader Trends - first month

I’m a bit surprised at just how much I’m reading. Almost 18,000 items in a month? I’d have never guessed that. Actually, almost half of that isn’t really “reading” per se, but “viewing”. Photos from Flickr. Which is why the “photo gallery” view would be great.

There are some shortcomings.

  1. I’ve got a nagging feeling that by using GR, I am continuing to “feed the beast” – by teaching Google about what interests me, and by providing guidance about relationships between feeds and items.
  2. There isn’t a “blogroll” or live OPML view of my tags/folders. BlogBridge lets me publish tags as live OPML documents, which is how my edublogs directory is managed. There isn’t currently a way to replicate that from within GR. Yes, I could periodically export a tag as an OPML file, and post that somewhere. Not the same.

All in all, I think I’ll keep using Google Reader for now. I’ll have to figure out how to reconcile my feed subscriptions with BB so that I can keep maintaining the edublogs directory, but that will work itself out somehow.

Heading back to BlogBridge

I tried. I really did. I wanted to give Google Reader a full week to see how well it works as a full-time feed aggregator.

I couldn't do it.

My morning check-in took 5 times longer than normal this morning. Google Reader seems like it would be nice for a small set of feeds, but it becomes unwieldy on my subscriptions. Endless scrolling, lots of clicking on folders, and waiting for items to be added to the bottom of the page, with no indication of how far you've come through the items in a folder (the scroll bar eventually becomes pegged at the bottom, even if there are 300 items left to read). And GR has no concept of a photo feed, so they're all displayed inline rather than in a grid, making it take an order of magnitude longer to go through my Flickr feeds. Frustrating.

GR has no real concept of ratings for feeds. I can star feed items, but not feeds. I can tag a feed with "5 stars" or the like, but GR doesn't know to treat that feed any differently (like bubble items from a  "5 star" feed to the top of a list, etc…

So, I'm back to BlogBridge.  Ahhhh… that's better. There's no place like home… 

I tried. I really did. I wanted to give Google Reader a full week to see how well it works as a full-time feed aggregator.

I couldn't do it.

My morning check-in took 5 times longer than normal this morning. Google Reader seems like it would be nice for a small set of feeds, but it becomes unwieldy on my subscriptions. Endless scrolling, lots of clicking on folders, and waiting for items to be added to the bottom of the page, with no indication of how far you've come through the items in a folder (the scroll bar eventually becomes pegged at the bottom, even if there are 300 items left to read). And GR has no concept of a photo feed, so they're all displayed inline rather than in a grid, making it take an order of magnitude longer to go through my Flickr feeds. Frustrating.

GR has no real concept of ratings for feeds. I can star feed items, but not feeds. I can tag a feed with "5 stars" or the like, but GR doesn't know to treat that feed any differently (like bubble items from a  "5 star" feed to the top of a list, etc…

So, I'm back to BlogBridge.  Ahhhh… that's better. There's no place like home… 

Trying Google Reader Again

I've been a raving, drooling BlogBridge fanboy for some time now. It's the best darned desktop aggregator I've used. That hasn't changed.

But, with all of the cool kids using Google Reader, I decided it's time to really give it a chance again. I dropped it like it's hot the last time I tried it because it doesn't have a feed star rating system, nor smart feeds. But, it's got a pretty flexible feed tagging system, which can be easily cajoled into performing these duties.

So, I just imported my feeds from BlogBridge to Google Reader via OPML, and I'll try giving it a shot for a week or so. I'm liking it after just a few minutes, but I'm not sure I can really switch away from BlogBridge.

I added a new tag called "5-stars" and tagged a bunch of feeds with it. By viewing new items in that tag, I can simulate the 5-star smart feed in BlogBridge. I can add 4-stars and 3-stars etc… as needed. Here's what my 5-stars tag looks like right now:

 

I'll keep trying it out for a week or so, and if I'm still using it then, I'll likely stick with it. So far, the single biggest reason to move to Google Reader is that it can actually parse the feed from OLDaily, which I've been missing for a couple of months now (BlogBridge has had trouble dealing with some of the slightly off-spec portions of that feed, but GR chews through it without complaining).

Update: Firefox has locked up on me twice now, forcing me to restart it. Safari is downright jittery when using Google Reader, so I'll have to deal with it. On the up side, synchronicity dropped this guide to "Getting Good with Google Reader" into my reader… 

I've been a raving, drooling BlogBridge fanboy for some time now. It's the best darned desktop aggregator I've used. That hasn't changed.

But, with all of the cool kids using Google Reader, I decided it's time to really give it a chance again. I dropped it like it's hot the last time I tried it because it doesn't have a feed star rating system, nor smart feeds. But, it's got a pretty flexible feed tagging system, which can be easily cajoled into performing these duties.

So, I just imported my feeds from BlogBridge to Google Reader via OPML, and I'll try giving it a shot for a week or so. I'm liking it after just a few minutes, but I'm not sure I can really switch away from BlogBridge.

I added a new tag called "5-stars" and tagged a bunch of feeds with it. By viewing new items in that tag, I can simulate the 5-star smart feed in BlogBridge. I can add 4-stars and 3-stars etc… as needed. Here's what my 5-stars tag looks like right now:

 

I'll keep trying it out for a week or so, and if I'm still using it then, I'll likely stick with it. So far, the single biggest reason to move to Google Reader is that it can actually parse the feed from OLDaily, which I've been missing for a couple of months now (BlogBridge has had trouble dealing with some of the slightly off-spec portions of that feed, but GR chews through it without complaining).

Update: Firefox has locked up on me twice now, forcing me to restart it. Safari is downright jittery when using Google Reader, so I'll have to deal with it. On the up side, synchronicity dropped this guide to "Getting Good with Google Reader" into my reader…