Community Detection on Twitter

I’ve been thinking a lot about how to visualize online presence and community. There are lots of great tools to do post-hoc analysis, but I’m thinking about something more realtime. It doesn’t exist yet, though. In the meantime, I’m playing around with the current tools to get a feel for what stories they can pull from the social graph data.

Yesterday, I followed the howto from Caleb Jones, to pull the social graph data from my Twitter account. The process took about 15 hours, because of Twitter’s helpful throttling of API calls. Thankfully, the twecoll python tool takes that into account and gracefully pauses when Twitter API tells it to cool it.

Once twecoll pulled out the raw data, I fed it into Gephi, and then followed Caleb’s howto for community detection.

I tweaked the layout a bit, played with the rendering settings, and came up with this:

dlnorman Twitter Graph 2015-01-05 v2
Community detection on @dlnorman, showing clusters of “edtech”, “LMS”, UCalgary” and “Calgary” network members.

There are a few main concentrations of people. The blue-ish one on the right is loosely “edtech folks” – but it’s strongly biased by “BC Edtech Folks”. The red patch at the top is “LMS-ish folks”, strongly represented by D2Lers. The far left is “UCalgary” – and it was able to pull out a cluster of official-ish accounts, student union accounts, and various other subclusters from UCalgary. The bottom left is loosely “Calgary” – and includes subclusters for politics, media, design, and cycling. Lots of overlap between design and cycling subclusters. Go figure.

Lessons learned from this exercise:

  1. It takes waaaay too long to do anything with this kind of community analysis on the fly. Post-hoc after-the-fact analysis is where things are now.
  2. Even with super-helpful scripts, the process is not something most people will do. And the new Gephi 0.9 is fantastic – but, again,It’s an excellent tool for researchers, and most people aren’t going to use it. The user experience for personal-social-network-analysis needs to come a long way before it can be used by everyone.
  3. Even with the pretty picture and community detection – so what? What can you actually do with this information. I have some ideas about that, but need to do some exploration first.

Update: I tweaked the layout. Here’s a better version of my twitter network graph with community detection:

dlnorman 2

discussion visualization with gephi

I’ve been playing around with gephi today, to see what I could come up with to display the discussion threads from my research data. Lots of manual data entry later, and I’ve got this:

and this:

WordPress sites are shown in red, Blackboard discussion forums in blue. So far, just a pretty picture, but I’ll hopefully be able to coax out a diagram or two that shows the difference in interaction patterns between the two platforms…