discussion visualization with gephi

I’ve been playing around with gephi today, to see what I could come up with to display the discussion threads from my research data. Lots of manual data entry later, and I’ve got this:

and this:

WordPress sites are shown in red, Blackboard discussion forums in blue. So far, just a pretty picture, but I’ll hopefully be able to coax out a diagram or two that shows the difference in interaction patterns between the two platforms…

discussion network visualization

I just put together some quick network maps for the online discussions from my thesis research data. Haven’t done any analysis – just some purty pictures to see any at-a-glance differences:

Both discussion platforms had about the same number of posts and responses, but the pattern of connections is markedly different for some reason…

aggregated metadata for online discussions

here’s a quick look at the aggregated metadata for all of the online discussions I’m using in my thesis:

About the same number of posts in each platform, with a bit more of a time-spread in the WordPress discussions, substantially longer posts in WordPress, about the same (non) use of images, more links in WordPress posts, and more attachments in Blackboard posts.

basic metadata analysis

Here’s a quick pass at analyzing the basic metadata for the online discussions.

I plotted a few calculated values (Excel pivot tables fracking ROCK, BTW…), to try to compare activity patterns. What’s interesting in this graph is the average wordcount (green line) – low for the Blackboard discussion board threads (the left 5 items) and markedly higher for the 8 student blog (the right 8 items).

The number of posts in each discussion (dark blue line) is relatively consistent across all discussions. Slightly lower for the WordPress blog sites, but not dramatically so.

Also interesting is the red line – standard deviation of the “day of course” for posts. It’s a rough estimate at how rapidly posts occur – a low standard deviation indicates the posts occurred relatively close together on the calendar. A high value indicates the posts occurred over a longer spread of days. This suggests that Blackboard posts were added in brief, rapid bursts, while the WordPress posts and comments were posted over longer durations. People kept coming back to blog posts long after they were started. Interesting. There could be a number of reasons for this – it’s easier to see Bb discussion boards all in one place – and easier to forget to check various blogs for activity, etc… Or, do they just reflect more, and more deeply on blogs? Interesting… I’d love to find out the reasons behind the different values…

So… The WordPress discussions occurred over longer periods, using slightly fewer posts/responses, but with dramatically longer posts than was seen in the Blackboard discussions…

on visualizing online discussions

For my MSc thesis research, I’m working with a bunch of data collected through online discussions during a blended course. Part of the discussions took place using Blackboard’s discussion board feature, part took place on students’ blogs. One of the things I need to do is to document how the discussions played out, to try and tease out any differences between the two venues. I’ll be using the Community of Inquiry model to describe the social/teaching/cognitive components of posts, but I’ve been wanting to describe the flow of discussion as well. How do the discussions occur? Are there patterns of activity, in time or size of responses? I’ve been struggling with how to document these. In my thesis, it’s really just a glorified case study, so I’ve had to constantly force myself to stop thinking of it as controlled experimental data. What I’m doing is describing the activity within a single course, in 2 venues of online discussion.

I had a bit of an epiphany this afternoon, while working through some preliminary work to prep for CoI coding. I thought about Hans Rosling’s statistic visualizations and how he was able to incorporate several axes of data into a graph by using size, colour, shape, etc…

And then it hit me – it would be relatively straightforward to apply that approach to the data documenting an online discussion. The timestamp data is there. The info about the individual is there. Basic “demographic” data is there (number of words, types of things included – images, links, attachments, media, etc…), and if I combine those, I get something like this:

On this rough mockup visualization, time is the vertical axis, transformed into a simple “number of days” integer. The horizontal axis is “threads of discussion.” This displays the discussion in a “FAQ” discussion board used in the course. There were 9 primary threads (plus one forked thread).

Each circle represents a post. The size of the circle represents the number of words in a post or response – in this mockup, I just did a simple conversion where the number of words directly translated into the width of the circle (a post with 100 words is 1.00″, a post with 50 words is .50″, a post with 150 words is 1.50″ etc…). The colour of the circle indicates the person who posted it. White circles are the instructor. Black circles are anonymous students (who did not provide consent to participate in the research, so the content of their posts was deleted from my working archive), and other colours indicating individual students.

This is a very rough mockup. I’m hoping to refine it a bit more, to include a way to represent the CoI coding for each message – an indicator of the relative social/cognitive/teaching aspect of the post, as well as a way to indicate other interesting things about a post (how many images/links/attachments/embedded media were included? etc…)

Problems with the mockup:

  1. It’s messy when posts occur close together. Overlap makes the circles obscure each other.
  2. The literal translation of wordcount to size means larger posts overwhelm the other posts in the diagram, in a way that over-represents the difference as seen in the actual discussion (a post that is 5x the size of another post doesn’t necessarily drown out the other posts, but it is given prominent emphasis in the diagram…)
  3. Forking of threads could get confusing – how to best indicate the branch points? I tried with a dotted line, but it’s unclear which post/circle it originates from…
  4. threads that are displayed beside each other may not be directly related, but they may appear to be intertwined because of the overlap of circles (a large post in thread 6 overlaps threads 5 and 7, etc…)

I’d like to extend the mockup, after figuring out ways to get around these issues, to show all posts in all discussions in the entire course. It should be interesting to see the temporal overlap between discussions, and see some data about patterns of interaction from participants across the entire thing – does a given participant start most threads? do they respond with giant posts? do they stay in one CoI aspect, or do they cover the whole thing? etc…

I would love to see a large visualization, with vertical lanes for each thread in an entire course, across all venues of online discussion, with posts displayed as shown above, and with the CoI coding indicated. What better way to compare activity across discussions in a course?

It strikes me that this visualization is extremely simple – perhaps too simple? perhaps so obvious in hindsight that someone else has already come up with a solution? Scott Leslie sent me a link to Boardtracker, which looks extremely interesting, but it looks like it’s strictly based on time and not threads, and doesn’t appear to handle representing individual contributions. Also, it appears to be under construction…

update: I was thinking about the overly-large-circle problem, and wondered what the diagram would look like if it was laid out more like an autoradiogram, with opacity of a block indicating the “size” of a contribution, and symbols overlaid to represent data like contributor and potentially coding info…

Size of contribution (wordcount) is the opacity of each block. The coloured circle represents the contributor (white is instructor, black is anonymous, etc…) This representation makes it harder to see at a glance, but probably displays the conversation more accurately.

update 2: working in some of Tim’s suggestions via his comment, I came up with this version. It’s a little closer to Rosling’s work. Now, I need to figure out how to indicate the CoI coding for each post…

update 3: I put all of the metadata from the Blackboard discussions, and one WordPress site, into OmniGraphSketcher to see what it would look like. Some interesting things become apparent:

Blackboard posts (and responses) are circles, WordPress posts (and comments) are diamonds. At a glance, discussion board interactions appear to be briefer – fewer words – and more immediate (posts usually occur within a few days, and then stop). Blog posts appear to be longer (more words), and extend conversation over a longer period – with several days being common between post and comment. The WordPress blog posts also appear to have elicited longer responses via comments (at least in the first WordPress site I entered data for…)

Visualization tools that may be useful:

  • SNAPP – works with major LMS applications, but appears to not like our old version of Blackboard (Bb8), and doesn’t grok WordPress, so couldn’t be used to visualize my entire data set.
  • Meerkat – sounds like it might support custom data imports. I’ve signed up for an account so I can try it out.
  • AGNA
  • DiscoverText

Ben Cowie on introducing first-year geoscience students to the primary scientific literature in a large classroom setting

What a fantastic series of posts by Dr. Ben Cowie, a geology prof here at the UofC. He worked with his first-year undergrads, on going to primary research lit, rather than just settling for teh wikarpedia.

  • Part 1: the motivation and desire to initiate this program
  • Part 2: the implementation of the work
  • Part 3: details of how the students handled the material and what the most commonly used strategies were for the students

so good. so happy he’s teaching here at UCalgary, and that he’s blogging the stuff he’s doing.

Motion capture

I bumped into a computer science prof who was lugging a cart to get coffee. On the cart was a big homebrew remote controlled car, with a Microsoft Kinnect sensor strapped to it. Turns out, they use it to capture the motions of athletes. They use the data both to analyse the motion later, and to provide immediate audio feedback to the athlete. They’re working on a model that can follow speed skaters, at full speed, around the track. Awesome!