discussion network visualization

I just put together some quick network maps for the online discussions from my thesis research data. Haven’t done any analysis – just some purty pictures to see any at-a-glance differences:

Both discussion platforms had about the same number of posts and responses, but the pattern of connections is markedly different for some reason…

aggregated metadata for online discussions

here’s a quick look at the aggregated metadata for all of the online discussions I’m using in my thesis:

About the same number of posts in each platform, with a bit more of a time-spread in the WordPress discussions, substantially longer posts in WordPress, about the same (non) use of images, more links in WordPress posts, and more attachments in Blackboard posts.

visualizing

spent a sick day being productive, crunching data for my thesis.

basic metadata analysis

Here’s a quick pass at analyzing the basic metadata for the online discussions.

I plotted a few calculated values (Excel pivot tables fracking ROCK, BTW…), to try to compare activity patterns. What’s interesting in this graph is the average wordcount (green line) – low for the Blackboard discussion board threads (the left 5 items) and markedly higher for the 8 student blog (the right 8 items).

The number of posts in each discussion (dark blue line) is relatively consistent across all discussions. Slightly lower for the WordPress blog sites, but not dramatically so.

Also interesting is the red line – standard deviation of the “day of course” for posts. It’s a rough estimate at how rapidly posts occur – a low standard deviation indicates the posts occurred relatively close together on the calendar. A high value indicates the posts occurred over a longer spread of days. This suggests that Blackboard posts were added in brief, rapid bursts, while the WordPress posts and comments were posted over longer durations. People kept coming back to blog posts long after they were started. Interesting. There could be a number of reasons for this – it’s easier to see Bb discussion boards all in one place – and easier to forget to check various blogs for activity, etc… Or, do they just reflect more, and more deeply on blogs? Interesting… I’d love to find out the reasons behind the different values…

So… The WordPress discussions occurred over longer periods, using slightly fewer posts/responses, but with dramatically longer posts than was seen in the Blackboard discussions…

full online discussion metadata visualization

I’ve finally entered all of the metadata information for the online discussions I’m using in my thesis. This includes the person who posts something, the date, and the size of the post. I worked through my earlier visualization mockup, and wanted to try it with the full set of data. So, here’s the Blackboard discussions (top image) and WordPress blog posts (bottom image):

It’s only the most basic of metadata, but already differences in activity patterns are becoming apparent. Both images are on the same time- and size- scales. The WordPress discussions appear to be using significantly longer posts and comments, spread over much more time. Blackboard discussions appear to be shorter posts, over briefer durations.

Next up, I get to code each post for Community of Inquiry model “presences” – as described by indicators for social, cognitive and teaching contributions in the posts. I’ll figure out some way to overlay that information on top of the basic metadata visualization.

on visualizing online discussions

For my MSc thesis research, I’m working with a bunch of data collected through online discussions during a blended course. Part of the discussions took place using Blackboard’s discussion board feature, part took place on students’ blogs. One of the things I need to do is to document how the discussions played out, to try and tease out any differences between the two venues. I’ll be using the Community of Inquiry model to describe the social/teaching/cognitive components of posts, but I’ve been wanting to describe the flow of discussion as well. How do the discussions occur? Are there patterns of activity, in time or size of responses? I’ve been struggling with how to document these. In my thesis, it’s really just a glorified case study, so I’ve had to constantly force myself to stop thinking of it as controlled experimental data. What I’m doing is describing the activity within a single course, in 2 venues of online discussion.

I had a bit of an epiphany this afternoon, while working through some preliminary work to prep for CoI coding. I thought about Hans Rosling’s statistic visualizations and how he was able to incorporate several axes of data into a graph by using size, colour, shape, etc…

And then it hit me – it would be relatively straightforward to apply that approach to the data documenting an online discussion. The timestamp data is there. The info about the individual is there. Basic “demographic” data is there (number of words, types of things included – images, links, attachments, media, etc…), and if I combine those, I get something like this:

On this rough mockup visualization, time is the vertical axis, transformed into a simple “number of days” integer. The horizontal axis is “threads of discussion.” This displays the discussion in a “FAQ” discussion board used in the course. There were 9 primary threads (plus one forked thread).

Each circle represents a post. The size of the circle represents the number of words in a post or response – in this mockup, I just did a simple conversion where the number of words directly translated into the width of the circle (a post with 100 words is 1.00″, a post with 50 words is .50″, a post with 150 words is 1.50″ etc…). The colour of the circle indicates the person who posted it. White circles are the instructor. Black circles are anonymous students (who did not provide consent to participate in the research, so the content of their posts was deleted from my working archive), and other colours indicating individual students.

This is a very rough mockup. I’m hoping to refine it a bit more, to include a way to represent the CoI coding for each message – an indicator of the relative social/cognitive/teaching aspect of the post, as well as a way to indicate other interesting things about a post (how many images/links/attachments/embedded media were included? etc…)

Problems with the mockup:

It’s messy when posts occur close together. Overlap makes the circles obscure each other.
The literal translation of wordcount to size means larger posts overwhelm the other posts in the diagram, in a way that over-represents the difference as seen in the actual discussion (a post that is 5x the size of another post doesn’t necessarily drown out the other posts, but it is given prominent emphasis in the diagram…)
Forking of threads could get confusing – how to best indicate the branch points? I tried with a dotted line, but it’s unclear which post/circle it originates from…
threads that are displayed beside each other may not be directly related, but they may appear to be intertwined because of the overlap of circles (a large post in thread 6 overlaps threads 5 and 7, etc…)

I’d like to extend the mockup, after figuring out ways to get around these issues, to show all posts in all discussions in the entire course. It should be interesting to see the temporal overlap between discussions, and see some data about patterns of interaction from participants across the entire thing – does a given participant start most threads? do they respond with giant posts? do they stay in one CoI aspect, or do they cover the whole thing? etc…

I would love to see a large visualization, with vertical lanes for each thread in an entire course, across all venues of online discussion, with posts displayed as shown above, and with the CoI coding indicated. What better way to compare activity across discussions in a course?

It strikes me that this visualization is extremely simple – perhaps too simple? perhaps so obvious in hindsight that someone else has already come up with a solution? Scott Leslie sent me a link to Boardtracker, which looks extremely interesting, but it looks like it’s strictly based on time and not threads, and doesn’t appear to handle representing individual contributions. Also, it appears to be under construction…

update: I was thinking about the overly-large-circle problem, and wondered what the diagram would look like if it was laid out more like an autoradiogram, with opacity of a block indicating the “size” of a contribution, and symbols overlaid to represent data like contributor and potentially coding info…

Size of contribution (wordcount) is the opacity of each block. The coloured circle represents the contributor (white is instructor, black is anonymous, etc…) This representation makes it harder to see at a glance, but probably displays the conversation more accurately.

update 2: working in some of Tim’s suggestions via his comment, I came up with this version. It’s a little closer to Rosling’s work. Now, I need to figure out how to indicate the CoI coding for each post…

update 3: I put all of the metadata from the Blackboard discussions, and one WordPress site, into OmniGraphSketcher to see what it would look like. Some interesting things become apparent:

Blackboard posts (and responses) are circles, WordPress posts (and comments) are diamonds. At a glance, discussion board interactions appear to be briefer – fewer words – and more immediate (posts usually occur within a few days, and then stop). Blog posts appear to be longer (more words), and extend conversation over a longer period – with several days being common between post and comment. The WordPress blog posts also appear to have elicited longer responses via comments (at least in the first WordPress site I entered data for…)

Visualization tools that may be useful:

SNAPP – works with major LMS applications, but appears to not like our old version of Blackboard (Bb8), and doesn’t grok WordPress, so couldn’t be used to visualize my entire data set.
Meerkat – sounds like it might support custom data imports. I’ve signed up for an account so I can try it out.
AGNA
DiscoverText

Notes: Jyothy, McAvinia & Keating: A visualisation tool to aid exploration of students’ interactions in asynchronous online communication

Jyothi, S., McAvinia, C., & Keating, J. (2012). A visualisation tool to aid exploration of students’ interactions in asynchronous online communication. Computers & Education, 58(1), 30–42. doi:10.1016/j.compedu.2011.08.026

Abstract

This paper describes a visualisation tool to aid the analysis of online communication. The tool has two purposes: first, it can be used on a day-to-day basis by teachers or forum moderators to review the development of a discussion and to support appropriate interventions. Second, the tool can support research activities since the visualisations generated provide the basis for further qualitative and quantitative analysis of online dialogue.

The visualisation software is designed to encode interaction types simply and quickly. The software was tested and then used to analyse data from a sample of forums within the Moodle VLE. The paper discusses both the method of visualisation and analysis of the online interactions as a pilot for further research analysing interaction in discussion forums.

Intro

This paper describes the design and implementation of a diagnostic tool which provides simple visual representations of the exchanges in asynchronous discussion forum threads. The visual representation is shown within a webpage, with hyperlinked nodes displaying the body text of messages posted to discussion forums. These graphical images might assist a teacher or moderator to intervene in the discussions whenever necessary, and the visual representations of online discussions can support researchers undertaking further analysis1.

Analysing asynchronous discussions in online environments

Given the importance ascribed to dialogue and CMC in educational theory, it follows that a means of reviewing and potentially analysing CMC interactions would therefore be useful to teachers and researchers, and research would benefit from an evidence base showing that online interactions had positive effects on students’ learning. However, the best ways of analysing CMC are not clear. Studies that have analysed the content of the online discussions are also limited. This may be due to the time required to perform such analyses (Hara, Bonk, & Angeli, 2000) and the lack of a reliable instrument or an analytical framework to analyse the online discussions. As Goodyear (2001) notes:

Analysing the content of networked learning discussions is a troublesome research area and several commentators have remarked on the difficulty of connecting online texts to discourse to learning. (Goodyear, cited Mehanna 2004: 283)

on assessing online discussions:

Formal assessment offers one indication of students’ learning, and online dialogue may then be argued to have supported this. However, unless the method of assessment includes the forum discussion in some way, it is not usually clear where and how learning in forums may have happened. Course feedback and evaluation mechanisms, similarly, may highlight the use of discussion forums as a useful supplement or yield examples of how students have used them, but ‘use’ cannot be equated with learning. Some researchers have instead proposed treating forum messages as qualitative data, and thereby draw on qualitative methods for analysis.

why build a tool to automate analysis/visualization of discourse?

Even for people accustomed to using qualitative methods as part of their research activities, they may be time-consuming to use in the context of evaluating learning in CMC. The methodological difficulties of analysing discussion forum data are therefore compounded by the practical constraints of time and experience. These issues have wider implications for the evidence base in e- learning: it is difficult to build up case studies of appropriate and effective use of technology to enhance learning, where practitioners lack the tools to make these studies.

Screen Shot 2011 11 19 at 1 52 35 PM

So, VIMS looks pretty awesome at this… Unfortunately, I can’t seem to find a fracking thing about the tool itself…

WTF is VIMS? No project website found, but the paper describes it:

VIMS provides real-time, radial-tree visualisation of the forum interactions, realised using a combination of SVG (Scalable Vector Graphics) using Perl with JavaScript. Visualisation maps are presented as interactive scalable images, viewable using most web browsers; the version described here can be seamlessly incorporated into Moodle. The technologies combined in VIMS allow the visualisation to have ‘hot spots’, on which the mouse can hover to access full details of a message. There is a continuous link between the image and the web server, implemented using AJAX, which means that the visualisation will change according as new messages are sent to the forum. An algorithm within the software depicts borders, differentiating between the threads of a discussion forum.

and the visualizations look something like:

Screen Shot 2011 11 19 at 1 58 34 PM

on the role of VIMS:

VIMS has considerable advantages as a visualisation tool. First, the discussions are shown in a systematic way, with the people starting the discussion placed at the first level. There is no on-screen clutter from message text and all threads in a discussion forum can be viewed at a glance. Navigation on-screen allows the discussion to be viewed as a whole, or for the viewer to zoom in on certain areas. One or more threads can be compared easily. This visual aid could help the instructor develop a collaborative environment, by aiding him/her to visualise the active and inactive participants, and therefore inform appropriate interventions.

It is important to acknowledge the limitations of the VIMS tool too: it is in essence a support for coding and management of the data, rather than offering in and of itself a new method for analysing that data. For such analysis, we need to consider the wider model used by Schrire or indeed to pursue existing qualitative methods. VIMS does not yet allow us a way to analyse the multi-modal nature of the student discourse in unmoderated Forums, and the inclusion of images, sounds and other media which students are now accustomed to using. This is a further area of work we need to address, but one for which the other visualisation tools described in this paper are (similarly) unsuited.

Lots of other interesting papers cited in this one. Mine it.

But, I don’t understand how VIMS doesn’t appear to have a project website or information available. Is it secret sauce?

I’m wondering if this might be a useful way to display the discourse in the data I’m gathering… [↩]

journalism in the age of data

via [a tweet from Clarence Fisher](http://twitter.com/glassbeed/status/25838927262), this interesting documentary on data visualization and journalism. I’m wondering how the concepts translate into other fields…

tweetcloud: dnorman

Apparently, my Twitter account became the primary stress test for the cool Tweetcloud service, which crunches through every tweet posted for a given account, and generates a cloud of words ranked by frequency. Although I’ve been posting to Twitter like a madman today, they were actually able to get it to crunch my account:

One thing that surprised me: I was sure “fracking” would be the #1 word, followed shortly by WTF. Surprise!

Thanks to John Krutsch and Jared Stein for their work on beefing up Tweetcloud to be able to handle the sheer scale of my self-absorbed banality.