chainsawriot

Reflections of my 2018 ICA conference #2: How to be a better computational researcher?

Posted on May 31, 2018 by Chung-hong Chan

firststep

Computational methods. What a beautiful name. I do that for a living. I think it is the coolest group in the ICA. People who hangout in this group are really smart and they have a problem solving mentality. They know how to solve some hard problems in communication science. Computational methods as a branch is arguably one of the most important branch in communication research right now. Due to its importance or popularity, other ICA divisions also got some sessions devoted to computational methods during the conference. I have attended the computational methods sessions organised by other divisions. To me, these sessions should be better integrated into the program of the computational methods interest group. So that researchers with experience in computational methods could review those papers. But of course due to administrative reasons, they were not integrated.

We as communication researchers can of course borrow whatever methods developed by the computer scientists or statisticians. They are cool people. But why do we can’t just adopt whatever newest methods the CS or Stat people developed into our field of communication research? Before I give my own answer to that, I hope you can think a bit first. Here, I would like to give you a test. Probably a master-level research method test.

Case 1: I read a WSJ article with the keyword “Merkel” in it, can I say the frame of this WSJ article is Angela Merkel?

I don’t know about you but if I write something like that and submit my paper to more traditional divisions such as political communication or journalism studies, I will certainly be rejected. The reason for rejection is simple: my operationalisation is incorrect. Frame is a theoretical construct and shouldn’t be operationalised simply using keywords and topics. ¹ Right?

So, here come the case 2.

Case 2: Suppose you are using TF-IDF and topic model to extract a number of topics from your WSJ corpus. By judging the topic words and some top documents from each topic, you think there should be topics about Angela Merkel. And then you interpret the topics you found as frames. Do you think it is okay?

To me, it is not okay. It becomes more apparent when you try to compare the case 1 and case 2. But when communication researchers are presented only with the case 2, the camouflage generated by the computational methods short-circuited our critical thinking, especially for researchers who are unfamiliar with those computational methods. The innovative factors somehow override the requirements for operationalisation of theoretical constructs with validity and reliability. Computer scientists are still arguing whether or not topics from topic model are human-interpretable². The short-circuited thinking asserts topics are human-interpretable, and also can be interpreted as frames.

It is not only for text analysis, but also for network analysis. For example, when one observed or not observed community structure on Twitter communication network, can we say we have the evidence for or against the existence of filter bubble? If we dig deeper into the definition of filter bubble by Pariser ³, he talks about filtering of “other sides” by algorithmic personalisation. The active avoidance of hearing from the “other sides”, which involves users’ agency, is not filter bubble. I mean, you can call that something else, but filter bubble is not a correct descriptor. If we need to use community structure on Twitter as an evidence of filter bubble, we need to first prove all interactions on Twitter are driven by algorithm with no users’ agency involved. This assumption is 9000% wrong and once again, computational methods short-circuit us.

Now, we come back to the question why we cannot adopt whatever newest methods developed by the CS and Stat people. To me, the reason is simple: methods they developed are in general gearing towards solving real life problems rather than measuring any theoretical constructs. A good computational communication researchers should always be mindful of that. This mindset can help us not to overinterpret the findings. This skepticism towards computational methods is healthy. I think a better approach to review substantive papers with computational methods is to do a thought experiment on a parallel analogue version like the aforementioned “case 1”. It is even more important if someone claims to measure any theoretical constructs (such as frames) using computational methods. The classical psychometric theories still apply. We still demand validity and reliability of a measurement. Computational methods are not your silver bullets. In the end, topics are topics. Communities are communities. Don’t extrapolate. It is like a computer is still a machine that does calculations, not a machine that distorts reality.

Entman (1993) defines frame as “select some aspects of a perceived reality and make them more salient in a communicating text, in such a way as to promote a particular problem definition, causal interpretation, moral evaluation, and/or treatment recommendation for the item described.” I don’t think he says frame is simply a bunch of keywords or topics. ↩
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Advances in neural information processing systems (pp. 288-296). ↩
In his book or his TED talk ↩