the semblance of objectivity in numbers

I just received my first ever first-authored conference paper rejection from FSE. The primary reasons, quoted from the reviews, include:

  • “The qualitative nature of the study … is liable to misinterpretation and bias.”
  • “I was expecting a quantitative analysis: is there any correlation between some of the characteristics and between [the results] and the time a bug takes to resolve and its resolution status?
  • “I would have thought that what types of elements to look for in discussion should be decided before by the researchers as it should be based on the problem”
  • “I was expecting concrete advice on HOW the tools should structure the discussion.”

I was hoping the reviewers would have been more epistemologically informed. For example, the first and second quotes are quite telling: they imply that some forms of empiricism are not subject to misinterpretation or bias. But quantitative empirical measures are just as subject to bias as any other measure. For example, if I had counted certain kinds of data and run correlations between these counts and other outcome measures, not only would one in twenty of them be “statistically significant” by chance, but whether there was any real meaning in the variables depends on the construct validity of the quantitative measurements. For example, if I had correlated hyperboles with bug resolution time, not only would the hyperbole measure have the same limitations as it did as a qualitative classification, but the bug resolution time would have any number of contextual factors that could influence its true reflection of the hyperbole’s impact on consensus. Transforming empirical observations into numbers does NOT make them objective, nor does it prevent bias and misinterpretation.

The third quote is ironic: this reviewer seems to believe that the only way to analyze a problem is to make some assumption about its nature upfront. The whole point of qualitative research is that the more you make upfront assumptions, the more you bias your findings. What this reviewer is proposing would have lessened the objectivity of the results and prevented us from uncovering the trends we did.

The last quote reveals the systemic bias in software engineering research (and also some HCI venues): qualitative studies are only valuable if they explicitly inform design. What this really reduces to is a view that material goods are real work, but the production of knowledge comes for free. Building a system or automating some activity, even if the system and automation are entirely impractical in the real world, is more valuable than understanding the real world. The comment also reveals the reviewer’s lack of understanding about design: innovations don’t come from studies, they come from people. Studies can support design decisions (and the results throughout our rejected submission have been quite valuable in our current design efforts), but they cannot generate ideas. People generate ideas.

Had I really wanted the paper in, I would have littered the submission with arbitrary, but seemingly objective quantifications and correlations of our data (which is what most quantifications are in software engineering papers). This has worked in past papers and is a tried and true workaround for the software engineering community’s lack of experience with qualitative methods. Reviewers would have thought, “I don’t get all of this qualitative stuff, but these numbers are great.” I decided not to do this on principle, since doing so would have only made the results seem more objective without adding any real objectivity.

So much for principle. Time to start correlating things!

10 thoughts on “the semblance of objectivity in numbers

  1. Pingback: Bad surveys « Catenary

  2. I have been a member of the PC that so frustrated you; and as future ESEC/FSE program chair, I am concerned about your implications. I cannot confirm a systemic bias against qualitative research (nor against quantitative research, for that matter). What I am looking for in a PC is causation, not correlation; and for causation, you not only need a well-established (quantitative) correlation, but also a (qualitative) theory on why things are as they seem. And the reason we’re looking for causation is that indeed, we’d like results to be actionable — such that people can change the cause in order to influence the effect. (In another sense, these papers carry information rather than just data.)

    Few papers excel in establishing causality, but they tend to dominate papers that are less actionable. And if a paper provides only the initial steps towards further research, it will be dominated by research that has gone all the way.

    (I don’t have access to your submission or its reviews, as I have a conflict with UW; so none of this need to apply to your paper. But if you send the material to me, I’ll be happy to give you some extra advice.)

    Best — Andreas

    • Thanks for your insights Andreas. I completely agree about the importance of establishing some form of causation. Software engineering, as a design discipline, is about changing and improving practice, and understanding the causal relationships between the variables we see in practice is the most reliable way to impact practice. I fully support this perspective.

      Unfortunately, I’ve found the software engineering research community to have a very limited notion causality. First, many of the reviews I’ve seen seem to think that one can *establish* causality. Empiricism can do no such thing. We can only gain confidence in it. And in fact, as any good epistemologist will tell you, most of our confidence comes from convergent validity: using a variety of methods and measures to study the same phenomenon and finding the same underlying truth through each lens. If the software engineering research community continues to limit publishable methods to quantitative empiricism, we will have a very skewed (and I would argue shallow) understanding of practice.

      And its not like the quantitative empiricism in software engineering research is that good (but it is getting better). Testing theories and demonstrating causality requires experiments and replications of experiments. These rarely get published because of concerns that there’s nothing new being established. A well-designed study with a non-result is often quite important in confirming our understanding of the world. Without these works, we practice weak empiricism.

      Yet there is larger, more systematic problem in what work the community values. Currently, the community only incentives work that *tests* theories, but it has little interest in (qualitative) work that generates new new theories? The whole point of qualitative research is to rigorously extract new theories from empirical observation. This was the goal of my paper, but the reviewers were of the mind, “These theories are great and we will use them to inform tool and process design, but I don’t see any real work here.” Apparently, three months of rigorous analysis of over 10,000 bug report comments isn’t work. Unless those analyses involved numbers 🙂

      I think its perfectly reasonable have work that “goes all the way” dominate, especially work that transforms our understanding of how to improve software engineering. But so much of this work lacks any substantive shift in our view of how software engineering work happens and how it can be improved. These shifts in understanding don’t come from narrow experiments or clever automations. They come from deep, rigorous analysis of what makes software engineering challenging.

      This brings me to a larger critique of PCs and conferences in general. Because our work is archived in conferences, which has limited slots, the goal of a PC is unfortunately to choose the “best” of the submitted work as opposed to work that is above some level of quality. If we were a discipline that was interested in creating generalizable, broad knowledge about software engineering, we would structure our modes of dissemination to ensure that all high quality knowledge is spread to the larger community. Currently, the artificial cutoffs imposed by conference venues means that we only spread the most conservative of high quality work. Work that applies new methods or tests a plausible but exotic theory rarely finds a home.

      I’m actually not bitter; I predicted the paper would be rejected for the methods it used. I just hope for a day where software engineering research uses a wider variety of methods to understand and impact software engineering work. From my view, it’s still in its infancy from a scientific perspective, with most researchers confusing “qualitative” with “subjective”.

  3. Your third point that has bothering me for the past month: are studies worthwhile if they fail to inform design?

    My current work has no clear path for informing design (right now). In the long term, it may, but I can’t articulate how at the moment.

    Is it worth pursuing? I think so. But, without articulating a clear motivation in the form of motivating design, then how can I sell it to a committee or funding source?

    • I think the issue is that the implications of studies can always have implications, but don’t always have implications for design. Sometimes the implications are for process, organization, or training. I think the problem is that most SE conferences don’t really view these other forms of implications as interesting or relevant.

  4. I suspect that ESEM would have been more receptive to this work, especially given Carolyn Seaman’s involvement in the community. Admittedly, ESEM is not considered as prestigious as FSE, but you get to maintain your principles!

    • That’s a good suggestion. We just submitted a revised version to CSCW today, but ESEM would be a good place if that doesn’t work out. Thanks!

    • That’s what I usually do. This time I didn’t do it out of principle, since it shouldn’t be necessary 🙂 So much for principle, back to pragmatism.

  5. Pingback: Andy Ko and the semblance of objectivity in numbers « Catenary

Leave a Reply

Your email address will not be published. Required fields are marked *