the semblance of objectivity in numbers

I just received my first ever first-authored con­fer­ence paper rejec­tion from FSE. The pri­mary rea­sons, quoted from the reviews, include:

  • The qual­i­ta­tive nature of the study … is liable to mis­in­ter­pre­ta­tion and bias.”
  • I was expect­ing a quan­ti­ta­tive analy­sis: is there any cor­re­la­tion between some of the char­ac­ter­is­tics and between [the results] and the time a bug takes to resolve and its res­o­lu­tion status?
  • I would have thought that what types of ele­ments to look for in dis­cus­sion should be decided before by the researchers as it should be based on the problem”
  • I was expect­ing con­crete advice on HOW the tools should struc­ture the discussion.”

I was hop­ing the review­ers would have been more epis­te­mo­log­i­cally informed. For exam­ple, the first and sec­ond quotes are quite telling: they imply that some forms of empiri­cism are not sub­ject to mis­in­ter­pre­ta­tion or bias. But quan­ti­ta­tive empir­i­cal mea­sures are just as sub­ject to bias as any other mea­sure. For exam­ple, if I had counted cer­tain kinds of data and run cor­re­la­tions between these counts and other out­come mea­sures, not only would one in twenty of them be “sta­tis­ti­cally sig­nif­i­cant” by chance, but whether there was any real mean­ing in the vari­ables depends on the con­struct valid­ity of the quan­ti­ta­tive mea­sure­ments. For exam­ple, if I had cor­re­lated hyper­boles with bug res­o­lu­tion time, not only would the hyper­bole mea­sure have the same lim­i­ta­tions as it did as a qual­i­ta­tive clas­si­fi­ca­tion, but the bug res­o­lu­tion time would have any num­ber of con­tex­tual fac­tors that could influ­ence its true reflec­tion of the hyperbole’s impact on con­sen­sus. Trans­form­ing empir­i­cal obser­va­tions into num­bers does NOT make them objec­tive, nor does it pre­vent bias and mis­in­ter­pre­ta­tion.

The third quote is ironic: this reviewer seems to believe that the only way to ana­lyze a prob­lem is to make some assump­tion about its nature upfront. The whole point of qual­i­ta­tive research is that the more you make upfront assump­tions, the more you bias your find­ings. What this reviewer is propos­ing would have less­ened the objec­tiv­ity of the results and pre­vented us from uncov­er­ing the trends we did.

The last quote reveals the sys­temic bias in soft­ware engi­neer­ing research (and also some HCI venues): qual­i­ta­tive stud­ies are only valu­able if they explic­itly inform design. What this really reduces to is a view that mate­r­ial goods are real work, but the pro­duc­tion of knowl­edge comes for free. Build­ing a sys­tem or automat­ing some activ­ity, even if the sys­tem and automa­tion are entirely imprac­ti­cal in the real world, is more valu­able than under­stand­ing the real world. The com­ment also reveals the reviewer’s lack of under­stand­ing about design: inno­va­tions don’t come from stud­ies, they come from peo­ple. Stud­ies can sup­port design deci­sions (and the results through­out our rejected sub­mis­sion have been quite valu­able in our cur­rent design efforts), but they can­not gen­er­ate ideas. Peo­ple gen­er­ate ideas.

Had I really wanted the paper in, I would have lit­tered the sub­mis­sion with arbi­trary, but seem­ingly objec­tive quan­tifi­ca­tions and cor­re­la­tions of our data (which is what most quan­tifi­ca­tions are in soft­ware engi­neer­ing papers). This has worked in past papers and is a tried and true workaround for the soft­ware engi­neer­ing community’s lack of expe­ri­ence with qual­i­ta­tive meth­ods. Review­ers would have thought, “I don’t get all of this qual­i­ta­tive stuff, but these num­bers are great.” I decided not to do this on prin­ci­ple, since doing so would have only made the results seem more objec­tive with­out adding any real objectivity.

So much for prin­ci­ple. Time to start cor­re­lat­ing things!