Emerson Murphy-Hill interviews me (part 1)

About a month ago, Emer­son Murphy-Hill (cur­rently a post-doc at UBC) asked if he could inter­view me about the chal­lenges of doing HCI research about Soft­ware Engi­neer­ing (and vice versa). I’ll post our inter­view in two parts: the first, listed here, cov­ers HCI and soft­ware engi­neer­ing research and the sec­ond cov­ers pub­lish­ing in HCI venues.

Q: What are the biggest dif­fer­ences between the HCI and soft­ware research?

Both pur­suits are very problem-driven: we want things that work, that demon­stra­bly solve issues, and move prac­tice and our under­stand­ing of prac­tice for­ward. Not only that, but both pur­suits are largely inter­ested in solv­ing the same prob­lem: we want to find ways of cre­at­ing soft­ware and tech­nol­ogy that achieves its require­ments and ulti­mately serves cus­tomer and user needs.

Where HCI and soft­ware research dif­fer are in their meth­ods. Most HCI research focuses on under­stand­ing the con­text being designed for and using this under­stand­ing to drive inno­va­tion. Soft­ware research often works the other way around, seek­ing inno­v­a­tive tech­no­log­i­cal approaches to well-established prob­lems, but not often doing for­ma­tive research to dis­cover new prob­lems. The other dif­fer­ence between the two pur­suits is the breadth of their method­olog­i­cal tool­boxes. HCI researches will use what­ever method is appro­pri­ate for a research ques­tion, whether that means con­trolled exper­i­ments, inter­views, ethno­graphic field work, or any num­ber of the­o­ret­i­cal frame­works and for­malisms. Soft­ware research tends to be more restricted method­olog­i­cally, focus­ing mainly on first order logic and quan­ti­ta­tive empiri­cism. In my view, this restricted set of epis­te­mo­log­i­cal tools pre­vents soft­ware researchers from see­ing the larger con­text of the prob­lems we try to solve.

Q: It sounds like you are sug­gest­ing that soft­ware research could make more of an effort to do for­ma­tive research to find prob­lems. Are there any areas in soft­ware research that you think that we don’t truly under­stand the nature of the problems?

I think that soft­ware research, as a whole, severely under­es­ti­mates the role of com­mu­ni­ca­tion, coor­di­na­tion, and man­age­ment in suc­cess­ful soft­ware projects. The impor­tance of this fac­tor has been claimed for decades and stud­ied infre­quently for nearly 20 years, but it plays such a minor role in most soft­ware engi­neer­ing research. This is sur­pris­ing, since most other engi­neer­ing dis­ci­plines focus heav­ily on the actual human process of engi­neer­ing dif­fer­ent goods.

Another area that is under­stud­ied is the notion of require– ments and what they actu­ally mean. Ulti­mately, the role of soft­ware in human­ity is to sup­port human­ity, but more often then not, soft­ware engi­neers let the medium, rather than human­ity, dic­tate the design. There are inter­est­ing con­nec­tions between Require­ments Engi­neer­ing and HCI, in that both seek to elicit require­ments, but using dif­fer­ent meth­ods. I’ve seen very lit­tle work that bridges these dif­fer­ent approaches to soft­ware design. Gen­er­ally, most soft­ware research focuses on “get­ting the design right” rather than “get­ting the right design.”

Q: I noticed that you men­tioned that soft­ware research has begun to use quan­ti­ta­tive empiri­cism, but did not men­tion qual­i­ta­tive empiri­cism. Do we not use both?

In my expe­ri­ence, soft­ware engi­neer­ing researchers are highly skep­ti­cal of qual­i­ta­tive meth­ods. It is quite rare to see a paper at a top con­fer­ence that uses qual­i­ta­tive meth­ods exclu­sively. I’ve per­son­ally received reviews that sug­gested I con­vert my qual­i­ta­tive obser­va­tions into num­bers in order to make them more objec­tive (which, epis­te­mo­log­i­cally, is both inef­fec­tive and naive). Unfor­tu­nately, there are some ques­tions for which quan­tifi­ca­tion is insuf­fi­cient. For exam­ple, if you want to know how a soft­ware devel­op­ment team selects which bugs to address first, what do you mea­sure? Some objects of study are processes and activ­i­ties, with no sin­gle mea­sur­able dimension.

I under­stand why the com­mu­nity is skep­ti­cal; we come from quan­ti­ta­tive tra­di­tions; most soft­ware engi­neer­ing researchers only have a vague idea of what soci­ol­o­gists and anthro­pol­o­gists do. It’s unfor­tu­nate that at the moment, to pub­lish research about inher­ently qual­i­ta­tive phe­nom­e­non, one has to cre­ate arti­fi­cial and unhelp­ful mea­sure­ments of phe­nom­e­non to make it palatable.

Q: Do you have any rec­om­mended read­ing for doing research in HCI?

There’s no small set of read­ing that would suf­fice. HCI spans over 50 years, 20 con­fer­ences, and at least 20 jour­nals. CHI, the lead­ing HCI con­fer­ence, is the sec­ond largest ACM con­fer­ence, sec­ond only to SIGGRAPH. Decid­ing what to read can be daunt­ing and there’s really no way to reduce its complexity.

Instead, I’d sug­gest that learn­ing to do research in HCI is more about choos­ing which meth­ods you want to excel at. Per­son­ally, I focus on user inter­face design, eval­u­a­tion, and empir­i­cal research, and that only cov­ers a small sub­set of the kinds of meth­ods that peo­ple use in HCI.

I can rec­om­mend some books, which offer some per­spec­tive on the mind­set of HCI researchers. For exam­ple, Bill Buxton’s Sketch­ing User Expe­ri­ences [2] is a fan­tas­tic look at what it means to design sys­tems and how find­ing the right design (the HCI part) can greatly sim­plify get­ting the design right (the soft­ware engi­neer­ing part). I very much sub­scribe to his per­spec­tive (which isn’t sur­pris­ing, since Bill unof­fi­cially advised my advi­sor, Brad Myers, at Toronto).

Q: At what point in the research process should researchers con­sider what venue to sub­mit to? Should the venue influ­ence how you con­duct your research?

There are def­i­nitely more expe­ri­enced peo­ple to ask than me! But per­haps I have a fresh per­spec­tive on the issue, as I strad­dle the bound­ary between the HCI and soft­ware engi­neer­ing worlds. Ide­ally, researchers would select impor­tant, inter­est­ing prob­lems and pub­lish the work when its done. The venue should only mat­ter once one knows what the con­tri­bu­tion is and which com­mu­ni­ties would appre­ci­ate it.

Unfor­tu­nately, the con­fer­ence cul­ture in both HCI and soft­ware engi­neer­ing tends to incen­tivize work of lim­ited and con­ser­v­a­tive scope. This has been dis­cussed across a vari­ety of venues, includ­ing HCI arti­cles and con­fer­ence pan­els, as well as sev­eral ICSE keynotes and papers. That, and a lot of good work gets rejected because its not yet fully formed. I believe that jour­nals, with their mul­ti­ple rounds of review and lack of dead­lines, offer a health­ier process with which to vet and dis­sem­i­nate aca­d­e­mic research.

Q: Tasks used in HCI stud­ies often appear to require lit­tle domain exper­tise and can be con­ducted in a short amount of time whereas soft­ware stud­ies often require sub­stan­tial domain exper­tise and can be dif­fi­cult to struc­ture to com­plete in a short amount of time. Is this state­ment true in your expe­ri­ence and if so, how have you man­aged the issue?

I don’t think this is a fair char­ac­ter­i­za­tion of HCI research. In the past, HCI has focused a lot on novice tasks, partly because user inter­faces were so bad; there is also a sub­set of HCI research that focuses in input tech­niques, which are more amenable to exper­i­men­ta­tion because of the more lim­ited vari­ance in human motor per­for­mance. But in the past few decades, there’s been a broad focus in HCI on sup­port­ing experts and expert team­work in a vari­ety of domains. Design­ing stud­ies to sup­port these activ­i­ties are just as or more dif­fi­cult than design­ing stud­ies to eval­u­ate soft­ware tools. This is one rea­son why HCI has adopted so many other kinds of method­olo­gies: one can’t design a con­trolled exper­i­ment to learn how first-responders use cell phones to coor­di­nate. We have the same chal­lenges when design­ing con­trolled exper­i­ments to learn about coor­di­na­tion in soft­ware teams.

I deal with this chal­lenge in my own work in a few ways. First, like researchers in all other empir­i­cal fields, I care­fully design my mea­sure­ments, stat­ing their lim­i­ta­tions and poten­tial con­founds, and then move for­ward despite the threats. The ulti­mate prod­uct of any empir­i­cal work is not the one per­fectly designed study, but a large col­lec­tion of stud­ies that repeat­edly demon­strate con­sis­tent and con­ver­gent results across a vari­ety of con­texts and with a vari­ety of oper­a­tional­iza­tions. There is still an atti­tude in soft­ware engi­neer­ing research that a sin­gle study should suf­fice; we need to move away from that view and start to plan for decades of study and exper­i­men­ta­tion on fun­da­men­tal issues.

Another way that I deal with this chal­lenge is to design stud­ies that explain how what my tools do for peo­ple and how they do it. For exam­ple, I’m design­ing a study at the moment with James Fog­a­rty and Kayur Patel to eval­u­ate how their inte­grated clas­si­fier devel­op­ment envi­ron­ment helps devel­op­ers find bugs. The goal of the study is less about show­ing a dif­fer­ence in suc­cess (since suc­cess in the real word depends on too many other fac­tors) and more about explain­ing what the tool does dif­fer­ently than con­tributes to devel­op­ers’ suc­cess. To do this, we’re ask­ing par­tic­i­pants to ver­bally state changes in their goals, and asso­ci­at­ing these shifts in goals with the use of dif­fer­ent parts of the tool. This way, the study result is not “par­tic­i­pants were more suc­cess­ful,” but “par­tic­i­pants were more suc­cess­ful because they spent more time con­firm­ing fewer hypothe­ses.” This is the kind of knowl­edge that helps design other debug­ging tools.

Q: So do you think that any parts of HCI or soft­ware research will have a last­ing impact?

Well, this is a con­tro­ver­sial topic within HCI, but I per­son­ally believe that there is fun­da­men­tal HCI research and then there are appli­ca­tions of HCI meth­ods (which are actu­ally the meth­ods of other com­mu­ni­ties, such as cog­ni­tive psy­chol­ogy, anthro­pol­ogy, and design). My body of work, for exam­ple, is largely an appli­ca­tion of HCI meth­ods to the prob­lems of soft­ware engi­neer­ing prac­tice. I view the core areas of HCI as input and out­put devices and any­thing else hav­ing to do with feed­back and inter­ac­tiv­ity. This lat­ter cat­e­gory has and will con­tinue to have last­ing impact.

Soft­ware engi­neer­ing, like HCI, has made sev­eral foun­da­tional con­tri­bu­tions to prac­tice, such as ver­sion con­trol, lim­ited forms of model check­ing, com­pil­ers, debug­gers, and devel­op­ment envi­ron­ments. How­ever, many of the coor­di­na­tion, plan­ning, and man­age­ment aspects of soft­ware engi­neer­ing have moved along largely with­out the help of research. I think the chal­lenge for soft­ware engi­neer­ing research is to rec­og­nize that many of the fun­da­men­tal chal­lenges in prac­tice are human chal­lenges, and that many basic soft­ware engi­neer­ing tools must be designed with these chal­lenges in mind.

One philo­soph­i­cal issue sur­round­ing the future of both applications-driven HCI research and soft­ware engi­neer­ing is whether the domains we study and design for are mov­ing tar­gets. Psy­chol­ogy, med­i­cine, and the nat­ural sci­ences oper­ate under the assump­tion that peo­ple and nature don’t change in their fun­da­men­tal nature (or at least very quickly). This makes it pos­si­ble to advance knowl­edge with empir­i­cal study over the course of 100 years. Can we make the same assump­tions for the nature of coor­di­na­tion in soft­ware devel­op­ment? Are there really fun­da­men­tal, unchang­ing aspects of soft­ware engi­neer­ing prac­tice, or are all of the chal­lenges we observe ephemeral? This is an open ques­tion that nei­ther HCI or soft­ware research have begun to address.

Q: You men­tion that doing HCI stud­ies are hard. How might one get started doing an empir­i­cal eval­u­a­tion for the first time, con­sid­er­ing both the need to get use­ful results and the high like­li­hood of mak­ing a mistake?

To really get good at empir­i­cal eval­u­a­tion, a lot of things are nec­es­sary. First, find an expert at empir­i­cal eval­u­a­tion who is inter­ested in apply­ing their skills out­side of their con­tent area. These might be sta­tis­ti­cians, exper­i­men­tal psy­chol­o­gists, or researchers in pol­icy depart­ments. Sec­ond, get a good book about epis­te­mol­ogy: there’s no end of gen­tle intro­duc­tions to the power and per­ils of mea­sure­ment. I rec­om­mend The Num­bers Game [1] for an intu­itive sense of the com­plex­ity of mea­sur­ing things. The key thing is to learn to be extremely skep­ti­cal about the valid­ity, reli­a­bil­ity, and seman­tics of measurement.

The rest of the chal­lenge is know­ing your audi­ence. Do you really need an exper­i­men­tal study to sup­port your claims? Or would find­ing one per­son to adopt your tool for a week suf­fice? Do you really need to demon­strate causal­ity, or are there other more press­ing ques­tions that might be inter­est­ing to inves­ti­gate? There are lots of ways to gain con­fi­dence that your design choices were good by some measure.

what’s surprising?

A com­mon com­plaint of research is that it’s not “sur­pris­ing.” For exam­ple, a reviewer might say, “The study was well done, but the results weren’t really that sur­pris­ing.”, or, “I found the results a bit predictable.”

But what do these state­ments really mean? Do they mean, “Had you asked me the research ques­tion, I could have guessed the results with some degree of con­fi­dence.”? Or, “If you asked your research ques­tion of 100 experts, 95 of their guesses would have been right.”?

Maybe we might intend for them to mean that, but they don’t actu­ally cap­ture what hap­pens when a reviewer reads a paper. What usu­ally hap­pens is the reviewer reads the research ques­tion and thinks, “Hm, I could guess, but I’m not sure.” Then, upon read­ing the results, the reviewer thinks, “Well of course, that’s not sur­pris­ing at all.” The test exe­cuted here is not whether an expert can con­fi­dently pre­dict the answer to a research ques­tion, but whether in hind­sight it seems plau­si­ble that an expert could have guessed the result.

In this sense, what makes a result “sur­pris­ing” has less to do with what we know as sci­en­tists and more to do with what we think we know about what other researchers know.

This social fab­ric that appar­ently under­lies our judge­ments of what is known has other inter­est­ing effects on what is accepted as advanc­ing knowl­edge. For exam­ple, that some find­ing has not been pub­lished, is rarely a sat­is­fac­tory argu­ment for why some­thing should be pub­lished. What under­lies this belief it that it is not our goal as sci­en­tists to doc­u­ment every­thing that we know. Instead, it is our job to doc­u­ment the sub­set of what we know that is inter­est­ing, impor­tant, and surprising.

But aren’t most judge­ments of what is inter­est­ing and impor­tant are grounded in the present? How are we to know what is inter­est­ing or impor­tant in the future? Who are we to judge that the future of human­ity will find no inter­est in the unin­ter­est­ing, unim­por­tant results of today? Take, for exam­ple, a recent review I wrote on a paper about using mul­ti­touch, table­top dis­plays for engi­neer­ing design. I argued that it was unclear what prob­lem was being solved. But what if it solves a prob­lem that doesn’t exist yet? Or what if it solves it in such a way that another prob­lem I hadn’t even thought of becomes triv­ial? On what basis could I really judge whether the work would have future worth?

All of this makes me think I don’t give papers a fair shake. Maybe I’ll adopt a new review­ing pro­to­col: instead of read­ing the paper straight through and record­ing my thoughts, I’ll look at the authors’ research ques­tion and try to answer it myself for five min­utes. Then, I’ll read the paper and if they came up with a dif­fer­ent solu­tion or answer that mine (that is of course reli­able, sound, etc.), whether or not I’m sur­prised, the authors get credit for dis­cov­er­ing or invent­ing some­thing that I didn’t know. Of course, If I guessed their results or solu­tion in a mere five min­utes, what could they pos­si­bly have contributed?