spreadsheet error costs time and money, yet again

Back in November, I got my first water, sewage, and gas bill from a company called ISTA. My apartment management had taken a while to set up the billing after the previous billing company went of business (or dropped the contract, I don’t remember exactly what happened). So I hadn’t actually paid water and sewage for two or three months.

So when the bill came for $173, I wasn’t too surprised. I didn’t really remember what I’d paid the previous year, but this seemed reasonable for a few months of water, sewage and gas. I wrote the check, and forgot about ISTA.

Forty five days later, I got the next bill, but this time something seemed wrong: $463 and it had only been a month and a half. What the hell was going on? I looked back on my old bills and noticed that my average 30 day bill was about $30, even in the winter. Either the company was trying to extort money from me or somebody had made an accounting error.

I looked more closely at the bill, which had three columns: previous usage, current usage, and usage. The difference between the first two columns was exactly 1000. The value in the third column was 10,000. Was there some hidden multiplier I didn’t understand? Maybe there was some rate that just happened to be 10, and I had just kept my apartment and showers really warm this winter.

So I called ISTA and disputed the bill. They immediately escalated it to their dispute manager, who called me back after a few days. They said that there had been a misread meter and that they had corrected the reading, and that after the bill was now only about $270, after they had applied the credit. When I got the call, I had a meeting to be at, so I didn’t think about it much.

After I got home that night, it still didn’t seem right. $270 for 45 days? What happened to the rates? They must have gone up by a factor of 10! So the next day, I called ISTA back, and spoke to a nice lady about my problem. Rather than call the dispute manager again, she told me she was opening my spreadsheet. She proceeded to walk through the calculations with me, describing the rates and the formulas. I jotted them down on paper as we went. Finally, we got to the final total calculation, and she said, “so this times the multiplier is … wait, it shouldn’t be.” She immediately put me on hold.

A few minutes later, she came back, saying that she needed to have the accounting department look at my spreadsheet. My spreadsheet, implying that every customer has their own. She said that the dispute manager would, yet again, call me back in a few days.

Four days later, the dispute manager called me back and explained that there was some sort of disagreement between billing and accounting, regarding the cause of the problem. Billing thought it was the spreadsheet and accounting thought it was the meter readings. She said she’d call back in a few more business days, after they’d worked out their differences.

When she did call back, she leveled with me: accounting was wrong, there was an error in the spreadsheet, and after fixing the multiplier cell, my bill was reduced by a factor of 10. After the credit calculators, they determined that I had overpaid from the previous bill by about $100, and that I probably wouldn’t have a bill for the next two cycles. She apologized for how long it took to resolve the issue, but reassured me that it wouldn’t happen to me again.

But I wasn’t thinking about me at this point. I was thinking about all of the other customers, whose spreadsheets probably had the same error. Would the accountants audit all of the spreadsheets that copied the error? How many customers would call about the bills? How many would insist, like I did, that there was a spreadsheet error, and demand that it be properly diagnosed? And how much of this feedback would ever make it to the accountants writing the buggy spreadsheets?

Oh, end-user programming. Your manifestations in society abound.

Emerson interview (part 2); writing for HCI venues

Here is part two of Emerson Murphy-Hill’s interview with me. This part covers some of the challenges in publishing in HCI venues.

Q: A prominent proponent of empirical software engineering once told me that that he typically spends a full page discussing the threats to validity of his evaluations. At the same time, it’s not unusual to find a CHI paper that doesn’t discuss threats. How does one choose which threats to include and exclude, and how to present those threats, to the CHI community?

Most CHI papers clearly discuss threats, just not in a section titled “threats to validity.” This tradition comes from CHI’s cognitive psychology research, where the threats were inherent to the study design and discussed throughout the method and discussion sections. There never needed to be a separate section because it was expected that discussion of the limitations would appear throughout the article. As a guideline, one should always discuss all non-obvious threats to validity. Its a necessary part of honest scholarly work.

Q: Where do you draw the line about whether a threat is obvious?

Some threats are common to all empirical research: the sample size was to too small, the study may not generalize, situations may not have been representative. These are standard disclaimers and its always worth mentioning them briefly. The ones to really spend time on are the definitions and measures one uses and what likelihood they have at actually capturing the concept of interest (the construct validity) and whether they have any meaning for the real world (ecological validity).

Q: Have you had an HCI reviewer suggest that your work is better suited for a software engineering venue, or vice versa? If so, how did you deal with the suggestion? If not, how do you think you preempted it in the first place?

No, I’ve never had a reviewer suggest that. Of course, the work that I publish at HCI venues usually has more to do with the actual work of software engineers, their collaborations, or their interactions with users, as opposed to conventional software engineering research on automation. I think one of the main stumbling blocks that software engineering researchers will have trying to publish at HCI venues is demonstrating that the problems they work on are of significance. For example, a common type of software engineering paper will find some specific set of circumstances that can be exploited to automate bug finding or prove correctness within a certain set of assumptions. In general, HCI researchers aren’t interested in these types narrow contributions, unless there’s some good evidence that the set of circumstances exploited is large and generalizable to some degree.

Q: In an HCI paper, where do you make the argument about generalizability? Is there room for speculation?

Andrew: There’s always room for speculation. That’s what discussion and limitation sections are for. The whole point of studies is to use a kernel of rigorous and trusted analysis in order to make predictions about the larger context of the world. In fact, I think too many software engineering papers simply report results and ignore what impact a tool design or study might have on our understanding of software engineering. Tools, after all, are embodiments of theories about the world, and they have just as much potential to teach us about our surroundings as studies – perhaps more.

Q: As a reviewer for HCI venues, what is the most common mistake that you see software researchers making?

Being more fascinated with technology itself than what technology does for people (whether those people are technology users or hardcore software developers). More often than not, I will read software engineering papers published at HCI venues that try hard to persuade me that the clever tricks they devised are interesting enough to overcome the minimal impact the tricks will have on users’ work and experience with a tool.

I also see software engineering researchers try to make knowledge contributions about software development practice without citing the large body of work done at CSCW and other conferences about group work. HCI researchers tend to view software development as just one of many examples of collaborative work. The argument that its special and unique usually doesn’t fly without evidence.

Q: Although HCI submissions are often anonymous, people tend to be suspicious of “outsiders,” and may treat outsiders’ work with some undue hostility. What can a software researcher do to avoid identifying himself as an outsider in the HCI community?

All HCI researchers are outsiders. There’s not enough of a concentration on any one topic or problem for there to be a common core. The best thing to avoid sounding naive is to read as much about a topic outside of your discipline as possible. HCI draws from cognitive science, psychology, design, computer science, engineering, anthropology, social psychology, communication, education, and several other fields. Chances are, there’s work in all of those fields you should at least be aware of, if not read and cite.

Q: Suppose that you attempt to solve a usability problem for a certain kind of software tool; HCI researchers may perceive that you are solving only a very narrow problem, and thus your contribution is small. How do you deal with that?

The typical solution to this problem is finding a community that thinks your problem is broad instead of narrow. HCI research tends to have a fairly broad view of the world, since its so applied, so understandably, many problems will viewed as small (just like any non-academic would view our problems as narrow). The best one can do is demonstrate what relation the problem has to society and what impact it might have on the world – not just on the tool users.

Q: Where should software researchers send their human-centered papers?

Anon: CHI is an obvious choice, but it’s the premier conference in HCI, which makes it a difficult target even for very experienced HCI researchers. Beyond CHI, what would we recommend? I’ve been investing in VL/HCC, the logical successor to the sadly defunct Empirical Studies of Programmers (ESP) conference. It’s a strong secondary conference, with a first-rate community, but with less content about professional SEs than I would like. I’m hard-pressed to recommend another HCI conference.

Emerson Murphy-Hill interviews me (part 1)

About a month ago, Emerson Murphy-Hill (currently a post-doc at UBC) asked if he could interview me about the challenges of doing HCI research about Software Engineering (and vice versa). I’ll post our interview in two parts: the first, listed here, covers HCI and software engineering research and the second covers publishing in HCI venues.

Q: What are the biggest differences between the HCI and software research?

Both pursuits are very problem-driven: we want things that work, that demonstrably solve issues, and move practice and our understanding of practice forward. Not only that, but both pursuits are largely interested in solving the same problem: we want to find ways of creating software and technology that achieves its requirements and ultimately serves customer and user needs.

Where HCI and software research differ are in their methods. Most HCI research focuses on understanding the context being designed for and using this understanding to drive innovation. Software research often works the other way around, seeking innovative technological approaches to well-established problems, but not often doing formative research to discover new problems. The other difference between the two pursuits is the breadth of their methodological toolboxes. HCI researches will use whatever method is appropriate for a research question, whether that means controlled experiments, interviews, ethnographic field work, or any number of theoretical frameworks and formalisms. Software research tends to be more restricted methodologically, focusing mainly on first order logic and quantitative empiricism. In my view, this restricted set of epistemological tools prevents software researchers from seeing the larger context of the problems we try to solve.

Q: It sounds like you are suggesting that software research could make more of an effort to do formative research to find problems. Are there any areas in software research that you think that we don’t truly understand the nature of the problems?

I think that software research, as a whole, severely underestimates the role of communication, coordination, and management in successful software projects. The importance of this factor has been claimed for decades and studied infrequently for nearly 20 years, but it plays such a minor role in most software engineering research. This is surprising, since most other engineering disciplines focus heavily on the actual human process of engineering different goods.

Another area that is understudied is the notion of require- ments and what they actually mean. Ultimately, the role of software in humanity is to support humanity, but more often then not, software engineers let the medium, rather than humanity, dictate the design. There are interesting connections between Requirements Engineering and HCI, in that both seek to elicit requirements, but using different methods. I’ve seen very little work that bridges these different approaches to software design. Generally, most software research focuses on “getting the design right” rather than “getting the right design.”

Q: I noticed that you mentioned that software research has begun to use quantitative empiricism, but did not mention qualitative empiricism. Do we not use both?

In my experience, software engineering researchers are highly skeptical of qualitative methods. It is quite rare to see a paper at a top conference that uses qualitative methods exclusively. I’ve personally received reviews that suggested I convert my qualitative observations into numbers in order to make them more objective (which, epistemologically, is both ineffective and naive). Unfortunately, there are some questions for which quantification is insufficient. For example, if you want to know how a software development team selects which bugs to address first, what do you measure? Some objects of study are processes and activities, with no single measurable dimension.

I understand why the community is skeptical; we come from quantitative traditions; most software engineering researchers only have a vague idea of what sociologists and anthropologists do. It’s unfortunate that at the moment, to publish research about inherently qualitative phenomenon, one has to create artificial and unhelpful measurements of phenomenon to make it palatable.

Q: Do you have any recommended reading for doing research in HCI?

There’s no small set of reading that would suffice. HCI spans over 50 years, 20 conferences, and at least 20 journals. CHI, the leading HCI conference, is the second largest ACM conference, second only to SIGGRAPH. Deciding what to read can be daunting and there’s really no way to reduce its complexity.

Instead, I’d suggest that learning to do research in HCI is more about choosing which methods you want to excel at. Personally, I focus on user interface design, evaluation, and empirical research, and that only covers a small subset of the kinds of methods that people use in HCI.

I can recommend some books, which offer some perspective on the mindset of HCI researchers. For example, Bill Buxton’s Sketching User Experiences [2] is a fantastic look at what it means to design systems and how finding the right design (the HCI part) can greatly simplify getting the design right (the software engineering part). I very much subscribe to his perspective (which isn’t surprising, since Bill unofficially advised my advisor, Brad Myers, at Toronto).

Q: At what point in the research process should researchers consider what venue to submit to? Should the venue influence how you conduct your research?

There are definitely more experienced people to ask than me! But perhaps I have a fresh perspective on the issue, as I straddle the boundary between the HCI and software engineering worlds. Ideally, researchers would select important, interesting problems and publish the work when its done. The venue should only matter once one knows what the contribution is and which communities would appreciate it.

Unfortunately, the conference culture in both HCI and software engineering tends to incentivize work of limited and conservative scope. This has been discussed across a variety of venues, including HCI articles and conference panels, as well as several ICSE keynotes and papers. That, and a lot of good work gets rejected because its not yet fully formed. I believe that journals, with their multiple rounds of review and lack of deadlines, offer a healthier process with which to vet and disseminate academic research.

Q: Tasks used in HCI studies often appear to require little domain expertise and can be conducted in a short amount of time whereas software studies often require substantial domain expertise and can be difficult to structure to complete in a short amount of time. Is this statement true in your experience and if so, how have you managed the issue?

I don’t think this is a fair characterization of HCI research. In the past, HCI has focused a lot on novice tasks, partly because user interfaces were so bad; there is also a subset of HCI research that focuses in input techniques, which are more amenable to experimentation because of the more limited variance in human motor performance. But in the past few decades, there’s been a broad focus in HCI on supporting experts and expert teamwork in a variety of domains. Designing studies to support these activities are just as or more difficult than designing studies to evaluate software tools. This is one reason why HCI has adopted so many other kinds of methodologies: one can’t design a controlled experiment to learn how first-responders use cell phones to coordinate. We have the same challenges when designing controlled experiments to learn about coordination in software teams.

I deal with this challenge in my own work in a few ways. First, like researchers in all other empirical fields, I carefully design my measurements, stating their limitations and potential confounds, and then move forward despite the threats. The ultimate product of any empirical work is not the one perfectly designed study, but a large collection of studies that repeatedly demonstrate consistent and convergent results across a variety of contexts and with a variety of operationalizations. There is still an attitude in software engineering research that a single study should suffice; we need to move away from that view and start to plan for decades of study and experimentation on fundamental issues.

Another way that I deal with this challenge is to design studies that explain how what my tools do for people and how they do it. For example, I’m designing a study at the moment with James Fogarty and Kayur Patel to evaluate how their integrated classifier development environment helps developers find bugs. The goal of the study is less about showing a difference in success (since success in the real word depends on too many other factors) and more about explaining what the tool does differently than contributes to developers’ success. To do this, we’re asking participants to verbally state changes in their goals, and associating these shifts in goals with the use of different parts of the tool. This way, the study result is not “participants were more successful,” but “participants were more successful because they spent more time confirming fewer hypotheses.” This is the kind of knowledge that helps design other debugging tools.

Q: So do you think that any parts of HCI or software research will have a lasting impact?

Well, this is a controversial topic within HCI, but I personally believe that there is fundamental HCI research and then there are applications of HCI methods (which are actually the methods of other communities, such as cognitive psychology, anthropology, and design). My body of work, for example, is largely an application of HCI methods to the problems of software engineering practice. I view the core areas of HCI as input and output devices and anything else having to do with feedback and interactivity. This latter category has and will continue to have lasting impact.

Software engineering, like HCI, has made several foundational contributions to practice, such as version control, limited forms of model checking, compilers, debuggers, and development environments. However, many of the coordination, planning, and management aspects of software engineering have moved along largely without the help of research. I think the challenge for software engineering research is to recognize that many of the fundamental challenges in practice are human challenges, and that many basic software engineering tools must be designed with these challenges in mind.

One philosophical issue surrounding the future of both applications-driven HCI research and software engineering is whether the domains we study and design for are moving targets. Psychology, medicine, and the natural sciences operate under the assumption that people and nature don’t change in their fundamental nature (or at least very quickly). This makes it possible to advance knowledge with empirical study over the course of 100 years. Can we make the same assumptions for the nature of coordination in software development? Are there really fundamental, unchanging aspects of software engineering practice, or are all of the challenges we observe ephemeral? This is an open question that neither HCI or software research have begun to address.

Q: You mention that doing HCI studies are hard. How might one get started doing an empirical evaluation for the first time, considering both the need to get useful results and the high likelihood of making a mistake?

To really get good at empirical evaluation, a lot of things are necessary. First, find an expert at empirical evaluation who is interested in applying their skills outside of their content area. These might be statisticians, experimental psychologists, or researchers in policy departments. Second, get a good book about epistemology: there’s no end of gentle introductions to the power and perils of measurement. I recommend The Numbers Game [1] for an intuitive sense of the complexity of measuring things. The key thing is to learn to be extremely skeptical about the validity, reliability, and semantics of measurement.

The rest of the challenge is knowing your audience. Do you really need an experimental study to support your claims? Or would finding one person to adopt your tool for a week suffice? Do you really need to demonstrate causality, or are there other more pressing questions that might be interesting to investigate? There are lots of ways to gain confidence that your design choices were good by some measure.