CHI 2017: Automation, Agency, and Learning

The CHI conference (the ACM Conference on Human Factors in Computing Systems) is a strange beast. From 1,000 feet, it’s an incredible gathering of thousands of researchers, teachers, practitioners interested broadly with how people and technology interact. From 10 feet, however, there’s massive diversity. Some people come to share new ways for people to interact with and through technology; others come to critique how technology is shaping and perhaps even eroding our humanity. Attendees somehow coexist and learn from each other despite our dramatic differences in interests.

Because of the scale of the conference, there’s no way to summarize what happened of it. At any given time, there are dozens of parallel sessions and hundreds of hallway conversations happening. Any trip report is therefore mostly a personal account of ideas that were salient, interesting, and impactful.

With my recent pivot to computing education research, learning and education was the focus of many of my conversations. Many of these were about the practice of being a teacher. For example, I had a great conversation with Scott Hudson (Carnegie Mellon) about their new undergraduate degree in HCI, and some of the challenges with trying to recruit students into a major from high school, where there’s barely any visibility of computing, let alone HCI. I also talked to Thomas Fritz (University of British Columbia) about the challenges of incorporating active learning into software engineering education at scale. Brian Dorn (University of Nebraska, Omaha) shared the many unique challenges of state and local level K-12 computing education policy development.

My other conversations about learning were grounded in research. David Karger (MIT) shared his views with me on programming language learning and programming problem solving. I had a great conversation with Nathalie Riche (Microsoft Research) about the role of learning in harnessing the power of interactive information visualizations. I also engaged in a riveting deconstruction with Jonathan Grudin (Microsoft Research) about the history of education policy in the U.S. and the implications of that history on present day flaws in high school and college.

Because many attendees knew of my foray into startups, I had many interesting conversations about technology transfer, the role of design in product management, and my own personal experiences with these things. Jason Hong (Carnegie Mellon) shared details about a new masters in product management, while Bonnie John (Bloomberg) shared her practical experiences with product managers and how they interact with designers. I talked to many of our MHCI+D students about startup life, mentoring them on both how to compare startup and non-startup jobs, but also how to negotiate offers. Danyel Fisher (Microsoft Research) described his new work with Andy Begel on understanding the vast diversity of barriers in technology transfer between Microsoft Research and Microsoft proper. Geraldine Fitzpatrick (TU Wien) also interviewed me for her podcast on changing academic life about my recent blog post on work life balance, where I discussed how the time stressors inherent to building a business motivated me to develop more rigorous time management skills.

Each year, we throw a DUB party in collaboration with Georgia Tech and Michigan, usually attracting hundreds of attendees who want to network, drink, and reconnect with friends. I had a great time learning about many of our former doctoral students’ experiences with faculty life.

I don’t usually go to talks at CHI, mostly because I find them to have too low an information density to be valuable. I did go to a few great ones, however, two of which concerned accessibility. One was my student Amanda’s talk on Genie, in which she described several clever techniques for automatically transforming interactive websites to support multiple forms of input. At the beginning of the conference, my colleague and friend Jake Wobbrock accepted his CHI Social Impact Award, discussing ability-based design, which is the idea that we should be designing for what people can do, not what they can’t do, adapting systems to individual abilities.

The other two notable talks I attended were the opening and closing plenary’s, both of which critiqued commercial software and it’s impact on society. Wael Ghonim (Quora) dissected social media, enumerating the numerous consequences of driving traffic through popularity metrics such as “likes” and ad impressions on our media diet. He argued that news feeds editorialize content via these metrics result in mob rule, where whoever is the loudest and controversial controls the conversation. Nicholas Carr in his closing plenary argued that automation actual disrupts our ability to learn, which creates dependency and ultimately a less capable humanity rather than a more capable one. He argued that commercial software enterprises, whether they realize it or not, must automate in order to create this dependency and make profit. Both of these talks aligned well with a birds of a feather session run by Jonathan Grudin (Microsoft Research) and Umer Farooq (Microsoft) on the topic of human-computer integration, or the idea that digital agents will become so autonomous that they will become to act as our assistants and friends. Unlike the two talks, this session was framed more optimistically, trying to uncover compelling examples of integration, but also open problems.

While some people might find the breadth and diversity of the topics above a bit overwhelming and potentially irrelevant to their work, I always find it energizing. It contextualizes my work and offers methods, techniques, and perspectives that help shape, motivate, and refine my work. I never leave CHI new knowledge about the questions I’m trying to answer in my research, but I always leave it with a new way of asking and answering the questions.

Genie: Input Retargeting on the Web through Command Reverse Engineering

Amanda Swearngin presenting Genie

Amanda Swearngin presenting Genie

Amanda Swearngin, one of my newer Ph.D. students, just presented her work on Genie at CHI 2017, in collaboration with me and her co-advisor James Fogarty. Genie is a clever technique that applies program analysis to reverse engineer a model of any interactive website’s commands, then uses that model to create alternative interfaces for accessing those commands via other input modalities such as keyboard, mouse, or speech. The really cool thing about this work is that web sites don’t have to be built for Genie: all of this works without modification, without the coordination of website developers, and continues to work as web sites evolve.

Here’s a demo:

What’s exciting about this work is that it could allow the entire interactive web to be accessible to people with diverse abilities without requiring web developers to design for diverse abilities. Even better, if developers do design for diverse abilities, Genie works even better, extracting even more meaningful command metadata.

Read our publication for more details.

How I (sometimes) achieve academic work life balance

I was a young father. Just twenty-one and a senior in college when my daughter was born in 2001. I probably don’t have to say this, but having a child at 21 wasn’t a smart move, generally: my (then) wife and I had basically no income, lots of student debt, and only an impression of who we were as people. Fortunately, we were both also pretty mature and goal-driven. Why not have a kid while in grad school? As students, we’d have fewer responsibilities and being under the poverty line, we wouldn’t get caught up in materialism. It’d be us, our love for our child, and our professional dreams.

This might sound overly romantic, but it was true. As a doctoral student, I really only had a few responsibilites: 1) learn to do research and 2) do a lot of it, well. What’s usually hard about this is focus: there’s so much to read, so many projects that one can work on, and so many paths to take, students can get stuck trying to find perfect projects, trying to motivate themselves, and trying to find ways to have the greatest impact. I had endless peers in grad school who were lost in this soup, and often spent 10-12 hours a day searching.

This search can be a wonderful part of grad school. But as a poor young father who had a wife in school too, I didn’t really have the luxury of such expansive time. I had different priorities: 1) be a great father and 2) get a job that I loved and that would provide my family stability (tenure-track professor). You’d think (as I thought) that these two might be incompatible. How is there possibly enough time in the day to achieve both goals?

The only way I found to reconcile the two was to budget time. I gave myself 9 hours per day to make progress on research. I gave the rest of my time to my family. I negotiated some exceptions with my wife (paper deadlines, conference travel, late meetings that others scheduled), but generally, I committed to a 45 hour work week throughout grad school.

This had several positive effects:

  1. I worked the hell out of those 9 hours each day. As most of my grad school peers can tell you, I was always working. I took breaks to stay healthy, went to class, and met with advisors and collaborators. But I spent every minute of every weekday practicing research.
  2. I leveraged required activities for research. One of my more widely cited papers, for example, was based on data I gathered as a teaching assistant. Another class project led to award-winning CHI and ICSE papers. These weren’t luck; I went into these classes with plans, knowing that I had to make the most of the experiences.
  3. With the help of my advisor, I became ruthlessly critical of the potential outcomes of research opportunities. I learned to pursue projects that would result in discovery regardless of the outcomes, so I wouldn’t have any dead ends.

This 45-hour weekly cap also had some negative effects. For example, I spent too little time making friends and maintaining friendships. Because I gave the rest of my time to family, and I wanted to make the most of my work time, I passed on parties, extracurricular activities, and other social time that really would have grown me as a person, and would have grown my network of colleagues and collaborators.

To be productive in these constraints, I had to develop some robust time management skills:

  • I learned to religiously maintain my calendar, protecting research time
  • I used professional to do list management tools and build practices of reviewing it multiple times a day
  • I became extremely disciplined about capturing to do items so I’d never forget ideas I generated or anything I’d committed to
  • I became facile at decomposing tasks, breaking large, difficult-to-start tasks (e.g., write this paper) into hundreds of smaller tasks, making it easy to squeeze progress into 5-10 minute chunks.

All of these skills also helped me develop better self-regulation skills, giving me more awareness about which skills I was developing, which I excelled at, and which kinds of tasks would require more undivided attention than others. This self-knowledge helped me better select when to do particular types of tasks and plan my time accordingly.

All of this carried over into faculty life. Because my responsibilites were more numerous (adding teaching, service, Ph.D. student management, grant management, fundraising, impact efforts, etc.), I’ve had to learn some new skills over the past nine years of faculty life:

  • I keep logs about the multiple projects I’m engaged in, spending a few minutes before a context switch to capture, allowing me to context switch back more easily.
  • I set quotas on commitments and time, giving myself a maximum number of papers to review each year, a maximum time commitment to committee work, a maximum number of talks to go to each week, a maximum amount of time to spend on classes I’m teaching.
  • I maintain a “commitment” calendar for each month into the next two years, to keep track of all categories of activities I’m engaged in. This helps me assess whether I’ve run out of time to say yes to something (e.g., so I can write emails like “I want to say yes to this review, but I have no hours left to do it in May“).
  • To the extent that I can, I organize the tasks each day around a single role (teaching, research, service). A typical week has 2.5 research days, 1.5 teaching days, and 1 service day.
  • I don’t schedule meetings with doctoral students. Instead, I have a weekly lab meeting for reporting and block of entire half days for ad hoc advising. My students know when these times are and that they’ll be able to talk to me then. This means I meet with students for less time overall, but the meetings are focused and therefore far more useful. (I do schedule quarterly mentoring meetings to proactively discuss career planning, networking, milestones, etc.).
  • I (try) to read email only once per day).
  • I revise my courses each quarter to identify ways of streamlining my time while improving learning outcomes.
  • When possible, I schedule 30 minute meetings, forcing attendees to come prepared. This often has the effect of resolving the meeting topic over email.
  • I try to use Slack instead email, since it gives me a more visible context for a conversation with a person or group.

I still aim for 45 hours a week. I still have exceptions, but they mostly come from collaborations now, rather than my own work (e.g., students working up to a deadline, collaborators working up to a deadline). And even then, I work hard to teach my students their own good time management skills, both to help them have better work-life balance, and to help me maintain my work-life balance. The key to avoiding these exceptions is to not overcommit and always pre-crastinate, preparing papers grants and other deliverables at least a few days before they’re due.

All of this of course takes one big commitment. First, I have to commit to doing less. There’s a constant pressure an academia to publish as much as possible. It’s really hard to say no to an opportunity. It takes a lot of discipline (and desire) to turn down a collaboration or not pursue a grant, especially since I usually want to do these things. That’s a natural by-product of loving what I do.

Why do I set a limit if I like what I do? Lots of reasons:

  • My family and friends matter to me more than ever.
  • I have to take care of my body, both physically and mentally (exercise and sleep matter!)
  • I believe I’m genuinely more creative when I have open, unrestricted time to think.  (I don’t count this as work time; if my mind is wandering at the grocery story or on a walk, so be it).

As much as I love nearly everything about my job, I’ve learned to enjoy my free time just as much. It makes me feel like a fuller, more integrated citizen and human, and unquestionably a better father. In a surprising way, I feel like maintaining this discipline over my time makes me a better scholar too. Others agree that open time is actually a critical resource for strong, deep scholarship.

Of course, I fail at capping my time all the time. I failed multiple times in the past few months while engaging in faculty searches. I fail when I’m not sufficiently ahead of deadlines. I fail when a student fails to be ahead of a deadline, and I’ve committed to helping them. And I’m failing right now, writing this blog post at 9 pm!

That’s okay. The point isn’t to be flawless, it’s to draw a line, and try to stay on one side of it.

What five years of early career research funding buys the world

Whenever I close out a grant, I like to reflect on what I achieved with the money. Well, to be clear, NSF likes me to do that too, in the form of a final report of project outcomes. And as it should be: the average American gave me a tenth of a penny to do some research. What did it buy them?

This particular grant was my CAREER award, granted in 2009. This grant is given out to a select few faculty each year who “have the potential to serve as academic role models in research and education and to lead advances in the mission of their department or organization.” Really though, it’s an award given out for important research by new faculty.

In my final outcomes report, I described my work like this (note that these reports are intended for the general public and aren’t supposed to require any expertise to understand):

When software companies release software, there are only a few ways for them to learn about problems that users experience. They can wait for users to report problems, which leads to large amounts of unstructured text that is difficult to aggregate and analyze. They can also automatically monitor for easily detectable problems such as crashes, errors, and performance issues. The broad set of usability and usefulness issues that arise, however, are difficult to monitor and aggregate, making it difficult for teams to improve software for users.

I then went on to summarize the discoveries and impact of the work:

Across the seven years of the project, we made numerous discoveries about this problem. We learned how developers, designers, and product managers evolve software, finding that many ignore feedback that comes through technical support channels, that feedback from users often comes from highly technical users, and that developers do engage with user feedback, they often view it as irrelevant minority opinion. We also found that when developers discuss these issues, they tend to ignore evidence, relying instead on anecdote, speculation, and hyperbole. We also discovered that the most expert software engineers are more rational and evidence-based in their decision making and assessment of feedback, relying on objective data sources to inform their product decisions. However, we also found that expert engineers require substantial interpersonal skills to persuade less experienced developers who rely on less objective decision-making practices.

We invented many approaches to address these problems. One was a way for users to request help while using software without having to express their problem. It dynamically creates a repository of frequently asked questions, predicts which questions a user will have based on their context, and provides structured data to software teams about which questions users have and where. This data can then be used to make more evidence-based decisions about how to improve software. In addition to this, we invented new algorithms for mining software feedback from technical support forums and for automatically detecting usability problems in software without even having to release software.

Were all of the facts above worth the $600K that received over 7 years (including 2 years of “no cost extensions” while I was on leave)? When they’re summarized as they are above, it’s hard to judge, since facts alone probably aren’t the most valuable thing to anyone in the general public—they’re more useful to us academics trying to build larger truths about software engineering. The questionable value of intermediate scientific discoveries is why NSF also requires reports to describe “broader impacts”. I described mine like this:

We disseminated this work in diverse ways. We co-founded a software startup called AnswerDash that sells the help technology, raising venture capital, and to date have created dozens of jobs, while increasing the sales of numerous companies, indirectly creating more jobs. At the time of this grant’s expiration, over 10 million people have used the product to seek help. We also shared our discoveries through multiple articles in popular press, through a webinar reaching over 30,000 software engineers, and through a podcast reaching over 10,000 software engineers. The PI also developed a new software engineering course and wrote a free online book to support the course, which summarizes the forty year history of research on human aspects of software engineering. The grant also supported the professional development of the PI, directly supported the research of four doctoral students (two of whom are now faculty), and trained over a dozen undergraduates about research, several of whom pursued graduate degrees.

Tech transfer? Teaching 40,000 software engineers? A new course and a new textbook? Those are pretty good, right?

Then there are the things that the general public wouldn’t really care about at all, but that I care about as an academic:

  • The grant supported the general research infrastructure at the University of Washington, including buildings, electricity, staff, and other expenses associated with the research. This is called “overhead”, and while it’s generally supposed to cover research related expenses, it supports highly coupled resources like buildings, which inadvertently also support the educational mission of the university.
  • My lab published 25 papers with the funding, spanning HCI, Software Engineering, and Computing Education venues. Four of those papers receive best paper awards.
  • These papers have already been cited over 350 times by other researchers in the world, impacting the ideas and directions of other researchers.
  • I was invited to give my first keynote at SPLASH 2016, which challenged me to think bigger about programming languages and equity.

Most importantly, because I wasn’t spending as much time fundraising all of these years, I was able to focus on becoming a better teacher, a better researcher, a better mentor, and a better leader. Without the support of the CAREER grant, there’s no way I’d have achieved the level of success and impact that I have at this point in my career. And there’s no way I’d be a position to resume frantic fundraising now without failing at my teaching, mentorship, and leadership duties. Because of the grant, I’m a more productive, effective, prolific, and impactful public intellectual, which ultimately helps the hundreds of students I teach every year be more productive, effective, and impactful people.

All that cost the average American a tenth of a penny (and given our current tax brackets, more like a penny for upper middle class Americans and everyone else basically nothing). Is the world that much better for its investment?

In this case, clearly yes: me and my co-founders (my colleague Jake Wobbrock and our former Ph.D. student Parmit Chilana) convinced a venture capitalist to invest $2.54 million in a local U.S. company that created more than two dozen jobs instead of investing somewhere else in the world. Even if you don’t care about anything above except for direct financial returns, that’s a $1,940,000 profit on a $600,000 investment—a 323% return!

Take that Trump-kins. Research beats stock market when done right.

A glimpse into state-level CS education policy implementation

This past Tuesday I had the privilege of attending the Washington State Computer Science Leadership Team, a group of leaders in the state of Washington responsible for devising and implementing K-12 CS education policy. From a computing education research perspective, it was an exciting chance to both observe a state try to systematically implement significant changes to public education, but also a unique opportunity to help shape policy by disseminating computing education research findings.

The meeting was held in Facebook Seattle’s current offices in Westlake, which is interesting in its own right. The Facebook leaders who sponsored the space have a clear interest strong local computing education, but they represented one piece of a much larger effort public/private partnership in state education policy. The room had STEM education representatives from nearly more than a dozen of our state’s school district offices, some directly representing districts, and other representing public educational services that serve multiple districts. There were also educational non-profits such as the Washington STEM, Code.org, Pacific Northwest National Labs, the Pacific Science Center, the University of Washington (Stuart Reges and myself), Seattle Pacific University, and for-profit organizations like Facebook.

The meeting itself was a mix of updates and planning. The updates were both exciting and intimidating. There are districts like Bellevue Schools doing very impressive things to incorporate CS teacher training and CS courses with small amounts of resources. And then there were the scary numbers that only about 10% of Washington state schools offer some form of computing education and less than 1% of Washington state students are engaging in them. That’s pretty far from universal access and even further from universal engagement. So far, the vast majority of teachers were unfamiliar with the CSTA or CSTA curriculum frameworks. The scale of the dissemination effort required for all of these is astounding, even at the scale of a relatively small state like Washington, with only about 1 million students.

Because I had to leave for afternoon meetings, I missed the afternoon planning, but I had plenty of chance in the morning for conversations with several attendees, laying the foundation for research dissemination with several folks. The interesting challenge from a research perspective is finding ways to pitch research in light of all of the other existing challenges in this massive change, such as money for teacher training and salaries, curriculum, and other resources. Discoveries have to be incredibly clear, concise, and adoptable to have any chance of being adopted amongst all of this other change. All that said, researchers should be involved at every level: in policy planning, policy implementation, teacher training, curriculum framework development, and technology design. These efforts can be a great way of disseminating research, but also discovering new research opportunities.

Review of Grudin’s “From Tool to Partner: The Evolution of HCI”

Last week there was far too much news I don’t want to hear, so instead of reading news, I read Jonathan Grudin’s new book on the history of HCI. (On my phone. On buses. In the dark. In five minute spurts!). Since you probably haven’t read it yet, I’ll do my best to summarize it here and tell you what I thought.

First, Grudin tackles a lot in this book. He synthesizes no less than the history of AI, HCI, Information Science, and Human Factors, trying to show how these fields emerged, intersected, but rarely engaged each other, despite all of their immense interest in the interaction between people and computing. It’s a massive amount of history about fields emerging at the beginning of the digital age and so the scope can be overwhelming.

Grudin does a reasonable job covering this scope, organizing the book chronologically, but bouncing between different fields, presenting big claims about the assumptions, ideas, and lenses that shaped what the different fields investigated. Throughout, he’s seeking to explain why these fields studied what they did, how that led to their ultimate lack of intermixing, and how that resulted in different fields’ differential impact on practice.

One of the big ideas in the book is the difference between the kind of discretionary computer use that happens in consumer settings and compulsory use that happens in organizational contexts. Grudin theorizes that this is the primary reason why information systems and LIS withered while HCI flowered: discretionary use just became the dominant, visible change in the world, bringing computing to every facet of life, giving HCI a mountain of interesting, diverse things to study and therefore broadening its methods and perspectives, while the world of compulsory inside of organizations moved more slowly, constrained by the difficulty of studying whole organizations and their glacial adoption of consumer trends.

One of the more interesting, perhaps implied ideas, is that these other fields of Management Information Systems, Human Factors, and Library & Information Sciences, despite withering, still have a wealth of knowledge to share about people’s interactions with computers, but the lack of disciplinary intermingling really prevented it from informing some of the big changes that occurred in computing. Google, with its roots as an NSF-funded Library & Information Sciences digital libraries project was one of these few exceptions. Look at the impact that emerged from its interdisciplinary foundations. What would the world look like if our major shifts in computing had been informed by all of these fields instead of primarily computer science?

Zooming to present day, Grudin isn’t sure what to make of the iSchool movement, which seeks to embrace some of these interdisciplinary threads that never quite connected through history. These fields are finding their way together after decades apart, with faculty from computing backgrounds like myself mingling with faculty from these other fields. Will we find ways to combine our disciplinary perspectives into new, more powerful ideas that will shape our computational futures? Or is it too late, with computer science shaping the conversation, but narrowly? I suppose that’s literally up to me and my colleagues to decide.

Grudin is convinced there’s plenty more runway to find out. He predicts a future that goes well beyond interaction, to human-computer integration. In fact, he predicts that future is now, and that we’re only just beginning to figure out how to reason about interactions that infuse computing into our every day decisions and communications. He predicts that understanding people and communication will be key to that, and that interdisciplinary perspectives on communication and information will be key to progress.

Aside from Grudin’s overarching thesis, the book is full of interesting little twists, turns, and origin stories in the history of computing, all told through the lens of interaction. If you’re interested in the history of computing from a research perspective, this is a great entry point to its rich and recent past. I also found it helpful in contextualizing my own epistemologies, my own training, and my interactions my colleagues at my own Information School. If you find yourself in an interdisciplinary setting, I highly recommend it.

The only critique I’ll make is that the book wanders. It doesn’t wander in a particularly frustrating or unhelpful way. It feels more like wandering through a zoo, constantly pulled forward by an interesting bird or a lumbering primate. By the end, you feel like you’ve seen much of the biodiversity in the world, but you’re not quite sure you’ve seen it all, and it all seems a bit artificial. Maybe it’s not possible to recreate a history faithfully, or in a way that feels faithful. Maybe the best we can do is menagerie.

How I applied learning sciences to undergraduate design education

I’m no fan of student evaluations. They’re fraught with gender bias, age bias, and all kinds of construct validity issues. They certainly are not good measures of learning outcomes or teaching quality. At their best, they are good indicators of an instructor’s success at creating a coherent, engaging experience, which is important to learning. And engagement is no small feat in a world that increasingly frames colleges as businesses and students as customers, compelling students to constantly question the value of what they’re learning to their career paths.

Since we nevertheless gather student evaluations every quarter at the University of Washington, I do use them to track my own progress at engaging students in learning. And I’ve usually done pretty well on whatever they’re measuring. Take, for example, my Design Methods course, which is basically an introduction to HCI and Design methods for undergraduates. Since I started teaching it about eight years ago, I’ve generally earned anywhere from a 4.0 to 4.6, which is generally considered by faculty to be excellent. At the University of Washington, these scores are the median of all students average across four prompts on a scale of very poor to excellent (how was the course, how was the content, how were the instructor’s contributions, and how effective was the instructor at teaching). So my generally high scores mean that most of my students believe I can engage them, believe I can explain things to them, and believe that I have sufficient expertise on design. None of this means I can actually do these things well, but pre-tenure, that was good enough for me.

On sabbatical last year, however, I began to read learning sciences and education research more deeply, partly because I’ve been doing more computing education research, and partly because I wanted to become a better teacher. What I found was that while my teaching as adequate, it was far from ideal. While reading through the book How People Learn, I found countless opportunities to produce better learning outcomes, usually without significantly more effort (and sometimes with less effort!).

The source of most of these opportunities was a simpler, but more robust theory of learning. In essence, I learned from learning sciences that effective, efficient learning requires three things:

  • A clear sense of the knowledge to be taught.
  • Deliberate practice of that knowledge (meaning immediate, targeted feedback)
  • Attention, and therefore motivation, on that practice.

That’s it. I learned that the complexity isn’t so much in learning (humans seem to do that quite naturally) but in setting up conditions that predispose people to learning. Getting students motivated, and therefore attending to practice, is hard. And designing effective deliberate practice is hard, often because we don’t know exactly what we’re teaching or what’s hard about what we’re teaching. It’s also hard to scale targeted, immediate feedback to individual learners.

Given these basics, I spent part of my sabbatical trying to redesign my course to achieve better learning outcomes in my Design Methods course. Here are a few of the things I did, applying the theory above.

One of the first and easiest things I did was share with students my theory of learning, to frame how I was engaging with them. I taught Carol Dweck’s work on theory of intelligence, explaining that every student has beliefs about where ability comes from, but those beliefs actually mediate how much people learn. I encouraged them to adopt a growth mindset, remembering that all ability comes from deliberate practice, and that the class would be structured to give them that practice. Second, I told them that as much as it was my job to structure an environment conducive to learning, it would only happen if they engaged, believed in their ability to learn, and listened closely to the feedback I provided.

Next, I tackled the problem of motivating students. I’ve always had some model in my head of what my undergraduates care about, but that model was always based on a few close relationships with undergraduate researchers, generic surveys, or student feedback in evaluations. None of these work that well in providing substantial insight into what motivates my students. To solve this, I spent the first day of class asking students to write a brief essay in class answering the question “Why are you in college and what does design have to do with it?” Then, rather than reading them privately, I had students share them with each other in small groups, and then construct an elaborate whiteboard diagram of their life trajectories and how design fit into it. What we learned was that because my course was a required course, most had little intrinsic motivation to learn design, but they were curious about it and thought it might be useful. Most also had very concrete life goals, including specific careers, visions for where they would live and how much money they needed to make to live there, and what kinds of friends and family they wanted. For most of them, school was a tool for getting them to those futures.

I used this model of my students’ motivations to shape a third pedagogical practice: at the beginning of every class and every in-class activity, I explicitly stated how I thought the day’s activity would contribute to their life goals. Devising these links was not easy and couldn’t be done in advance; I was constantly updating my model of what was motivating my students so I could come up with a single justification that would work best with the whole class. For example, the day I taught heuristic evaluation, I said something to the effect of: “So we’ve talked a lot about UX designers in class so far and how their general responsibility is to envision seamless user interface designs. Some of you want this job, others of you will be working with UX designers to implement their visions. How will they know if their design is good? And how can they know in just a few days, which is the time scale that many designers have to work at? There’s one method invented back in the 1990’s that tried to solve this problem. We’re going to talk about it today, learn its strength and weaknesses, and discuss when it makes sense to use it.” Note that this kind of justification is essentially the same justification that Jakob Nielsen used in his book Usability Engineering. I just needed to link his motivation to students’ individual aspirations.

Another challenge was in motivating students to learn the declarative knowledge about HCI and Design, such as important methods, concepts, histories, and ideas in design. How could I motivate students to read about these things? In addition to use the same strategy above (simply explaining how it linked to students’ own goals), I designed a series of reading exercises that aimed to be frictionless, but also engaging. Twice a week, students would read a short blog-post length chapter that I personally wrote as an introduction to a subject in design. They were short enough that students would read them, but deeply linked, so that throughout the reading, there were multiple followup readings students could do to deepen their knowledge. Then, to motivate students to read them, I held a reading quiz at the beginning of class to verify that they had read it (which had the added benefit of getting them to show up to class on time). I also required a brief summary of a reading of their own they could select, choosing from the readings I linked to, or from any other reading, podcast, or video on the web that concerned the same subject. After the reading quiz, students engaged in “think-pair-share”, turning to a few of their neighbors and explaining what they read and what they found interesting about it. Then, after a few minutes of sharing, I asked for students to voluntarily share the most interesting readings they heard about from their peers. In just about 20 minutes of class, we covered a range of readings, many of which were entirely new to me. I had to be ready to rapidly synthesize and relate the topics they raised to the subject of the day, but this kept me engaged as well. It also reinforced every day that I genuinely did have the expertise to be teaching the subject.

After the reading period, we would engage in an in-class activity. I explained to students that our time together was precious, because it was the only time that we could actually do design together (as design is rarely done alone). For each topic, I carefully designed an activity with a very specific form of deliberate practice, always beginning with a justification and ending with a reflection that tied together the practice they engaged in with feedback on what they did right and wrong in their practice. My role in these activities was to facilitate and closely observe so I could provide this feedback. One example of an activity was a 90-minute usability testing activity in which teams of two designed a paper prototype alarm clock interface, design a task to verify its usability, and conduct a series of usability tests with their classmates. The rules governing this activity were carefully designed to mimic the kind of usability tests that people run in industry, but also to reveal the fundamental scholarly questions behind usability testing (namely, how reliable is the knowledge they produce). I tried to design each of these activities to feel like a game, with some clear notion of the rules and definition of winning, but align these with authentic ideas in practice, and make their authenticity clear to the students.

The result of these readings and activities was that every day, students got to come to class to share what they learned in their selected readings, learn from each other, and then engage with each other with my help to acquire a skill that would help them get closer to their life goals. Almost all students came on time, excited about class, and many left craving more time to go into more depth (which we never had).

I can say with some certainty (both from student evaluations and my own observations) that students were engaged: my median student evaluation score was a 4.9/5.0, the highest I’ve ever received across eight years of teaching and twenty-five courses and the highest I’ve ever seen amongst my colleagues. Unfortunately, what I still can’t say was that they learned any better. We simply don’t know how to measure design skill with any reliability or validity. And so I take it on faith that, given what we know about learning, that as a natural byproduct of deeper, more sustained engagement, the students practiced more and more deliberately the content I gave them.

Now I just have to figure out if it’s the right content! And if students’ perceptions of my teaching skills have anything to do with the quality of their learning. And how to figure out what they’ve learned about design. And a million other unanswered questions about design education!

Assessment is a computing education grand challenge

How do you know what someone knows about computing?

This question is foundational and pops up everywhere. It arises in classrooms, where teachers need to be able to accurately determine what a student has learned, both to help them learn better (through formative assessments) but also to establish a record of how well they’ve learned it (a summative assessment). But it also arises in professional settings such as hiring: when an applicant says they “know” Java, what does that actually mean? What is it predictive of? Surely there are better ways for an employer to know how well someone knows a programming language other than self-report or having passed a class at a university. We don’t even know how well these indicators actually predict ability.

Isn’t this just a matter of writing tests? It turns out that writing good tests is very difficult. It’s not enough to write an exam that asks people to define concepts and solve problems. If the wording of the questions is off, people may get the answers wrong even though they know the answer, or even get the answers right even though they don’t. These are examples of poor test validity, where the test measures something other than the knowledge one is trying to assess. Some tests aren’t reliable, in that using the test repeatedly produces different results for the same individual in different settings. Reliability issues can arise from ambiguous wording, ill-defined concepts, or poorly constructed definitions of correct answers leading to unreliable scoring.

Making a reliable, valid test is a considerable amount of work. Several of the students from Mark Guzdial‘s lab have spent a substantial portion of their time as doctoral students developing reliable, valid tests for measuring how well students can mentally simulate (or trace) the behavior of simple imperative programs (see the FCS1 and SCS1). Even after their rigorous efforts, these assessments are hard to reproduce and sensitive to overuse, making it difficult to scale these efforts to other concepts or other languages.

The implications of unreliable, low validity tests can be severe. Bad tests in introductory programming classes can fail students that actually know quite a lot or pass students that know quite little. This poor signal can trickle down to employers, who might use courses, grades, and other credentials as an indicator of ability. And because tests are garbage-in, garbage-out, all of this happens without a teacher or employer ever really knowing, producing a garbled, sometimes overconfident sense of what students know.

I’ve seen these problems as a student myself. I remember graduating back in 2002 with my undergraduate degree in CS with many of my high performing peers admitting that despite all of their high grades, they still couldn’t sit down in front of an empty code editor and write a program to solve a problem. Sure, they solved lots of problems in class with the help of peers, TAs, and highly scaffolded assignments within the scope of problems their teachers had discussed. But they often didn’t know why their solutions worked. I remember getting partial credit for solutions for regurgitating partial solutions that I really didn’t understand, resulting in inflated grades that miscommunicated the level of my understanding.

Does it matter that developers understand the code that they write if the code still works? If correct code stayed correct and code could be “correct enough,” this might not matter. Unfortunately, correctness matters: programs receive unexpected inputs and developers have to debug, and developers can only do this well with a deep understanding of the semantics of a program’s execution. Moreover, this deep understanding likely would have prevented some of these defects from occurring in the first place.

All this said, there are some people who obviously develop a deep, nuanced, accurate understanding of computation. These people are our best programmers, our computer science faculty, and others who’ve likely devoted their life to eradicating every misconception about computation from their mind through incredible amounts of deliberate practice. I suspect these individuals aren’t confined to the limits of assessment because they’ve learned to self-assess their knowledge. In fact, computing might be unique in that people can actually test their understanding of computation by carefully probing program behavior, using the computer itself as a source of feedback about their understanding. Perhaps this is how people are able to develop robust understandings of computation despite the failures of assessment. This might also explain why CS teachers appear to believe that some students “get it” and some don’t: what’s really going on is that some have an insatiable curiosity about how computers behave, and use that to fuel a limitless quest for more robust knowledge of computing.

Because knowing what computing knowledge is in someone’s head is so hard and so important, I believe it’s a grand challenge of computing education research. If we don’t discover reliable, valid, scalable, replicable ways of knowing what people know about computing—or find a way to give more people an insatiable curiosity about computing—we’ll continue to overlook deficiencies in knowledge, producing defective unreliable code. It’s up to researchers to make these discoveries and up to society to fund it.

Two truths

My first presidential election as an eligible voter was back in 2000. I was one of those annoying Nader supporters who found Gore boring and soft, and preferred Nader’s rage. My feelings on Bush, of course, were a different matter entirely: he seemed stupid, feckless, ignorant. He couldn’t form coherent sentences. Most of all, his disinterest in the truth was frustrating and discouraging. How could I vote for a man that willfully ignored reality?

The years and the wars and the lies dragged on. Americans died, the country split, and cable news helped. Truthiness came to life. Republicans got better at twisting reality into a story that fit their goals, and using words to hide reality: the Clean Air Act, No Child Left Behind, Mission Accomplished. It was 1984 in 2004, hiding lies behind propaganda, falsehoods as reality.

My generation saw Obama as the answer. He talked about the hard truths and what we might do about them. He acknowledged the messy complex realities of our country and sought to implement pragmatic, incremental remedies. The Affordable Care Act was a pure expression of pragmatism: not quite everything we wanted, but a bit better, with a bevy of little changes that aimed to make things a little bit better for some people. Not great, but better. The incrementalist in me swooned.

Of course, all this time, there was another truth, a competing truth, that writhed under Obama’s rule. This truth was an account of the world that was not based in science, in evidence, or in logic, but a truth grounded emotion, experience, and faith. These were truths that accepted science when it was compatible with faith, rather than accepting faith with it was compatible with science. This was an America that was tired of the elites—the economists, the scientists, the secular urban progressives—who claimed truth as their own and rejected anyone with a different epistemology as not only wrong, but also bigoted, ignorant, and backwards.

The country segregated itself by these truths, with the secular elite seeking diversity, inclusion, and progress in the cities, and the rural faithful seeking homogeneity, privacy, and stability in the countryside. We sorted ourselves: not only geographically and economically, but epistemologically. And when housing bubble burst, it was the isolated rural who hurt most, losing not only their fragile local economies, but the also the small trickle of wealth from the growing urban centers upstream. Rural America watched urban America only grow wealthier and more powerful. The secular truth became a cause of suffering, and the religious truth the only cure.

The segregation of urban and rural America, and the segregation of the truths that came with it, left rural America voiceless. The centers of media and journalism were in the cities. With newspapers’ declining revenues going to Silicon Valley, there was even less reason to drive to the country and report on rural America, especially as cities grew and became the center of American vitality. Rural America was not only abandoned by the economy, and by the elites, but by their only remaining voice in the public sphere, the media. Alone, abandoned, and isolated, a justified hate of cities, of sciences, of progress, and of the media festered.

Trump did not cause this. He exploited it. He spoke to a rural America that had been ignored for years and promised to restore everything that had been lost over the past twenty years. He described a truth that everyone in rural America knew: America was falling apart, or at least their America was, and no one else seemed to know it, not the media, not Democrats, and not even Republicans. Something had to be done to restore it.

Who should it be? Certainly not the establishment? The only way to solve a problem is to accept that it exists, and Trump was the only one who did. The lies, the bluster, the hate, the insults, the misogyny, none of these were desirable things. They certainly weren’t Christian behavior. But when it comes down to restoring faith and restoring livelihood, the latter has to come first. Faith is for the fed.

All the while, me and my secular urban elite friends were oblivious. The economy was growing (in cities), fewer people were in poverty (in cities), more people had jobs (in cities), more people have health insurance (in cities), and violent crime was down (in cities). America’s migration to urban centers combined with the steady improvement of cities masked the decline of small American towns. Our aggregate statistics obscured the opposing forces of our economy. The very tools of our secular truth failed us, while our most human senses, our emotions, were blotted out by distance.

Truth matters. It still does. But more importantly, all truths matter. The truths we discover with our minds, but also the truths we discover with our hearts. Our secular urban methods of science and data can only see part of reality because we only answer the questions we ask. We didn’t ask what was happening in rural America. No one did.

Now we know the answer. And now we have to accept that hidden inside our economic recovery was a tragic economic decline. And if we accept the scientific reality behind this decline, we’ll know that it was scientific and technological progress that caused it, centralizing, automating, and digitizing human activity to a degree that place and people no longer mattered, just information. Secular urban progressives robbed rural America of its vitality with science. And now it’s the secular urban progressives who must restore it, making rural America great again.