Modern compilers are expected to efficiently translate code written in constantly-evolving source languages into optimized code in increasingly sophisticated and expanding machine languages; it should be no surprise that they are wont to do so imperfectly. Thus, an important question is: *How can we increase our confidence that a compiler is correct?* Typically, we do so using testing: we compile a large (manually curated) collection of code, and if everything works as expected, the compiler is probably doing its job. There are, however, a few flaws in this basic approach.

This blog post describes a better approach, *random compiler testing,* which has proved highly effective at uncovering compiler bugs. The post’s focus is on research carried out on random compiler testing for C and C++ compilers, first reported in our SPLASH/OOPSLA 2020 Distinguished paper “Random Testing for C and C++ Compilers with YARPGen.”

What is wrong with just testing a compiler by using it to compile a large collection of code? First, it is not always clear what that code should do when the compiler is correct. Suppose you compile a very large software system, such as Android or Windows, and running it produces a crash. Was a compiler bug responsible? Or was this just a bug in Android? Second, even if a compiler is capable of correctly compiling Android, this doesn’t necessarily mean it will compile the next collection of code correctly. We’d like to do a better job testing compilers.

One approach is to use *random compiler testing*. The idea is simple: Generate a program randomly (i.e., using randomly generated numbers to determine what sorts of expressions and statements it contains), then compile it with a compiler and see what it does. To know whether it’s doing the right thing, compile the same program with another compiler and then see if it behaves the same way as when compiled with the first. If so, that’s good; generate a new program and try again. If not, then one of the compilers must be wrong, and we can investigate the reasons why. This technique has been used to test compilers since at least the 1960s, and it is — for reasons we do not completely understand — often extremely effective in finding bugs. For example, a recent research effort by Dr. Zhendong Su’s group at ETH Zurich used random testing to find more than 1,600 compiler bugs over a period of a few years.

The goal of our research project was to explore two new ideas in random compiler testing, to see if they are effective. Both ideas are implemented in a tool that we created called YARPGen, and both ideas are pretty simple! We’ll describe them, but first let’s take a look at the big picture.

YARPGen merely generates random programs. To make it usable in practice, we wrote a program that runs YARPGen, runs several compilers on programs that it generates, monitors the compilers for crashes, and several other tasks. The big picture looks like this:

An important point is that we look for two very different kinds of compiler bug. First, those that cause the compiler to crash or otherwise fail to produce an executable program. Second, those where the compiler produces an executable, but that executable is somehow defective: it performs the wrong computation. In either case, our framework takes a program that triggers a compiler bug and runs C-Reduce on it. C-Reduce is an external program that takes a large C or C++ program that triggers a compiler bug and turns it into a small program that still triggers a compiler bug. Thus, our framework can produce output like this without any human intervention:

Our first idea is motivated by the observation that, perhaps paradoxically, it can be the case that random programs, even if they are all different from each other, can end up being similar, in an important sense. How? Consider a random image generator that we’re going to use to test a new cat detection algorithm. We generate random images by assigning a random color to each pixel in the image. Unfortunately, almost every such image looks like a blotchy grey field. It might take many years (or many millenia) of testing before we randomly generate a collection of pixels that is sufficiently cat-like to trigger our algorithm. This kind of testing is probably just a waste of time. Our images lack any structure that is coordinated across multiple pixels.

YARPGen does not generate programs as sequences of random bytes: this would work every bit as poorly as the image example we just discussed. Rather, it knows how to generate code that is syntactically and semantically valid C and C++. However, even after doing this, our random programs would lack some kinds of high-level structure that we believed was needed to trigger compiler bugs. We designed a mechanism called *generation policies* to help add this kind of structure.

Each generation policy applies to some part of the program being generated, and has the effect of making it more likely, or less likely, to generate certain constructs. For example, one generation policy is designed to stress-test a compiler optimization called “common subexpression elimination”, which looks for repeated occurrences of the same computation in the code being compiled, and computes it only once. So this generation policy maintains a buffer of previously generated subexpressions, and injects these into subsequent expressions.

We have several generation policies for dealing with numeric constants. Why? Consider this compiler optimization, which results in the generated code being slightly smaller:

**(x + -1) – y → ~y + x**

This transformation, and many similar ones, requires a specific value, -1, for it to work. If we generate numeric constants uniformly across the range of -2^31..2^31-1 or -2^63..2^63-1, then -1 will almost never appear, and we will fail to find defects in the implementations of these optimizations. Thus, we have a policy that generates numeric constants that are commonly scanned for by optimizations, such as 0, 1, -1, the largest signed integer value, etc. Other optimizations don’t require a specific constant, but rather require multiple constant values that are related to each other. For example, if the constant c1 is the bitwise complement of constant c2, then:

**(( x | y) & c1 ) | (y & c2)**

can be rewritten as:

**x & c1**

To ensure that this kind of optimization has the opportunity to happen, YARPGen ensures that some constant values that it generates are closely related to previous constants, such as being their complement, their predecessor or successor, etc.

Application code is often idiomatic. For example, cryptographic code tends to have densely-packed bitwise operations. Compilers are equipped to optimize such code, but we can only expose bugs in these optimizations with reasonably high probability if we generate similarly idiomatic code. YARPGen, therefore, has a notion of “code contexts” that preferentially generate subsets of the language’s operators:

- additive: addition, subtraction, negation
- bitwise: binary negation, and, or, xor
- logical: and, or, not
- multiplicative: multiplication, division
- bitwise-shift: binary negation, and, or, xor, right/left shifts
- additive-multiplicative: addition, subtraction, negation, multiplication, division

A fragment of code that was generated in bitwise-shift context might look like this:

**a = ~b & ((c | d) ^ (e << 4));**

Code contexts are applied to random sub-regions of the generated program and can overlap each other.

A final kind of generation policy that YARPGen supports is to randomly set certain parameters that apply to an entire generated program. So, for example, one program generated by YARPGen might have 80% of its variables have the type “char,” and the next one might have only 3% of these. There are a total of 23 different parameters whose values are randomly chosen at the start of program generation.

To test whether generation policies are beneficial, we conducted a study where we attempted to rediscover some known compiler bugs using both YARPGen and also a modified YARPGen that does not use generation policies at all. We found that generation policies are highly effective in helping discover compiler bugs that are triggered with low probability. The generation policies that we built into YARPGen were developed by hand, based on our observations about program characteristics. Automatically devising generation policies, perhaps using a machine-learning-based approach to detect interesting characteristics of real C and C++ programs, would be an interesting future project.

When a Java program attempts to access an array element that is out of bounds, the program error is caught and an exception is thrown. C and C++ take a different approach to such errors: they completely ignore them. If your C++ program attempts to access an array outside its bounds, your program might immediately crash, it might keep executing in a corrupted state, or it might continue to execute as if nothing bad had happened. These languages contain hundreds of “undefined behaviors,” which is just a fancy way of saying “program errors that probably cause something very bad to happen, but we’re not saying what it is — so good luck with that!” A partial list of undefined behaviors in C can be found in Annex J of the C99 specification.

For people testing compilers, the problem with undefined behaviors is that different compilers, or even different optimization levels of the same compiler, take programs that execute undefined behaviors and make them have different results. These different results are not because of compiler bugs, but rather because of program bugs! Here is a simple example where the GNU C compiler turns an undefined program (which violates a somewhat obscure rule about when it is legal to modify a variable) into an executable printing “3” and the Intel C compiler turns the same program into an executable printing “4.” Neither compiler is wrong, and if we try to convince the developers of either compiler that this discrepancy is evidence of a bug, they will get annoyed with us for wasting their time.

To avoid this problem, YARPGen creates programs without undefined behaviors. It does this by embedding an interpreter for C and C++ inside the generator, which looks for undefined behaviors in the randomly generated code. When undefined behavior is found, it rewrites the code in such a way that the problem is avoided. For example, the subexpression on the left, which is undefined because it contains a signed integer overflow (see Section 6.5, paragraph 5 of the C99 standard), can be rewritten as the subexpression on the right:

This process is repeated in a bottom-up fashion on expression trees, and across multiple expressions in a program, giving the final invariant that the program is free of all undefined behaviors:

These rewrite rules simplify generator structure because they are applied locally and guaranteed to succeed. YARPGen’s strategy for avoiding undefined behaviors was largely motivated by previous random program generators, such as Csmith, which inserted runtime checks into generated code to avoid undefined behavior. Our belief is that those runtime checks sometimes make it harder to detect compiler bugs, so we wanted to completely avoid using them.

Over a two year period we were able to find and report more than 220 bugs in GCC, LLVM, and the Intel® C++ Compiler. The majority of bugs that we reported have been fixed by compiler developers. Most of the bugs were in the compilers’ optimizers. Broken down by compiler, the results are:

About 10% of the LLVM bugs that we found were independently rediscovered and reported by application developers.

YARPGen is effective in finding bugs in C and C++ compilers that have been missed by other testing efforts, and it is open source software. We’re currently working on a new version of YARPGen that focuses on finding bugs in loop optimizations.

**Bio:** Vsevolod Livinskii is a PhD student in computer science at the University of Utah, Dmitry Babokin is a Staff Software Engineer at Intel, and John Regehr is a professor of computer science at the University of Utah.

**Disclaimer:** *These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.*

Have you ever had a mentor who helped you get where you are? I’ve had many, and I can’t imagine where I’d be without them. But I also know that it takes some combination of resources and luck to find good mentors right now, and I don’t think it ought to. So for the past six months, I’ve been working to create a new mentoring program — now an official SIGPLAN committee. This post describes the program and the journey that led me to create it.

The mentors in my life didn’t always call themselves mentors, but they made a huge difference for me nonetheless. Here’s one example: After college, I worked as a software engineer at Amazon in Seattle for three years. I decided during that time to apply to graduate school. The only problem was that I had absolutely no idea how to even do that.

So I emailed my undergraduate compilers professor, Jeff Foster. And he told me to crash PLDI 2013. Weird suggestion, I figured, since probably someone would yell at me for crashing whatever this PLDI thing was. But I trusted Jeff, so I crashed PLDI 2013. And there, he introduced me to Dan Grossman, who is now my PhD advisor. Jeff also helped me through the graduate school application process.

I’m grateful for Jeff. But I also know that not everyone has a Jeff. There are software engineers with no way into the programming languages community at all. There are undergraduates at institutions with no programming languages faculty. And even for me, there were times in graduate school when I really, desperately needed another Jeff in another context — an external, long-term mentor with a particular perspective on a barrier I was facing — and I didn’t have one.

So when I had a chance to design a mentorship program at ICFP this past August, I thought hard about what was missing. And after more emails and spreadsheets than I would have ever liked to see in my life, I piloted an international long-term mentorship program with 173 mentees and 95 mentors from over 30 countries. With the help of many wonderful people, this program has since grown into the SIGPLAN Long-Term Mentoring Committee, which we call SIGPLAN-M.

SIGPLAN-M matches mentors with mentees from different institutions for mentoring relationships that last at least a year. As of today, there are about 250 mentees and 150 mentors participating. Any aspiring or current programming languages researcher — student, postdoc, faculty, or industrial researcher — can sign up as a mentor, mentee, or both. I do both, and it is very rewarding. You can sign up on the website, too! And you can follow us on Twitter for the latest news, like our plans for POPL 2021.

Of course, none of this would be possible without wonderful mentors. And serving as a mentor can help not just the mentee, but also the mentor.

So I want to talk about one of those mentors. Her name is Jubi Taneja.

Jubi Taneja is a PhD candidate at the University of Utah. She will graduate and go on to work full time with the Machine Learning Compiler group at Microsoft Research starting in the summer of 2021. Her research broadly focuses on compiler optimizations, correctness, and static analysis, with the goal of helping compiler developers use formal methods. She received the Best Paper Award and Best Student Presentation Award at CGO 2020. Before Utah, she was an undergraduate research fellow at IIT Bombay, and she earned her B.E. with a Gold Medal from Punjabi University.

Jubi mentors five mentees through SIGPLAN-M. But she tells me she had not really considered serving as a short-term mentor at previous SIGPLAN events. All of the mentors at PLDI, for example, seemed to be more senior than she was. “At ICFP, the whole equation changed,” she says. “I [found out] that PhD students could also become mentors.”

Jubi deliberately creates a nonjudgmental environment for her mentees. She likes to ask her mentees open-ended questions — her favorite is “are you happy?” — and let them guide the discussion from then on. This helps her mentees open up, which in turn helps her form authentic connections with her mentees. “If you want to make a connection,” Jubi says, “ask them if they’re happy.”

Jubi had previously mentored high school and college students in India, and she had found great joy in that. But she was worried that she would not be able to connect with mentees from outside of India. And her five mentees through SIGPLAN-M spanned three countries — none of them India. But through her interactions with her mentees, Jubi has come to feel that, in some sense, “all of us are the same.” She finds it reassuring that everyone everywhere has problems — similar problems, even. Across culture barriers, across borders, across gender lines. And that has brought her joy, even in spite of her initial concerns.

“Being a mentor just made me relieved about all of the stresses that I have,” she says. By listening to what other people are going through, she says, “it made me feel that I can relate to them. That I have been through this.” Sometimes, she can help by explaining how she overcame the barriers she faced. And sometimes, she is still facing those same barriers — but she feels relieved to know she is not facing them alone.

Jubi notes that during her PhD, she felt a lot of pressure to “keep on a happy face” all the time, and was afraid of opening up. But with the perspective she has now, she would have spoken about her experiences more openly. “If I had known this earlier, probably my individual PhD journey would have been much much better.”

“My mentees sometimes ask me my personal stories as well, and I love to share everything open-heartedly with them, because 10 years ago and during my PhD, I needed to hear the real stories,” Jubi says. “The moment someone relates to you, to your story, they start believing in you and in themselves.”

Jubi credits her mentees for giving her a new perspective on what mentoring can be. Before serving as a mentor, she used to view mentorship more narrowly, focused on standard career questions. But “being a mentor made me understand that there are so many other things that people are looking for.” Like juggling personal relationships and graduate school, or deciding whether it is the right time to pursue a PhD — and help processing the feelings around those things, like guilt for choosing one path over the other.

From that, Jubi says, she has learned so much about what kinds of questions she can ask her *own* mentors. “So I would give that credit to the mentees who asked those questions so willingly.”

Jubi’s favorite thing about mentoring is helping students — something she has always been passionate about. She had feared that she would miss out on this if she ended up in industry rather than academia.

But as a mentor, Jubi has realized that regardless of her career path, her ability to help students like this “is one thing that cannot be taken away from me. It gives me that same joy,” she says. “It gives me the happiness of two different worlds now.”

And Jubi really has helped her mentees, and during an extremely difficult year at that. “Having someone to talk to about my future has been one of the few good highlights of 2020,” says one of her mentees. “It has made me feel like there might be good things in the future, not just bad ones.”

“The most important lesson that I’ve learned is that people need that human touch,” Jubi says. Especially this year. “People need mentors who can encourage them. People need mentors who can understand them. Who can talk to them so that they don’t feel isolated in their whole journey.”

Jubi plans to be there for her mentees whether or not they choose to stay in programming languages research. To her, mentoring is not just about helping them technically or with their careers — it is about building and maintaining a relationship.

Some of this falls on the mentees. To mentees, Jubi says, “trust your mentor, and tell them your real problems.” And don’t feel like you must only ask technical questions.

But Jubi says that much of that responsibility should fall on the mentor. “You need to put in that extra effort and empathize with [your mentees] more. Because they really need you, you know? That’s why they’ve entered into this mentorship program.”

That’s a lot of work. Is it worth it? For Jubi, the answer is an unequivocal yes: “[SIGPLAN-M is] not only helping mentees, [but] also helping mentors become better human beings.”

*A personal note: I would like to thank the countless people who have given me the opportunity to make SIGPLAN-M a reality, including the committee, all of the mentors and mentees, SIGPLAN, Alexandra Silva, Roopsha Samanta, Nate Foster, Lindsey Kuper, Stephanie Weirich, and everyone who attended ShutdownPL. I would also like to thank the **UW PLSE** group, whose ideas helped shape the program. And finally, I would like to thank John Wickerson for the awesome logo.
*

**Bio:*** Talia Ringer is in the final year of earning her PhD from the Paul G. Allen School of Computer Science & Engineering at the University of Washington. She is a member of the PLSE lab, and her research specializes in proof engineering; she is a contributor to the Coq interactive theorem prover. She is passionate about and active in advising, mentorship, service, and outreach.*

**Disclaimer:** *These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.*

Probabilistic modeling is a cornerstone of modern machine learning and artificial intelligence. Currently probabilistic models are typically maintained by a small number of domain experts with deep knowledge in how the model works. As probabilistic modeling proliferates in applications, a natural question arises: *how can we better integrate probabilistic models into the modern software-engineering stack*?

Probabilistic programming languages (PPLs) give an answer to this question: they turn a programming language *into* a probabilistic modeling language. This lets programmers use their well-honed programming skills and intuitions to develop and maintain probabilistic models, expanding the domain of model builders and maintainers.

The central challenge for any probabilistic programming system is scaling up *probabilistic inference*: computing the probability distribution of the value returned by the program. Probabilistic inference is a fantastically powerful general-purpose reasoning tool with countless applications in machine learning and probabilistic verification, including verification of probabilistic networks, natural language processing, and probabilistic graphical models. Dice is our approach to scaling (exact) probabilistic inference. In this blog post, we describe how it works; it is a summary of our SPLASH/OOPSLA 2020 Distinguished Paper, Scaling Exact Inference for Discrete Probabilistic Programs.

We approach the problem of scaling probabilistic inference in part as one of *language design*: dice focuses on a particular class of probabilistic programs, programs that are *discrete* and *finite*. Why this focus? There are two reasons.

First, in practice, there are many interesting finite and discrete probabilistic programs: examples that we discuss in the main paper include text analysis such as automatic frequency analysis of simple ciphers, verification of probabilistic networks, and examples from discrete graphical models. The second reason is that many existing and widely used PPLs struggle with discreteness: discreteness breaks many of the assumptions that are required by powerful approximate inference algorithms like Hamiltonian Monte-Carlo and variational inference.

What does a dice program look like? Let’s work through a simple example. The dice language has a few different sources of probabilistic uncertainty: for instance, the syntax “flip 0.4” represents a coin flip that is true with probability 0.4 and false with probability 0.6. We can use this “flip” syntax naturally in the following dice program:

let x = flip 0.1 in let y = if x then flip 0.2 else flip 0.3 in let z = if y then flip 0.4 else flip 0.5 in z

For the above program, we can compute the probability that the value “true” is returned as the following sum of products:

The above sum simply enumerates all possible assignments to the random variables in the program: for instance, represents the probability that x is true, y is true, and z is true. This exhaustive enumeration approach to inference is simple, but its fatal flaw is that it cannot scale to large programs: the number of terms in the sum is exponential in the number of flips. In general, scaling inference is the gatekeeper for the broad application of PPLs: finding new and faster approaches to probabilistic inference is a grand challenge for the field.

In order to scale inference to large and interesting programs, we will need to do better than exhaustive enumeration. Dice takes a new approach to exact inference in discrete probabilistic programs based on *factorization*.

The key idea behind dice inference begins with an observation about the sum-of-all-paths expression in the first section: *it has lots of redundant computation*. Concretely, we can refactorize this sum into a different equivalent form:

For this small example, the savings may seem modest. But as the program grows in size, these savings add up to *orders-of-magnitude improvements*.

In the main paper we show that exploiting factorization gives asymptotic improvements on examples from text analysis and network verification, from exponential to linear time. This factorization occurs because the program has a particular data-flow property: the variable z does not depend on the variable x if one knows y’s value. This is known as *conditional independence* and is one form of structure that dice will exploit in order to factorize the inference computation. The question here is: how does dice find and exploit such factorizations?

At a high level, dice compiles probabilistic programs to binary decision diagram (BDD) in order to perform inference by *weighted model counting*. This compilation is performed modularly — each function is compiled to a BDD separately, and subsequently reused at call-site — and exploits the compositional construction of BDDs to remain compact and efficiently manipulable. Once this compilation is complete, inference is linear time in the size of the BDD. Ultimately, dice strikes a unique balance between a rich variety of program features — including functions, integer operations, control-flow, and bounded iteration, and Bayesian conditioning — and scalable inference. We show that dice can scale to extremely large probabilistic programs with hundreds of thousands of random variables, and can even compete with specialized Bayesian network inference algorithms on their own benchmarks. In addition, we prove this inference algorithm correct with respect to a denotational semantics.

Now we will expand on these high-level ideas. The main idea is to represent the state space of the program as a BDD, which makes subsequent inference computations fast. The figure shows the compiled BDD for the running example program. BDDs are read top-down. Each logical variable in the BDD corresponds with an assignment to a flip in the original program: is true if and only if “flip 0.1” is true, and so on. A solid edge denotes the case where the parent variable is true and a dotted edge denotes the case where the parent variable is false. Then, a *model* of this BDD — a set of assignments that cause it to evaluate to true — directly corresponds with a set of assignments to flips that cause the program to return true.

This BDD exploits the program’s conditional independence to efficiently produce a compact canonical representation of the state space. Specifically, there is a single subtree for , which is shared by both the path coming from and the path coming from , and similarly for . These shared sub-trees are induced by conditional independence: fixing y to the value true — and hence guaranteeing that a path to is taken in the BDD — screens off the effect of x on z, and hence reduces both the size of the final BDD and the cost of constructing it. The BDD construction process automatically finds and exploits such factorization opportunities by caching and reusing repetitious logical sub-functions.

Once the BDD is constructed we can perform inference in *linear time* in its size. First, we associate a *weight* to each assignment to each logical variable in the BDD: for instance, , and . The *weight of a model* is the product of the weights of the variables that constitute it: for instance, , which is the first term in the unfactorized sum above. Then, the probability that the program returns true is equal to the *weighted model count* (WMC): the sum total of the weights of all models.

Now, inference has been reduced to WMC. By virtue of the shared sub-functions, the BDD in Figure 1 directly describes how to compute the WMC in the factorized manner. Observe that each node is annotated with the weighted model count, which is computed in linear time in a single bottom-up pass of the BDD. For instance, the WMC at node is given by taking the weighted sum of the WMC of its children, $0.2 \times 0.4 + 0.8 \times 0.5$. Finally, the sum taken at the root of the BDD (the node ) is exactly the factorized sum.

Probabilistic inference, at its core, is a challenging and increasingly important program analysis task. We want PPLs to be expressive, but we also want inference to be fast.

Dice’s design struck a specific tradeoff, using *finite discreteness* to enable a fast inference algorithm despite it limiting expressiveness. Long term, making progress on automated probabilistic program inference will involve a delicate interplay between the modeling language, the inference algorithm, and the programmer. Our job as probabilistic programming language designers is to make this interplay possible. Doing so involves answering some big questions:

- How can we design languages that adequately trade off
*expressiveness*and*tractability for inference*? - What are
*new inference*algorithms that broaden the landscape of languages that afford tractable inference? - What are
*new and interesting queries*that we can ask of probabilistic programs, beyond probabilistic inference?

Ultimately answers to these questions will push PPLs towards a future where they are more automated and more user-friendly, enhancing the ability of society at large to make rational scientific decisions.

**Bio****:** *Steven Holtzen is a PhD. candidate at the University of California, Los Angeles (UCLA). Guy Van den Broeck and Todd Millstein are professors at UCLA. *

**Acknowledgments: ***The authors would like to thank Mike Hicks for helpful comments on this post. *

**Disclaimer:** *These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.*

Using programming frameworks, such as IBM’s Qiskit and Microsoft’s Q#, people can write quantum programs and execute them on classical simulators or real quantum computers. However, program testing and debugging, which have been studied for a long time in classical computing, are still at a very early stage for quantum computing. The reason is that these activities turn out to be surprisingly challenging on quantum computers. This blog post presents our initial effort to tackle these challenges. It is a summary of our paper, Projection-based Runtime Assertions for Testing and Debugging Quantum Programs, named a Distinguished Paper at SPLASH/OOPSLA 2020.

Our work revisits *assertions*, one of the basic program testing and debugging approaches, and applies it to quantum programs. Designing quantum program assertions is challenging for two reasons.

**Very large, complex quantum state.**The state of a quantum system consisting of*n*qubits is represented by a state vector of size , or a density matrix of size , because a quantum state can be in a superposition of at most classical states for*n*qubits. How to express the predicate of a quantum state is a question. The classical language employed by previous quantum program assertions (Statistical Assertion, Dynamic Assertion) can only express very few types of quantum states and a large number of complex intermediate states cannot be asserted.**Destructive measurement.**You need to measure the quantum system to extract information about the tested state, but the measurement operation affects the state. Measuring a quantum state which is a superposition of many classical states returns just one of these states, and collapses the superposition to that state. As such, single measurements do not characterize the full superposition, and destroy it in the process.

Considering the two obstacles above, we set two objectives for our assertion design.

**Strong expressive power.**The assertion language should have a strong expressive power and can express a wide range of predicates.**High checking efficiency.**The satisfaction of the assertion predicates can be checked upon a small number of measurements/executions.

Traditionally, physicists employ a protocol called quantum state tomography to test an unknown quantum state. This protocol can theoretically check any state but its running time increases exponentially in the number of qubits.

(Note: For readers who would like to learn more about quantum computing, to expand on and solidify the concepts just mentioned, we recommend the interactive essays at quantum.country.)

In our OOPSLA’20 paper, we proposed a new kind of assertion, in which predicates are expressed using *projections*.

Projections are special linear operators in a Hilbert space — the “home” of quantum states. A projection operator will map a state into a linear subspace. Each projection operator corresponds to a unique subspace, which acts as a predicate: When a state is in the subspace of the projection, we can say that the state satisfies the projection.

**Expressive. **A projection-based assertion predicate language is highly expressive compared with previous classical languages. All projections formulate an orthomodular lattice. The assertion languages in previous works (Statistical Assertion, Dynamic Assertion) can only express a few elements in the lattice while projections can express *all* of its elements.

For example, suppose we are simulating the state of two electrons on three orbitals. When the number of electrons is conserved, we know that a valid state must be in the subspace spanned by . Here, each qubit represents one orbital and the values x=1 and x=0 represent if the orbital is occupied by an electron or not, respectively. Therefore a valid state must have two 1’s in the 3-bit string, which can be expressed by the assertion

.

**Efficient and Non-destructive. **Checking a projection-based predicate can be very efficient. For the projection in an assertion predicate, we can construct a two-output projective measurement. In this constructed measurement, if the tested state satisfies the projection (in the corresponding subspace), the measurement outcome is always true and the post-measurement state is the original state (before the measurement) with probability 1 because a projection operator has no effect on a state that is already in its subspace. If not, we will observe a false outcome and know that the tested state is not what we expect. A possible case remaining is that the tested state does not satisfy the projection but it has some components in the subspace of the predicate. In this case we can observe both true and false outcomes at different testing executions. Different from a direct -output measurement which requires many iterations to fully reconstruct the tested state, we can be highly confident that the assertion predicate is satisfied in a two-output measurement using just a few measurements.

To execute the projection-based assertions on a quantum computer, there are still two practical implementation issues that must be addressed. These two issues come from the constraints of physical quantum computers.

**Limited measurement basis.**On a physical quantum computer, only measurement along a specific basis called the computational basis is directly supported. But the projection in an assertion may result in measurement not along the computational basis.

**Dimension mismatch**. The subspaces of the projections can have different dimensions but only those with the dimension of 2 to the power of an integer can be directly implemented by measuring an integer number of qubits (and we cannot measure a fraction number of qubits.)

For the first issue, the compiler can rotate the basis through an additional operation. Then after the measurement it can rotate things back to recover the correct program state. For the second issue, the compiler can construct the target projection by two larger projections without the dimension mismatch or augment the original Hilbert space by introducing auxiliary qubit. Then checking one assertion is transformed to checking an array of derived assertions without changing the semantics.

After these transformations, all projection-based assertions will become executable on physical quantum computers.

To facilitate practical quantum program runtime testing and debugging, we studied how to design and implement effective and efficient quantum program assertions. We selected *projections* as predicates because of their logical expressive power and the ability to check them efficiently at runtime.

Currently the proposed transformations can generate multiple, functionally equivalent assertion implementations with different overhead. This transformation procedure (compilation) can be further optimized to find low-overhead implementations for several possible objectives like gate count and circuit depth.

It is also possible to relax the assertion predicates to find their sound yet easy-to-implement approximations. The degree of predicate relaxation and its effect on the robustness of the assertions in realistic erroneous program debugging need to be studied.

Generally speaking, there is substantial opportunity to explore new language designs and compilation strategies for quantum programs. It’s a rich, exciting area!

**Bio**: Gushu Li is a fourth-year Ph.D. student at University of California, Santa Barbara. Li Zhou is a Postdoc Scholar at Max Planck Institute for Software Systems. Nengkun Yu is a Faculty member and Mingsheng Ying is a Distinguished Professor and Research Director of the Center for Quantum Software and Information at the University of Technology Sydney. Yufei Ding is an Assistant Professor at Computer Science Department and Yuan Xie is a Professor at Electrical and Computer Engineering Department at University of California, Santa Barbara.

**Disclaimer:**

PL Perspectives was launched just about 18 months ago, on Jun 22, 2019, as the blog of the Special Interest Group on Programming Languages (SIGPLAN).

In the time since, we have published 66 posts authored by dozens of PL enthusiasts. These authors include industrial researchers, engineers, students, professors, post-docs, lecturers, and startup company CEOs and CTOs. The topics have been equally varied, covering research topics such as formal verification, gradual typing, program synthesis, static analysis, quantum computing, and AI/ML; non-technical topics such as climate change, the publication process (including open access), and the job search; and interviews with scientists and retrospectives on papers and conferences.

**What will the next 18 months look like?**

We will surely continue to solicit articles from talented and diverse writers who make up the PL community, and adjacent communities besides.

But we also hope to **receive more offered contributions from you**. Several posts we published were offered by readers, and they are among the most popular on the blog. Maybe the next could be yours!

If you have an idea for a post, or series of posts, please share the idea via the *request for contributions box*. Include in your request:

- Your idea (in a sentence or three)
- What material you will draw on to write it (e.g., a class you teach, a paper you wrote, work you’ve done)
- Links to writing samples, if at all possible, to demonstrate past experience.

We are happy to re-publish prior work (with attribution to the original), likely with adjustment to the PL Perspectives audience. We have done this on several occasions (e.g., James Bornholt’s Building your first program synthesizer post and Lindsey Kuper’s My First Fifteen Compilers post).

Feel free to also **nominate topics that you’d like to read more about**, and ideally suggest potential writers for those posts. Do this via the comments below, via the suggestion box, or via direct message.

Thanks for your contributions!

**Disclaimer:**

This is the third part of a three-part article describing the approach that Richard Bird and I took to using functional programming for algorithm design in our recent book Algorithm Design with Haskell.

Part 0 of the article argued for using a functional language rather than imperative one, because it supports the calculation of algorithms from specifications using good old-fashioned equational reasoning. In particular, the *fusion law* is fundamental. We specify a problem as clearly as possible, in a modular fashion; for example, many optimization problems take the form

where the generation phase is some combinatorial enumeration, testing filters out some ineligible candidates, and aggregation selects the optimal remaining candidate. We then use fusion to promote aggregation and testing into the generator. The result is hopefully an efficient program; it may be more obscure than the very clear specification, but that’s fine, because it is demonstrably equivalent to it.

Part 1 used greedy algorithms as an illustration (the book also covers other algorithm design techniques, such as dynamic programming and exhaustive search). Just following one’s nose leads to a theorem indicating when an optimization problem as above has a greedy solution. But there is an obstacle: the premises of that theorem are too strong for most applications.

The essence of the obstacle is that the ordering we are optimizing (to find the cheapest, or smallest, or whatever) is often not linear, because it is not antisymmetric: there typically exist distinct solutions tied on cost. Because both specification and implementation are expressed as functions, both choose a specific cheapest solution—but maybe not the same solution. So the specification and implementation are different functions; and of course we then cannot prove that the implementation implements the specification.

The fix is to allow *nondeterministic functions* in a few, carefully controlled places. Specifically, when there is a tie, we consider *all* optimal partial candidates, not an arbitrary single one of them. Here is where we pick up the story.

Optimization problems are often underdetermined: there are typically *multiple distinct optimal solutions*. The specification of the problem should *admit any optimal solution*, without committing in advance to one; this allows maximal freedom in implementing the specification. But ultimately, the implementation (especially if it is to be a greedy algorithm) must *commit to one optimal solution*: it must be optimal, of course, but not all optimal solutions need be reachable. Therefore, the implementation no longer equals the specification, it *refines* it.

Taking this perspective seriously inherently leads to a calculus of *relations* rather than functions, with relational inclusion as refinement. This is the approach taken in the *Algebra of Programming* book by Richard Bird and Oege de Moor, now sadly out of print. I think this is morally the right approach to take; but it is rather complicated. Experience has shown that—with a few honorable exceptions—no-one can keep all the necessary rules in their head long enough to use them fluently.

Fortunately, we can get away without a wholesale switch from functions to relations; a much gentler approach suffices. All we need is to extend our language of algorithm development with *one* special nondeterministic function , denoted with an initial capital to indicate that it is not a true function, and with one piece of meta-notation . These satisfy the following property:

Read as “the collection of all minimal elements of finite list under cost function “, and as “ is a possible result of applying nondeterministic function to “.

It’s important to note that this extension is just for problem *specifications*; there is no change to the implementation language. The game is then to manipulate the nondeterministic specification to eliminate all the nondeterminism. Crucially, we can still use most of the familiar results; for example, the distributive law from Part 1 becomes

This means:

The most important result we need is a fusion theorem for with nondeterministic post-processing (in particular, with ):

With this theorem, we can make much better progress with greedy algorithms. Starting with the nondeterministic specification

we can calculate the same deterministic greedy algorithm

satisfying , where

That is, is *any* deterministic refinement of the nondeterministic greedy step. The calculation goes through under the weaker assumption of refinement rather than equality for the greedy condition:

For example, consider the problem of making change in UK currency, which has coins of denominations 1p, 2p, 5p, 10p, 20p, 50p, £1 (ie 100p), £2 (200p):

The smallest denomination is 1p, so any whole number sum can be achieved; the problem is to do so with the fewest coins. It is well known that the greedy algorithm works for making change in most currencies, including this one: from largest denomination to smallest in turn, use the most coins that you can. Let’s see if we can calculate that!

Define a *tuple* to be a list of coin counts, of the same length and in the same order as the denominations (smallest denomination first), with *count* the total number of coins in a tuple:

Then the change-making problem is specified by

where

In words, components are coin denominations, and candidate solutions are tuples. The initial candidate represents the empty tuple and remaining sum . Partial solutions are constructed from largest denomination to smallest; for each smaller denomination , the current candidate is extended by coins of denomination , for each possible value of from up to . Finally, we select only those candidates whose remainder is , project out the tuples and discard the remainders, and take the smallest (by count) tuple.

Then the big question is: does the functional calculation above give us the familiar greedy algorithm for making change? Not always. By happy accident, it does so for UK currency. But consider a currency with denominations 1p, 3p, 7p, for which the greedy algorithm does work, and a sum of 9p. The well-known algorithm (largest coins first, as many coins as possible) will use one 7p and two 1p coins—three coins in total. However, our function instead uses three 3p coins—also three coins, but a different solution. This is because of the particular order in which happens to generate results, and the particular tie-breaking rule that uses. The specification and the implementation are not equal; therefore the deterministic greedy condition cannot hold. However, the nondeterministic greedy condition *does* hold, and the familiar algorithm is indeed a refinement of the specification that admits *all* optimal tuples instead of prematurely committing to one.

Most optimization problems are like the coin-changing problem: the most natural specification admits multiple distinct optimal solutions. The specification should not prematurely rule out any optimal solution. But a given implementation strategy must eventually make a choice.

Therefore any calculus of program *equality* is too strict; program *refinement* is necessary. But one need not fully commit to a calculus of relations, and all the complexity that entails; it suffices to judiciously introduce just a couple of specification primitives into an otherwise plain calculus of functions.

The book obviously goes beyond greedy algorithms. In particular, even using the nondeterministic greedy condition, not all currencies accommodate a greedy algorithm. One that does not is the pre-decimal UK currency :

(ignoring fractional denominations). For example, using the greedy algorithm for 48d will yield one half-crown (30d), one shilling (12d), and one sixpence (6d), rather than two florins (24d each). For this instance we need an algorithm that is allowed to maintain multiple partial candidates rather than just one; the book discusses *thinning algorithms* of this kind, and the corresponding nondeterministic operation needed to specify them.

Functional programming is a convenient vehicle for presenting algorithm design techniques and algorithmic developments: *higher-order functions* encapsulate the important patterns of computation, and *good old-fashioned equational reasoning* allows the calculation of programs from specifications, directly in terms of program texts rather than in a separate formalism such as predicate calculus. But specifically for optimization problems, it is convenient to step slightly outside the boundaries of pure functional programming, to accommodate *nondeterministic functions* in a handful of places.

**Bio**: Jeremy Gibbons is Professor of Computing at the University of Oxford, where he leads the Algebra of Programming research group and is former Deputy Head of Department. He served as Vice Chair then Past Vice Chair of ACM SIGPLAN, with a particular focus on Open Access. He is also Editor-in-Chief of the Journal of Functional Programming, SC Chair of ICFP, on the Advisory Board of PACMPL, an editor of Compositionality, and former Chair of IFIP Working Group 2.1 on Algorithmic Languages and Calculi.

**Disclaimer:**

This is the middle part of a three-part article describing the approach that Richard Bird and I took to using functional programming for algorithm design in our recent book Algorithm Design with Haskell.

Part 0 contrasted algorithm development in a functional style with the corresponding activity using an imperative language. In this part, we see how the functional approach can be applied to the class of *greedy algorithms*: these can be used when the characteristics of a problem allow a globally optimal solution to be reached by making a series of locally optimal steps. Greedy algorithms are often very simple, but the proof that they do indeed yield a globally optimal solution can be surprisingly tricky. The final Part 2 presents the main novelty of the book: a step outside pure FP to admit careful and limited use of *nondeterministic functions*, which turn out to be necessary for the typical situation when the solution ordering being optimized admits ties.

Let’s look at greedy algorithms as a more interesting example. We’ll consider algorithms that construct a single optimal *candidate* out of a collection of *components*. Instances include Huffman coding (a shortest *encoding* constructed from given *symbols*), line breaking (a least wasteful *paragraph* constructed from given *words*), minimum-cost spanning tree again (a lightest *spanning tree* constructed from given *edges*), and making change (a smallest *handful of coins* constructed from given *denominations*).

The greedy approach assembles an optimal candidate step by step from the given components, maintaining just a *single partial candidate* throughout: “think globally, act locally”. This generally leads to a simple algorithm, but often one with a tricky proof of correctness.

The problem can be specified as computing a minimum-cost candidate from given components:

Here, selects from a non-empty list the (leftmost) minimal element according to a given cost function:

( is a standard Haskell function to aggregate a non-empty list). We will suppose that can be written as an instance of :

constructing a finite non-empty list of candidates, starting with a trivial initial candidate , and using a function that extends a single candidate by a single component in all possible ways. (The book also considers problems where uses a different recursion pattern.)

For example, consider sorting a string: the components are characters, and candidates are strings, *permutations* of those components. The trivial initial candidate is the empty string, and is extended by each component in turn in all possible ways:

So . The cost to be minimized is the *inversion count*, the number of out-of-order pairs in the permutation:

We can calculate a greedy algorithm for computing a minimum-cost candidate by fusing selection with generation , so that we maintain only a single candidate at each step. Since the generation process is an instance of , we use the fusion law:

Now is , is , is , is , and we have to come up with a function , the “greedy step”, to play the role of —that is, to satisfy

We calculate:

The first step is just to unfold the definition of the function . The second step is a “bookkeeping law”: to aggregate a collection of collections, either combine them into one big collection to aggregate, or aggregate each subcollection and aggregate the intermediate results. The third step fuses two consecutive maps into one. The fourth step simply names the subterm . The fifth step

is wishful thinking: that taking the best final candidate obtained by making a greedy step from each of a collection of initial candidates gives the same result as taking a greedy step from the single best initial candidate. Call this wishful thought the *greedy condition*.

The point is that the greedy condition alone suffices to satisfy the premises of the fusion law; every other step in the calculation is universally valid. If the greedy condition is satisfied, then we have the greedy algorithm

for finding the minimum-cost candidate. In words, from the empty candidate , make a single greedy step with each component in turn. A single greedy step consists of considering each possible way of extending the current candidate, and taking the (locally) best of them.

So, does the greedy condition hold for sorting? Sadly, it does not. Consider the two partial candidates “eabc” and “cbae” (in that order), each with inversion count 3, and suppose that the next component is ‘d’. Because these two candidates tie on inversion count, it is an arbitrary choice which is ‘minimal’; our happens to pick the leftmost minimal element, namely “eabc”. But the best extension of “eabc” with ‘d’ is “eabcd”, with inversion count 4, whereas the best extension of “cbae” with ‘d’ is “cbade”, with inversion count 3. The single best initial candidate “eabc” does not lead to the best final candidate.

The essence of the obstacle is that ordering by inversion count is not linear, because not antisymmetric: two different permutations can have the same inversion count. I can think of three possible fixes.

The first fix is that sometimes this kind of tie can’t happen in context. In particular, for sorting, there is always a unique minimal-cost permutation, namely the sorted one (at least, it is unique when the elements themselves are linearly ordered). Although “eabc” and “cbae” are tied, they and all other permutations lose to “abce”. One can refine the calculation of the greedy algorithm to take this context into account, which will overcome the obstacle for sorting. However, this fix does not work for all greedy algorithms.

The second fix is to change the cost function. In the case of sorting, we can switch from to —no-one said that the ‘cost’ had to be a number. This fixes the issue at a single stroke! The problem is then to find the lexically least permutation (namely, the sorted one), which will also be the one with minimal inversion count (namely, zero). In general, the ordering is refined to a linear order, often by adding dimensions to the cost function. However, I think this is really cheating: prejudicing the specification in light of the intended implementation.

The third fix, and the one taken forward in the book, is to allow *nondeterministic functions* in a few, carefully controlled places. Specifically, when there is a tie, we consider *all* optimal partial candidates, not an arbitrary single one of them. This third fix is explained in Part 2 of the article.

**Bio**: Jeremy Gibbons is Professor of Computing at the University of Oxford, where he leads the Algebra of Programming research group and is former Deputy Head of Department. He served as Vice Chair then Past Vice Chair of ACM SIGPLAN, with a particular focus on Open Access. He is also Editor-in-Chief of the Journal of Functional Programming, SC Chair of ICFP, on the Advisory Board of PACMPL, an editor of Compositionality, and former Chair of IFIP Working Group 2.1 on Algorithmic Languages and Calculi.

**Disclaimer:**

Richard Bird and I recently published our book Algorithm Design with Haskell with Cambridge University Press. This book can be seen as a sequel to Richard’s introductory textbook Thinking Functionally with Haskell: as well as being a textbook on algorithm design, it serves as a more advanced book on functional programming.

Our starting assumption was that algorithm design should be taught in a constructive way. Many textbooks take the form of a catalogue of famous algorithms (such as Kruskal’s and Prim’s algorithms for minimum-cost spanning trees), and some discussion of common techniques (such as greedy algorithms). They sometimes even give proofs of correctness of these algorithms. But mostly they do not address the design *process*: how do you get from a fresh problem statement to an algorithmic solution?

We took a constructive algorithmics approach, as espoused particularly by the mathematics of program construction community. Instead of thinking hard, inventing a program, then verifying that it implements its specification, one can *calculate* the program from its specification so that it is correct by construction. As well as increasing your confidence that the answer is correct, calculational techniques are a valuable foothold when you can’t immediately see what the answer should look like. After all, we don’t usually guess then verify the solution to a long division problem—we calculate the quotient and remainder from the dividend and divisor, following a repeatable process.

Our primary purpose in writing the book was to demonstrate that functional languages are a powerful vehicle for presenting both algorithm design techniques in general, and specific algorithmic developments in particular. This is in contrast to traditional textbooks on algorithm design, which are almost universally presented in an imperative style. It is sad that, even among undergraduate CS curricula that do include FP in the first place, most pay no attention to FP after the FP course is over—we are implicitly telling students that FP is only useful for FP courses, and not for anything else.

One advantage of a functional approach is that it is at a higher level: you deal *holistically* with a whole data structure rather than separately with each of its elements, using algebraic datatypes and higher-order operators. Another advantage is that you can conduct equational reasoning directly *with programs*, rather than having to sidestep into another formalism such as predicate calculus or Hoare logic. Programs are formal entities in their own right: why reason *about* them when you can reason *with* them?

The article is in three parts. In this part, Part 0, I’ll set the scene by exploring the difference between presenting developments in a functional style from doing so in an imperative style, illustrating with very simple examples. Part 1 presents some more interesting problems. But there is a catch, which we sort out in Part 2.

Consider a very simple algorithm, namely summing the elements of an array. This seems very trivial, but please bear with me: I’ll go into detail in order to make comparisons later.

Given an array for , where does the following obvious looping program come from?

The canonical answer involves Hoare Logic, loop invariants, and weakest preconditions, beautifully laid out in the short book Programming: The Derivation of Algorithms by Anne Kaldewaij.

You could surely just have written the summing program from scratch, then subsequently maybe verified its correctness using loop invariants and so on. But being able to leap to the answer for simple problems is no help when it comes to more complicated problems: you need to understand the process even for simple cases, so that you can apply the same steps in more complicated cases.

For summing an array, one initial insight is needed: that the program requires a loop. Everything else can be deduced from there. Having decided that a loop is required, one first needs to come up with a loop invariant. A standard technique for doing so is to generalize one of the constants in the postcondition to a variable; here, we generalize constant to variable , and stipulate that varies from to :

The invariant is easily established by setting and , because then the range of the summation is empty. Moreover, the invariant together with the termination condition implies the postcondition; so for partial correctness it suffices to use as the loop guard—if this loop terminates, maintaining the invariant, then clearly it does so satisfying the postcondition. This leads to:

(Demonstrating that the loop does indeed terminate entails choosing also an integer-valued *variant function*, bounded below and decreasing on every iteration; for this program, we could use , which is bounded below by . In the interests of brevity, we skip this aspect.)

All that remains now is to complete the loop body, which starts with the invariant and loop guard both true, and must re-establish the invariant:

Clearly we can make progress by incrementing . The weakest precondition approach involves working backwards through a program, so let’s make the last statement in the loop body; what must be true before that, in order that the invariant holds afterwards? The weakest precondition rule for assignment says that we simply have to substitute for in the postcondition of the assignment (namely the loop invariant):

(The reason for working backwards is precisely the simplicity of this rule for assignment; counterintuitively, it’s not nearly so easy working forwards.) We have to achieve this state starting with the invariant and the guard, so it is helpful to massage this condition to make the second part look more like the invariant:

Since the invariant implies that , we are led to the assignment —because then the substitution rule for assignment (that is, substituting for in the above condition) tells us that we have only to establish

which follows already from the invariant.

Putting these pieces together gives the following justification for the loop:

Seen as a tableau like this, it is straightforward to verify Hoare’s conditions: each assignment establishes its postcondition, given its precondition, and so on. But it is not so straightforward to see how the tableau was developed. It’s like looking at a building, once the scaffolding used for construction has been taken down: it isn’t obvious how the builders got the roof on.

This simple example illustrates rigorously justified imperative program development. But it is quite painful to conduct, for at least two reasons. For one thing, it is very low level: we would rather deal with whole data structures such as the array than with individual scalar elements. And for a second thing, none of the work happens in the programming language—it all happens instead in the predicate calculus. The translation back and forth is unfortunate extra effort, and it would be much more convenient to be able to reason directly in terms of program texts. Fortunately, both pain points can be addressed by switching from an imperative to a functional style, as we shall now see.

A functional approach deals with the first difficulty using algebraic or abstract datatypes and higher-order functions. In Haskell, for example, there is a datatype of lists, as values in their own right. The function that sums a list of integers can be written using explicit recursion:

That’s already a bit more concise than the imperative loop, because there is no need for the loop counter. But this pattern of recursion over lists is a common one: a constant () for the empty list; and for a non-empty list, a binary operator () to combine the head with the result of a recursive call on the tail. So it is helpful to abstract the pattern as a higher-order function:

and instantiate it for specific instances:

which is more concise still.

The standard function happens to be provided in the standard library, but there is nothing special about it—we could have defined it ourselves if it weren’t already there. Indeed, in our book, we defined various similar higher-order functions of our own. One is to aggregate the elements of a non-empty list. There is already another standard library function

(note that there is no clause for the empty list, so this function is defined only for non-empty lists). It uses a homogeneous (typically associative) binary operator , and therefore aggregates a list of values of one type to a result of the same type . But this was too specific for our application. So we simply defined another higher-order function, (for “ for non-empty lists”):

This can aggregate a list of values of one type to a sum of a different type.

Higher-order functions are extremely powerful for this reason: they effectively give you the power to “roll your own control structures”, if the standard ones don’t suit your purposes.

In an OO approach, you would use a library of collection classes and corresponding iterators and visitors instead of algebraic datatypes and higher-order functions. Algebraic datatypes such as lists are *open abstractions*, which makes it easy to add new operations over them. In contrast, OO classes are typically *closed abstractions*: clients can’t add new operations, and must interact through the fixed public interface—but conversely, providers can more easily add new representations. It’s a deliberate trade-off, with each approach having advantages over the other. But for a book on algorithm design and not about data structures, we would rather facilitate adding new operations than new representations.

The second difficulty with the imperative style is that reasoning about programs does not take place in the programming language, but in the predicate calculus instead. Recall that the first step in developing the loop for summing an array was to replace the constant by a variable —but *in the postcondition*, not in any program. Similarly, the crucial step in developing the loop body is the transformation

—again, reasoning that happens in propositions about the program, not in the program text. In contrast, the defining characteristic of pure FP is *referential transparency*. This enables program manipulation by *good old-fashioned equational reasoning*, directly with program texts, without needing to step out into a distinct language such as predicate calculus.

In particular, the fundamental rule for reasoning about is the *fusion law*, which states

This combines a post-processing phase with an aggregation (using operators and ) into a single aggregation (using different operators and ). This can be done when the premises on the right-hand side can be satisfied—specifically, given and , when we can find an such that

for all . (The second conjunct of the right-hand side is easy to satisfy, just by letting be .)

For example, let

and consider

which sums the elements of a list and then doubles the result. The post-processing can be combined with the aggregation into a single pass:

This follows from the fusion law, with , , , , and . Clearly

and crucially, for arbitrary ,

No sidestep is needed into loop invariants and so on; equational reasoning *with programs* suffices. If you try to do the same thing with the imperative loop, you will need to work through the development with invariants and postconditions again, and all the reasoning will again be in predicate calculus rather than with programs.

Fusion is the driving force in many algorithmic developments: many problems have a simple specification written with standard components (maps, filters, folds, etc), and an efficient implementation can be obtained simply by equational reasoning using fusion. For example, a common form of specification is

as illustrated by minimum-cost spanning tree:

The aggregation step is often some form of *selection*, to select an optimal result. The generation step is often some combinatorial enumeration, such as permutations, segments, partitions, subsequences, or trees; these are usually neatly expressed using standard higher-order operators such as . Then the problem is to fuse the aggregation and testing with the generation.

In Part 1 of this article, we’ll cover a more interesting class of programs, namely *greedy algorithms*, which appear to lend themselves nicely to the functional approach. However, there is a fly in the ointment. Many algorithmic problems are optimization problems, which are often under-determined: for example, there are typically several different minimum-cost spanning trees of the same graph. Proper treatment of these problems entails handling of nondeterminism. We found that the most lightweight way of achieving this—so that we could remain focussed on algorithms and not get bogged down in metatheory—was to step briefly and carefully outside pure FP, with just a small handful of *nondeterministic functions*. That challenge, and our solution to it, is the the main novelty in the book, and the subject of Part 2 of the article.

**Bio**: Jeremy Gibbons is Professor of Computing at the University of Oxford, where he leads the Algebra of Programming research group and is former Deputy Head of Department. He served as Vice Chair then Past Vice Chair of ACM SIGPLAN, with a particular focus on Open Access. He is also Editor-in-Chief of the Journal of Functional Programming, SC Chair of ICFP, on the Advisory Board of PACMPL, an editor of Compositionality, and former Chair of IFIP Working Group 2.1 on Algorithmic Languages and Calculi.

**Disclaimer:**

I’m very pleased to announce that Todd Millstein has joined PL Perspectives as my co-Editor. I asked Todd to say a few words about himself.

I’m a Professor and Vice Chair for Graduate Studies in the Computer Science Department at the University of California, Los Angeles. My research interests are broadly in programming languages and software verification. Currently my research is focused on core challenges for automated verification as well as applications of PL technology to computer networks and to probabilistic programming. I’m excited to join this effort to educate one another and the broader community about the power of programming language ideas.

I first got to know Todd when we were both graduate students, and I have kept up with him at conferences and followed his work with interest ever since. He is a fantastic communicator and has produced a broad range of results, from core ideas in semantics and analysis, to language design, to a variety of interesting applications, most recently of software verification to networks. He has some great ideas for how PL Perspectives can expand its coverage and impact. Stay tuned!

**Disclaimer:**

In 2018, I was two years into a tenure-track junior faculty position at Carnegie Mellon University when I saw what was happening in the world of software. 2018 was when Cambridge Analytica happened; soon afterward, GDPR became law. Most academics who start companies in industry do so because they are commercializing a specific project; I left to start Akita Software because I saw that the timing was right to build the kinds of tools I wanted to build—those that could help improve the quality and security of software. For the last two years, I’ve been working to build developer tools at the API level for modern web apps.

Because the technical problem was not set in stone at the time I started Akita, I spent the first year of working on the company doing product and user research work. My academic career trained me to *solve problems* in a principled way. Through the process of getting Akita up and running, I learned how to *select problems* in a principled way. This was important because for the first time, the success of my efforts was being defined by how well I solved *problems that people cared about* enough to pay real-world money for. Here are my learnings.

When I was in research, I spent a lot of time talking to programmers “in the wild,” to understand the problems they had. So I thought I was doing my homework in terms of problem selection. In the last two years, I’ve learned about *user research* and realized that my problem selection in academia could have been much more methodical and principled!

When you are trying to understand the precise problem, **there are principled ways of understanding user pain**—even when your users are programmers and you are working on something as complex as programming languages and tools. You can’t prove them using math or measuring them like you do compiler performance, but you can be scientific all the same, e.g., by applying rigorous methods. This is something that the field of HCI has been doing for a long time—and that people have been advocating for merging more with programming tools, but still hasn’t reached most of programming languages research. (For instance, see this paper “Programmers are Users Too: Human-Centered Methods for Improving Programming Tools” and Sarah Chasins’s syllabus for “Building User-Centered Programming Tools.”)

One of the main things I learned about user research was to **be rigorous about ***how*** I asked user questions**. The most straightforward way is to ask all users the same questions in the same order, under the same conditions. It is also important not to ask leading questions, nor to directly ask users what they want—there’s that famous saying that if Henry Ford had asked people what they wanted, they would have said faster horses. If your hypothesis is that tying one’s shoes is a pain point when trying to leave the house, it is not a good idea to ask directly: “Do you have trouble tying your shoes?” A better way to do this is to ask the user to walk you through what happens when they leave the house and ask them for what the pain points are. (For more on this, you may be interested in “Principles of Contextual Inquiry.”)

**Another big lesson I learned during user research was to zoom out**. Previously when I had been talking to users when doing programming languages research, my conversations usually centered around the pain developers were having around the specific problem I was working on. What I learned when doing user research was to understand the user as a whole person, complete with greater motivations and incentives, and understand how the problem I solved fit in with their goals, hopes, and dreams. For instance, here are questions I asked

- Tell me about your experience with problem X.
- What solutions for problem X would you find most helpful?
- Would you use my solution for your problem X?

And here are some of the questions we asked dozens of users when I started Akita, that had very little to do with the specific tool we were building at all:

- Tell me about your role in the company.
- Walk me through a day in your job. What tools are you using? How are you spending your time?
- What tools do you use on a daily basis?
- Tell me about what led you to adopt tool X.
- What are issues that would cause you to stay at work past the usual time?
- Tell me about how you interface with the rest of engineering. What frictions do you have there? (This was a question for security engineers.)
- How do you get promoted?

As a programming languages researcher, I certainly did not ask these questions in a consistent and disciplined way during problem selection—if at all. From asking these questions, I learned, for instance, that developer adoption impacted the success of a security tool much more than anything security engineers thought about the tool. And that developer adoption often hinged more on actionability than accuracy. These realizations not only led us to completely pivot what we thought our product should be (we started out focused on security/privacy-related usage of APIs, but pivoted to general-purpose use-cases), but also **forever changed my views on building programming languages** and tools.

Embracing the approach described above led to important changes in what we were doing. When I started Akita, I thought what we were going to build was the equivalent of an application performance monitoring tool for tracing data, targeted towards security and privacy teams. I had come to this initial conclusion from talking to people at a high level about what they thought the problems were, and what they thought was needed. Through doing user research, we instead decided to build low-friction integration tools targeted at inferring and testing against API specs, and to pivot our solution away from targeting security or privacy specifically. Here are some of the lessons that led to that:

**The most important way a security engineer can spend their time is improving their relationship with software engineering. When we asked security engineers how they got promoted and/or increased their influence in their organization, it came up over and over again that a security team was only as good as the engineers who listened to them. This gave us confidence that building***developer*tools was the right way to go, since they have much greater influence!

- One of the
**biggest challenges security engineers face is getting software engineers to fix their bugs**. When we asked security engineers to walk us through what they ended up spending a lot of time on, it involved convincing engineers to fix things. For instance, one engineer I spoke to spent a lot of his time coming up with examples to convince engineers that something they wrote is insecure. Another team took it one step further and hired penetration testers just to prove out the importance of certain security/privacy controls to their engineering teams. The results of this user research led us to believe that it wasn’t enough to provide reports to security that something was wrong; a more powerful tool would not only help identify possible problems, but also help developers prioritize and fix the problems. (Our findings match those of this study.) As a developer tools team, this was right up our alley, leading us to start building tools that helped with providing context and reproducibility.

**Integration friction matters**. When we asked users about what led them to choose the tools that they currently used, ease of integration into existing tool chains came up over and over again, as well as how much configuration the tool needed. This was especially true for security engineers, as they burned social capital whenever they made asks of other teams. We realized if we didn’t make a tool that was easy to integrate, it would be very hard to get adoption.

- Building a developer tool for security means
**building**. Over the course of the first year of working on our product, we realized that our product design and go-to-market involved straddling the communication and incentive gap between security engineering and software engineering teams. After understanding the point of view of each side and that developers did not actually seem very interested in closing it, we ultimately decided we needed to choose a side. We chose to become a developer tool with a security use case, instead of the reverse. Without having done the user research we had done, the choice would not have been nearly as obvious!*for*the developer/security gap

Through the process of doing this user research, I also learned to let go of many of the assumptions I had held in academia:

**Soundness may not be the right goal.**In academia, it is often the goal to make static program analyses*sound*: if a bug is not reported, then that bug should not be possible. A common complaint about static analyses is that they aren’t actually that helpful a lot of the time because of the false positives. One reason this is a problem, for instance, is that the more bugs security teams report to developers, the less likely the bugs are to get fixed. In academia, I had thought of false bugs as non-bugs; many industry developers would call a low-priority bug a “false positive.” A surprising (to me) number of engineers said they’d rather not know about all of the bugs, but instead be informed about a subset of bugs that were likely to be fixed. Peter O’Hearn’s “Incorrectness Logic” paper is one that addresses this side of the problem.**“Legacy factors” may be more important than guarantees or performance.**More generally, the tradeoffs that people want to make are not what you might expect, so they are worth exploring. When I dug deeper on my problem, for instance, I learned that the barrier to entry was often convincing people to adopt tools, which had more to do with integrations and learning curve than than math or performance. If people never got to the point of appreciating what was good about a tool, they would not appreciate the tool! This is definitely a different point of view than the one I had in academia, where the focus was on building clean versions of new features, leaving adoption as an exercise to the reader. While requiring all academic work to worry about adoption would certainly curb a lot of innovation, viewing adoption as “not academia’s problem” may be be curbing the opportunity for impact. Given that frictionless integration with legacy tool chains yields surprisingly challenging technical problems and can disproportionately affect impact, it should be given more attention in academia.

My experience learning about user research led me to revisit something that I had previously thought a lot about: how should we be evaluating programming languages research papers? While a paper’s Introduction is what gets reviewers interested in a paper in the first place, the evaluation section gives a champion reviewer the necessary ammunition to go to battle at the Program Committee meeting. Because of this power, what shows up in the evaluation section determines what kind of work gets done in entire fields.

My strong belief is that** programming languages research does not value usability because there currently is no satisfying, agreed-upon way of evaluating it. **Semantics evaluations guarantee properties like soundness, completeness, and type safety, but they don’t say anything about whether someone will actually care about the guarantees or the language. Benchmarks determine the tool runs fast according to sample programs. Benchmarks are great for compilers of known languages, but are often not as appropriate for language or tool design papers that do not focus around performance. The hour-long user studies we often see are often inadequate for evaluating complex languages and tools. (Coblenz et al have a recent TOCHI submission outlining some existing shortcomings in evaluation processes and proposing a new process.)

The parts of language or tool design that have to do with developer experience—which, for a lot of papers, is a lot of things—end up only in the introduction and in various assumptions throughout the paper. What this means is that these parts are not directly evaluated. And when we don’t directly evaluate usability claims, two things happen. First, work that focuses on improving usability in a thoughtful way does not necessarily have a better chance of getting positively peer-reviewed. Second, usability claims do not get critically reviewed, meaning the field is less likely to move towards solutions that improve developer experience. Right now, when a paper’s main strength is usability, it has a *worse* time getting through peer review because it’s up to the specific reviewers whether they choose to appreciate it or not.

Here’s a question that I would like to pose to the field: **What would it look like to apply rigorous evaluation to problem selection for PL papers**, instead of case studies for the result? For papers that propose things for developer productivity, why not evaluate them against interviews/surveys of where developers are actually struggling? Why not also make the criteria (for instance, soundness or performance) something that needs to be justified by the user research? “Soundness” is a formal statement that an analysis respects semantics according to some semantic property of interest, for instance no run-time failures or no information. What would it look like to justify this property itself, e.g., through user interviews?

I don’t know the answers, but here is one version of what it could look like. For researchers, understanding that there exists a framework for evaluating usability may change how you decide to pose and evaluate those claims. There is no need to go all-out and do a collaboration with an HCI researcher, though that could be interesting and helpful for the work. Even conducting any interviews is better than nothing. For reviewers, understanding usability evaluation may help you consider it as an acceptable form of evaluation for PL research—and flag when papers make usability claims *without* backing them up rigorously. And the only way we can move towards this is if the leaders of the community support this shift towards rigorous user-centric work.

A final word: we should be careful not to let the pursuit of any specific kind of evaluation dominate a field, including user-centric evaluation. On the one hand, I love the idea of adding human-centered evaluation sections to highly technical programming languages paper. I have always thought of programming languages research as combining human factors with math and code—and these evaluation sections would finally reflect this marriage. On the other hand, academia *isn’t* industry for a reason. We need room for the what-ifs. And the more that we require of evaluations, the harder it will be to justify the intuition that accompanies truly novel discoveries. While I believe it would move the field forward to consider human-centered evaluation on a similar footing to semantics-centered or performance-centered evaluations, we should be wary of making any one of these evaluations required.

**Bio**: Jean Yang is the founder and CEO of Akita Software, an API tools company. She was previously an Assistant Professor in the Computer Science Department at Carnegie Mellon University.

**Acknowledgments**: Thanks to Will Crichton for comments and suggestions on a draft of this post!

**Disclaimer:**