Conversation Engineering

How many times have you had a conversation with a computer today? For most of us that number is increasing every day. Arguably, the locus of user interaction has forever shifted from graphical to conversational1. However, thus far the practice of designing conversations has not kept up with this shift. Fundamentally, we have many fewer theories, guidelines, tools and metrics to design, implement and evaluate conversational user interfaces, versus traditional graphical ones.

This is strange, as conversations need to be just as useful, delightful and beautiful as the graphical applications that we interact with every day. Conversations are one of the defining ways we experience modern computing technologies, and should take their rightful place at the center of what we call user experience and design.

The stakes here are also fundamentally different than they were for earlier shifts in interface paradigms. A poorly designed webpage is ugly or confusing; a poorly designed conversational agent confirms a payment with the wrong person, contradicts a policy shared three days ago, or provides medical guidance it has no business providing. The non-determinism of LLMs combined with the wide scope of their application drastically raises the stakes of both successes and errors. This alone warrants the rigorous processes, tools and safeguards that an engineering discipline requires.

From Prompt Engineering to Conversation Engineering

The artifact at the center of this transition is the humble and much-maligned system prompt. For those of us not training or fine-tuning our own foundation models (in other words, the vast majority of us), the system prompt is the primary way we have of controlling the behavior of these powerful models, and thereby architecting the conversation as experienced by the user.

Given this centrality, it is somewhat bizarre that being a “prompt engineer” is considered in the derogatory sense. Largely this is because most people consider authoring system prompts to be something that doesn’t require any specialized skills or training. As a result, the practice of authoring system prompts remains largely an ad hoc set of activities without any kind of systematic process, unifying theories or established metrics2.

Several organizations that we work with on critical applications, including in healthcare, financial services and customer support, have described their current process of authoring system prompts as “terrifying”. Their processes of designing these prompts, evaluating them, and iterating are completely ad hoc and arbitrary. Often, they rely on runtime errors or discrepancies (often called observability) as their primary way of testing and debugging.

This is clearly untenable. As we interact with LLMs for more and more critical tasks, the activity of designing, implementing and testing them cannot remain based on a set of “folk recipes” and leaky parachutes. Above all, this process needs to be made rigorous, grounded and accountable. In short, we need to go from being prompt engineers, to conversation engineers, centering the process and holistic experience, and respecting the larger context within which this experience resides — a conversation.

Conversation Engineering: A New Practice and Discipline

Luckily, we already have some guidance on how to approach this. We can build on the earlier practice of conversation design, which for years has guided how we design other kinds of conversational interactions, like IVRs, NLU (natural language understanding) systems, and other kinds of assistant-like interactions. On the empirical side, we can build on the field of conversation analysis, which understands the patterns and behaviors associated with naturalistic human conversations, as well as a range of other psychological and cognitive theories that ground these patterns and behaviors. On the process side, we have the rich history of user-centered design, based on the iterative process of designing, prototyping and evaluating, to guide us. And from software engineering itself, we inherit practices around specification, versioning, testing and auditability — practices that the existing culture of prompt authoring has thus far largely ignored.

Conversation engineering is the discipline of designing, evaluating and operating conversational systems with the rigor that production deployment requires. It encompasses behavioral specification, evaluation methodology, multi-stakeholder coordination across designers and engineers and compliance and operations, and accountability across the full lifecycle of a conversation from authoring through deployment to ongoing operation.

None of this is purely theoretical. There are practitioners already doing rigorous conversation work — voice AI teams at agencies deploying production systems in regulated industries, conversation designers at large platforms, the engineering teams behind the assistants and agents that increasingly populate our daily lives. Naming the discipline is partly about giving this existing work its proper home, and partly about creating the space for the systematic theory, tooling and education that the work deserves but has thus far lacked. Towards that end, I’m teaching the first Conversation Engineering course at Cornell Tech, launching in January 2027.

Designing Beautiful Conversations

Designing a conversation implies resolving a whole set of fundamental questions related to values, ethics, goals and understanding that can never be solved by AI in the absence of humans. We believe that as the centrality of conversations grows in terms of our interactions with computers, more and more UX designers, software engineers, and other professional designations will become conversation engineers, opening up a whole wide world of expertise, skills, tools and methods in which humans will be central to the profession.

As LLMs automate the mundane aspects of technology design and implementation, only the most important questions remain — Who are we? What do we want? How do we want to be understood? — and these are the questions that conversation engineers will be best-equipped to address.


Note about AI usage: The initial draft was written by me with no AI assistance. Claude Opus 4.7 contributed structural suggestions and helped refine several passages in the final version.

Footnotes

  1. Notwithstanding the age-old HCI debate around agents vs. mixed initiative, our assertion is not that the GUI will go away, but the nature of LLMs fundamentally shifts the recall vs remember calculus such that conversations are now genuinely competitive with GUIs for a far wider set of tasks.

  2. It is notable that several of the companies we are working with consider their system prompts to be highly protected intellectual property.