Design

Alexander Embiricos

Mar 15, 2024

Designing Usefully Unintelligent AI Summaries

Chatbots are being added to every software product in the cloud and under the sun. But while they're mostly cool, they're not consistently useful.

A cool exploration of adding an AI bot to chat in Multi, which we thankfully did not ship

Many AI features fail because they require too much "intelligence" from both LLMs and users. And it's not because the models "aren't quite good enough." Chatbots are not the future. Instead LLM features require careful UX design for a specific use case.

Last week, we shipped our first LLM-based features: Summaries and Shared Content References. Unlike other approaches we’ve seen, we deliver our summaries in a real-time multiplayer text editor, and iterated to surprisingly simple underlying prompts. In this post, we’ll explore some of the principles that led to those decisions.

Our use case: Video calls for getting work done together

At Multi, we're building video calls to help software teams get work done together. So far, we’ve been focused on features for multiplayer collaboration during sessions. A modern take on Screenhero.

Now we're starting to use LLMs to help with follow ups, sharing context with teammates, and keeping a team’s knowledge fresh. For any productivity nerds, that’s connecting "sync & async."

Challenges building for software teams

There’s a growing plethora of summarizers and recorders for sales, support and general use cases, but they miss the mark for software teams.

Conversations about building together bring a few additional challenges:

Discussion centers around designs, code, or docs. Imagine pairing without seeing the screenshare! The transcript is woefully insufficient.
Any knowledge that does lie in the transcript is highly specific in both terms and phrasing. LLMs are unlikely to retain that nuance.
Conversations are collaborative, which means that participants need shared outputs, and shared affordances to generate them.

Principles

Our goal is for AI features in Multi to feel like they're provided by a proactive AI assistant that coinhabits your workspace and leans on shared context.

Here are the principles we came to:

Lean on the facts
Make the AI feel like a teammate
No prompts. Instead, provide useful actions in context

1. Lean on the facts

LLM summaries are imprecise but can serve as skimmable pointers to factual content

LLMs aren’t suited to writing at the specificity needed to communicate decisions, rationales, or plans. One classic mistake is missed or confused negations. For example “Alice,” a user in our early access, sent us this summary point we'd generated:

· Alice is merging extraction work to avoid blocking production deployment.

She then told us:

"I wasn’t merging anything to avoid blocking a deployment. I was merging stuff and it needs to be tested out before we can deploy to production. So it was inversed."

Although GPT4 got the nuance of the plan wrong here, it did capture a few relevant themes: “Alice,” “extraction work”, and “blocking production deployment.” This generalizes to something useful: LLMs are good at transforming transcripts into skimmable snippets that capture the gist of a conversation.

Skimmable summaries; factual details

For the details that LLMs are not suited for, we can rely instead on the actual artifacts that were discussed. We can link to those, include screenshots, or find the right moment in the recording. This pairing of skimmable summaries with factual details brings us to a content taxonomy, which is in effect a roadmap for us.

Content Source Skimmable Format Detailed Format Transcript Short AI summary Detailed notes written with user assistance —OR— Raw transcript at key moment Reference to shared artifact Filename / webpage name Open the content Recording Screenshot of key moment Clip of recording at key moment

To intuit this, imagine catching up on Slack. Can you see yourself skimming something in the left column, and occasionally wanting to learn more? Conversely, how often will you read a transcript or watch a recording without some skimmable context first?

You'd skim some short notes in Slack, but not a long video + transcript

The notepad

Lets see these elements in Multi. As of our March 7th changelog, here’s what a notepad looks like after a session:

We think the above is still light on facts, and plan to fix that. Here’s an exploration of how we might add citations that reference the transcript and recording:

Citations on summary points, that reference the transcript & recording

2. Make the AI feel like a teammate

High fidelity AI outputs require high fidelity human inputs

Wattenberger writes about the “spectrum of how much human input is required for a task":

"When a task requires mostly human input, the human is in control."
"But once we offload the majority of the work to a machine, the human is no longer in control.”

Multi is a team communication tool. How can we elegantly provide control of AI across multi-step workflows like taking notes then sharing them, or tracking action items then filing tickets?

Davis Treybig mentions Debuild as a good example of “validation affordances”:

Debuild’s initial validation affordance — it asks you to validate the “use cases” it thinks you need, and you can edit/modify those use cases before moving forward.

Debuild "asks you to validate the use cases it thinks you need, and you can edit/modify those use cases before moving forward."

This works well for a single focused workflow, but each custom flow we add distracts from the ongoing discussion and makes the app more complex. Build one for each possible workflow during or after a conversation, and we quickly end up an overwhelming app.

The AI is just a teammate taking notes with you

Can we feed AI into familiar multiplayer affordances, instead of creating a new system? Luckily, we’ve spent a year designing simple but powerful primitives for multiplayer collaboration in Multi.

We realized that if we build the AI to interact with us like a teammate, it becomes immediately obvious to understand all of the below, in real time:

What it’s doing, including intermediate steps
How to add, edit, and remove intermediate and final AI outputs
When teammates are interacting with it

That’s why Multi’s AI primarily delivers output in the same realtime multiplayer notepad that users can take notes in. Here are some of the options we explored to get to that:

Spectrum of AI feeling like a teammate vs a new system

Ultimately, we decided that the closer AI notes felt to teammate notes, the more natural it would feel to correct it—to work with it.

Example workflow: Tracking action items and filing to Linear

We want a system where:

The user, AI, or other teammates can add action items.
All action items are editable by any the user or other teammates.
All action items, whether or not they've been edited, can become draft Linear issues with one click.

System where: The user, AI, or other teammates can add action items. All action items are editable by any the user or other teammates. All action items, whether or not they've been edited, can become draft Linear issues with one click.

By making action items blocks in a multiplayer text editor, that have a "Create Issue" button, this all just works!

Create a Linear Issue from an Action Item in the Notepad.

3. No prompts. Instead, provide useful actions in context

Just do the basic thing, automatically

The first AI feature we shipped to early access was open-ended prompting. It was my (terrible) idea. After some inspiring conversations with power users who imagined all sorts of use cases, I was excited to see what they’d come up with.

And… nobody used it. Few tried prompting, and even fewer tried again. The problems were clear: We were making users guess what capabilities our AI had, decide how and when to use them, and also how to prompt for good results.

We quickly pivoted and shipped the obvious thing: AI summaries, triggered automatically after a session, with a prompt we’d refined to be useful most of the time.

The confidence spectrum

Automatically triggering the AI is great when we’re be confident it’ll be useful. But what happens when we’re less sure? Depending on our level of confidence, here are some UX patterns that we use or plan to use. In each case, we take as much context as possible from the interface so that the user doesn’t have to prompt.

Trigger automations that are always useful automatically. If they're often useful, suggest. If they're only sometimes useful, progressively disclose a button

Expand on a topic

Press space for assistant to continue taking notes on this topic

We can automatically trigger short, skimmable summaries after calls because we’re confident they’ll be useful. However, what if someone wants more detail, or they want it during the call? Since we can’t reliably predict the timing or content, we can give the user explicit affordances to tell us.

File an issue

Creat Linear Issue from a Multi Action Item

Similarly, it’s useful to quickly file an issue with detailed content, but for most teams,relatively few action items should become issues.

Transcript-based copilot

GitHub copilot works well in part because it seamlessly takes context from the workspace, and has a lightweight UX for triaging and incorporating suggestions. What if you had a copilot while taking notes, that predicted based on what’s been said?

So, are open-ended inputs ever the right solution?

As long as you aren’t relying on them, we think open-ended inputs are worth incorporating as another means of interaction for power users.

For example, the above button to expand on a line is great while editing, but it’s not discoverable otherwise. What would a permanent affordance to write a detailed summary or answer look like?

Open ended summaries, with suggestions

We brought back our failed experiment of open ended prompts, but with two key fixes that have increased usage to healthy levels:

First, this prompt is shown below the skimmable summary. That creates a clear workflow of reading the concise summary, and using the prompt to learn more.
Second, we added suggestions for topics. This pattern creates buttons for those topics, avoids being annoying when we’re wrong, and most importantly helps teach users how they can use the feature in context.

Deliver outputs in context. No, really in context

It seems intuitively right for users and AI come together to write notes in a notepad, but is that really the right place for final delivery? In most cases, no. Users want these notes in Slack, Notion, Linear, etc.

So naturally, we’re exploring how Multi can push notes into existing tools. Most of our reasonable explorations look like the “Create Linear Issue from Action Item” flow above: Some action you can take from the notepad.

The problem is, these might be automations, but they feel like extra work. You shouldn’t have to migrate design crit feedback into Figma—it should just be there when you revisit the designs. Hence what might be the most provocative idea here…

AI that responds to OS context

Are you getting minor heartburn thinking about how and when this will activate without being annoying? So are we. Please send help.

Conclusion

We’re just getting started with AI interfaces for calls and desktop OSes. I’d love hear your thoughts on these applied principles, and if you made it here I'd love to read whatever else you’re finding interesting. Please find me at @embirico on X/Twitter or LinkedIn, or email me at alexander@.

Subscribe to Remotion email updates