Skip to content
Go back

Why Chat is the Best Interface for LLMs (for now?)

Published:  at  07:00 PM

I think that chat provides the best user experience for LLM apps. This is something of an accepted truth for now, ever since ChatGPT came out in November 2022.

In this post I want to briefly talk about why it is that chat interfaces work best for LLM powered-apps as they exist in late 2023.

The short answer: because LLMs aren’t good enough and we haven’t figured out how to build applications around them that compensate for this.

Limitations of LLM Apps

LLMs, even the best ones like GPT-4, usually don’t do exactly what you want based on a single prompt. This is for two main reasons:

  1. Limited reasoning ability
  2. Insufficient context

Limited Reasoning Ability

The largest of the language models, like GPT-4, are pretty incredible tools but often falls short when it comes to complex reasoning. This is especially true when an LLM has to work through a large prompt with lots of details.

LLMs can struggle with understanding and generating nuanced or deeply reasoned responses. They are better at aggregating and rephrasing existing knowledge than at creating new insights or complex reasoning.

LLMs may lose coherence over long texts or intricate prompts with many details. Their performance can degrade as the length and complexity of the task increase.

One inherent limitation of LLMs that has become less of an issue in LLM-powered systems is that LLMs do not have real-time awareness or personal experiences. Their responses are based on pre-existing data up to their training cut-off, making them less effective for current events or highly personalized queries. However, this limitation has been ameliorated with giving LLMs access to external knowledge with retrieval augmented generation (RAG) or web browsing.

Until the reasoning ability of LLMs improves, the output of LLMs on the first attempt often fall short of what’s desired.

Insufficient Context

It can be hard to prompt an LLM to do exactly what you want it to.

Say you give an LLM the prompt: “write me a funny story about a cocker spaniel named Georgie”

There’s probably a lot of other contextual information that you’d like the LLM to include that isn’t included in the prompt string, such as:

These questions and more could at least partially be baked into the prompt, but it can be hard to know all the details to include in there.

Therefore, when the LLM takes its best crack at answering your initial generic prompt, it’ll probably fall significantly short of your expectations. But, it’ll probably get some stuff write that you can further refine in your chat conversation.

In your follow up messages, you could say things like “make the story more sarcastic” or “make Georgie a girl”.

While LLMs are pretty remarkable at understanding and generating natural language, their performance is still heavily dependent on the quality and specificity of the input they receive. I’ve yet to see systems that can reliably give LLMs the requisite context that’s not stated in the user prompt but should be included in the generated answer.

Chat as the Interface

The power of the chat interface is that it allows for continuous prompt refinement throughout the conversation to compensate for LLM limitations.

With back-and-forth dialogue, users can:

While LLM chat apps don’t have the same full on sci-fi appeal of being able to tell an autonomous agent to do some abstract task for you, it’s still incredibly useful. Say a 30% utility gain instead of 300%. That 30% is a major win, and one that I believe we can optimize current chat-based systems on to get 10s of percent greater utility improvements.

While I haven’t been able to find research on the benefits of LLM chatbots specifically, recent data about using generative AI tools backs this up:

The main conclusion here is that chatbots do not present order of magnitude gains as agents promise, but something tangible and sizable to work with right now.

Evolving the Chat Experience

Now, let’s discuss some ways to achieve these 10s of percent utility gains with generative AI chatbots.

There are a few concrete measures that I believe can significantly improve chatbot performance with currently available technologies and techniques.

These measures are:

Custom GPTs

The easiest way to experiment with an evolved chat experience that I’ve found is ChatGPT’s custom GPTs functionality.

Using GPTs, you can prompt engineer. GPTs even come with a nice chat-based configuration wizard to help you with your prompt engineering.

You can connect to tools. GPTs come with some tools out of the box, including web browsing, knowledge retrieval, and data analysis. You can also add your own custom integrations.

The GPTs also have a mechanism to extend chat memory beyond the context window.

As of now, there’s no user-directed personalization. However, you could take a GPT template and customize its system prompt to suit your personal needs.

Using the ‘Copilot’ Idea to Scope Chatbots

One question regarding LLM-powered chatbots is how to scope their functionality to maximize utility. Due to the above-mentioned limitations of current LLM technology, chatbots should be scoped to a specific domain to best utilize their limited reasoning abilities and context window.

I can imagine a future where, using some clever context window management technique (like how a computer manages multiple applications in memory), a single chatbot can support an arbitrary number of use cases. I haven’t yet seen techniques that can achieve this, though.

For the time being, it makes sense to focus on one specific use case per chatbot. Determining the domain of a chatbot is one of the most important decisions when creating it.

I think using the idea of a ‘copilot’ can help determine what the scope of a chatbot should be. You want a copilot to help guide you through a specific thing. For example, you probably wouldn’t want the same copilot on your plane as your boat.

Microsoft has leaned hard into the idea of the copilot, with CTO Kevin Scott presenting a recent keynote called The Age of the Copilot. They’ve rolled out a bunch of copilots to their tools, such as Github Copilot, Bing Copilot, and Office Copilot.

I expect Microsoft to continue leading the space.

Looking Beyond Chat (to Agents?)

The next evolutionary step for LLM apps beyond the chatbot is the agent. Agents can perform complex tasks autonomously with minimal oversight.

To address the limitations of current LLM apps and pave the way for LLM-powered agents, several improvements are necessary:

There are a lot of people spending a lot of time on a lot of projects for AI agents. To name a few:

And then there are the LLM labs working on making the model better.

Once we get to autonomous agents, even if they’re only for discrete tasks, it has the potential to completely revolutionize how we use software and interface with the world. Bill Gates expands on this idea in his excellent recent blog post, AI is about to completely change how you use computers.

I suspect that before we get to fully autonomous agents, there will be a phase of semi-supervised asynchronous agents. These agents can perform an indeterminate number of actions autonomously and asynchronously but regularly check in with a human for feedback. Think more email back and forth than the current class of chatbot’s instant messaging.

I also suspect that in six months to a year’s time, this blog post could look very different. New LLM models and application-level tools could make agent-driven LLM apps much more tangible.

While folks work on building these future classes of technology, we shouldn’t lose sight of what we can do with the present set of tools to provide immense utility, the chatbot.


Suggest Changes

Previous Post
From RAG to the Knowledge Retrieval Tool
Next Post
MongoDB Podcast: Building the MongoDB Docs AI Chatbot