Tool use with gptel: looking for testers!

2024-12-31

emacs , gptel

TL;DR: Get LLMs to do things from Emacs, with gptel and your help.

A short one today, without the usual flourishes. gptel is a large language model (LLM) client for Emacs. At its core is a wrapper around the HTTP APIs provided by all LLM providers, including the smaller language models you can run locally on your machine. At its surface… well, it tries not to have a surface at all, and blend in with Emacs’ design metaphors and affordances instead.

What does that mean? For one, it’s available at any time and in any buffer, even in places where it probably shouldn’t, like tooltips and the minibuffer. It treats text – yours or that generated by LLMs – as Emacs treats text. You can sling it around, propertize it, redirect it, shove it into buffers and more. You can save LLM conversation states by just saving the buffer, treat hierarchical Org documents as parallel conversation threads, and so on. Second, it’s programmable: you can use a simple API to script gptel into your tools instead of adapting your tools to gptel Unfortunately, one caveat is that gptel is presently async-only, so you’ll have to write a callback instead of just inserting a call to gptel-request in the middle of your code. .

This year old video, while significantly behind the current state of the project, illustrates some aspects of the design.

A generalist tool like an LLM can be used in more ways than can be captured by any rigid design, and we’re some distance away from answering this simple question: Can the most flexible text editor ever written provide a generic interface to the most flexible text generator ever created? I am puzzled.

And at some point – not today – I will write in detail about gptel’s evolving design and the challenges therein. Today, however, I need your help!

Tool use: the layman’s version

At some point folks realized that instead of training LLMs to do every task, it’s easier to train them to delegate actions to specialized tools.

You can think of these tools as capabilities you equip the model with: the ability to look up information, run actions that modify the state of your computer, and so on. This solves two common problems with using LLMs:

Their knowledge of the world is limited to their training corpus – and then further compressed in their internal representation. With access to better information in the mix, it’s no longer just a game of asking a frustratingly cheerful oracle a question and wondering if it’s making up a load of hogwash.
They can generate text (or other multi-media formats) but not actually do anything useful, assuming you can trust them to not trip over themselves.

Of course, the downside is that now you’re responsible for providing these “capabilities”, and defining the interface and levers that the LLM can choose to pull. So tool use, or “function calling” is LLM usage where

you include a function specification along with your task/question to the LLM. This includes a detailed listing of the function’s arguments, their types and so on.
The LLM optionally decides to call the function, and supplies the arguments.
You run the function call, and (optionally) feed the results back to the LLM.
The LLM completes the task based on the information received.

You can use this to give the LLM awareness of the world, by providing access to APIs, your filesystem, web search, Emacs etc. You can get it to control your Emacs frame, for instance.

And speaking of the Emacs frame…

Tool use in gptel

gptel supports tool use, but it’s not ready for primetime yet. Here are a couple of illustrations.

You can get it to write boilerplate so you don’t have to. In this demo I provide an LLM with some basic filesystem capabilities, and get it to set up a Nix flake integrated with direnv:

Play by play

Unlike the usual video narration on this blog this one’s going to be rather high level, as the tool use UI isn’t final.

Type in the query, with a reasonable amount of detail.
Ensure that the filesystem tools are available to the LLM. (The “scope” switch sets the tools buffer-locally.)
Run M-x gptel-send.
In the lower window, run the watch command so we can see the ~/Desktop/ghostty directory being populated.
After the LLM is done creating the directory and files we asked for, it opens the directory in dired and reports its actions.
Examine the flake.nix and .envrc files to make sure they look okay.

Here I give the LLM the ability to query my Elfeed and Wallabag databases, which covers most of the things I’ve read or watched on the Internet for the past twenty years. I access both of them from Emacs via the elfeed and wombag Emacs packages respectively. and they are only available locally on my computer. I ask the LLM for details about something I vaguely remember watching a while ago:

Play by play

Unlike the usual video narration on this blog this one’s going to be rather high level, as the tool use UI isn’t final.

Type in the query, with as much detail as I can remember.
Ensure that the local search and web tools are available to the LLM.
Run M-x gptel-send.
It finds the most likely search result from Elfeed, which is a Youtube video.
It fetches the description and a transcript for the video, then summarizes it with links to some references.

I wouldn’t get too excited by this second example – there is no fancy vector embedding or similarity search going on. The LLM just runs a standard keyword search against the Elfeed database, and we were fortunate that the result we were looking for contained the word “derivative” that I asked for.

But you get the idea, I hope.

A sample tool definition

If you were curious, “tools” as used in the above demos are elisp functions coupled with descriptions of the schema. Here is an example.

(gptel-make-tool
 :function (lambda (path filename content)
             (let ((full-path (expand-file-name filename path)))
               (with-temp-buffer
                 (insert content)
                 (write-file full-path))
               (format "Created file %s in %s" filename path)))
 :name "create_file"
 :description "Create a new file with the specified content"
 :args (list '(:name "path"
	       :type "string"
	       :description "The directory where to create the file")
             '(:name "filename"
	       :type "string"
	       :description "The name of the file to create")
             '(:name "content"
	       :type "string"
	       :description "The content to write to the file"))
 :category "filesystem")

Testing tool use

There are dozens of uses for this kind of LLM usage, and dozens of ways in which this can break. That’s where you come in. If you’re interested in this kind of thing, please try out the feature-tool-use branch of gptel – kick the tires a bit. Write your own tools for tasks you’d like to automate, and let me know what breaks, and what’s missing!

You can find detailed instructions for testing tool use on gptel’s issue tracker, which is also the best place to report issues or suggestions. gptel does not currently ship with any tools, and the issue page includes the collection of tools that I used in the above demos.

Scratching the surface

Following the theme of general befuddlement, I’m not sure yet about how any of this should work. The most I can confidently say is that as an Emacs-native LLM interface, gptel is quite far from any kind of optimum. So here’s hoping we can inch our way there.