Bridging Islands in Emacs: re-builder and query-replace-regexp

2021-04-28

emacs

One of the problems with Emacs, especially out of the box, is that its constituents don’t communicate with each other as comprehensively as they ought to. This is expected given the bazaar nature of its development: it’s an amalgamation of elisp libraries written by different contributors over decades, few of whom were aware of many of Emacs’ existing capabilities they could reuse or plug into. I covered a couple of examples of this deficiency in my series on Batteries included with Emacs (such as the pulse and view libraries).

The experience of using most software teaches us to ignore such shortcomings and annoyances until we don’t even notice them any more. When using Emacs, however, it’s possible to glance occasionally at a lacuna between its islands that’s begging for a bridge. In stark contrast to most software, building arbitrary bridges in elisp is very easy.

The Islands: `re-builder` and `query-replace-regexp`

One such example is a simple connection between the built in regexp-builder (re-builder) library and the *-replace-regexp functions. re-builder lets you build a regular expression (henceforth “regexp”) with interactive feedback. Text in the main buffer matching the regular expression is highlighted, which is very helpful to catch and correct errors:

Figure 1: re-builder: Matches for the current regexp are highlighted in the main buffer. — Figure 1: `re-builder`: Matches for the current regexp are highlighted in the main buffer.

I’m reasonably familiar with Emacs’ flavor of regular expressions, idiosyncratic as it is (compared to PCRE), and thus often forget that re-builder even exists. On the other hand, query-replace-regexp (henceforth qrr), an everyday tool which does what it says on the tin, does not show matches as you construct the expression to be replaced:¹

Figure 2: qrr: No feedback, no highlights. — Figure 2: `qrr`: No feedback, no highlights.

Building regexps and replacing constructed regexps: these two commands do complementary things and deserve to work together. As things stand now, you’d have to

Open re-builder and construct your regexp.
Copy it to the kill ring (C-c C-w by default).
Quit re-builder (C-c C-q by default).
Run query-replace-regexp (C-M-% by default).
Paste the kill (C-y).
Fix newlines and backslashes (if using the “read” interface to re-builder, more on this below)
Type in the replacement string and press RET to begin the replacements.

Connecting commands through the kill-ring (or clipboard) should be the last resort, not a go-to strategy!

(A)bridged functions

Here they are working in concert:

The idea is this: re-builder and qrr are now effectively the same command. Bring up re-builder using a keybinding (preferably your keybinding for the latter) and build your regexp interactively. Press RET and re-builder exits, with its contents as the input to the replacement command.

This is exactly as many keystrokes as running qrr or re-builder for their individual purposes, but now you can use both fully, go from the latter to the former, and you have to remember one fewer command or keybinding!

A Combinatorial Expansion: `rx` input to `query-replace-regexp`

re-builder has many more advantages than just interactive feedback, and now they all carry over to qrr:

You construct the regexp in a regular buffer, giving you access to the full suite of Emacs command for editing, including your own editing customizations.
You can navigate between matches interactively (like with isearch), and thus choose where in the buffer the replacement should begin. This even doubles as a replacement for isearch-forward-regexp and isearch-backward-regexp, although these can be configured to preview matches and re-builder is less crucial.
You can switch between different modes of regex entry, including the powerful rx forms that qrr does not allow:

Figure 3: Regexp specification through a much easier to parse Lisp form.

Pressing “RET” will run the replacement on the appropriate condensed version of this string:

To switch re-builder to rx mode, invoke reb-change-syntax (bound to C-c C-i by default). This is persistent, you only need to do it once. Note that this is a regular lisp buffer, so you have access to all your lisp editing tools: smartparens/lispy, autocomplete etc.

re-builder actually has a third “read” interface, where you quote regexps like you would in a string in Lisp code. This is useful to test regexps that you plan to place in code.

Thus we now have a route that threads more islands that were originally disjoint: rx, your Lisp editing suite, re-builder and qrr. This is perhaps the lesson here:

Connecting existing libraries in Emacs leads not to a linear growth in its features, but to a combinatorial expansion of its capabilities. This can be significantly more bang for your buck than writing things from scratch, and it will help minimize your cognitive load as the things you already know work in more contexts.

The Actual Bridge

Finally here’s the elisp forming a bridge between the two commands instead:

   (defvar my/re-builder-positions nil
      "Store point and region bounds before calling re-builder")
    (advice-add 're-builder
                :before
                (defun my/re-builder-save-state (&rest _)
                  "Save into `my/re-builder-positions' the point and region
  positions before calling `re-builder'."
                            (setq my/re-builder-positions
                                  (cons (point)
                                        (when (region-active-p)
                                          (list (region-beginning)
                                                (region-end)))))))
  (defun reb-replace-regexp (&optional delimited)
    "Run `query-replace-regexp' with the contents of re-builder. With
  non-nil optional argument DELIMITED, only replace matches
  surrounded by word boundaries."
    (interactive "P")
    (reb-update-regexp)
    (let* ((re (reb-target-value 'reb-regexp))
           (replacement (query-replace-read-to
                         re
                         (concat "Query replace"
                                 (if current-prefix-arg
                                     (if (eq current-prefix-arg '-) " backward" " word")
                                   "")
                                 " regexp"
                                 (if (with-selected-window reb-target-window
                                       (region-active-p)) " in region" ""))
                         t))
           (pnt (car my/re-builder-positions))
           (beg (cadr my/re-builder-positions))
           (end (caddr my/re-builder-positions)))
      (with-selected-window reb-target-window
        (goto-char pnt) ; replace with (goto-char (match-beginning 0)) if you want
                        ; to control where in the buffer the replacement starts
                        ; with re-builder
        (setq my/re-builder-positions nil)
        (reb-quit)
        (query-replace-regexp re replacement delimited beg end))))

Additionally, I bind this new replace-regexp function (reb-replace-regexp) to RET in the re-builder buffer, and replace qrr entirely with just re-builder:

  (define-key reb-mode-map (kbd "RET") #'reb-replace-regexp)
  (define-key reb-lisp-mode-map (kbd "RET") #'reb-replace-regexp)
  (global-set-key (kbd "C-M-%") #'re-builder)

Very briefly, the code works as follows:

Save the region and point positions into my/re-builder-positions before invoking re-builder, since these are lost. This is done by advising the function.
When you press RET, quit re-builder and call qrr with the built regexp, saved point and region information.

Lastly, if you want to insert a newline in the regexp-builder buffer you can now use C-q C-j. Entering literal newlines in a regexp definition is rare enough that dedicating RET to the much more useful qrr is a no-brainer.

Yes, visual-regexp exists. But piling on another thousand lines of code here would be like bringing in a mountain of dirt to create a new self-contained island when the existing ones are lacking but a few connecting strings, and have the opportunity to form a denser network of interactions. ↩︎

The Islands: re-builder and query-replace-regexp

(A)bridged functions

A Combinatorial Expansion: rx input to query-replace-regexp

The Actual Bridge

The Islands: `re-builder` and `query-replace-regexp`

A Combinatorial Expansion: `rx` input to `query-replace-regexp`