Fixing Sphinx Search¶

When transitioning over to using denote-style filenames for this site, I changed the output filenames generated by Sphinx so that corresponding URLs were shorter and more concise. This unfortunately had the side-effect of breaking searches since the search results were still using the true docnames!

Finding the Problem¶

After playing with the search results page and reading some of the Sphinx code I eventually found out the following.

The search index is stored in a JavaScript file searchindex.js
The search index is mostly incomprehensible to me… but there is a field called docnames which contained the true docnames
The search index is built using the IndexBuilder in the sphinx.search module.

Implementing a Fix¶

Warning

This involves monkey-patching the IndexBuilder object which is definitely not part of Sphinx’s public API, you should expect this to break eventually!

To fix the search results we “just” have to update the docnames so that they are consistent with the ones we generated as part of the build. It took me a few attempts but I eventually came up with the following

class DenoteHTMLBuilder(DirectoryHTMLBuilder):
    ...

    def dump_search_index(self):
        if (builder := self.indexer) is not None:
            builder.freeze = partial(rewrite_indexed_docnames, freeze=builder.freeze)

        super().dump_search_index()

Here I’m just overwriting the method responsible for generating the data written to searchindex.js, where rewrite_indexed_docnames is defined as

def rewrite_indexed_docnames(*, freeze: Callable[[], dict[str, Any]]) -> dict[str, Any]:
    """Rewrite the docname in the search index so that they align to urls generated by
    the DenoteHTMLBuilder."""
    index = freeze()

    docnames = []
    for docname in index["docnames"]:
        if not docname.startswith("content/"):
            docnames.append(docname)
            continue

        if (record := Record.parse(docname.replace("content/", ""))) is None:
            docnames.append(docname)
            continue

        docnames.append(record.url)

    index["docnames"] = tuple(docnames)
    return index