Fixing Sphinx Search¶
When transitioning over to using denote-style filenames for this site, I changed the output filenames generated by Sphinx so that corresponding URLs were shorter and more concise. This unfortunately had the side-effect of breaking searches since the search results were still using the true docnames!
Finding the Problem¶
After playing with the search results page and reading some of the Sphinx code I eventually found out the following.
The search index is stored in a JavaScript file
searchindex.jsThe search index is mostly incomprehensible to me… but there is a field called
docnameswhich contained the true docnamesThe search index is built using the
IndexBuilderin thesphinx.searchmodule.
Implementing a Fix¶
Warning
This involves monkey-patching the IndexBuilder object which is definitely not part of Sphinx’s public API, you should expect this to break eventually!
To fix the search results we “just” have to update the docnames so that they are consistent with the ones we generated as part of the build. It took me a few attempts but I eventually came up with the following
class DenoteHTMLBuilder(DirectoryHTMLBuilder):
...
def dump_search_index(self):
if (builder := self.indexer) is not None:
builder.freeze = partial(rewrite_indexed_docnames, freeze=builder.freeze)
super().dump_search_index()
Here I’m just overwriting the method responsible for generating the data written to searchindex.js, where rewrite_indexed_docnames is defined as
def rewrite_indexed_docnames(*, freeze: Callable[[], dict[str, Any]]) -> dict[str, Any]:
"""Rewrite the docname in the search index so that they align to urls generated by
the DenoteHTMLBuilder."""
index = freeze()
docnames = []
for docname in index["docnames"]:
if not docname.startswith("content/"):
docnames.append(docname)
continue
if (record := Record.parse(docname.replace("content/", ""))) is None:
docnames.append(docname)
continue
docnames.append(record.url)
index["docnames"] = tuple(docnames)
return index