Integrating Typesense with Sphinx Read the docs Template: A Step-by-Step Guide

In this guide, I will walk you through the process of integrating Typesense with the Read the docs theme for your documentation website. This integration allows you to add a powerful search feature to your documentation, making it easier for users to find the information they need.

Summary:

Before we begin, let’s outline the infrastructure and assumptions for this setup:

Infrastructure:

  1. Typesense Server: The Typesense Server will hold your search index.
  2. Accessibility: The Typesense server must be accessible from the search bar in your documentation site.
  3. Docker: You’ll need Docker to run the Typesense Docsearch Scraper.
  4. Typesense Docsearch Scraper: This tool will be used to index your documentation website.
  5. Documentation Site: Our documentation site is self-hosted hosted on AWS S3 and publicly accessible. The documentation site itself is built using restrucuted text + Sphinx infrastruture.
  6. Documentation Site Format: This guide specifically talks about integrating the Typesense search with the documentation site that uses a Read the docs theme (sphinx-rtd-theme).

Assumptions:

  1. This setup is tested on a Mac.
  2. You are familiar with Restructured Text and Sphinx.

Now, let’s dive into the integration process.

Install Typesense Server

  1. Install Typesense server by running the following commands. For more information or other ways of installing the Typsense server, see the official Typesense Installation Guide.
   brew install typesense/tap/typesense-server@0.25.0
   brew services start typesense-server@0.25.0
  1. Verify that the Typesense server is running:
   curl http://localhost:8108/health

It should return {"ok":true} if the server is running correctly.

Next Steps: Crawl Your Documentation Site using Typesense Docsearch Scraper

Crawl Your Documentation Site using Typesense Docsearch Scraper

Prerequisites:

  • Typesense server is up and running.
  • Install docker (Docker desktop or other methods)
  • Install jq (pip3 install jq)

Crawl your Documentation Site

  1. Create a configuration file for the Typesense Docsearch Scraper. Here’s an example:
   {
     "index_name": "your_index_name",
     "start_urls": [
       "http://docs.titaniamlabs.com/index.html",
	    "https://docs.titaniamlabs.com/plugin/es/configuration/plugin_admin.html",
	    "https://docs.titaniamlabs.com/server/installation/aws/hello_world_aws.html"
     ],
     "sitemap_urls": ["http://docs.titaniamlabs.com/sitemap.xml"],
     "selectors": {
	    "lvl0": {
	      "selector": ".wy-nav-content-wrap h1",
	      "type": "css",
	      "global": true,
	      "default_value": "Documentation"
	    },
	    "lvl1": ".wy-nav-content-wrap h2",
	    "lvl2": ".wy-nav-content-wrap h3",
	    "lvl3": ".wy-nav-content-wrap h4",
	    "lvl4": ".wy-nav-content-wrap h5",
	    "text": ".wy-nav-content-wrap p, .wy-nav-content-wrap ul li, .wy-nav-content-wrap table tbody tr"
	  },
     "strip_chars": " .,;:#",
     "custom_settings": {
       "separatorsToIndex": "_",
       "attributesForFaceting": ["language", "version", "type"],
       "attributesToRetrieve": [
         "hierarchy",
         "content",
         "anchor",
         "url",
         "url_without_anchor",
         "type"
       ]
     }
   }

Note: When setting up the configuration file, you must set the selectors properly. The selectors vary based on your Sphinx documentation theme. In the above configuration file, “selector”: “.wy-nav-content-wrap h1”, is set because for the Read the docs theme, the content to index is wrapped within the section class="wy-nav-content-wrap" element. If you are using a different Sphinx html theme, you must use the corresponding elements. Make sure to adjust the selectors based on your Sphinx documentation theme. Refer to the Typesense Docsearch Scraper documentation for more details.

Example Configuration Files

2. Create an .env file in the same folder as your configuration file with the following format:

   TYPESENSE_API_KEY=xyz
   TYPESENSE_HOST=host.docker.internal
   TYPESENSE_PORT=8108
   TYPESENSE_PROTOCOL=http

If Typesense is hosted on a server with SSL, adjust the TYPESENSE_HOST and TYPESENSE_PROTOCOL accordingly.

3. Run the following command from your configuration and .env folder to launch the Typesense Docsearch Scraper as a Docker image:

 docker run -it --env-file=.env -e "CONFIG=$(cat config.json | jq -r tostring)" typesense/docsearch-scraper:0.8.0

If the run is successful, you must see something similar to the following:

DEBUG:urllib3.connectionpool:http://host.docker.internal:8108 "POST /collections/your_index_name_1696401507/documents/import HTTP/1.1" 200 None
DEBUG:typesense.api_call:host.docker.internal:8108 is healthy. Status code: 200
> DocSearch: https://docs.titaniamlabs.com/plugin/es/configuration/plugin_admin.html 31 records)

If you see the output similar to the following, something is wrong with your typesense setup. Most likely, the issue might be with the selectors in the configuration files.

https://docs.titaniamlabs.com/plugin/es/installation/hello_world_installation.html 0 records)
> DocSearch: https://docs.titaniamlabs.com/server/installation/aws/hello_world_aws.html 0 records)
INFO:scrapy.core.engine:Closing spider (finished)
INFO:scrapy.statscollectors:Dumping Scrapy stats:

Add a Searchbar to Your Read the Docs Site

Do the following steps in your restructured text source folder.

  1. Add the following CSS to all pages on your documentation site by editing the conf.py file:
   html_css_files = [
       'https://cdn.jsdelivr.net/npm/typesense-docsearch-css@0.3.0'
   ]
  1. Create a searchbox.html file with the following contents:
   {%- if 'singlehtml' not in builder %}
   <div id="searchbar"></div>
   <script src="https://cdn.jsdelivr.net/npm/typesense-docsearch.js@3.4"></script>
   <script>
     docsearch({
       container: '#searchbar',
       typesenseCollectionName: 'your_index_name',
       typesenseServerConfig: { 
         nodes: [{
           host: 'localhost',
           port: '8108',
           protocol: 'http'
         }],
         apiKey: 'xyz',
       }
     });
   </script>
   {%- endif %}

Note: Ensure that the typesenseCollectionName matches the index name you used in the Typesense Docsearch Scraper configuration.

  1. Store the searchbox.html file in the _templates folder of your Restructured Text source folder.
  2. Build your HTML documentation using the Sphinx command:
   sphinx-build -b html . _build

Your documentation website should now include a Typesense search bar connected to the Typesense server, enhancing the search experience for your users.

Leave a Comment