Demystifying Elasticsearch and Django

July 17, 2017 04:25pm
by Thomas Tran

In this article, we'll explore how to set up Elasticsearch with a Django app. First, we'll explore what Elasticsearch is, then understand how it's faster than traditional full-text SQL queries. Lastly, we'll implement a small Django app to index model-based data with Elasticsearch.

The Theory

Elasticsearch is a standalone "database" server that receives input data and stores it in a schema-less format which is optimized for language-based searches. Inputs into Elasticsearch are called "documents" and to search for a document, you'll need to "query" for them. Queries in Elasticsearch are lightning fast because the server does not rely on schema, unlike traditional relationship-based databases.

Elasticsearch works by taking in documents, splitting them into words, and then building an index for each word. When you search a single word query, Elasticsearch searches the index, which is O(1) in time complexity. Compare this to raw SQL queries, where the database has to perform a table scan on every row, which will result in a time complexity of O(n) -- much slower than the complexity of O(1) in Elasticsearch! After retrieval, the results are ranked to ensure that the best match shows up before the other results. This is why Elasticsearch is so fast compared to raw SQL queries.

Download and run elasticsearch

Let's download and run Elasticsearch. In this tutorial, we'll be using an older version of Elasticsearch (2.4.5) since the library that connects Django and Elasticsearch, django-haystack, does not yet support the latest version. This version of Elasticsearch is available here.

After your download has finished, extract the folder elasticsearch-2.4.5 to a clean directory. Then, cd into the bin directory, open a separate command-line, and run the following:

elasticsearch

...or if you're on Windows...

elasticsearch.bat

This will start the Elasticsearch server, which you'll need to keep alive during development.

Implementation with Python

We'll now set up a Django project and show how to index basic models and how to query for data.

First, I'll assume that you have set up a virtualenv and that you're using Python 3.5. If you're not familiar with virtualenv, you can read more about it here.

Next, perform the following commands:


pip install Django==1.11.3
python django-admin startproject djangoelastic
cd djangoelastic
python manage.py startapp core

The first command installs Django and makes the django-admin module available to your virtualenv. And assuming you're in a folder where your project will be ideally located, the second command starts an empty project named djangoelastic. The fourth command creates a Django app called core. This app will encapsulate all code and business logic relevant to our Django elasticsearch implementation.

We'll create some model classes in core/models.py, which we'll use to store info that we want to index. To be clear, querying the database is highly costly in terms of memory, time, and performance. Indexing documents, in this case will hand over model info to Elasticsearch, which searches faster than raw SQL queries.

Paste the following into core/models.py:


from django.db import models

class Product(models.Model):

	CATEGORY_CHOICES = (
		('FRU', 'Fruits'),
		('VEG', 'Vegetables'),
		('POL', 'Poultry'),
		('FIS', 'Fish')
	)

	name = models.CharField(max_length=100, blank=False)
	description = HTMLField(help_text="Detailed description of the product")
	price = models.FloatField(blank=False)
	category = models.CharField(
		max_length=3,
		choices=CATEGORY_CHOICES,
		blank=False, 
		help_text="A category the product belongs in"
	)

Create Indices

To let Elasticsearch know what data will be available and searcheable, we will be creating indices, which are Python classes that let Elasticsearch understand how to store Django model info. To do so, we'll use a library that helps us abstract data storage and query against Elasticsearch called Haystack.

First, we'll need to install Haystack with the following command:

pip install django-haystack

Since django-haystack is an app of its own, we need to include it in our INSTALLED_APPS setting. Remember to also add our core app that we created earlier.


INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'core',
    'haystack',
]

The following settings are also required to make Haystack work:


HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': 'http://127.0.0.1:9200/',
        'INDEX_NAME': 'haystack',
    },
}
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

The HAYSTACK_CONNECTIONS setting lets Django know where the Elasticsearch server is located. The RealtimeSignalProcessor setting allows Django to update the index when the model is saved. This allows for the data in Elasticsearch to sync with the data inside the database.

Next, we'll create some indices to let Elasticsearch know how to store information. To do so, we'll need to create a new file in core/search_indexes.py. All indices within the core app should go into this file.


class ProductIndex(indexes.SearchIndex, indexes.Indexable):
	text = indexes.CharField(document=True, use_template=True)
	price = indexes.CharField(model_attr='price')
	category = indexes.CharField(model_attr='category')

	def get_model(self):
		return Product

When the option user_template is set to True, Haystack will look for a file name templates/search/indexes/core/product_text.txt. This file contains all texts that will be available for search by default (without specifying any additional facets). In our case, this file should contain the title of the product, since that's what we want to show up when we use the search functionality:


{{ object.name }}
{{ object.description }}

Make sure to set the TEMPLATE_DIRS setting to include our new templates location:


TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': [os.path.join(BASE_DIR, 'templates').replace('\\', '/')],
        'APP_DIRS': True,
        'OPTIONS': {
            'context_processors': [
                'django.template.context_processors.debug',
                'django.template.context_processors.request',
                'django.contrib.auth.context_processors.auth',
                'django.contrib.messages.context_processors.messages',
            ],
        },
    },
]

Now, let's index our data. You have multiple Product data available. If you don't you can easily use the Django shell to insert some data. Then, use the following commands to index your data:


python manage.py rebuild_index --verbosity=2

Later on, when you you want to update date inside your index but don't want to erase and build it all over again, use the following command:


python manage.py update_index --verbosity=2

Create View

Since we want our backend to accommodate both "traditional" Django and "Restful" Django, we'll create a view which accepts a query from the frontend, which will then be processed and the result will be serialized and returned as JSON to the frontend.

In core/views.py, we'll create a (very) simple and naïve view which will take our query, search products containing our query, and then return corresponding results:


def query_products(request, query):

	results = SearchQuerySet().models(Product).filter(content__exact=query)

	return JsonResponse({
		'query': query,
		'results': results
	})

Lastly, in djangoelastic/urls.py, append the following url to route the url to your view:


from django.conf.urls import url
from django.contrib import admin

from core.views import query_products

urlpatterns = [
    url(r'^admin/', admin.site.urls),
    url(r'^search/(?P\w+)/$', query_products, name="query_products"),
]

Open up a browser and go to http://localhost:8000/search/{your query}/ to access this endpoint.

From here, you can display your results on the frontend by parsing the JSON response and then displaying it in a user-friendly manner.

On a side note, Haystack does not offer exact keyword searching, meaning that it'll search both the query and words similar to the query. In order to disable this scoring capability, you must make use of Elasticsearch's constant_score query.