Demystifying Elasticsearch and Django
by Thomas Tran
In this article, we'll explore how to set up Elasticsearch with a Django app. First, we'll explore what Elasticsearch is, then understand how it's faster than traditional full-text SQL queries. Lastly, we'll implement a small Django app to index model-based data with Elasticsearch.
The Theory
Elasticsearch is a standalone "database" server that receives input data and stores it in a schema-less format which is optimized for language-based searches. Inputs into Elasticsearch are called "documents" and to search for a document, you'll need to "query" for them. Queries in Elasticsearch are lightning fast because the server does not rely on schema, unlike traditional relationship-based databases.
Elasticsearch works by taking in documents, splitting them into words, and then building an index for each word. When you search a single word query, Elasticsearch searches the index, which is O(1)
in time complexity. Compare this to raw SQL queries, where the database has to perform a table scan on every row, which will result in a time complexity of O(n)
-- much slower than the complexity of O(1)
in Elasticsearch! After retrieval, the results are ranked to ensure that the best match shows up before the other results. This is why Elasticsearch is so fast compared to raw SQL queries.
Download and run elasticsearch
Let's download and run Elasticsearch. In this tutorial, we'll be using an older version of Elasticsearch (2.4.5) since the library that connects Django and Elasticsearch, django-haystack
, does not yet support the latest version. This version of Elasticsearch is available here.
After your download has finished, extract the folder elasticsearch-2.4.5
to a clean directory. Then, cd
into the bin
directory, open a separate command-line, and run the following:
elasticsearch
...or if you're on Windows...
elasticsearch.bat
This will start the Elasticsearch server, which you'll need to keep alive during development.
Implementation with Python
We'll now set up a Django project and show how to index basic models and how to query for data.
First, I'll assume that you have set up a virtualenv
and that you're using Python 3.5
. If you're not familiar with virtualenv
, you can read more about it here.
Next, perform the following commands:
pip install Django==1.11.3
python django-admin startproject djangoelastic
cd djangoelastic
python manage.py startapp core
The first command installs Django
and makes the django-admin
module available to your virtualenv
. And assuming you're in a folder where your project will be ideally located, the second command starts an empty project named djangoelastic
. The fourth command creates a Django
app called core
. This app will encapsulate all code and business logic relevant to our Django elasticsearch implementation.
We'll create some model classes in core/models.py
, which we'll use to store info that we want to index. To be clear, querying the database is highly costly in terms of memory, time, and performance. Indexing documents, in this case will hand over model info to Elasticsearch, which searches faster than raw SQL queries.
Paste the following into core/models.py
:
from django.db import models
class Product(models.Model):
CATEGORY_CHOICES = (
('FRU', 'Fruits'),
('VEG', 'Vegetables'),
('POL', 'Poultry'),
('FIS', 'Fish')
)
name = models.CharField(max_length=100, blank=False)
description = HTMLField(help_text="Detailed description of the product")
price = models.FloatField(blank=False)
category = models.CharField(
max_length=3,
choices=CATEGORY_CHOICES,
blank=False,
help_text="A category the product belongs in"
)
Create Indices
To let Elasticsearch know what data will be available and searcheable, we will be creating indices, which are Python classes that let Elasticsearch understand how to store Django model info. To do so, we'll use a library that helps us abstract data storage and query against Elasticsearch called Haystack.
First, we'll need to install Haystack with the following command:
pip install django-haystack
Since django-haystack
is an app of its own, we need to include it in our INSTALLED_APPS
setting. Remember to also add our core
app that we created earlier.
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'core',
'haystack',
]
The following settings are also required to make Haystack work:
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://127.0.0.1:9200/',
'INDEX_NAME': 'haystack',
},
}
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
The HAYSTACK_CONNECTIONS
setting lets Django know where the Elasticsearch server is located. The RealtimeSignalProcessor
setting allows Django to update the index when the model is saved. This allows for the data in Elasticsearch to sync with the data inside the database.
Next, we'll create some indices to let Elasticsearch know how to store information. To do so, we'll need to create a new file in core/search_indexes.py
. All indices within the core
app should go into this file.
class ProductIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
price = indexes.CharField(model_attr='price')
category = indexes.CharField(model_attr='category')
def get_model(self):
return Product
When the option user_template
is set to True
, Haystack will look for a file name templates/search/indexes/core/product_text.txt
. This file contains all texts that will be available for search by default (without specifying any additional facets). In our case, this file should contain the title of the product, since that's what we want to show up when we use the search functionality:
{{ object.name }}
{{ object.description }}
Make sure to set the TEMPLATE_DIRS
setting to include our new templates
location:
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [os.path.join(BASE_DIR, 'templates').replace('\\', '/')],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
Now, let's index our data. You have multiple Product
data available. If you don't you can easily use the Django shell to insert some data. Then, use the following commands to index your data:
python manage.py rebuild_index --verbosity=2
Later on, when you you want to update date inside your index but don't want to erase and build it all over again, use the following command:
python manage.py update_index --verbosity=2
Create View
Since we want our backend to accommodate both "traditional" Django and "Restful" Django, we'll create a view which accepts a query from the frontend, which will then be processed and the result will be serialized and returned as JSON to the frontend.
In core/views.py
, we'll create a (very) simple and naïve view which will take our query, search products containing our query, and then return corresponding results:
def query_products(request, query):
results = SearchQuerySet().models(Product).filter(content__exact=query)
return JsonResponse({
'query': query,
'results': results
})
Lastly, in djangoelastic/urls.py
, append the following url to route the url to your view:
from django.conf.urls import url
from django.contrib import admin
from core.views import query_products
urlpatterns = [
url(r'^admin/', admin.site.urls),
url(r'^search/(?P\w+)/$', query_products, name="query_products"),
]
Open up a browser and go to http://localhost:8000/search/{your query}/
to access this endpoint.
From here, you can display your results on the frontend by parsing the JSON response and then displaying it in a user-friendly manner.
On a side note, Haystack does not offer exact keyword searching, meaning that it'll search both the query and words similar to the query. In order to disable this scoring capability, you must make use of Elasticsearch's constant_score
query.