Skip to main content

Text indexes

In MongoDB, a text index is a special type of index that allows for text search queries on string content. Text indexes can include any field whose value is a string or an array of string elements. These indexes are particularly useful for implementing features like search engines, where you need to locate documents based on text content.

How to Create a Text Index

You can create a text index using the createIndex method and specifying the index type as "text".

// Create a text index on the "title" field
db.articles.createIndex({ "title": "text" })

You can also create a compound text index that includes multiple fields.

// Create a compound text index on the "title" and "description" fields
db.articles.createIndex({ "title": "text", "description": "text" })

Features of Text Indexes

Text indexes in MongoDB offer a range of features designed to facilitate complex text-based search queries. Here's a detailed look at these features:

1. Full-Text Search Capabilities

Text indexes enable full-text search, allowing you to search for words or phrases within string fields in your documents. This is particularly useful for implementing search functionalities in applications.

// Search for documents containing the word "apple"
db.products.find({ $text: { $search: "apple" } })

Text searches are case-insensitive by default, meaning a search for "Apple" will also return documents containing "apple" or "APPLE".

Text indexes also ignore diacritics, so a search for "naïve" will also match "naive" and vice versa.

4. Stemming

MongoDB uses stemming algorithms to include various forms of a word in the search. For example, searching for "running" will also match documents containing "run" or "ran".

5. Stop Words

Commonly used words like "and", "the", "is", etc., known as stop words, are usually ignored in text searches. MongoDB has a list of stop words for supported languages.

You can search for exact phrases by enclosing the search string in double quotes.

// Search for the exact phrase "apple pie"
db.products.find({ $text: { $search: "\"apple pie\"" } })

7. Language Support

Text indexes can be created to support multiple languages. MongoDB uses appropriate stemming and stop words for each language.

// Create a text index with Spanish language support
db.products.createIndex({ "description": "text" }, { default_language: "es" })

8. Compound Text Indexes

You can create compound text indexes that include multiple fields.

// Create a compound text index on "title" and "description"
db.products.createIndex({ "title": "text", "description": "text" })

9. Text Score

In MongoDB, when you perform a text search query using a text index, each document in the result set is assigned a "text score" that indicates how well the document matches the search string. This text score is a numerical value that represents the relevance of a document to the search query. The higher the text score, the more relevant the document is to the query.

To include the text score in the query results, you can use the $meta operator in the projection part of the query. Here's how you can do it:

// Include text score in the query result
db.products.find(
{ $text: { $search: "apple" } },
{ score: { $meta: "textScore" } }
)

In this example, each document in the result set will include a field named score that contains the text score.

Sorting by Text Score

You can also sort the query results based on the text score to get the most relevant documents first:

// Sort by text score
db.products.find(
{ $text: { $search: "apple" } },
{ score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })

10. Exclusion and Inclusion

You can exclude or include words in your search query using the - and + operators, respectively.

// Search for documents containing "apple" but not "pie"
db.products.find({ $text: { $search: "apple -pie" } })

MongoDB also supports wildcard text search, allowing you to match partial words.

// Search for documents containing words that start with "appl"
db.products.find({ $text: { $search: "appl*" } })

Considerations

  1. Resource Intensive: Text indexes can consume a significant amount of RAM and disk space, especially for large text fields.

  2. Single Index per Collection: You can only have one text index per collection, although that index can cover multiple fields.

  3. Not for All Queries: Text indexes are specialized for text search and are not suitable for other types of queries.

  4. Language Limitations: While MongoDB supports multiple languages, it may not cover all linguistic nuances, especially for less commonly used languages.