Text indexes
In MongoDB, a text index is a special type of index that allows for text search queries on string content. Text indexes can include any field whose value is a string or an array of string elements. These indexes are particularly useful for implementing features like search engines, where you need to locate documents based on text content.
How to Create a Text Index
You can create a text index using the createIndex
method and specifying the index type as "text"
.
// Create a text index on the "title" field
db.articles.createIndex({ "title": "text" })
You can also create a compound text index that includes multiple fields.
// Create a compound text index on the "title" and "description" fields
db.articles.createIndex({ "title": "text", "description": "text" })
Features of Text Indexes
Text indexes in MongoDB offer a range of features designed to facilitate complex text-based search queries. Here's a detailed look at these features:
1. Full-Text Search Capabilities
Text indexes enable full-text search, allowing you to search for words or phrases within string fields in your documents. This is particularly useful for implementing search functionalities in applications.
// Search for documents containing the word "apple"
db.products.find({ $text: { $search: "apple" } })
2. Case-Insensitive Search
Text searches are case-insensitive by default, meaning a search for "Apple" will also return documents containing "apple" or "APPLE".
3. Diacritic-Insensitive Search
Text indexes also ignore diacritics, so a search for "naïve" will also match "naive" and vice versa.
4. Stemming
MongoDB uses stemming algorithms to include various forms of a word in the search. For example, searching for "running" will also match documents containing "run" or "ran".
5. Stop Words
Commonly used words like "and", "the", "is", etc., known as stop words, are usually ignored in text searches. MongoDB has a list of stop words for supported languages.
6. Phrase Search
You can search for exact phrases by enclosing the search string in double quotes.
// Search for the exact phrase "apple pie"
db.products.find({ $text: { $search: "\"apple pie\"" } })
7. Language Support
Text indexes can be created to support multiple languages. MongoDB uses appropriate stemming and stop words for each language.
// Create a text index with Spanish language support
db.products.createIndex({ "description": "text" }, { default_language: "es" })
8. Compound Text Indexes
You can create compound text indexes that include multiple fields.
// Create a compound text index on "title" and "description"
db.products.createIndex({ "title": "text", "description": "text" })
9. Text Score
In MongoDB, when you perform a text search query using a text index, each document in the result set is assigned a "text score" that indicates how well the document matches the search string. This text score is a numerical value that represents the relevance of a document to the search query. The higher the text score, the more relevant the document is to the query.
To include the text score in the query results, you can use the $meta
operator in the projection part of the query. Here's how you can do it:
// Include text score in the query result
db.products.find(
{ $text: { $search: "apple" } },
{ score: { $meta: "textScore" } }
)
In this example, each document in the result set will include a field named score
that contains the text score.
Sorting by Text Score
You can also sort the query results based on the text score to get the most relevant documents first:
// Sort by text score
db.products.find(
{ $text: { $search: "apple" } },
{ score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })
10. Exclusion and Inclusion
You can exclude or include words in your search query using the -
and +
operators, respectively.
// Search for documents containing "apple" but not "pie"
db.products.find({ $text: { $search: "apple -pie" } })
11. Wildcard Text Search
MongoDB also supports wildcard text search, allowing you to match partial words.
// Search for documents containing words that start with "appl"
db.products.find({ $text: { $search: "appl*" } })
Considerations
Resource Intensive: Text indexes can consume a significant amount of RAM and disk space, especially for large text fields.
Single Index per Collection: You can only have one text index per collection, although that index can cover multiple fields.
Not for All Queries: Text indexes are specialized for text search and are not suitable for other types of queries.
Language Limitations: While MongoDB supports multiple languages, it may not cover all linguistic nuances, especially for less commonly used languages.