
Introduction to NoSQL databases, features of MongoDB, and installation procedures
Complete CRUD operations and MongoDB query language examples
Demonstrations of the aggregation framework to group data and perform aggregation operations
Indexing details to improve the performance of a query
Replication and sharding features to increase data availability and avoid data loss in case of a failure
Monitoring the database and backup strategies
Security features to implement proper authentication and authorization
The audience for this book includes all levels of IT professionals and anyone who wants to get a good understanding of MongoDB.
Chapter 1 describes NoSQL databases, various features of MongoDB, the installation procedure, and how to interact with MongoDB.
Chapter 2 describes MongoDB’s rich query language to support create, read, update, and delete (CRUD) operations.
Chapter 3 describes the aggregation framework to perform aggregation operations. The aggregation framework can group data and perform a variety of operations on that grouped data.
Chapter 4 describes indexes, types of index, index properties, and the various indexing strategies to be considered. Indexes are used to improve the performance of a query.
Chapter 5 describes the various replication and sharding strategies in MongoDB. Replication is the process of creating and managing a duplicate version of a database across servers to provide redundancy and increase the availability of data. Sharding distributes data across servers to increase availability.
Chapter 6 describes the multi-document transactions feature. Multi-document transactions help us to achieve all-or-nothing execution to maintain data integrity.
Chapter 7 describes the details of various tools to monitor a MongoDB database and procedures for backup operations. These MongoDB monitoring tools help us to understand what is happening with a database at various levels.
Chapter 8 describes the various security features in MongoDB to verify the identity of the user and access controls to determine the verified user’s access to resources and operations.
Subhashini Chellappan
Dharanitharan Ganesan
The making of this book was a journey that we are glad we undertook. The journey spanned a few months, but the experience will last a lifetime. We had our families, friends, colleagues, and other well-wishers onboard for this journey and we wish to express our deepest gratitude to each one of them.
We would like to express our special thanks to our families, friends, and colleagues who provided the support that allowed us to complete this book within a limited time frame.
Special thanks are extended to our technical reviewers for the vigilant review and filling in with their expert opinion.
We would like to thank Celestin Suresh John at Apress for signing us up for this wonderful creation. We wish to acknowledge and appreciate all our coordinating editors and the team who guided us through the entire process of preparation and publication.

is a technology enthusiast with expertise in the big data and cloud space. She has rich experience in both academia and the software industry. Her areas of interest and expertise are centered on business intelligence, big data analytics, and cloud computing.

has an MBA in technology management with high level of exposure and experience in big data—Apache Hadoop, Apache Spark, and various Hadoop ecosystem components. He has a proven track record of improving efficiency and productivity through the automation of various routine and administrative functions in business intelligence and big data technologies. His areas of interest and expertise are centered on machine learning algorithms, blockchain in big data, statistical modeling, and predictive analytics.

has been working in the software industry for nearly 20 years. He holds an engineering degree from COEP, Pune (India) and since then has enjoyed an exciting IT journey.
As a Principal Architect at TatvaSoft, Manoj has undertaken many initiatives in the organization ranging from training and mentoring teams leading the data science and machine learning practice, to successfully designing client solutions from different functional domains.
Starting as a Java programmer, he is one of the fortunate to have worked on multiple frameworks with multiple languages as a full-stack developer. In last five years, he has worked extensively in the field of business intelligence, big data, and machine learning with the technologies like Hitachi Vantara (Pentaho), Hadoop ecosystem, tensorflow, Python-based libraries, and more.
He is passionate about learning new technologies, trends, and reviewing books. When he is not working, he is either working out or reading infinitheism literature.
Manoj would like to thank Apress for providing this opportunity to review this title and his two daughters Ayushee and Ananyaa for their understanding during the process.
Why NoSQL?
What are NoSQL databases?
CAP theorem.
BASE approach.
Types of NoSQL databases.
MongoDB features.
MongoDB installation on Windows.
MongoDB installation on Linux.
MongoDB Compass installation on Windows.
Terms used in MongoDB.
Data types in MongoDB.
Database commands.
Modern application storage capacity exceeds the storage capabilities of the traditional relational database management system (RDBMS).
Modern applications require unknown level of scalability.
Modern applications should be available 24/7.
Data needs to be distributed globally.
Users should be able to read and write data anywhere.
Users always seek reduction of software and hardware costs.
All these challenges have given birth to the NoSQL databases.
NoSQL databases are open source, nonrelational, distributed databases that allow organizations to analyze huge volumes of data.
Availability.
Fault tolerance.
Scalability.
They do not use SQL as a query language.
Most NoSQL databases are designed to run on clusters.
They operate without a schema, freely adding fields to the database without having to define any changes in structure first.
They are polygot persistent, meaning there are different ways to store the data based on the requirements.
They are designed in such a way that they can be scaled out.
Consistency implies that every read fetches the last write.
Availability implies that reads and writes always succeed. In other words, each nonfailing node will return a response in a reasonable amount of time.
Partition tolerance implies that the system will continue to function even when there is a data loss or system failure.

CAP theorem
Consistency and availability.
Consistency and partition tolerance.
Availability and partition tolerance.
The partition tolerance property is a must for NoSQL databases.
Basic availability: The database should be available most of the time.
Soft state: Tempory inconsistency is allowed.
Eventual consistency: The system will come to a consistent state after a certain period.
Data Model | Example | Description |
|---|---|---|
Key/Value Store | Dynamo DB, Riak | • Least complex NoSQL options. • Key and a value. |
Column Store | HBase, Big Table | • Also known as wide column store. • Storing data tables as sections of columns of data. |
Document Store | MongoDB, CouchDB | • Extends key/value idea. • Storing data as a document. • Complex NoSQL options. • Each document contains a unique key that is used to retrieve the document. |
Graph Database | Neo4j | • Based on graph theory. • Storing data as nodes, edges, and properties. |
MongoDB is an open source, document-oriented NoSQL database written in C++. MongoDB provides high availability, automatic scaling, and high performance. The following sections introduce the features of MongoDB.

A document
In many programming languages, a document corresponds to native data types.
Embedded documents help in reducing expensive joins.
Here, the collection Person has two documents, each with different fields, contents, and size.
We discuss MongoDB query language in detail in Chapter 2.
MongoDB provides an aggregation framework to perform aggregation operations. The aggregation framework can group data from multiple documents and perform variety operations on that grouped data. MongoDB provides an aggregation pipeline, the map-reduce function, and single-purpose aggregation methods to perform aggregation operations. We discuss the aggregation framework in detail in Chapter 3.
Indexes improve the performance of query execution. MongoDB uses indexes to limit the number of documents scanned. We discuss various indexes in Chapter 4.
GridFS is a specification for storing and retrieving large files. GridFS can be used to store files that exceed the BSON maximum document size of 16 MB. GridFS divides the file into parts known as chunks and stores them as separate documents. GridFS divides the file into chunks of 255 KB, except the last chunk, the size of which is based on the file size.
Replication is a process of copying an instance of a database to a different database server to provide redundancy and high availability. Replication provides fault tolerance against the loss of a single database server. In MongoDB, the replica set is a group of mongod processes that maintain the same data set. We discuss how to create replica sets in Chapter 5.
A single server can be a challenge when we need to work with large data sets and high throughput applications in terms of central processing unit (CPU) and inout/output (I/O) capacity. MongoDB uses sharding to support large data sets and high-throughput operations. Sharding is a method for distributing data across multiple systems, discussed in detail in Chapter 5.
The mongo shell is an interactive JavaScript interface to MongoDB. The mongo shell is used to query, update data, and perform administrative operations. The mongo shell is a component of MongoDB distributions. When you start the mongod process, the mongo shell will be connected to a MongoDB instance.
Terms | MongoDB |
|---|---|
MongoDB server | mongod |
MongoDB client | mongo |
RDBMS | MongoDB |
|---|---|
Database | Database |
Table | Collection |
Record | Document |
Columns | Fields or key/value pairs |
ACID transactions | ACID transactions |
Secondary index | Secondary index |
JOINs | Embedded document, $lookup |
GROUP_BY | Aggregation pipeline |
String | Strings are UTF-8 |
|---|---|
Integer | Can be 32-bit or 64-bit |
Double | To store floating-point values |
Arrays | To store a list of values into one key |
Timestamps | For MongoDB internal use; values are 64-bit Record when a document has been modified or added |
Date | A 64-bit integer that represents the number of milliseconds since the Unix epoch (January 1, 1970) |
ObjectId | Small, unique, fast to generate, and ordered Consist of 12 bytes, where the first four bytes are a timestamp that reflects the ObjectId’s creation |
Binary data | To store binary data (images, binaries, etc.) |
Null | To store NULL value |
Refer to the following link for data types in MongoDB: https://docs.mongodb.com/manual/reference/bson-types/
Let us learn how to install MongoDB.
In this recipe, we are going to discuss how to install MongoDB on Windows.
You want to install MongoDB on Windows.
Download the MongoDB msi installer from https://www.mongodb.com/download-center#enterprise.
Let’s follow the steps in this section to install MongoDB on Windows.
Right-click the Windows installer, select Run as Administrator, and follow the instructions to install MongoDB.
Create \data\db directory in C:\ to store MongoDB data. Directory looks like “C:\data\db”.

Starting MongoDB Server
You should get a message that states “Waiting for connection on port 27017.”

Starting MongoDB Client
We can see the mongo shell.
This confirms, we have installed MongoDB on windows.
In this recipe, we are going to discuss how to install MongoDB on Ubuntu.
You want to install MongoDB on Ubuntu.
Download the MongoDB tarball from https://www.mongodb.com/download-center/community.
Let’s follow the steps in this section to install MongoDB on Ubuntu.
Create a /data/db directory as shown in Recipe 1-1.
In this recipe, we are going to discuss how to install MongoDB Compass on Windows. MongoDB Compass is a simple-to-use graphical user interface (GUI) for interacting with MongoDB.
You want to install MongoDB Compass on Windows.
Download the MongoDB Compass msi installer from https://www.mongodb.com/download-center#compass.
Let’s follow the steps in this section to install MongoDB Compass on Windows.

MongoDB Compass GUI
You should see a message that states “Waiting for connection on port 27017.”
The default port number for MongoDB is 27017.

New Connection page

Connecting MongoDB Compass to the MongoDB server
We have now connected MongoDB Compass to the MongoDB server.
Next, we discuss database commands in MongoDB.
In this recipe, we are going to discuss how to create a database in MongoDB.
You want to create a database in MongoDB.
Let’s follow the steps in this section to create a database in MongoDB.
This shows that you are working in the mydb database, so you know we have created a database by that name.
In this recipe, we are going to discuss how to drop a database in MongoDB.
You want to drop a database in MongoDB.
Let’s follow the steps in this section to drop a database in MongoDB.
We have thus dropped the database named mydb.
If no database is selected, the default database test is dropped.
In this recipe, we are going to discuss how to display a list of databases.
You want to display a list of databases.
Let’s follow the steps in this section to display a list of databases.
We can now see the list of databases.
The newly created database mydb is not shown in the list. This is because the database needs to have at least one collection to display in the list. The default database is test.
In this recipe, we are going to discuss how to display the version of MongoDB.
You want to display the version of MongoDB.
Let’s follow the steps in this section to display the version of MongoDB.
We can see that the version of MongoDB is 4.0.2.
In this recipe, we are going to see how to display the list of MongoDB commands.
You want to display the list of MongoDB commands.
Let’s follow the steps in this section to display a list of commands.
We can now see the list of MongoDB commands.
A few scenarios where we can apply MongoDB are e-commerce product catalogs, blogs, and content management.
In next chapter, we discuss how to perform CRUD operations using MongoDB query language.
Collections.
MongoDB CRUD operations.
Bulk write operations.
MongoDB import and export.
Embedded documents.
Working with arrays.
Array of embedded documents.
Projection.
Dealing with null and missing values.
Working with limit() and skip().
Working with Node.js and MongoDB.
This command creates a collection named person if it is not present. If it is present, it simply inserts a document into the person collection.
In this recipe, we are going to discuss how to create a collection.
You want to create a collection named person using the db.createCollection() command.
Let’s follow the steps in this section to create a collection named person.
We can now see the collection named person.
In this recipe, we are going to discuss what a capped collection is and how to create one. Capped collections are fixed-size collection for which we can specify the maximum size of the collection in bytes and the number of documents allowed. Capped collections work similar to a circular buffer: Once it fills its allocated space, it makes room for new documents by overwriting the oldest documents in the collection.
You can’t shard capped collections.
You can’t use the aggregation pipeline operator $out to write results to a capped collection.
You can’t delete documents from a capped collection.
Creating an index allows you to perform efficient update operations.
You want to create a capped collection named student using the db.createCollection() method .
Let’s follow the steps in this section to create a capped collection named student.
Here, size denotes the maximum size of the collection in bytes and max denotes the maximum number of documents allowed.
We can see the results are now in the reverse order.
Notice that the first document is overwritten by the third document, as the maximum number of documents allowed in the student capped collection is two.
Create operations allow us to insert documents into a collection. These insert operations target a single collection. All write operations are atomic at the level of a single document.
db.collection.insertOne | Inserts a single document |
db.collection.insertMany | Inserts multiple documents |
db.collection.insert | Inserts a single document or multiple documents |
In this recipe, we are going to discuss various methods to insert documents into a collection.
You want to insert documents into the collection person.
Let’s follow the steps in this section to insert documents into the collection person.
insertOne() returns a document that contains the newly inserted document’s _id.
If you don’t specify an _id field, MongoDB generates an _id field with an ObjectId value. The _id field act as the primary key.
Pass an array of documents to the insertMany() method to insert multiple documents.
Read operations allow us to retrieve documents from a collection. MongoDB provides the find() method to query documents.
You can specify query filters or criteria inside the find() method to query documents based on conditions.
In this recipe, we are going to discuss how to retrieve documents from a collection.
You want to retrieve documents from a collection.
Let’s follow the steps in this section to query documents in a collection.
You need to pass an empty document as a query parameter to the find() method.
To specify an equality condition, you need to use <field>:<value> expressions in the query to filter documents.
You can use $or in a compound query to join each clause with a logical or conjunction.
The operator $or selects the documents from the collection that match at least one of the selected conditions.
db.collection.updateOne | To modify a single document |
db.collection.updateMany | To modify multiple documents |
db.collection.replaceOne | To replace the first matching document in the collection that matches the filter |
In MongoDB, update operations target a single collection.
In this recipe, we are going to discuss how to update documents.
You want to update documents in a collection.
Let’s follow the steps in this section to update documents in a collection.
MongoDB provides modify operators to modify field values such as $set.
You cannot update the _id field.
Here, the modifiedCount is 2, which indicates that the preceding command modified two documents in the student collection.
You can replace the entire content of a document except the _id field by passing an entirely new document as the second argument to db.collection.replaceOne() as shown here.
Do not include update operators in the replacement document. You can omit the _id field in the replacement document because the _id field is immutable. However, if you want to include the _id field, use the same value as the current value.
db.collection.deleteOne | To delete a single document |
db.collection.updateMany | To delete multiple documents |
In MongoDB, delete operations target a single collection.
In this recipe, we discuss how to delete documents.
You want to delete documents from a collection.
Let’s follow the steps in this section to delete documents in a collection.
The MongoDB import tool allows us to import content from JSON, comma-separated value (CSV), and tab-separated value (TSV) files. MongoDB import only supports files that are UTF-8 encoded.
The MongoDB export tool allows us to export data stored in MongoDB instances as JSON or CSV files.
In this recipe, we are going to discuss how to import data from a CSV file.
You want to import data from the student.csv file to the students collection.
To work with the Mongo import command, start the mongod process and open another command prompt to issue the mongo import command.
Let’s follow the steps in this section to work with Mongo import.
In this recipe, we are going to discuss how to export data from the students collection to a student.json file.
You want to export data from the students collection to a student.json file.
To work with the Mongo export command, start the mongod process and open another command prompt to issue the Mongo export command.
Let’s follow the steps in this section to work with Mongo export.
Here, the marks field contains an embedded document.
In this recipe, we are going to discuss how to query embedded documents.
You want to query embedded documents.
You need to pass the filter document {<field>:<value>} to the find() method where <value> is the document to match.
Let’s follow the steps in this section to query embedded documents.
To perform an equality match on the embedded document, you need to specify the exact match document in the <value> document, including the field order.
We can use dot notation to query a field of the embedded document.
In MongoDB, we can specify field values as an array.
In this recipe, we are going to discuss how to query an array.
You want to query an array.
Let’s follow the steps in this section to query embedded documents.
To specify an equality condition on an array, you need to specify the exact array to match in the <value> of the query document {<field>:<value>}.
Here, one element of the scores array can satisfy the greater than 20 condition and another element can satisfy the less than 24 condition, or a single element can satisfy both the conditions.
Use dot notation to query an array element by its index position.
You can use the $size operator to query an array by number of elements.
You can use the $push operator to add a value to an array.
You can use the $addToSet operator to add a value to an array. $addToSet adds a value to an array only if a value is not present. If a value is present, it does not do anything.
You can use the $pop operator to remove the first or last element of an array.
In this recipe, we are going to discuss how to query an array of embedded documents.
You want to query an array of embedded documents.
Let’s follow the steps in this section to query an array of embedded documents.
Equality matching on the array of an embedded document requires an exact match of the specified document, including field order.
By default, queries in MongoDB return all fields in matching documents. You can restrict the fields to be returned using a projection document.
In this recipe, we are going to discuss how to restrict the fields to return from a query.
You want to restrict the fields to return from a query.
Let’s follow the steps in this section to restrict the fields to return from a query.
You can project the required fields by setting the <field> value to 1 in the projection document.
You can suppress the _id field by setting its exclusion <field> to 0 in the projection document.
You can exclude fields by setting <field> to 0 in the projection document.
You can use dot notation to refer to the embedded field and set the <field> to 1 in the projection document.
In this recipe, we are going to discuss how to query null or missing values in MongoDB.
You want to query null or missing values.
Let’s follow the steps in this section to query null or missing values.
Here, it returns both the documents.
You can use the $exist operator to check for the existence of a field.
The db.collection.find() method of MongoDB returns a cursor. You need to iterate a cursor to access the documents. You can manually iterate a cursor in the mongo shell by assigning a cursor returned from the find() method to a variable using the var keyword.
If you are not assigning a cursor to a variable using the var keyword, it is automatically iterated up to 20 times to print the first 20 documents.
In this recipe, we are going to discuss how to iterate a cursor in the mongo shell.
You want to iterate a cursor in the mongo shell.
The limit() method is used to limit the number of documents in the query results and the skip() method skips the given number of documents in the query result.
In this recipe, we are going to discuss how to work with the limit() and skip() methods .
You want to limit and skip the documents in a collection.
Let’s follow the steps in this section to work with the limit() and skip() methods. Consider the numbers collection created in Recipe 2-14.
MongoDB is one of the most popular databases used with Node.js .
In this recipe, we are going to discuss how to work with Node.js and MongoDB.
You want to perform CRUD operations using Node.js.
Download the Node.js installer from https://nodejs.org/en/download/ and install it.
The given link might be changed in the future.
Let’s follow the steps in this section to work with Node.js and MongoDB.

Data models.
Data model relationship between documents.
Modeling tree structures.
Aggregation operations.
SQL aggregation terms and corresponding MongoDB aggregation operations.
Embedded data models.
Normalized data models.

A denormalized model
This embedded document model allows applications to store the related piece of information in the same records. As a result, the application requires only few queries and updates to complete common operations.
We can use embedded documents to represent both one-to-one relationships (a “contains” relationship between two entities) and one-to-many relationships (when many documents are viewed in the context of one parent).
For read operations.
When we need to retrieve related data in a single database operation.
Embedded data models update related data in a single atomic operation. Embedded document data can be accessed using dot notation.

Normalized data model
When embedding data model results duplication of data.
To represent complex many-to-many relationships.
To model large hierarchical data sets.
Normalized data models do not provide good read performance.
Let’s explore a data model that uses an embedded document and references.
In this recipe, we are going to discuss a data model using an embedded document.
You want to create a data model for a one-to-one relationship.
Use an embedded document.
Let’s follow the steps in this section to design a data model for a one-to-one relationship.
With this data model, we can retrieve complete student information with one query.
This data model allows us to retrieve complete student information with one query.
In this recipe, we are going to discuss a data model using document references.
You want to create a data model for a one-to-many relationship.
Use a document reference.
Let’s follow the steps in this section to design a data model for a one-to-many relationship.
Here, the publisher document is embedded inside the book document, which leads to repetition of the publisher data model.
Let’s look at a data model that describes a tree-like structure.
In this recipe, we are going to discuss a tree structure data model using parent references.
You want to create a data model for a tree structure with parent references.
Use the parent references pattern.
Let’s follow the steps in this section to design a data model for a tree structure with parent references.
The parent references pattern stores each tree node in a document; in addition to the tree node, the document stores the _id of the node’s parent.

Tree structure for the author collection
The child references pattern stores each tree node in a document; in addition to the tree node, the document stores in an array the _id value(s) of the node’s children.
Child references are a good choice to work with tree storage when there are no subtree operations.
The array of ancestors pattern stores each tree node in a document; in addition to the tree node, the document stores in an array the _id value(s) of the node’s ancestors or path.
The array of ancestors field stores the ancestors field and reference to the immediate parent.
This pattern provides an efficient solution to find all descendants and the ancestors of a node. The array of ancestors pattern is a good choice for working with subtrees.
Aggregation pipeline.
Map-reduce function.
Single-purpose aggregation methods.
The aggregation pipeline is a framework for data aggregation. It is modeled based on the concept of data processing pipelines. Pipelines execute an operation on some input and use that output as an input to the next operation. Documents enter a multistage pipeline that transforms them into an aggregated result.
In this recipe, we are going to discuss how the aggregation pipeline works.
You want to work with aggregation functions.
Let’s follow the steps in this section to work with the aggregation pipeline.
In the preceding example, the value of the variable is accessed by using the $ sign.
MongoDB also provides map-reduce to perform aggregation operations. There are two phases in map-reduce: a map stage that processes each document and outputs one or more objects and a reduce stage that combines the output of the map operation.
A custom JavaScript function is used to perform map and reduce operations. Map-reduce is less efficient and more complex compared to the aggregation pipeline.
In this recipe, we are going to discuss how to perform aggregation operations using map-reduce.
You want to work with aggregation operations using map-reduce.
Use a customized JavaScript function.
Let’s follow the steps in this section to work with map-reduce.
To filter on status:"A" and then group it on custID and compute the sum of amount, use the following map-reduce function.
MongoDB also provides single-purpose aggregation operations such as db.collection.count() and db.collection.distinct(). These aggregate operations aggregate documents from a single collection. This functionality provides simple access to common aggregation processes.
In this recipe, we are going to discuss how to use single-purpose aggregation operations.
You want to work with single-purpose aggregation operations.
Let’s follow the steps in this section to work with single-purpose aggregation operations.
SQL Term | MongoDB Operator |
|---|---|
WHERE | $match |
GROUP BY | $group |
HAVING | $match |
SELECT | $project |
ORDER BY | $sort |
LIMIT | $limit |
SUM | $sum |
COUNT | $sum |
JOIN | $lookup |
In this recipe, we are going to discuss examples of matching MongoDB operations to equivalent SQL aggregation terms.
You want to understand the equivalent of MongoDB queries for any SQL queries.
Refer to Table 3-1 and use the equivalent MongoDB operator for a respective SQL clause.
Let’s follow the steps in this section to understand the MongoDB queries for certain SQL operations.
Now, let’s find the count of records from the orders colletion.
Indexes.
Types of indexes.
Index properties.
The various indexing strategies to be considered.
Indexes are used to improve the performance of the query. Without indexes, MongoDB must search the entire collection to select those documents that match the query statement. MongoDB therefore uses indexes to limit the number of documents that it must scan.
Indexes are special data structures that store a small portion of the collection’s data set in an easy-to-transform format. The index stores a set of fields ordered by the value of the field. This ordering helps to improve the performance of equality matches and range-based query operations.
MongoDB defines indexes at the collection level and indexes can be created on any field of the document. MongoDB creates an index for the _id field by default.
MongoDB creates a default unique index _id field, which helps to prevent inserting two documents with the same value of the _id field.
In this recipe, we are going to discuss how to work with indexes in MongoDB.
You want to create an index.
Let’s follow the steps in this section to create an index.
Here, the parameter value “1” indicates that empId field values will be stored in ascending order.
We can’t drop _id indexes. MongoDB creates an index for the _id field by default.
In this recipe, we are going to discuss various types of indexes.
You want to create different types of indexes.

Types of indexes
Multikey indexes are useful to create an index for a field that holds an array value. MongoDB creates an index key for each element in the array.
You can’t create a compound multikey index.
MongoDB provides text indexes to support text search queries on string content. You can create a text index on a field that takes as its value a string or an array of string elements.
This command creates a text index for the field post_text.
The size of the indexes can be reduced with the help of hashed indexes. Hashed indexes store the hashes of the values of the indexed field. Hashed indexes support sharding using hashed shard keys. In hashed-based sharding, a hashed index of a field is used as the shard key to partition data across the sharded cluster. We discuss sharding in Chapter 5.
Hashed indexes do not support multikey indexes.
The 2dsphere index is useful to return queries on geospatial data.
Indexes can also have properties. The index properties define certain characteristics and behaviors of an indexed field at runtime. For example, a unique index ensures the indexed fields do not support duplicates. In this recipe, we are going to discuss various index properties.
You want to work with index properties.
Let’s follow the steps in this section to work with index properties.
Time to Live (TTL) indexes are single-field indexes that are used to remove documents from a collection after a certain amount of time. Data expiration is useful for certain types of information such as logs, machine-generated data, and so on.
A unique index ensures that the indexed fields do not contain any duplicate values. By default, MongoDB creates a unique index on the _id field.
Partial indexes are useful when you want to index the documents in a collection that meet a specific filter condition. The filter condition could be specified using any operators. For example, db.person.find( { age: { $gt: 15 } } ) can be used to find the documents that have an age greater than 15 in the person collection. Partial indexes reduce storage requirements and performance costs because they store only a subset of the documents.
Use the db.collection.createIndex() method with the partialFilterExpression option to create a partial index.
equality expressions (i.e., field: value or using the $eq operator).
$exists: true expression.
$gt, $gte, $lt, and $lte expressions.
$type expressions.
$and operator at the top level only.
Sparse indexes store entries only for the documents that have the indexed field, even if it contains null values. A sparse index skips any documents that do not have the indexed field. The index is considered sparse because it does not include all the documents of a collection.
Partial indexes determine the index entries based on the filter condition, whereas sparse indexes select the documents based on the existence of the indexed field.
We must follow different strategies to create the right index for our requirements. In this recipe, we are going to discuss various indexing strategies.
You want to learn about indexing strategies to ensure you are creating the right type of index for different purposes.
Type of executing query.
Number of read/write operations.
Available memory.

Indexing strategies
Let’s follow the steps in this section to work with different indexing strategies.
Creating the right index to support the queries increases the query execution performance and results in great performance.
Sort operations use indexes for better performance. Indexes determine the sort order by fetching the documents based on the ordering in the index.
Sorting with a single-field index.
Sorting on multiple fields.
When using multiple collections, we must consider the size of indexes on all collections and ensure the index fits in memory to avoid the system reading the index from the disk.
When we ensure the index fits entirely into the RAM, that ensures faster system processing.
This query must scan all the documents to return the result of empId values greater than 1.
This query must scan only one document to return the result empId:4.
Replication.
Sharding.
Replication is the process of creating and managing a duplicate version of a database across servers to provide redundancy and increase availability of data.

A replica set
In this recipe, we are going to discuss how to set up a replica set (one primary and two secondaries) in Windows.
You want to create a replica set.
Use a group of mongod instances.
Let’s follow the steps in this section to set up a three-member replica set.

Starting mongod with replica set 1
hostname must be replaced as with ipaddress or localhost if it is the same local machine.

mongod instance waiting for connection on port 20001

Starting mongod with replica set 2
Refer to Figure 5-5 for a mongod instance that is waiting for connection on port 20002.

mongod instance waiting for connection on port 20002

Starting mongod with replica set 3

mongod instance waiting for connection on port 20003

Connect to mongo instance on port 20001
Here, the mongod instance running on port 20001 becomes primary.
The output Error: error: { ........... } shows that we are getting an error message because we are trying to read data from the secondary node.
In the preceding output, WriteCommandError { ........... } shows that we are getting an error message because we can’t perform write operations in a secondary node. We can perform write operations only in the primary node.

Replication strategy

Primary failover

Primary failover and new primary election
Vertical scaling: We need to increase the capacity of a single server such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space.
Horizontal scaling: We need to divide the data set and distribute the workload across the servers by adding additional servers to increase the capacity as required.
Shard: Each shard contains a subset of the sharded data. Each shard can be deployed as a replica set.
mongos: The mongos acts as a query router, providing an interface between client applications and the sharded cluster.
Config servers: Config servers store metadata and configuration settings for the cluster.
In this recipe, we are going to discuss how to create sharding to distribute data across servers.
You want to create sharding to distribute data across servers.
The solution is a group of mongod instances .
Let’s follow the steps in this section to set up a sharding.
First, create data directories for three shards as shown here.
Next, start the shards as shown here.

Starting the shard1 server

Starting the shard2 server

Starting the shard3 server

Connect the shard1 server

Initiating the replica on the shard1 server

Starting the config servers
Now it is time to perform sharding. Here, we are going to use one query router, one config server, and three shards.
This shows that user data is distributed to shard1 and shard2.
All these demos are executed in a Windows environment.
In Chapter 5, we discussed replica sets and sharding in MongoDB. In this chapter, we are going to discuss multidocument transactions in MongoDB, a new feature introduced in MongoDB 4.0.
In MongoDB, a write operation on single document is atomic, even if the write operation modifies multiple embedded documents within a single document. When a single write operation (e.g., db.collection.updateMany()) modifies multiple documents, the modification of each document is atomic, but the operation as a whole is not.
Starting with version 4.0, MongoDB provides multidocument transactions for replica sets. Multidocument transactions help us to achieve all-or-nothing execution to maintain data integrity.
When a transaction commits, all data changes made in the transaction are saved and visible outside the transaction. The data changes are not visible outside the transaction until the transaction is committed.
When a transaction aborts, all data changes made in the transaction are discarded without ever becoming visible.
Multidocument transactions are available only for a replica set. If we try to use multidocument transactions on a nonreplica set, we would get the error “Transaction numbers are only allowed on a replica set member or mongos,” as shown in Figure 6-1.

Error on usage of multidocument transaction on a nonreplica set
You can specify CRUD operations only on existing collections. The collections can be in different databases.
You cannot perform read/write operations on config, admin, and local databases.
You cannot write to system.* collections.
You cannot create or drop indexes inside a transaction.
You cannot perform non-CRUD operations inside a transaction.
You cannot return an operation’s query plan (i.e., explain).
In MongoDB, transactions are associated with sessions. MongoDB’s sessions provide a framework that supports consistency and writes that can be retried. MongoDB’s sessions are available only for replica sets and shared clusters. A session is required to start the transaction. You cannot run a transaction outside a session, and a session can run only one transaction at a time. A session is essentially a context.
session.startTransaction(): To start a new transaction in the current session.
session.commitTransaction(): To save changes made by the operations in the transaction.
session.abortTransaction(): To abort the transaction without saving it.
In this recipe, we are going to discuss how to work with multidocument transactions .
You want to work with multidocument transactions.
Use session.startTransaction(), session.commitTransaction(), and session.abortTransaction().
Let’s follow the steps in this section to work with multidocument transactions .
We can see the modifications from inside the transaction.
Because the transactions are not committed, we cannot see modifications outside the transaction .
In this recipe, we are going to discuss how to perform an isolation test between two concurrent transactions.
You want to perform an isolation test between two concurrent transactions.
Use session.startTransaction(), session.commitTransaction() and session.abortTransaction().
Let’s follow the steps in this section to perform an isolation test between two concurrent transactions.
Here, the transactions are isolated, and each transaction shows the modification that it has made itself.
In this recipe, we are going to discuss write conflicts with transactions.
You want to see the error message for write conflicts that occurs when two transactions try to modify the same document.
Use session.startTransaction(), session.commitTransaction() and session.abortTransaction().
Let’s follow the steps in this section to manage write conflicts between two concurrent transactions.
Here, MongoDB detects a write conflict immediately, even though the transactions are not yet committed.
In this recipe, we are going to discuss how to discard data changes with abortTransaction.
You want to discard data changes with abortTransaction.
Use session.startTransaction() , session.commitTransaction() , and session.abortTransaction().
Let’s follow the steps in this section to discard data changes using the abortTransaction method.
Here, the data changes are discarded.
Multidocument transactions are only available for deployments that use WiredTiger storage engine.
MongoDB monitoring tools.
Backup and restore with MongoDB.
MongoDB provides various tools to monitor the database. These MongoDB monitoring tools help us to understand what is happening with a database at various levels.
The first basic tool is the log file.
In this recipe, we are going to discuss MongoDB log files.
You want to see the MongoDB log file and set the log level.
Use the db.setLogLevel function to set the log level.
Let’s follow the steps in this section to see the log file in mongo shell.
We cannot filter the log messages in the console.
MongoDB has several logging levels, which can be set using the db.setLogLevel function.
MongoDB echoes back the previous configuration before the new setting was applied. At the top, you can see the default level, which indicates that any components that are not set will log at this level. Here, it will override only the query component. The -1 verbosity indicates that it inherits the default level from its parent.
Here, the query level is set to 2.
This query will not return any results because there is no collection named demos in the test database.
You can see some additional information about the query.
Now check the end of the log file: We cannot see the extra logging information for this find query.
Use db.setLogLevel(0) to turn off the logging level.
If we do not specify the option as query or write or anything, it will be applied for all options.
The log level 5 is very detailed; levels 1 to 3 are more useful options and these are recommended unless you need all of the detailed information provided by the highest log level of 5.
Also, setting the appropriate log level helps us to improve the performance of queries by identifying the query plans logged in the log file and also by identifying the slow operations by checking the time taken to execute a query.
Refer to Recipe 7-10 to learn more about MongoDB query plans.
It is mandatory to analyze the performance of the database when we develop new applications with MongoDB. Performance degradation could happen due to hardware availability issues, number of open database connections, and database access strategies.
Performance issues indicate the database is operating at full capacity and abnormal traffic load due to an increase in the number of connections.
The database profiler can help us to understand the various operations performed on the database that cause performance degradation.
The database profiler is used to collect information about the commands that are executed against a running mongod instance. The database profiler writes all the collected data to the system.profile capped collection in the admin database.
MongoDB also has a Performance Advisor tool that automatically monitors slow queries and suggests new indexes to improve query performance. A query is considered to be slow if it takes more than 100 milliseconds to execute.
The Performance Advisor tool is available in the MongoDB Atlas cluster. MongoDB Atlas is the global cloud database service for modern applications.
Level | Description |
|---|---|
0 | The profiler is off and does not collect any data. This is the default profiler level. |
1 | The profiler collects data for operations that take longer than the value of the slowms parameter. |
2 | The profiler collects data for all operations. |
In this recipe, we are going to discuss how to enable database profiler and how to set the profiling level.
You want to enable database profiling with a profile level.
Use the db.setProfilingLevel() helper in the mongo shell.
Let’s follow the steps in this section to enable the database profiler.
was is the key/value pair indicating the previous level of profiling.
Here, the slow operation threshold is 30 milliseconds.
That code sets the profiler to sample 53% of all slow operations.
The default threshold value for the slow operation is 100 milliseconds.
The was field indicates the current profiling level.
This command sets the profiling level to 1, the slow operation threshold to 20 milliseconds, and it profiles only 50% of the slow operations.
In this recipe, we are going to discuss how to view database profiler.
You want to view database profiler information.
Use db.system.profile.find() in the mongo shell.
Let’s follow the steps in this section to view database profiler information.
The database profiler logs information in the system.profile collection. You need to query system.collection to view profiling information.
op: Component type (i.e., query or command).
ts: Timestamp.
ns: Collection details where the query is executed.
These options can be used to filter or sort as in the queries that follow.
ts specifies the timestamp and the value -1 specifies to sort in descending order. We can specify 1 for ascending or -1 for descending order.
mongostat is a monitoring utility that provides information about a mongod instance if you are working on a single mongod instance. If you are working on a shared cluster, it shows information about a mongos instance.
In this recipe, we are going to discuss how to use mongostat to monitor the MongoDB server.
You want to see the details of a MongoDB server.
Use the mongostat tool, which is available in the installation directory. Usually the installation directory is C:\Program Files\MongoDB\Server\4.0\bin or the custom path specified during your installation process.
Avoid navigating to the directory each time to use the mongostat tool and add the directory path to your PATH environment variable.
Let’s follow the steps in this section to view server information using mongostat.
Issue the mongostat command by specifying the hostname and port of MongoDB server.
27017 is the default port. We can simply use mongostat without any options if we want to connect and get statistics of localhost and default port 27017. To connect a different or remote instance, specify the option -host as shown.
insert/query/update/delete: These columns show the number of insert, query, update, and delete operations per second.
getmore: This column shows the number of times the getmore operation is executed in one second.
command: This column shows the number of commands executed on the server in one second.
flushes: This column shows the number of times data was flushed to disk in one second.
mapped: This column shows the amount of memory used by the mongo process against a database. It is the same as the size of the database.
vsize (virtual size): This column represents virtual memory allocated to the entire mongod process.
res (resident memory): This column represents the physical memory used by MongoDB.
faults: This column shows the number of Linux page faults per second.
qr|qw: This column shows queued-up reads and writes that are waiting for the chance to be executed.
ar|aw: This column shows number of active clients.
netIn and netOut: These columns show the network traffic in and out of the MongoDB server within a given time frame.
conn: This column shows the number of open connections.
time: This column shows the time frame in which operations are performed.
mongotop provides a method to track the amount of time spent by the mongod process during reads and writes. mongotop provides statistics on a per-collection level.
In this recipe, we are going to discuss how to use mongotop to track the amount of time spent by the mongod process during reads and writes.
You want to see the amount of time spent by the mongod process during reads and writes.
Issue the mongotop command in the installation path of MongoDB.
Let’s follow the steps in this section to view the amount of time spent during reads and writes.
db.stats() provides the statistics of a single database.
In this recipe, we are going to discuss how to get the statistics for a single database.
You want to see the disk and memory usage estimates for a database.
Use the db.stats command to display the statistics for a database.
Let’s follow the steps in this section to view the statistics for the database.
db.serverStatus() returns a document that provides statistics for the state of a database process.
In this recipe, we are going to discuss the db.serverStatus() command .
You want to see the memory status of a database.
Use the db.serverStatus() command .
Let’s follow the steps in this section to view the statistics for the memory status of a database.
MongoDB provides mongodump and mongorestore utilities to work with BSON data dumps. These utilities are useful for creating backups for small deployments. To create resilient and nondisruptive backups for large deployments, we can use file system or block-level disk snapshot methods.
We should use a system-level tool to create a copy of the device file system that holds MongoDB files. The file system snaphots or the disk-level snapshot backup method require additional system configuration outside of MongoDB, so we do not cover this in depth in this book.
When deploying MongoDB in production, we should have a backup and failover strategy for capturing and restoring backups in the case of data loss events.
In this recipe, we are going to discuss how to back up data using mongodump.
You want to back up data using mongodump.
Use the mongodump command.
Let’s follow the steps in this section to back up data.
The mongodump utility makes a backup by connecting to a running mongod or mongos instance.
When you run the mongodump command without specifying any arguments, it connects to the mongodb running on localhost on port 27017 and creates a database backup named dump in the current directory.
Ensure mongod is running.
Open a command prompt with administrator privileges, navigate to the mongodb installation folder, and type mongodump as shown here.
You can see the dump folder in the current directory as shown in Figure 7-1.

MongoDB installation folder
We can create scripts to create the backup directory for the current date and time and export the database to the same directory.
For Windows, create a bat (batch script) file with the commands shown in Figure 7-2.

Creating a batch script file in Windows

Creating a shell script file in Unix/Linux
Schedule the script file to execute at a certain interval. Use task scheduler or any other scheduler for Windows, and use the CRON schedule for Unix/Linux.
In this recipe, we are going to discuss how to restore data using mongorestore.
You want to restore data using mongorestore .
Use the mongorestore command.
Let’s follow the steps in this section to restore data.
Specify the username and password using options --user and --password only if the remote mongod instance requires authentication.
In this recipe, we are going to discuss query plans to understand the MongoDB query execution details with query planner and query optimizer.
You want to understand the query execution plan.
Use the cursor.explain() command.
Let’s follow the steps in this section to understand the query plan.
Observe in this output that there is no parsedQuery in the query planner, and the winning plan says that it involves the complete collection scan.
The explain() method accepts queryPlanner, executionStats, or allPlansExecution as the operating mode. The default mode is queryPlanner if we don’t specify otherwise.
"executionSuccess" : true: This shows whether the query is successfully executed or not.
"nReturned" : 5: This is the number of rows returned.
"executionTimeMillis" : 0: This is the execution time in milliseconds.
If the executionTimeMillis value is more, then it could be considered a slow query and we must rewrite the query to optimize the executiuon time by adding the required filters to retrieve only the required data. We could also consider using an index on keys for better performance.
In Chapter 7, we discussed monitoring and backup methods in MongoDB. In this chapter, we are going to discuss the security features of MongoDB.
Authentication is the process of verifying the identity of the user, whereas authorization or access control determines the verified user’s access to resources and operations.
SCRAM: This is the default authentication mechanism for MongoDB. It verifies the user against their username, password, and authentication database.
x.509 certificates: This mechanism uses x.509 certificates for authentication. MongoDB supports a number of authentication mechanisms that clients can use to verify their identity. These mechanisms allow MongoDB to integrate into an existing authentication system.
In addition to this, MongoDB Enterprise Edition also supports Lightweight Directory Access Protocol (LDAP) proxy authentication and Kerberos authentication.
Replica sets and shared clusters require internal authentication between members when access control is enabled. Refer to Recipe 8-2 to understand the authentication process for replica sets.
MongoDB enforces authentication to identify users and their permissions to access resources only if access control is enabled. The upcoming recipes explain the procedure to create various roles and the mechanisms to enable access control.
When you install MongoDB, it has no users. First, therefore, we need to create the admin user, which can be used to create other users.

MongoDB built-in roles
First, let’s look at the commonly used built-in roles.
The read role gives the holder access to read all of the user-created collections.
The readwrite role gives access to read all of the user-created collections and also the ability to modify the nonsystem (i.e., user-created) collections.
The dbAdmin role could help us to perform administrative tasks such as schema-related tasks, indexing, and collecting statistics. We cannot use this role to grant privileges for users and others’ role management.
We can perform any administrative tasks on a database using the dbOwner role. This role also includes the other role previliges such as the readWrite, dbAdmin, and userAdmin roles.
We can create and modify roles and users on the current active database.
The clusterAdmin role provides cluster management access. This role also includes other role privileges, such as clusterManager, clusterMonitor, and hostManager roles.
This role helps us to manage and monitor user actions on the mongodb cluster.
In this recipe, we are going to discuss how to create a superuser and authenticate a user.
You want to create a superuser and authenticate a user.
Let’s follow the steps in this section to create a superuser.
Use MongoDB command-line authentication options (i.e., --username, --password, and –authenticationDatabase) when connecting to a mongod or mongos instance.
Connect to a mongod or mongos instance, then run the db.auth method against the authentication database.
You will be connected to mongoshell. You can also use db.auth() to authenticate a user.
In this recipe, we are going to discuss how to authenticate a server in a replica set using a key file.
You want to authenticate a server in a replica set using a key file.
Use a replica set and a key file.
Let’s follow the steps in this section to authenticate a server in a replica set using a key file.
First, we will create a three-member replica set by following the procedure given here.
Figure 8-2 shows a mongod instance that is waiting for a connection on port 20001.

A mongod instance waiting for connection on port 20001
Figure 8-3 shows a mongod instance that is waiting for connection on port 20002.

A mongod instance waiting for connection on port 20002

A mongod instance waiting for connection on port 20003
We can use any method to generate the key file. To use the openssl method, you need to install openssl. Go to https://indy.fulgan.com/SSL/ to download and install the openssl package (note that this link might be changed in the future).

Extracted files of the openssl package
Use openssl.exe to generate the keys as mentioned earlier. Also, in Windows, the generated key file permissions will not be checked. Only on the Unix platform, the key file should not have group or other permissions.
In this recipe, we are going to discuss how to modify access for the existing user.
Let us use the superUserAdmin user, which we created in Recipe 8-1.
We want to modify the access for superUserAdmin (created earlier).
Check the granted access (i.e., assigned roles and privileges) of the user, then revoke the permissions that are not needed or grant the new roles and privileges.
Let’s follow the steps in this section to modify the access of superUserAdmin.
This output clearly shows the user superUserAdmin has "role" : "userAdminAnyDatabase".
userAdminAnyDatabase is a built-in role that provides the ability to perform administration operations as userAdmin on all databases except local and config.
Repeat the preceding step to identify the roles and to verify the newly added role to the user superUserAdmin.
As you can observe, "role" : "read" is added to the "db" : "local" for the superUserAdmin user.
You can see that "role" : "userAdminAnyDatabase" is removed to the "db" : "admin" for the superUserAdmin user.
In this recipe, we are going to discuss how to change the password for the superUserAdmin user.
You want to change the password for a specific user.
Use changeUserPassword() with the new password.
Let’s follow the steps in this section to change the password of a user.
Now disconnect the instance and connect with the new password to ensure that the password has been changed.
In this recipe, we are going to discuss how to track the various activities of different users.
You want to track user activity.
Use the auditing feature.
The auditing feature is available in MongoDB Enterprise Edition only.
Let’s follow the steps in this section to enable the auditing feature and store the audit information in different output formats like writing audit events to the console, a JSON file, or a BSON file.

Audit events
We can use auditDestination to enable the auditing feature and specify the output location of audit information.
This syslog option is not available for Windows.
Here, atype refers to the action type and param.db refers to the monitoring database.
In this recipe, we are going to discuss how to encrypt the data in MongoDB.
You want to encrypt the data stored or moved into MongoDB.
Use encryption at the file level and the whole database level.
The auditing feature is available in MongoDB Enterprise Edition only.
Let’s follow the steps in this section to encrypt the data stored and decrypt while reading.
Encrypting data in motion.
Encrypting data in rest.
MongoDB Enterprise Edition supports Transport Layer Security (TLS) and Secure Sockets Layer (SSL) encryption techniques to secure the data being sent or received over the networks.
MongoDB Enterprise Edition supports a native storage-based symmetric key encryption technique to secure the data available on the storage system. It also provides Transparent Data Encryption (TDE), which is used to encrypt the whole database.

The new encryption file

The encryption output
Encrypting the data using the local key file is not recommended, as the secure management of the local key file is critical.