MongoDB Interview Questions

Here are the top 50 commonly asked questions in MongoDB interviews. Whether you’re just starting your preparation or need a quick refresher, these questions and answers will boost your confidence for the interview. Ranging from basic to advanced, they cover a wide array of MongoDB concepts. Practice these questions for campus and company interviews, positions from entry to mid-level experience, and competitive examinations. It’s also important to practice them to strengthen your understanding of MongoDB.

MongoDB Interview Questions with Answers

1. What is MongoDB?

MongoDB is an open source, document-oriented database designed with both scalability and developer agility in mind. Instead of storing your data in tables and rows as you would with a relational database, in MongoDB you store JSON-like documents with dynamic schemas. Documented oriented database falls into broader category called NoSQL database.

2. What is the difference between NoSQL MongoDB database and RDBMS?

With NoSQL, you cannot do joins, or do complex transaction database and there are no constraints in the database. Though, you can write query on the NoSQL database, they are horizontal scalable.

3. What is replica set?

A replica set in MongoDB is a group of mongod processes that provide redundancy and high availability. The members of a replica set are:

  • Primary: The primary receives all write operations.
  • Secondaries: Secondaries replicate operations from the primary to maintain an identical data set. Secondaries may have additional configurations for special usage profiles.

4. What is sharding in MongoDB?

Sharding is a method for storing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations. Sharding divides the data set and distributes the data over multiple servers, or shards. Each shard is an independent database, and collectively, the shards make up a single logical database.

5. How is replication achieved in MongoDB?

MongoDB handles replication through an implementation called “replica sets”. A replica set is a group of mongod instances that maintain the same data set. A replica set contains several data bearing nodes and optionally one arbiter node. Of the data bearing nodes, one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes.

6. What is namespace in MongoDB?

A “namespace” is the concatenation of the database name and the collection names [1] with a period character in between.Collections are containers for documents that share one or more indexes. Databases are groups of collections stored on disk using a single set of data files.
For an example employee.HR namespace, employee is the database name and HR is the collection name.

advertisement
advertisement

7. How does a collection differ from a table?

Instead of tables, a MongoDB database stores its data in collections. A collection holds one or more BSON documents. Documents are analogous to records or rows in a relational database table. Each document has one or more fields; fields are similar to the columns in a relational database table.

8. Does MongoDB support transactions?

MongoDB does not support multi-document transactions. However, MongoDB does provide atomic operations on a single document.MongoDB does not support ACID consistency model.

9. What is GridFS in MongoDB?

GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16 MB.Instead of storing a file in a single document, GridFS divides the file into parts, or chunks, and stores each chunk as a separate document. By default, GridFS uses a chunk size of 255 kB; that is, GridFS divides a file into chunks of 255 kB with the exception of the last chunk.

10. What is vertical scaling?

Vertical scaling adds more CPU and storage resources to increase capacity. Scaling by adding capacity has limitations: high performance systems with large numbers of CPUs and large amount of RAM are disproportionately more expensive than smaller systems. Additionally, cloud-based providers may only allow users to provision smaller instances. As a result there is a practical maximum capability for vertical scaling.

11. What are the components of sharded clusters?

Sharded cluster has the following components: shards, query routers and config servers.

  • Shards store the data. To provide high availability and data consistency, in a production sharded cluster, each shard is a replica set.
  • Query Routers, or mongos instances, interface with client applications and direct operations to the appropriate shard or shards.
  • Config servers store the cluster’s metadata. This data contains a mapping of the cluster’s data set to the shards. The query router uses this metadata to target operations to specific shards.

12. What is MongoDB projection?

The positional $ operator limits the contents of an <array> from the query results to contain only the first element matching the query document. Use $ in the projection document of the find() method or the findOne() method when you only need one particular array element in selected documents.

13. What is aggregation pipeline?

The aggregation pipeline is a framework for data aggregation modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into aggregated results. The aggregation pipeline provides an alternative to map-reduce and may be the preferred solution for aggregation tasks where the complexity of map-reduce may be unwarranted.

14. What are main features of MongoDB?

MongoDB is an open-source document database that provides mainly three features which are as follows:

  • High performance-MongoDB provides high performance data persistence.
  • High availability-To provide high availability, MongoDB’s replication facility, called replica sets
  • Automatic scaling-MongoDB provides horizontal scalability as part of its core functionality.

15. What are advantages of using document database?

The advantages of using documents are:

advertisement
  • Documents (i.e. objects) correspond to native data types in many programming languages.
  • Embedded documents and arrays reduce need for expensive joins.
  • Dynamic schema supports fluent polymorphism.

16. What is journaling in MongoDB?

With journaling, MongoDB’s storage layer has two internal views of the data set: the private view, used to write to the journal files, and the shared view, used to write to the data files. With journaling enabled, MongoDB writes the in-memory changes first to on-disk journal files.

17. What is the purpose of profiler in MongoDB?

MongoDB includes a database profiler which shows performance characteristics of each operation against the database. Using the profiler you can find queries (and write operations) which are slower than they should be; use this information, for example, to determine when an index is needed.

18. Explain the structure of ObjectID in MongoDB.

ObjectID is a 12-byte BSON type with:

  • 4 bytes value representing seconds
  • 3 byte machine identifier
  • 2 byte process id
  • 3 byte counter

19. What is the purpose of embedded documents in MongoDB?

Embedded documents capture relationships between data by storing related data in a single document structure. MongoDB documents make it possible to embed document structures in a field or array within a document.

20. Explain auditing in MongoDB?

Auditing provides administrators with the ability to verify that the implemented security policies are controlling activity in the system. MongoDB Enterprise includes an auditing capability for mongod and mongos instances. The auditing facility allows administrators and users to track system activity for deployments with multiple users and applications.

advertisement

21. What is the purpose of index in MongoDB?

Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.

22. Does MongoDB handle caching?

Yes. MongoDB keeps most recently used data in RAM. If you have created indexes for your queries and your working data set fits in RAM, MongoDB serves all queries from memory.
MongoDB does not cache the query results in order to return the cached results for identical queries.

23. How can concurrency affect replica sets primary?

In replication, when MongoDB writes to a collection on the primary, MongoDB also writes to the primary’s oplog, which is a special collection in the local database. Therefore, MongoDB must lock both the collection’s database and the local database.

24. Which storage engines are used by MongoDB?

A storage engine is the part of a database that is responsible for managing how data is stored, both in memory and on disk. Storage engines are as follows:

  • MMAPv1 Storage Engine
  • WiredTiger

25. Which operator is used to analyze mongodb queries?

The $explain operator provides information on the query plan. It returns a document that describes the process and indexes used to return the query. It provides information on the query, indexes used in a query and other statistics.

Advanced MongoDB Interview Questions with Answers

26. What is the difference between Manual references and DBRefs?

Manual references where you save the _id field of one document in another document as a reference. Then your application can run a second query to return the related data. These references are simple and sufficient for most use cases.
DBRefs are references from one document to another using the value of the first document’s _id field, collection name, and, optionally, its database name. Unless you have a compelling reason to use DBRefs, use manual references instead.

27. What is covered query?

An index covers a query when both of the following apply:

  • All the fields in the query are part of an index, and
  • All the fields returned in the results are in the same index

28. What are the three popular types of NoSQL databases?

There are three different, popular types which are as follows:

  • Key/Value: Redis, Tokyo Cabinet, Memcached
  • ColumnFamily: Cassandra, HBase
  • Document: MongoDB, CouchDB

29. How does MongoDB provide concurrency?

MongoDB uses multi-granularity locking that allows operations to lock at the global, database or collection level, and allows for individual storage engines to implement their own concurrency control below the collection level.
MongoDB uses reader-writer locks that allow concurrent readers shared access to a resource, such as a database or collection, but in MMAPv1, give exclusive access to a single write operation.

30. How to isolate cursors from intervening write operations?

MongoDB cursors can return the same document more than once in some situations. You can use the snapshot() method on a cursor to isolate the operation for a very specific case. snapshot() guarantees that the query will return each document no more than once.

31. When should I embed documents within other documents?

When modeling data in MongoDB, embedding is frequently the choice for:

  • “contains” relationships between entities.
  • one-to-many relationships when the “many” objects always appear with or are viewed in the context of their parents.

You should also consider embedding for performance reasons if you have a collection with a large number of small documents.

32. How does sharding affect concurrency?

Sharding improves concurrency by distributing collections over multiple mongod instances, allowing shard servers (i.e. mongos processes) to perform any number of operations concurrently to the various downstream mongod instances.
In a sharded cluster, locks apply to each individual shard, not to the whole cluster; i.e. each mongod instance is independent of the others in the shard cluster and uses its own locks. The operations on one mongod instance do not block the operations on any other.

33. How does concurrency affect a replica set primary?

With replica sets, when MongoDB writes to a collection on the primary, MongoDB also writes to the primary’s oplog, which is a special collection in the local database. Therefore, MongoDB must lock both the collection’s database and the local database. The mongod must lock both databases at the same time to keep the database consistent and ensure that write operations, even with replication, are “all-or-nothing” operations.When writing to a replica set, the lock’s scope applies to the primary.

34. How does concurrency affect secondaries?

In replication, MongoDB does not apply writes serially to secondaries. Secondaries collect oplog entries in batches and then apply those batches in parallel. Secondaries do not allow reads while applying the write operations, and apply write operations in the order that they appear in the oplog.

35. What happens to unsharded collections in sharded databases?

In the current implementation, all databases in a sharded cluster have a “primary shard.” All unsharded collection within that database will reside on the same shard.

36. How does MongoDB distribute data across shards?

Sharding must be specifically enabled on a collection. After enabling sharding on the collection, MongoDB will assign various ranges of collection data to the different shards in the cluster. The cluster automatically corrects imbalances between shards by migrating ranges of data from one shard to another.

37. Is it safe to remove old files in the moveChunk directory?

Yes. mongod creates these files as backups during normal shard balancing operations. If some error occurs during a migration, these files may be helpful in recovering documents affected during the migration.
Once the migration has completed successfully and there is no need to recover documents from these files, you may safely delete these files. Or, if you have an existing backup of the database that you can use for recovery, you may also delete these files after migration.

38. When does the decision tree algorithms stop growing the tree?

Most decision tree algorithms stop growing the tree when one of three criteria are met:

  • The segment contains only one record. (There is no further question that you could ask which could further refine a segment of just one.)
  • All the records in the segment have identical characteristics. (There is no reason to continue asking further questions segmentation since all the remaining records are the same.)
  • The improvement is not substantial enough to warrant making the split

39. When do the mongos servers detect config server changes?

Mongos instances maintain a cache of the config database that holds the metadata for the sharded cluster. This metadata includes the mapping of chunks to shards.
mongos updates its cache lazily by issuing a request to a shard and discovering that its metadata is out of date.

40. How do indexes impact queries in sharded systems?

If the query does not include the shard key, the mongos must send the query to all shards as a “scatter/gather” operation. Each shard will, in turn, use either the shard key index or another more efficient index to fulfill the query.
If the query includes multiple sub-expressions that reference the fields indexed by the shard key and the secondary index, the mongos can route the queries to a specific shard and the shard will use the index that will allow it to fulfill most efficiently.

41. Why use journaling if replication already provides data redundancy?

Journaling facilitates faster crash recovery. Prior to journaling, crashes often required database repairs or full data resync. Both were slow, and the first was unreliable.
Journaling is particularly useful for protection against power failures, especially if your replica set resides in a single data center or power circuit.

42. What information do arbiters exchange with the rest of the replica set?

Arbiters exchange the following data with the rest of the replica set:

  • Credentials used to authenticate the arbiter with the replica set. All MongoDB processes within a replica set use keyfiles. These exchanges are encrypted.
  • Replica set configuration data and voting data.

43. Do hidden members vote in replica set elections?

Hidden members of replica sets do vote in elections. To exclude a member from voting in an election, change the value of the member’s members[n].votes configuration to 0.

44. How do memory mapped files work?

  • MongoDB uses memory mapped files for managing and interacting with all data.
  • Memory mapping assigns files to a block of virtual memory with a direct byte-for-byte correlation.
  • MongoDB memory maps data files to memory as it accesses documents. Unaccessed data is not mapped to memory.
  • Once mapped, the relationship between file and memory allows MongoDB to interact with the data in the file as if it were memory.

45. What is the difference between soft and hard page faults?

Page faults occur when MongoDB, with the MMAP storage engine, needs access to data that isn’t currently in active memory. A “hard” page fault refers to situations when MongoDB must access a disk to access the data. A “soft” page fault, by contrast, merely moves memory pages from one list to another, such as from an operating system file cache.

46. How do write operations affect indexes?

Write operations may require updates to indexes:

  • If a write operation modifies an indexed field, MongoDB updates all indexes that have the modified field as a key.
  • When running with the MMAPv1 storage engine, if an update to a document causes the document to grow past its allocated record size, MongoDB moves the document to a new record and updates all indexes that refer to the document, regardless of the field modified.

47. How can I see the size of an index?

The db.collection.stats() includes an indexSizes document which provides size information for each index on the collection. Depending on its size, an index may not fit into RAM. An index fits into RAM when your server has enough RAM available for both the index and the rest of the working set. When an index is too large to fit into RAM, MongoDB must read the index from disk, which is a much slower operation than reading from RAM.

48. What is the purpose of BSON in MongoDB?

BSON is the binary encoding of JSON-like documents that MongoDB uses when storing documents in collections. It adds support for data types like Date and binary that aren’t supported in JSON. BSON extends the JSON model to provide additional data types and to be efficient for encoding and decoding within different languages.

49. What is the maximum document size in MongoDB?

The maximum BSON document size is 16 megabytes. The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API.

50. What are memory mapped files?

A memory-mapped file is a file with data that the operating system places in memory by way of the mmap() system call. mmap() thus maps the file to a region of virtual memory. Memory-mapped files are the critical piece of the MMAPv1 storage engine in MongoDB. By using memory mapped files, MongoDB can treat the contents of its data files as if they were in memory. This provides MongoDB with an extremely fast and simple method for accessing and manipulating data.

Useful Resources:

If you find any mistake above, kindly email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.