Seven NoSQL Databases in a Week
上QQ阅读APP看书,第一时间看更新

The embedded data model

In this document model, the relationships between data are maintained by storing data in a single document. Here, we do not create a separate document to define a relationship. We can embed the document structure in a field or array within the document. These documents are denormalized. This allows us to retrieve data in a single operation, but it unnecessarily increases the size of the document.

The embedded document structure allows us to store related pieces of data in the same document. This also allows us to update a single document without worrying about data consistency.

The embedded document structure is used in two cases:

  • When there is a one-to-one relationship with the embedded document. We can store the embedded document as an object field.
  • When there is a one-to-many relationship with the embedded document. Here we can store the embedded document as an array of the object field.

The embedded structure provides better performance for read operations, requests and retrieves related data in a single database operation, and updates the data in a single atomic operation. But this approach can lead to an increase in the size of the document, and MongoDB will store such documents in a fragment, which leads to poor write performance.

Modeling the application data for MongoDB depends on the data as well as the characteristics of MongoDB itself. When creating data models for applications, analyze all of the read and write operations with the following operations and MongoDB features:

  • Document size: The update operation on a document may increase the size of the document as MongoBD documents are schemaless. An update may include adding more elements to an array or adding new elements to the document. If the document size exceeds the maximum limit, MongoDB automatically relocates the document on to the disk.
  • Atomicity: In MongoDB, each operation is atomic at the document level. A single operation can change only one document at a time. So, the operation that modified more than one document needed multiple write operations. The embedded document structure is more suitable in such a scenario where all related data is stored in a single document.
  • Sharding: Sharding provides horizontal scaling in MongoDB. This enables deployment with a large dataset and high throughput for operations. Sharding allows us to partition a collection and store documents from collections across multiple instances of mongod or clusters. MongoDB uses shard keys to select data. The sharding key has an effect on performance and can prevent query isolation and increased write capacity. So be careful when choosing the shard key.
  • Indexes: We use indexes to improve performance for common queries. Normally, we build indexes on a field that is often used in filter criteria and can be in sorted order so that searching will use effective algorithms, such as mid-search. MongoDB will automatically create an index on the _id field. While creating indexes, consider the following points:
    • Each index requires at least 8 KB of space.
    • Indexes have a negative impact on write operations. For collection with a high write-to-read ratio, indexes are much more expensive as each insert operation leads to some update operations.
    • Collections with a high read-to-write ratio often benefit from indexes. Indexes do not affect read operations on non-index fields.
    • Active indexes use disk space and memory. This usage can be significant and we should analyze it for performance considerations.
  • Large numbers of collections: In some use cases, we may decide to store data over multiple collections instead of a single one. If the number of documents in the collection is low, then we can group the documents by type. For example, by maintaining a separate collection for the dev, prod, and debug logs, instead of using three collections named dev_log, prod_log, and deug_log, we can maintain a single collection called log. Having a large number of collections decreases the performance of operations. When adding collections, consider the following points:
    • Each collection has an overhead of a few kilobytes.
    • Each index on _id requires at least 8 KB of data space.
    • Each database single namespace stores the metadata. And each index and collection have an entry in the namespace file.
  • Data lifecycle management: The Time to live (TTL) feature of a collection expires documents after a certain period of time. We can consider using the TTL feature if certain data in the collection is not useful after a specific period of time. If an application used only recently-inserted documents, use capped collections. The capped collection provides first in, first out (FIFO) management of documents that supports insert and read operations based on insertion order.