Combining data
In a monolithic environment, combining data is easy; you need to just join two tables to create the required view. In microservices, datasets are distributed across microservices and combining them requires moving the data across microservices, which may involve significant network and storage overhead. It also becomes challenging to keep the combined data up to date. There are multiple ways to solve the problem of combining data or joins in a microservices architecture based on the scope of the request.
For example, if you wish to build an order summary page for a particular user, you need to get only that user's data from the User Service and all the orders for that user from the Orders Service. These can be obtained independently and joined at the requesting service level to generate the order summary, as shown in the preceding diagram. These kinds of join work well for 1:N joins.
Real-time joins work well for limited datasets, but it is expensive to combine data in real time for each request. Imagine tens of thousands of similar requests hitting the Order Summary Service every second. In such scenarios, services should instead keep denormalized (https://en.wikipedia.org/wiki/Denormalization) combined data in a cache that is kept up to date using the events generated by the source services. The service can then respond to the requests by just looking up this denormalized data cache in real time. This approach scales well at the expense of data being near real time. The data in the cache might be off by the time source service generates the event and target service picks it up and makes changes to its cache.
For example, as shown in the preceding diagram, an Interest Service may receive user interests via its API endpoint, but it may need the user and order details from the User and Orders services respectively. Instead of directly looking up details for each user interest, the Interest Service may subscribe to the events generated by the user and orders service and internally keep a denormalized cache view of interest data that is readily available with all the required details of users and orders.