Concurrent Patterns and Best Practices
上QQ阅读APP看书,第一时间看更新

The push for concurrency

Let's come back to the software world. You want to listen to some music at the same time that you are writing an article. Isn't that a basic need? Oh yes, and your mail program should also be running so that you get important emails in time. It is difficult to imagine life if these programs don't run in parallel.

As time goes by, software is becoming bigger and demand for faster CPUs is ever increasing; for example, database transactions/second are increasing in number. The data processing demand is beyond the capabilities of any single machine. So a divide and conquer strategy is applied: many machines work concurrently on different data partitions. 

Another problem is that chip manufacturers are hitting limits on how fast they can make chips go! Improving the chip to make the CPU faster has inherent limitations. See http://www.gotw.ca/publications/concurrency-ddj.htm for a lucid explanation of this problem. 

Today's big data systems are processing trillions of messages per second, and all using commodity hardware (that is, the hardware you and me are using for our day-to-day development)—nothing fancy, like a supercomputer. 

The rise of the cloud has put provisioning power in the hands of almost everyone. You don't need to spend too much upfront to try out new ideas—just rent the processing infrastructure on the cloud to try out the implementation of the idea. The following diagram shows both scaling approaches:

The central infrastructure design themes are horizontal versus vertical scaling. Horizontal scaling essentially implies the use of a distributed concurrency pattern; it's cost effective, and a prominent idea in the big data world. For example, NoSQL databases (such as Cassandra), analytics processing systems (such as Apache Spark), and message brokers (such as Apache Kafka) all use horizontal scaling, and that means distributed and concurrent processing.

On the other hand, installing more memory or processing power in a single computer is a good example of vertical scaling. See https://www.g2techgroup.com/horizontal-vs-vertical-scaling-which-is-right-for-your-app/ for a comparison between the two scaling approaches.

We will look at two common concurrency themes for horizontally scaled systems: MapReduce and fault tolerance.