Designing Data-Intensive Applications I

The main goals of designing a data-Intensive applications:

1. Reliability: Tolerating hardware and software faults, human error
2. Scalability: Measuring load & performance, latency percentiles, throughput
3. Maintainability: Operability, simplicity and evolvability

Databases: storing data
Caches: remembering expensive operation
Search Indexes: allow users to search data by keywords
Stream processing: send message from one process to another
Batch processing: periodically crunch a large amount of accumulated data

Redis: datastore used as message queses
Kafka: message queues with database-like durability guarantees

Systems that anticipate faults and cope with them are called fault-tolerant or resilient.

1. Design systems in a way that minimizes opportunities for error.
2. Decouple the places where people make the most mistakes.
3. Test thoroughly from unit tests to whole-system integration tests.
4. Minimize impact in the case of failure
5. Setup monitoring referred to as telemetry

Response time is what the clients sees besides the actual time to process the request (service time) caused by network delays and queueing delays
Latency is the duration that a request is waiting to be handled awaiting service

Tail latency amplification: even if a small percentage of backend calls are slow the chance of getting a slow call increases if an end-use request requires multiple backend calls.

Elastic: automatically add computing resources when they detect a load increase.
100k request per second at 1kB == 3 requests per minute each 2 GB in size have the same thought-put

Reducing complexity is a good way to write code, using abstractions.
The goal of the relational model was to hide that implementation detail behind a cleaner interface.

Impedance mismatch: Data is object-oriented form, but stored in database in a relational-way

search fields should have auto-completer to avoid mistakes

Relational database are better for data that's more interconnected.

CSS and XSL are both declarative languages, very specific in what it wants to do but more limited.
Imperative languages are more powerful but with no abstraction it more verbose and more complex.

MongoDB can run JavaScript code in query like map, reduce.
Graphs-structure data: Two relation tables that are connected. One representing vertices and one representing edges. No schema but requires recursion to reach the data from a query.

Triple-Stores: SPARQL: (subject, predicate, object) (Jim, likes, bananas)
Three main data models: Document, Relational, and Graph

Learning_Stuff

Search This Blog

Designing Data-Intensive Applications I

Comments

Post a Comment

Popular posts from this blog

Mastering Ethereum Part II

Mastering Ethereum Part I