The main goals of designing a data-Intensive applications:
1. Reliability: Tolerating hardware and software faults, human error
2. Scalability: Measuring load & performance, latency percentiles, throughput
3. Maintainability: Operability, simplicity and evolvability
Databases: storing data
Caches: remembering expensive operation
Search Indexes: allow users to search data by keywords
Stream processing: send message from one process to another
Batch processing: periodically crunch a large amount of accumulated data
Redis: datastore used as message queses
Kafka: message queues with database-like durability guarantees
Systems that anticipate faults and cope with them are called fault-tolerant or resilient.
1. Design systems in a way that minimizes opportunities for error.
2. Decouple the places where people make the most mistakes.
3. Test thoroughly from unit tests to whole-system integration tests.
4. Minimize impact in the case of failure
5. Setup monitoring referred to as telemetry
Response time is what the clients sees besides the actual time to process the request (service time) caused by network delays and queueing delays
Latency is the duration that a request is waiting to be handled awaiting service
Tail latency amplification: even if a small percentage of backend calls are slow the chance of getting a slow call increases if an end-use request requires multiple backend calls.
Elastic: automatically add computing resources when they detect a load increase.
100k request per second at 1kB == 3 requests per minute each 2 GB in size have the same thought-put
Reducing complexity is a good way to write code, using abstractions.
The goal of the relational model was to hide that implementation detail behind a cleaner interface.
Impedance mismatch: Data is object-oriented form, but stored in database in a relational-way
search fields should have auto-completer to avoid mistakes
Relational database are better for data that's more interconnected.
CSS and XSL are both declarative languages, very specific in what it wants to do but more limited.
Imperative languages are more powerful but with no abstraction it more verbose and more complex.
MongoDB can run JavaScript code in query like map, reduce.
Graphs-structure data: Two relation tables that are connected. One representing vertices and one representing edges. No schema but requires recursion to reach the data from a query.
Triple-Stores: SPARQL: (subject, predicate, object) (Jim, likes, bananas)
Three main data models: Document, Relational, and Graph
1. Reliability: Tolerating hardware and software faults, human error
2. Scalability: Measuring load & performance, latency percentiles, throughput
3. Maintainability: Operability, simplicity and evolvability
Databases: storing data
Caches: remembering expensive operation
Search Indexes: allow users to search data by keywords
Stream processing: send message from one process to another
Batch processing: periodically crunch a large amount of accumulated data
Redis: datastore used as message queses
Kafka: message queues with database-like durability guarantees
Systems that anticipate faults and cope with them are called fault-tolerant or resilient.
1. Design systems in a way that minimizes opportunities for error.
2. Decouple the places where people make the most mistakes.
3. Test thoroughly from unit tests to whole-system integration tests.
4. Minimize impact in the case of failure
5. Setup monitoring referred to as telemetry
Response time is what the clients sees besides the actual time to process the request (service time) caused by network delays and queueing delays
Latency is the duration that a request is waiting to be handled awaiting service
Tail latency amplification: even if a small percentage of backend calls are slow the chance of getting a slow call increases if an end-use request requires multiple backend calls.
Elastic: automatically add computing resources when they detect a load increase.
100k request per second at 1kB == 3 requests per minute each 2 GB in size have the same thought-put
Reducing complexity is a good way to write code, using abstractions.
The goal of the relational model was to hide that implementation detail behind a cleaner interface.
Impedance mismatch: Data is object-oriented form, but stored in database in a relational-way
search fields should have auto-completer to avoid mistakes
Relational database are better for data that's more interconnected.
CSS and XSL are both declarative languages, very specific in what it wants to do but more limited.
Imperative languages are more powerful but with no abstraction it more verbose and more complex.
MongoDB can run JavaScript code in query like map, reduce.
Graphs-structure data: Two relation tables that are connected. One representing vertices and one representing edges. No schema but requires recursion to reach the data from a query.
Triple-Stores: SPARQL: (subject, predicate, object) (Jim, likes, bananas)
Three main data models: Document, Relational, and Graph
Comments
Post a Comment