Skip to main content

Designing Data-Intensive Applications I

The main goals of designing a data-Intensive applications:

1. Reliability: Tolerating hardware and software faults, human error
2. Scalability: Measuring load & performance, latency percentiles, throughput
3. Maintainability: Operability, simplicity and evolvability

Databases: storing data
Caches: remembering expensive operation
Search Indexes: allow users to search data by keywords
Stream processing: send message from one process to another
Batch processing: periodically crunch a large amount of accumulated data

Redis: datastore used as message queses
Kafka: message queues with database-like durability guarantees

Systems that anticipate faults and cope with them are called fault-tolerant or resilient.

1. Design systems in a way that minimizes opportunities for error.
2. Decouple the places where people make the most mistakes.
3. Test thoroughly from unit tests to whole-system integration tests.
4. Minimize impact in the case of failure
5. Setup monitoring referred to as telemetry

Response time is what the clients sees besides the actual time to process the request (service time) caused by network delays and queueing delays
Latency is the duration that a request is waiting to be handled awaiting service

Tail latency amplification: even if a small percentage of backend calls are slow the chance of getting a slow call increases if an end-use request requires multiple backend calls.

Elastic: automatically add computing resources when they detect a load increase.
100k request per second at 1kB == 3 requests per minute each 2 GB in size have the same thought-put

Reducing complexity is a good way to write code, using abstractions.
The goal of the relational model was to hide that implementation detail behind a cleaner interface.

Impedance mismatch: Data is object-oriented form, but stored in database in a relational-way

search fields should have auto-completer to avoid mistakes

Relational database are better for data that's more interconnected.

CSS and XSL are both declarative languages, very specific in what it wants to do but more limited.
Imperative languages are more powerful but with no abstraction it more verbose and more complex.

MongoDB can run JavaScript code in query like map, reduce.
Graphs-structure data: Two relation tables that are connected. One representing vertices and one representing edges. No schema but requires recursion to reach the data from a query.

Triple-Stores: SPARQL: (subject, predicate, object) (Jim, likes, bananas)
Three main data models: Document, Relational, and Graph

Comments

Popular posts from this blog

Mastering Ethereum Part II

Digital signature prove knowledge of a secret without revealing it. Account address are derived directly from private keys. Public key cryptography also known as asymmetric cryptography. Public key can be derived from private keys. A digital signature is code that is produced with private key and the transaction details (the message). A private key is a number between 1 and 2^256 The public key is on a point on an elliptic curve, with an X and Y value. PubK = PrivKey * G a constant generator point Ethereum cryptographic Hash function: Keccak-256 (not the finalized SHA-3 different output) Addresses are hex numbers dervied from the last 20 bytes of the public key. Inter exchange client address protocol (ICAP) checksum prevent wrongly input address. Wallet: serves as the primary user interface for the user to access money, managing keys, address, creating and signing transaction. But at it's core it's a data structure that acts as a container to store the private ke...

Mastering Ethereum Part I

Account contains: -Address (rightmost 160 bits of Keccak(SHA-3) hash of public key) -Balance -Nonce -Storage and code Assert(false) compiles to 0xfe, uses up all gas and reverts all changes Big-endian: most significant digits first Little-endian: least significant digits first BIPS: bitcoin improvement proposals Byte code: numeric format virtual machine executable Contract account: An account containing code that executes when receiving a transaction from another account. Contract creation transaction: Special transaction with "zero address" as recipient that register a contact and recode it on the Ethereum blockchain Digital signature: Using a private key which the user produces a short string that can prove with the corresponding public key plus the signature(short string) combine to verify that document(transaction) was created by the user who owns the private key. Gas: The computational cost of an execution on a smart contract. Turing Complete: Gener...