Skip to main content

Compilers Principles, Techniques, and Tools Part 1

Chapter 1 Intro to Compiling:

A compiler take the "source" language and translate to another language "target" language. It is important for the compiler to report to the user any errors in the source program.

There are two parts of compilation: analysis and synthesis.
Analysis: breaks down the source program into a tree structure, called syntax tree.
Synthesis: constructs the desired target program from the intermediate representation.

source program -> compiler -> assembler -> loader/link-editor

Lexical Analysis: linear analysis or scanning, how computer reads the source program, reads the characters in source program and group them into a stream of tokens, it identifies keywords,  expressions and statements.

Syntax Analysis: parsing, grouping the tokens of the source program into grammatical phrases to synthesize output.

Semantic Analysis: Checks for semantic errors then gathers type information for the subsequent code-generation phase.

lexical analyzer -> syntax analyzer -> semantic analyzer -> intermediate code generator -> code optimizer -> code generator -> target program

during each of the phases above the program may encounter errors, compiler may try to deal with the  errors and keep going while other errors most stops the program.

Last step of the compiler generates machine code or assembly code. memory are allocated for the program.

Preprocessors: produce input to compilers, macro processing, file inclusion, ration preprocessors, language extension.

Assemblers: assembly code is mnemonic version of machine code, which names are used instead of binary codes for operation, names are given memory addresses. Some assembly takes two passes at the input. First identify and denote storage location, second translate operation into bits.

Chapter 2 A simple one-pass compiler:

Syntax: what the language looks like
Semantics: what the language mean

character stream -> lexical analyzer -> token stream -> syntax directed translator -> intermediate representation

syntax definition:

if (expression) statement else statement
keyword(terminals) token(nonterminals)

some operands has a higher precedence 9+5*2

Depth-First Traversals: visit an unvisited child of a node whenever it can, tries to visit nodes as far away from the root as quickly as it can.

9-5+2
95-2+

9 and 5 are the bottom of the tree as leaves, one level up is - then 2 then lastly +
Tree is constructed by precedence level, operators at the same precedence level are evaluated left to right. Where the left most operators has less precedence than right operators.

Evaluation is then done on the stack.

expression -> expression + term = infinite loop
no changes to the input takes between recursive calls

The current token being scanned in the input is frequently referred as the lookahead symbol.

count = count + increment;
id = id + id;

input -> lexical analyzer -> parser

A data structure called a symbol table is generally used to store information about various source language constructs.

FrontEnd constructs the intermediate representation of the source program.
BackEnd generates the target program.

Stack manipulation:

push v: push v onto the stack
rvalue l: push contents of data location l
lvalue l: push address of data location l
pop: throw away value of top of the stack
:=  the r-value on top is placed in the l-value below it and both are popped
copy: push a copy of the top value on the stack

a+b

rvalue a
rvalue b
+

Control Flow:

label l: target of jumps to l, no other effect
goto l: next instruction is taken from statement with label l
gofalse l: pop the top value; jump if it is zero
gotrue l: pop the top value; jump if it nonzero
halt: stop execution

Lexical Analysis: Reads the the code strip out white space, identify symbols and tokens
Parser: Check for syntax error, constructs functions.

to be continued... page 72 


Comments

Popular posts from this blog

Mastering Ethereum Part II

Digital signature prove knowledge of a secret without revealing it. Account address are derived directly from private keys. Public key cryptography also known as asymmetric cryptography. Public key can be derived from private keys. A digital signature is code that is produced with private key and the transaction details (the message). A private key is a number between 1 and 2^256 The public key is on a point on an elliptic curve, with an X and Y value. PubK = PrivKey * G a constant generator point Ethereum cryptographic Hash function: Keccak-256 (not the finalized SHA-3 different output) Addresses are hex numbers dervied from the last 20 bytes of the public key. Inter exchange client address protocol (ICAP) checksum prevent wrongly input address. Wallet: serves as the primary user interface for the user to access money, managing keys, address, creating and signing transaction. But at it's core it's a data structure that acts as a container to store the private ke...

Mastering Ethereum Part I

Account contains: -Address (rightmost 160 bits of Keccak(SHA-3) hash of public key) -Balance -Nonce -Storage and code Assert(false) compiles to 0xfe, uses up all gas and reverts all changes Big-endian: most significant digits first Little-endian: least significant digits first BIPS: bitcoin improvement proposals Byte code: numeric format virtual machine executable Contract account: An account containing code that executes when receiving a transaction from another account. Contract creation transaction: Special transaction with "zero address" as recipient that register a contact and recode it on the Ethereum blockchain Digital signature: Using a private key which the user produces a short string that can prove with the corresponding public key plus the signature(short string) combine to verify that document(transaction) was created by the user who owns the private key. Gas: The computational cost of an execution on a smart contract. Turing Complete: Gener...

Designing Data-Intensive Applications I

The main goals of designing a data-Intensive applications: 1. Reliability: Tolerating hardware and software faults, human error 2. Scalability: Measuring load & performance, latency percentiles, throughput 3. Maintainability: Operability, simplicity and evolvability Databases: storing data Caches: remembering expensive operation Search Indexes: allow users to search data by keywords Stream processing: send message from one process to another Batch processing: periodically crunch a large amount of accumulated data Redis: datastore used as message queses Kafka: message queues with database-like durability guarantees Systems that anticipate faults and cope with them are called fault-tolerant or resilient. 1. Design systems in a way that minimizes opportunities for error. 2. Decouple the places where people make the most mistakes. 3. Test thoroughly from unit tests to whole-system integration tests. 4. Minimize impact in the case of failure 5. Setup monitoring refe...