Disclaimer: you might become a code-vegan.
Language and Code
In order to grok how a compiler reads code, it is helpful to think of the language you’re using to read this article: English. We’ve all encountered the glaring red SyntaxError in our development consoles, but as we’ve scratched our heads, searching for the missing semicolon, we’ve probably never stopped to think about Noam Chomsky. Chomsky defines syntax as:
“the study of principles and processes by which sentences are constructed in particular languages.”
We’ll call our “built-in” simplify() function on Noam Chomsky’s definition.
//Result: Languages order their words differently.
There are three areas of linguistic study that we can observe in relation to compilers: lexical units, syntax, and semantics. In other words, the study of the meaning of words and their relations, the study of the arrangement of words, and the study of sentence meanings(we’ve limited the definition of semantics to suit our purpose).
Take this sentence: We ate beef.
Notice how each word in the sentence can be broken down into units of lexical meaning: We/ate/beef
That basic sentence syntactically follows the Subject/Verb/Object agreement. Let us assume that this is how every English sentence must be constructed. Why? Because compilers must work according to strict guidelines in order to detect syntax errors. So, Beef we ate, though understandable, will be incorrect in our oversimplified English.
Semantically, the sentence has proper meaning. We know that multiple people have eaten beef in the past. We can strip it of meaning by rewriting the sentence as, We+ beef ate.
let sentence = “We ate beef”;
Expressions can be broken down into lexemes: let/sentence/=/ “We ate beef”/;
Semantically, our code has meaning that our machines will eventually understand via the compiler. In order to achieve semantic meaning from code, the compiler must read code. We’ll delve into that in the next section.
Note: Context differs from scope. Explaining further would go beyond the “scope” of this article.
We read from left to right. On the other hand, the compiler reads in both directions. How? With Left -Hand-Side(LHS) look-ups and Right-Hand-Side (RHS) look-ups. Let’s break them down.
LHS look-ups focus on the “left hand side” of an assignment. What this really means is that it is responsible for the target of the assignment. We should conceptualize target rather than position because an LHS look-up’s target can vary in its position. Also, assignment does not explicitly refer to the assignment operator.
Check out the example below for clarification:
The function call triggers an LHS lookup for a. Why? Because passing 5 as an argument implicitly assigns value to a. Notice how the target can’t be determined by positioning at first glance and must be inferred.
Conversely, RHS look-ups focus on the values themselves. So if we go back to our previous example, an RHS lookup will find the value of a in the expression a*a;
It is important to keep in mind that these look-ups occur in the last phase of compilation, the code-generation phase. We’ll elaborate further once we get to that stage. For now, let’s explore the compiler.
Think of the compiler as a meat processing plant with several mechanisms that grind the code into a package that our computer deems edible or executable. In this example, we will be processing Expression.
First, the tokenizer dissects code into units called tokens.
Note: If this same system is able to make associations between one token and another token, and then group them together like a parser, it will be considered a lexer.
There is an intermediary step where the source code is transformed into intermediate code — usually bytecode — by an interpreter, statement by statement. The bytecode is then executed within a virtual machine.
Afterwards, the code is optimized. This involves the removal of white space, dead code, and redundant code, among many other optimization processes.
Once the code is optimized, the code-generator’s job is to take the intermediate code and turn it into a low level assembly language that a machine can readily understand. At this juncture, the generator is responsible for:
(1) making sure that the low level code retains the same instructions as the source code
(2) mapping bytecode to the target machine
(3) deciding whether values should be stored in register or memory and where values should be retrieved. Here’s where a code-generator performs LHS and RHS look-ups. Simply put, an LHS look-up writes to memory the target’s value and an RHS look-up reads value from memory. If a value is stored in both cache and register, the generator will have to optimize by taking the value from register. Taking values from memory should be the least preferred method.
(4) deciding the order in which instruction should be executed.
Beyond just flipping images and colorizing them, your brain can fill in blank spaces based on its ability to recognize patterns, like a compiler’s ability to read values from cached memory.
So if we write, please give us a round of ______,you should easily be able to execute that code.