# Programming Language Theory

The Church-Turing Thesis equates a vaguely-defined set of "computable" functions with the partial recursive functions. Several systems have been proven equivalent to the partial recursives in power, such as Turing machines and the λ-calculus (practical programming languages generally provide further syntaxes and semantics, but a Real Programmer is perfectly happy with combinatory logic and a beer). Peter J. Landin's 1965 ACM report, "A correspondence between ALGOL 60 and Church's Lambda-notation", is blamed by most textbooks for jump-starting the study of programming language theory, ensuring computer scientists fifty years later would still be exploring topoi, sheaves, Curry-Howard correspondences and fixed-point combinators.

There may, indeed, be other applications of the system than its use as a logic.-- Alonzo Church, 1932

## Contents

## Functions and Declarative Programming

Expressions in functional programs compose functions (as opposed to "values"). Backus proposed three tiers of complexity in his Turing Award lecture:

- Simply functional language (
`fp`): No state, limited names, finitely many functional forms, simple substitution semantics, algebraic laws - Formal functional system (
`ffp`): Extensible functional forms, functions represented by objects, translation of object representation to applicable form, formal semantics - Applicative state transition system (
`ast`):`ffp`plus mutable state and coarse-grained operations thereupon

*Higher-order functions* map one or more functions to a function. First-order support of functions provides composition of functions at runtime.

### Combinatory Logic

Built from only variables, a set of primitive combinators (closed λ-expressions with their own reduction rules), and function application, combinatory logic provides no *abstraction* mechanism. A Turing-complete kernel of primitive combinators is *complete*. The SKI calculus requires only the K and S combinators; Chris Barker's Iota and Jot require only one each.

#### SKI Calculus

`I ≡ λx. x`(identity)`K ≡ λx, y. x`(generation of constant functions)`S ≡ λx, y, z. (x z (y z))`(generalized function application)

#### Grammar

C-term ::= C-app | C-prim | var C-app ::= C-term C-term

#### Divergent Combinators

The *omega* combinator is divergent. It can be reduced, but has no normal form:

`omega ≡ (λx. x x) (λx. x x)`

A combinator which is in normal form, but not reduced to a value, is said to be *stuck*. One of the major objectives of static typing systems is to detect possible "stuck" reductions.

#### Fixed-Point Combinators

For a function F, its set of *fixed points* are those inputs which map to themselves. Provided algebraic functions, for instance, fixed points of a function F(x) would be found by setting *F(x) = x* and solving for the roots. Provided an ordered, finite domain, looping through the domain will find all solutions. For an infinite ordered domain, looping is a semi-decision: if there is a least fixed point, it will be found, but the loop never terminates otherwise. The *fixed-point combinators* compute fixed points of their functional inputs. Curry's *Y-combinator* was the first (it always computes the least fixed point):

`Y ≡ λf. (λx. f (x x)) (λx. f (x x))`(untyped λ-calculus)`Y ≡ S (K (S I I)) (S (S (K S) K) (K (S I I)))`(SKI calculus)

Divergence-free evaluation of the Y-combinator requires call-by-name semantics. Call-by-value semantics can make use of the Θ_{v} (Turing) or Z-combinators:

`Θ`_{v}≡ (λx. λy. (y (λz. x x y z))) (λx. λy. (y (λz. x x y z)))`Z ≡ λf. (λx. f (λy. x x y)) (λx. f (λy. x x y))`(via η-expansion on Y)

The infinitely many fixed-point combinators of untyped λ-calculus are recursively enumerable.

### Untyped λ-calculus

Two operators (function *definition* and *application*) upon one operand type (λ-expression).

- Function definition:
`(λ`*boundparam*. body) - Function application:
`function(`*actualparam*)

The body is made up of *free* and *bound* variables. Those not present in the λ's list of bound variables are free. A λ-expression with no free variables is *closed* (closed expressions are equivalent in power to combinatory logic). A λ-expression enclosed in scopes of its free variables is *meaningful*. Distinct free variables specify a λ-expression's distinct substitution classes.

#### Grammar

λ-term ::= λ-app | λ-abst | var λ-app ::= λ-term λ-term λ-abst ::= 'λ'var'.' λ-term

This abstract grammar can be augmented with associativity rules and grouping syntax (parentheses) to provide a concrete grammar. If verbosity is no issue, no associativity rules need be specified for the following grammar:

λ-term ::= λ-app | λ-abst | var λ-app ::= '(' λ-term λ-term ')' λ-abst ::= '(' 'λ'var'.' λ-term ')'

#### Evaluation via Substitution

We'll use a substitution syntax to rewrite λ-calculus strings. Heavyweight de Bruijn indexing is one substitution semantic, as is the *Barendregt convention*. The following simple semantic is due Rugaber, possibly through Pierce:

*{N / X}M* substitutes expression N for free instances of X in expression M:

- If M is a variable ({N / X}V), replace M with X if and only if M is N.
- If M is a λ-app ({N / X}(A B)), {N / X}M = (({N / X}A) ({N / X}B))
- Otherwise, M is a λ-abst ({N / X}(λV. B)).
- If X is V, M is unchanged (X is bound in M).
- If X does not appear in B, M is unchanged (no free X in M)
- If V occurs free in N, substitute N for X throughout B.
- Otherwise, the substitution in this form is invalid (but it can be made valid...)

With this syntax, we define evaluation rules. Application of the a rule preserves its *equivalence*.:

*ɑ-conversion*renames bound variables with any variable y which is not free, preserving*ɑ-equivalence*:- λx. m →
_{ɑ}λy. {y / x}m

- λx. m →
*β-reduction*replaces a formal parameter with an actual parameter, preserving*β-equivalence*:- (λx. m) n →
_{β}{n / x}m - An expression is in
*β-normal form*if no β-reductions are possible. It is otherwise a*β-redex*.

- (λx. m) n →
*η-reduction*eliminates bindings unused within a scope, preserving*η-equivalence*:- (λx. (f x)) →
_{η}f - An expression is in
*β-η-normal form*("Beta-eta-normal form") if neither β- nor η-reductions are possible- If a η-reduction is applicable, the expression is a
*η-redex*.

- If a η-reduction is applicable, the expression is a

- (λx. (f x)) →

According to the Church-Rosser Theorem, all terminating evaluations will compute the same function. Certain application orders might not terminate for a given expression, though, despite termination via other orders (proofs under arbitrary reduction ordering are said to operate under *full beta reduction*). This gives rise to the core difference between call-by-value and call-by-name semantics:

- leftmost-innermost application evaluates arguments as soon as possible, and is equivalent to leftmost call-by-value ("applicative order")
- only the outermost redexes are reduced, and the right hand side must be in β-η-normal form before reducing the left
- this is
*strict*evaluation (arguments are always evaluated, whether used or not)

- leftmost-outermost application substitutes prior to evaluation, and is equivalent to lazy call-by-name sans memoization ("normal order")

Call-by-name terminates if any order does, but might require more total reductions. Call-by-value only β-reduces abstractions, not applications. Call-by-need memoizes the results of evaluated functions, and is operationally equivalent to call-by-name if side-effects are disallowed (it can be more efficient, but also more complicated: reduction is performed on abstract syntax graphs rather than abstract syntax trees). Call-by-name is text substitution augmented by capture avoidance. Wikipedia's evaluation strategy page is pretty thorough.

#### Numerics

The integers (or any countably infinite set) can be represented via the Church encoding (or Mogensen-Scott, or others):

`0 ≡ λf. λx. x`(referred to as`c`, and ɑ-equivalent to_{0}`false`as defined below)`1 ≡ λf. λx. f x``2 ≡ λf. λx. f (f x)``3 ≡ λf. λx. f (f (f x))``n ≡ λf. λx. f`(...up thorugh^{n}x`c`)_{n}

Some basic arithmetic operations:

`plus ≡ λm. λn. λf. λx. m f (n f x)`(from*f*)^{(m + n)}(x) = f^{m}(f^{n}(x))`succ ≡ λn. λf. λx. f (n f x)`(β-equivalent to`(plus 1)`for our`1`from above)`mult ≡ λm. λn. λf. n (m f)`(from*f*)^{(m * n)}(x) = (f^{m}(x))^{n}- alternatively,
`mult ≡ λm. λn. m (plus n) c`_{0}

- alternatively,
`pow ≡ λm. λn. n m``pred ≡ λn. λf. λx. n (λg. λh. h (g f)) (λu. x) (λu. u)`(returns 0 when applied to 0)`sub ≡ λm. λn. (n pred) m`(using`pred`from above)

#### Pairs

`pair ≡ λf. λs. λb. b f s``first ≡ λp. p true``second ≡ λp. p false`

#### Logic

The Church booleans evaluate to one of their two arguments:

`true ≡ λa. λb. a``false ≡ λa. λb. b`(note that`false`is ɑ-equivalent to`c`!)_{0}

Using them, we implement basic conditional logic:

`test ≡ λa. λb. λc. a b c`(using`true`/`false`for a,`test`reduces to`b`on`true`, and`c`on`false`)`not ≡ λt. t false true`(β-equivalent to`test(A, false, true)`)`iszro ≡ λm. m (λx. false) true`

This definition of `test` will evaluate both arguments in a strict application, which is undesirable. Instead, define `if` as a function taking one argument (`true/false`), which evaluates to either the function `first` or `second`.

#### Syntactic sugar

- Left-associative application as implicit parentheses
- Use of definitions (allowing identifiers to stand in as λ-expressions)
- Currying:
`(λx, y. x + y)`rather than`(λx. (λy. x + y))` - Numeric literals rather than Church encoding
- Function definition: binding a functional abstraction to an identifier
- Pattern matching (piecewise function definition based on argument structure)
- The de Bruijn notation (not to be confused with de Bruijn indexing as a substitution scheme) is another, probably superior, syntax. It has an
*abstractor wagon*, which I whole-heartedly approve of.

### Typed Functional Programming

Two agenda were proposed for adding types to the λ-calculus:

- Church suggested explicit type annotations on all terms (simply-typed λ-calculus (λ
_{→}), "simple" due to single type constructor →) - Curry suggested inferred types (now implemented via techniques such as Hindley-Milner)

Either way, we lose some properties of the untyped λ-calculus (there's no type for the Y-combinator, for instance!). Nonetheless, the benefits of typing (especially in the presence of type inference) lead to most functional programming languages making use of a strict typing system. More complicated systems (*second-* and higher-*order λ-calculi*) such as System F (aka Girard-Reynolds polymorphic λ-calculus) and λ_{ω} (aka "simply-typed λ-calculus with type operators") exist; they can be explored along Barendregt's λ-cube, should one feel so inclined (unravel the mysteries of "F_{<:}^{ω}"!).

### Logic Programming

Non-procedural statements of relationships built atop a *unification* algorithm and database.

#### Prolog

Language elements include facts, queries, variables, rules and data structures. Queries are made relative to a base of facts expressed as *functors* on *atoms*. Anonymous variables are supported via '_'.

- All satisfactions of a query are returned, as they're discovered.
- Several queries can be conjoined via ','; unification of such a system might require
*backtracking*. - Rules are dependencies among facts, expressed as unions of intersections (multiple right-hand-sides, each with arbitrarily many conjunctions). For
`a :- b, c`:- Backward chaining -- seeking truth of
`a`via proof of`b, c` - Forward chaining -- concluding truth of
`a`via discovery of`b, c`

- Backward chaining -- seeking truth of
- Lists are formed via square brackets, and '|' divides the car and cdr of a list

Two semantics: declarative (first-order predicate logic) and procedural (DFS of the database).

- The procedural semantics involve terms (constants, atoms, functors), ground terms (a term containing no variables), and the resolvent (set of all outstanding subgoals).
- Queries with no variables can trivially be decided as true or false
- The result is otherwise the set of all query-satisfying bindings between variables and ground terms
- Resolvent grows when a RHS is substituted for a subgoal
- Resolvent shrinks when a fact matches a subgoal
*Logically-incomplete*: DFS might fail on left-recursive rules. BFS takes up a great deal of space.

- The declarative semantics reduce the resolvent processing to modus ponens:
- Any Prolog program can be represented as a set of Horn clauses...which can be transformed into clausal predicate logic...which can be transformed into predicate logic
- A Horn clause is a set of terms, exactly one of which is positive
- A clause with no negative terms corresponds to a Prolog fact
- The single positive term corresponds to the LHS of a Prolog rule
- The zero or more negative terms correspond to clauses on the RHS:

- The
*cut*operator '!' controls searching: "No further options for the current rule ought be considered"- The declarative semantics are unaffected
- Whether or not procedural semantics generate solutions might change, but not the solutions generated

Prolog statement | Logical interpretation | Horn clause (Implicitly-quantified CNF) |
---|---|---|

A :- B | B ⇒ A | ~B ∨ A |

A :- B, C, D | B ∧ C ∧ D ⇒ A ≡ ~(B ∧ C ∧ D) ∨ A | ~B ∨ ~C ∨ ~D ∨ A |

### Constraint Programming

**FIXME**

### Nondeterminism

**FIXME** impossible in purely declarative programming!

## Typing

Weak- vs strong-typing refers to the degree to which the language's typing system can be subverted, and whether this is regularly necessary. Typing insecurities include:

- Untyped pointers
- Subrange narrowing / downcasting
- Untagged variant records ("type punning")
- Procedure parameters

Static- vs dynamic-typing refers to (among other possible definitions) whether type is determined at compile-time or run-time. Among dynamic typing, *duck typing* types based on (dynamically-bound) method availability; variants of static typing include *structural typing* (relations based on form) and *nominative typing* (relations based on names). Nominative typing has (predictable) problems with anonymous types, and also name shadowing. Structural typing has issues with recursive declarations, along with ambiguities regarding constituent ordering and nomenclature; it is used primarily in ML and research, with nominative typing dominating the mainstream. The benefits of static typing include:

- Detection of many types of errors at compile-time
- "Safety = Progress + Preservation" (Harper 1994) suggests well-typed expressions ought not get stuck, and that reductions of well-typed expressions are themselves well-typed.

- Improved performance, reduced resource requirements
- Secure data abstractions can be built atop the safety of strong typing systems

Properties of a typing system include:

- Support for
*polymorphism*(dispatch based on types of actual arguments)- Parametric polymorphism: Generation of a version for necessary types (Ada generics, C++ templates, SML type parameters)

- Support for
*subtyping*, refining T as T_{0}(defining an ordering relation on types: T*subsumes*T_{0}, x ∈ T_{0}⇒ x ∈ T)**NB**:*Subclassing*(inheritance) is independent of subtyping!- Subsumption properties can't be arbitrarily enforced (decided) when subclasses override the implementation
*Soundness*refers to data structuring operation support for subtyping, such as annotation of type alterations:*Covariant*modifications retain the original ordering*Contravariant*modifications invert the original ordering*Invariant*modifications dissolve the subsumption relation

- Support for
*minimal*and*maximal*types (also known as Bottom and Top)- Bottom (no values) + Top (all values) + subtyping ⇒
*bounded type lattice*

- Bottom (no values) + Top (all values) + subtyping ⇒

### Conversion

Treatment of a value of a given type as if it had another type, possibly involving modification of the underlying representation, or storage/alignment requirements thereof. When done explicitly, conversion is referred to as *casting*; when done implicitly, it is called *coercion*.

## Objects

**FIXME** object calculus woo-hah...

## Concurrency

**FIXME**π-calculus, CSS, CSP, petri nets, dataflow variables...- Shared state
- Message passing
- Data parallel processing
- FIXME expand each

## Programming Language Design

### Syntax

Pierce describes three means for describing language syntax, using *terms* to refer to language definitions involving at least one terminal:

- Induction on successive terms (of which EBNF is a compact notation)
- Inference from axioms (terminals) and successive inference terms
- Concrete definition (generative grammar) placing successive sets of alternate terms in bijection with the ordinals

### Semantics

Pierce describes three techniques of formalizing semantics:

*Operational*semantics define one or more formal abstract machines, and attempt to exhaustively describe the language constructs in these machines' terms. Emphasis is on proof of various abstract machines' equivalence (and correspondence to actual implementations), and thus proof of an implementation's correctness. When finite state machines are used (this is only possible for simple languages), this is Plotkin's*small-step*or*structural*operational semantics; more powerful languages might make use of Kahn's*big-step*or*natural*semantics.*Denotational*semantics map between different*semantic domains*(the investigation of which constitutes domain theory) using*interpretation functions*. Domains are interesting if results have been proven for their properties, such as termination and safety results. Denotational semantics can thus lead to proof schemas for a language, but meaningful interpretation functions seem to largely require domain-driven language design.- While operational and denotational semantics define the behaviors of programs, and then prove laws about them,
*axiomatic*semantics define the behavior of the language in terms of these laws themselves. As Pierce says: "The meaning of a term is just what can be proved about it."

Van Roy and Hiridi also mention *logical* semantics, which define a statement as a model of logical theory. This is claimed to work well for declarative and logical programming models.

## Sources

- Professor Spencer Rugaber's CS6390 "Theory and Design of Programming Languages". Georgia Tech, Fall 2009.
- Hindley and Seldin.
*Lambda-Calculus and Combinators: An Introduction*, Second Edition. Cambridge University Press, 2008. - Goldberg. "On the Recursive Enumerability of Fixed-Point Combinators". BRICS Report RS-05-1, January 2005.
- Van Roy and Haridi.
*Concepts, Techniques and Models of Computer Programming*. MIT Press, 2004. - Pierce.
*Types and Programming Languages*. MIT Press, 2002. - Culler and Singh.
*Parallel Computer Architecture: A Hardware/Software Approach*. Morgan Kaufman, 1999. - Felleisen. "A Lecture on the
*Why*of*Y*". 1997. - Winskel.
*Formal Semantics of Programming Languages*. MIT Press, 1993. - Hennessy.
*Semantics of Programming Languages*. Wiley, 1990. - Barendregt.
*The Lambda Calculus: Its Syntax and Semantics*, Second Edition. North Holland, 1985. - Böhm and Berarducci. "Automatic Synthesis of Typed Lambda-Programs on Term Algebras".
*Theoretical Computer Science*39, August 1985. - Church.
*The Calculi of Lambda Conversion*. Princeton University Press, 1941.