SML: Part I

Introduction to SML

SML is a strongly-typed functional programming language.

An example

Here to begin is a simple SML declaration:

fun fact n = if n<2 then 1 else n * fact(n-1)
This declaration defines a function named fact which computes the factorial of any nonnegative integer.

Bibliography

Introduction to Part I

In Part I a very simple subset of the SML language is presented in BNF (Backus-Naur Form) notation. This description is useful in that the essence of the language with its most important features, like anonymous functions and let expressions, are given prominence. Also we give examples of integers, and tuples. Many of the rules of the grammar are simplified.

A slightly more detailed description of SML can be found in another SML, Part II. There we introduce the interaction with the user, and discuss patterns, and user-defined datatypes. Other topics are reserved for later still.

Lexical issues

Reserved words

Comments

Anything between properly nested occurrences of the delimiters (* and *) is considered a comment in SML, and is ignored.

Important parts of the SML grammar

The three most important syntactic categories of the SML language are: expressions, declarations, and types. Expressions denote objects like integers, pairs and functions. Declarations bind objects to names. Types divide expressions into different kinds. Unlike most languages, the programmer does not need to write the types as part of any declarations, but a correct program will only apply functions to the appropriate types of arguments.

Expression

The syntactic category expression is the category of all things that denote values in SML.
<expression> ::= <identifier>                             (object name)
<expression> ::= <literal>
<expression> ::= <expression> <expression>                (function application)
<expression> ::= <expression> <identifier> <expression>   (infix application)
<expression> ::= fn <identifier> => <expression>          (anonymous function)
<expression> ::=
    if <expression> then <expression> else <expression>  (conditional expression)
<expression> ::= ( <expression> ... <expression> )       (tuple)
<expression> ::= let <declaration> in <expression> end   (let expression)
<expression> ::= ( <expression> )                        (parenthesized)

Declaration

The syntactic category declaration is the category of all things that bind values to names in SML. It is possible to bind names to objects, in particular, functions. In addition, it is possible to bind datatypes (see section II), and exceptions (see section III). The scope of these bindings can be controlled using the local declaration.

<declaration> ::= val <identifier> = <expression>                (value binding)
<declaration> ::= fun <identifier> <identifier> = <expression>   (function binding)
<declaration> ::= local <declaration> in <declaration> end       (local declaration)

Type

Simple unstructured types in SML include int and bool. The type for integer numbers and boolean values repectively. Structured type include types of functions and tuples.

<type> ::= int
<type> ::= bool
<type> ::= <type> * ... * <type>
<type> ::= <type> -> <type>
For example, (2,true) has type int*bool. Some other examples follow:
239                 (* has type "int"  *)
3*45                (* has type "int"  *)
not true            (* has type "bool" *)
(3,false,27 div 5)  (* has type "int*bool*int")
fn n => n+2         (* has type "int->int" *)

Expressions

Object Name

Identifiers can be used to name objects. For example, if the identifier a has been given the value 3, then the expression a denotes that value. The identifier a may have been bound to the value in a let construct:
let
    val a = 1+2;
in
    a+a  (* "a" denotes the value 3 here *)
end

Another way an identifier can refer to an object is if it is the formal parameter of a function. For example, the x in the body of the function fun f x = x+x; stands for the value 3 when the function f is applied to the expression 1+2.

Literal

The value of some objects is based solely on their lexical structure.
<literal> ::= <integer literal>
Boolean literals true and false are predefined identifiers, so we do not consider them literals. Strings and real numbers are discussed elsewhere. We do not include tuples and functions in this syntactic category, as they are important enough to get their own syntactic category.

Integer literals are the same in SML as in most programming languages. Here are some examples:

1
2
34543
1073741823   (* largest int *)
~1073741824  (* smallest int *)
These integers are limited by size of the machine word. The maximum integer is typically 1073741823 or one less than 2 to the 30th. Negative integers begin with the tilde symbol.

Anonymous Function

The syntax of a function definition is:

fn <identifier> => <expression>
We call the function anonymous because it does not have a name. For example, the integer successor function is:
fn n => n+1
The identifier n is the name we give to the argument to the function. The function itself does not have a name. We say n+1 is the body of the function. This is the expression that is evaluated when the function is applied to an actual argument.

Here is another example:

fn n => n<0

Most of the time we want the functions we define to have names. A name can be bound to any object using the value binding construct.

val succ = fn n => n+1
val neg = fn n => n<0
This combination of a value binding and a function definition is so common it has its own special syntax (see function bindings).

We define functions in order to apply them to arguments. This is the subject of the next section.

Function Application

Function application is not indicated by any markers in the syntax, unlike some languages in which a functional call can be spotted by looking for parentheses. Parentheses are often required in SML in order to group elements in the manner desired by the programmer, but they are not part of the syntax of function application. The simple juxtaposition of two expressions signifies function application.
<expression> <expression>
If fact is a name of a function that has already been defined, then the following are examples of function application.
fact 3
fact (2+5)
fact (if 4>0 then 4 else 0)
The expression in the function position does not have to be an identifier, i.e., the name of a function, it might be the function itself, i.e., an anonymous function.
(fn x => x+1) 4
(fn n => n<2) 3
Actually, there are many different kinds of expressions that denote functions.

Infix Application

Some identifiers (especially several built-in identifiers) are known by the SML parser to have infix status. These identifiers all denote binary functions, i.e, functions of type X*Y->Z. This means their argument is a pair in which the first element as type X and the second element as type Y. Binary functions that have infix status are applied to their two arguments by writing the name of the function in between the arguments.

<expression> <identifier> <expression>
In this way, SML resembles the customary mathematical notation.
4 mod 27
5 div 7
Symbolic identifiers are treated no differently:
4+1
4*5
5>4
(4+5)<=3*1

Conditional Expression

The syntax of the conditional expression is:

if <expression> then <expression> else <expression>
The first expression must evaluate to a boolean value. The other two expressions must have the same type. The else part is not optional.

Some functions return a boolean value and hence appear in the test of a conditional expression. The functions ``='', ``<'', ``<='', ``>'', ``>='', and ``<>'' are all built-in functions returning either true or false.

if n=2  then 1 else 0            (* if n is equal to 2 ...  *)
if n<2  then 1 else 0            (* if n is less than 2 ...  *)
if n<=2 then 1 else 0            (* if n is less than or equal to 2 .. *)
if n>2  then 1 else 0            (* if n is greater than  2 .. *)
if n>=2 then 1 else 0            (* if n is greater than or equal to 2 .. *)
if n<>2 then 1 else 0            (* if n is unequal to 2 ... *)
Here are some slightly more complex conditional expressions:
if (n - 3) = 0 then 1 else 2*n
if (n mod 3) = 0 then n div 3 else n-1

let Expression

The syntax of the let expressions has the following form:

let <declaration> in <expression> end
The scope of the declaration is between the in and the end. Often the declaration is a value binding. For example, the expression
let val a = 5 in a+6 end
evaluates to 11, since a has the value 5 in the expression a+6. The binding of a to 5 is not visible outside the let construct.

Initially, it can appear that the let expression is useless. In fact, it is very important for the following reasons (and even more reasons). The let expression is used to make programs more readable by introducing names for objects, like the way we use the name pi for a certain real number. It is important for program organization, because it localizes the scope in which the programmer, a reader, or a computer must retain some context. The let expression also can be used to compute a complex computation once and use it several times. Here is such an example:
let val p = 9973 * 9967 in (p, p+2, p+4) end
The multiplication is done only once.

Tuple

A tuple is a heterogeneous group of objects treated as a unit. Syntactically, a tuple is constructed using parentheses, each element is separate by a comma.
( <expression> , ... , <expression> )
There must be a least two elements in a tuple. Here are some examples:
(2, true)
(0+1, 1+1, 2*1+1)
(1, 2, 3, 4, 5, 6)
An expression with parentheses around it
(1)
is not a 1-tuple, but just a parenthesized expression.

Parenthesized expressions

Any expression in SML may be parenthesized.
( <expression> )
Operations in SML are grouped according to the ``usual'' rules of association. Parentheses can make this association explicit or can be used to group operations in a different manner. Thus, the expression
(2+3)*4
evaluates to 20. But
2+3*4
evaluates to 14.

Declarations

Value binding

Values are given names with a value binding.

<declaration> ::= val <identifier> = <expression>
A value binding is like a constant declaration in other programming languages.

val x = 2;
val y = (2+3)*4;
val z = (x+y, x*y);

Function binding

A function binding is used to define a function and give it a name. Its syntax is similar to a value binding.

<declaration> ::= fun <identifier> <identifier> = <expression>
fun succ n = n+1
fun fact n = if n<2 then 1 else n * fact(n-1)
All functions in SML take one argument, but that argument may be a tuple.
fun f (x,y) = 2*x+y
fun g (x,y,z) = (if x<y then (y-x) else (x-y)) * z
fun h (x,y) = (x*y,x+y)

Local Declaration

A local declaration has a syntax that it much like the let construct:
<declaration> ::= local <declaration> in <declaration> end
Both constructions introduce a scope in which a private declaration declaration may be used. The difference is that the local declaration introduces a binding and the let construct is an expression, i.e., it evaluates to a value like an integer or a string. Here is an example of a local declaration:
local
   val c = 299792500;     (* velocity of light in a vacuum (m/sec) *)
in
   fun energy (mass) = mass * c * c;
end;

Exercises

Which of the following are syntactically legal expressions or declarations:
if n<2 then 1 else n-1
(fn b => if b then 3)
(fn x => x+x) (2*8)
(fn x => x) (2,3,4)
5
5 mod 15
5 mod 15 div 3
let two = 2 in two * 45 end
What is the type of the following:
(fn x => x+37)
let val x = 2 in x*x end
(1+1)
let val x = 2 in fn y => y+x*x end
Ryan Stansifer <ryan@cs.fit.edu>
Last modified: Thu Apr 11 09:13:16 EST 1996