Values
In Stroscot values are defined to be expressions that are in normal form (strongly reduced), i.e. they evaluate to themselves. WHNF is not sufficient to ensure a value, e.g. [1,undefined]
reduces to undefined
hence is not a value. Traditionally a function is only defined on values, but lazy evaluation allows functions to produce useful behavior for non-values as well.
Values are immutable (as Rich Hickey says) and have notions of equality, hashing, literal syntax, and deconstruction. In terms of memory management values can be copied freely, and discarded if they are no longer needed.
For convenience, “value” really describes the equivalence class of expressions that reduce to a value. In particular here we describes values by their concise syntactic form. These syntactic forms typically are reducible terms, rather than normal forms - usually they reduce to a value of the compiler’s preferred data structure implementation. Programs may define the syntax here to other values.
Symbols
Symbols or identifiers are unique values. Bare symbols can be either ordinary or extended, vaguely like Raku. However symbols defined in modules are qualified to that module, hence more specific.
symbol
underscore_symbol
unícσdє-symβol
(++)
@"a long symbol"
Ordinary
An ordinary identifier matches the pattern alphanum (alphanum|((apostrophe|hyphen)alphanum))*
, i.e. a sequence of digits/letters/underscores and isolated/embedded apostrophes or hyphens.
“Alphabetic” means Unicode General Category value Letter (L), and the underscore _. “Numeric” includes characters with the Unicode General Categories value Number and Decimal Digit (Nd).
Case is significant, thus foo
, Foo
, and FOO
are distinct identifiers.
Extended
Extended symbols use an escape sequence to allow more freedom. They have the form @"sym str"
(from Zig), which allows spaces etc. following the syntax for a string. An extended symbol matches a corresponding ordinary symbol if that form exists, i.e. null == @"null"
.
In addition parsing rules for operator symbols may be suppressed with parentheses. For example (+)
is equivalent to @"+"
. There is also the empty parentheses symbol ()
, called “unit”.
Qualified
A qualified symbol is a bare symbol together with a module reference. A literal looks like qsym <module> sym123
. Normally you use the period operator .
and write module.sym
, but the period operator is a function not a constructor.
Examples of qualified symbols include the predefined symbols of the prelude, like null
, true
, and false
- the latter two forming the boolean type.
It is key to support side-by-side execution of multiple versions of the same functionality. So the module part of a qualified name includes a version number and a cryptographic hash.
Namespacing
Identifiers can be qualified by periods: a.b.c
. .
is an infix left-associative operator that binds tighter than juxtaposition. This gets resolved to a nondeterministic set of qualified names.
Term
A term is a symbol applied to other values.
tree (combine (leaf 1 2) (leaf 1 2))
(++++) a b
(*) (some (weird thing)) 12
Note that if there is an appropriate syntax rule the second example could also be written as a ++++ b
, but it would be an equivalent value.
Terms subsume algebraic data types, since an ADT value is also a symbol applied to other values. An ADT is a “free” term that has no reduction rules defined, or conversely a term is an “untyped” ADT data constructor with no restrictions on its arguments.
Term application also includes the other function syntax - keyword arguments, implicit arguments, and so on. This argument syntax is not usually reduced away, so is part of the term’s value.
Lists
A list represents an ordered sequence of values that is empty or finite, but not infinite (only terms can be infinite).
Basic list syntax is the usual syntactic sugar for list values.
[] // empty list
arr = [a, b, c]
arr
translates to a Lisp-like term list a b c
, making use of the term syntax’s capability to take variadic arguments. Heterogeneous lists are possible, list 1 "x" (int32 3)
.
Haskell’s cons operator is abandoned in favor of concatenation:
xs ++ ys // concatenation
[x] ++ xs // list with head element ``x`` and tail list ``xs``, like cons
++
is just a symbol, so you can write [1] ++ 2
for example, i.e. the term (++) (list 1) 2
.
A list has push and pop operations from head and tail so can be used as a stack, queue, or double-ended queue (dequeue).
Tuple
In Stroscot tuple is synonymous with list - they’re both immutable. There’s not a different type with different semantics like in Python or Pure. You can use the tuple syntax (a,b)
in place of list syntax [a,b]
whenever convenient.
Arrays
(Immutable) arrays are lists together with an indexing scheme. The indexing scheme specifies the length of the list and how index values map to integer indexes of the list. For example array (range_inclusive 1 3) [1,2,3]
defines a 1-based array where arr[i] = i
. Maybe there is also an element type, typed_array int32 (range_inclusive 1 3) [1,2,3]
Mutable arrays are a reference pointing to an immutable array. Operations are optimized by the memory system, so it does in-place operations where possible but can still resize the array. Conceptually you are doing (read arr)[0]
to get the first element, i.e. taking an immutable snapshot and then reading/modifying it. This is hidden normally because arr[0]
and arr[0] := 1
are overloaded to read/write mutable arrays.
There is also an array of mutable cells (bytes), similar to C pointers / arrays. You can do something like readOffset Int 0 ptr
. You can read a different type than you wrote, and it doesn’t have to be aligned (although aligned accesses may be faster depending on architecture). This type is useful for low-level munging but mutable arrays are probably safer.
[TS85] page 73 says that “allowing the size of arrays to be decided at run time […] introduces considerable implementation problems and interferes with compile-time error checking. This feature may be of only limited value in certain applications areas.” But Stroscot is based on an interpeter model - so the only time the size of an array could be decided is at run-time. Most languages these days have dynamically-sized arrays.
Tensors
Tensors are just nested lists, e.g. here is a (3,2,5)-sized rank 3 tensor:
[[[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]],
[[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]]
If you want to save a bit of bracket typing you can use reshape
on a flat list:
reshape (3,2,5)
[0, 1, 2, 3, 4,
5, 6, 7, 8, 9,
10, 11, 12, 13, 14,
15, 16, 17, 18, 19,
20, 21, 22, 23, 24,
25, 26, 27, 28, 29]
Or similarly use a 3D array:
array (range 0 3, range 0 2, range 0 5) [0,1,2,...,29]
There is also a matrix
DSL which turns semicolons into rows.
matrix [1,2;3,4]
# [[1,2],[3,4]]
Strings
A string is a sequence of bytes of a given length. Subtypes include null-terminated strings like C and UTF-8 encoded strings.
"Hello world!\n"
``Hello user ${id}``
[Enclosed text]
'string'
""" multiline
string"""
Double and single quotes are both supported, as well as a multi-line syntax. Escape sequences are defined:
\newline Backslash and newline ignored
\\ Backslash (\)
\' Single quote (')
\" Double quote (")
\a ASCII Bell (BEL)
\b ASCII Backspace (BS)
\f ASCII Formfeed (FF)
\n ASCII Linefeed (LF)
\r ASCII Carriage Return (CR)
\t ASCII Horizontal Tab (TAB)
\v ASCII Vertical Tab (VT)
\0 null byte
\ooo Byte with octal value ooo
\xhh Byte with hex value hh
\N{name} Codepoint with name, abbreviation or alias 'name' in the Unicode database
\nnnn Codepoint with decimal value nnnn. The maximum value of a codepoint is 1114111.
\uxxxx Codepoint with hex value xxxx. The maximum value is hexadecimal 10ffff.
\& Backslash and ampersand ignored. The purpose of this escape sequence is to make it possible to write a numeric escape followed immediately by a regular ASCII digit.
\^[@A-Z[\\]^_] caret control code notation (does anyone use?)
There is also a binary/hex literal syntax to abbreviate \xAA\xBB\xCC
as 0xAABBCC
: We allow various base prefixes - 0x
(hexadecimal), 0o
(octal), 0d
(decimal) and 0b
(binary). The decimal base expands to the shortest binary string that can contain that decimal. So for example 0d6 = 0b110 = bits [1,1,0]
.
base = 0[a-z]
digit = [0-9a-fA-F_]
data = base digit+
Characters
There is no explicit syntax for characters, instead a character is a Unicode string containing exactly one grapheme cluster. Unicode provides an algorithm for identifying grapheme clusters in UAX #29. The main notable feature of the algorithm is that a grapheme cluster / character is not just a single Unicode code point and may be arbitrarily long due to the use of combining characters/accents and ZWJs. For example, “G” + grave-accent is a character represented by two Unicode code points, and emojis similarly have lots of code points, as does Zalgo text. Hence a character is in general an arbitrary length sequence of codepoints and it is simplest and most correct to define a character as a type of string.
Bitvectors
Most data in a computer simply sits in storage and has no easily accessible interpretation. It is simply a sequence of bits. As such Stroscot provides bitvector values to represent binary data.
The normal form of a bitvector is just the symbol bits
applied to a list of bits, bits [1,0,1]
. The symbol marks that the list should be stored compactly.
A more compact way to write a bitvector is via a string bits "abcd\x0F"
. This syntax uses UTF-8 characters and hexadecimal escapes, but is limited to expressing bitvectors whose length is a multiple of 8.
Date/time
Date/time values are written using symbols applied to strings, lists, or records using ISO 8601 style formats, e.g. instant "2011-12-03T10:15:30.999999999Z"
, gregorianDate [2010,12,03]
, or time { hour = 10, minute = 10, second = 12.3 }
. This hides all internal representation details. Internally there is a more compact form, e.g. a 128-bit number.
Records
Records are like C structs or Python dictionaries. The order of the fields is remembered, so this data type is a list of key-value pairs.
rec = {a = 1, b = 2, c = 3}
rec.a # 1
rec[a] # 1
{a = x} = rec # x = 1
{a,b} = rec # a = 1, b = 2
# record update
rec // {b=4, d = 4}
# {a = 1, b = 4, c = 3, d = 5}
Once you get to four values, it is best to make a record with named entries instead of using a tuple.
Maps
Maps are the same as records except the fields are not ordered (set of pairs).
map {a = 1, b = 2, c = 3}
Multimap
A multimap is a map where the values are nonempty bags.
multimap {a = 1, a = 1, b = 2, c = 2, c = 3}
-- same as
map {a = bag [1,1], b = bag [2], c = bag [2,3]}
Sets
Sets are the mathematical definition, i.e. a function isElemOf : Any -> {Present|Absent}
. They may be specified by logical formulas. Finite sets may be specified as lists with no repeated values, similar to a map whose values are all the symbol Present
.
universalSet = set (\_ -> Present)
a = set [1,2,3]
-- equivalent to
b = map { 1 = Present, 2 = Present, 3 = Present }
a = set (\x -> lookup {default=Absent} b x)
More notation for sets is discussed on the Sets page.
Bags
Bags are unordered multisets, similar to a map whose values are nonnegative integers.
bag [1,1,2,3]
Priority queue
This is a bag plus an ordering operation.
Lambdas
Lambdas are first-class and hence values. Equality is determined by alpha beta eta equality (i.e., beta reduce to normal form, eta reduce, and compare modulo alpha equivalence).
Modules
Modules are also first class, they are discussed in their own page.
Infinite values
Sometimes it is useful to deal with values that are solutions to a system of equations, like let x=cons 1 x in x
. These are also values. For terms with no reduction rules, there is a way to compute the (unique) minimal graph representation, where loops in the graph represent infinite cyclic repetitions. There are also infinitely reducible expressions, e.g. let fib x y = cons x (fib y (x+y)) in fib 0 1
requires infinitely many reductions to reduce to an infinite normal form (infinite list value).
Rewriting system
A rewriting system consists of a set of rewrite rules. They are defined over a fixed abstract rewriting system called the “substitution calculus” consisting of the proofs from Stroscot’s core logic, where reduction is cut elimination. Free variables etc. are incorporated by extending the Use
rule. Terms are representatives of equivalence classes of proofs under <->*
of the substitution calculus. Contexts are similarly representatives of precontexts.
A (conditional) rewrite rule has the form l -> r | C1, ..., Cn
where l
and r
are both terms. The conditions take the form of predicates Pi(x1, ..., xm, ->)
, where the xi
are the free variables of l
and r
, and ->
is the rewrite relation of the system. An unconditional rewrite rule l -> r
is one where the conditions Ci
are always true. Example predicates are:
type predicates, term must be of a certain form
a
joins with, rewrites to, or is convertible tob
A term M
rewrites to a term N
by a rewrite rule l -> r | Ci
if, for some context C
with one hole, and substitution σ
, the propositions M <->* C[l /. σ]
, C[r /. σ] <->* N
, and Ci /. σ
all hold, where C[l]
means C
with the hole substituted by l
, and <->*
is the relation of the substitution calculus.
Pointers
Pointers are just a wrapper for particular bit patterns (integers), like pointer 0xdeadbeef
. You can do integer arithmetic and turn it into a pointer, but at least on x86-64 not all 64-bit integers are valid pointers.
References
References are like pointers but use symbols instead of integers, we’ll go with Ref r123
for syntax where r123
is a symbol. The main difference from a pointer is that you can’t do arithmetic on symbols. Most symbols are autogenerated inside the reference creation operation ref
, but you can also write reference values directly. This is mainly for convenience in debugging at the REPL, since fixed symbols are tantamount to global variables and hence are bad programming practice.
Postfix ++ and – are statements
Data Structures
Arrays
Array
Bit array
Bit field
Bitboard
Bitmap
Circular buffer
Control table
Image
Dope vector
Dynamic array
Gap buffer
Hashed array tree
Lookup table
Matrix
Parallel array
Sorted array
Sparse matrix
Iliffe vector
Variable-length array
Lists
Singly/Circular/Doubly Linked list
Array list
Association list
Self-organizing list
Skip list
Unrolled linked list
VList
Conc-tree list
Xor linked list
Zipper
Doubly connected edge list also known as half-edge
Difference list
Free list
Trees
Binary trees
AA tree
AVL tree
Binary search tree
Binary tree
Cartesian tree
Conc-tree list
Left-child right-sibling binary tree
Order statistic tree
Pagoda
Randomized binary search tree
Red–black tree
Rope
Scapegoat tree
Self-balancing binary search tree
Splay tree
T-tree
Tango tree
Threaded binary tree
Top tree
Treap
WAVL tree
Weight-balanced tree
B-trees
B-tree
B+ tree
B*-tree
Dancing tree
2–3 tree
2–3–4 tree
Queap
Fusion tree
Bx-tree
Heaps
Heap
Binary heap
B-heap
Weak heap
Binomial heap
Fibonacci heap
AF-heap
Leonardo heap
2–3 heap
Soft heap
Pairing heap
Leftist heap
Treap
Beap
Skew heap
Ternary heap
D-ary heap
Brodal queue
Bit-slice trees - each tree node compares a bit slice of key values.
Radix tree (compressed trie), Patricia tree
Bitwise trie with bitmap
Suffix tree
Suffix array
Compressed suffix array
FM-index
Generalised suffix tree
B-tree
Judy array
X-fast trie
Y-fast trie
Merkle tree
Multi-way trees
Ternary tree
Ternary search tree
K-ary tree
And–or tree
(a,b)-tree
Link/cut tree
SPQR-tree
Spaghetti stack
Disjoint-set data structure (Union-find data structure)
Fusion tree
Enfilade
Exponential tree
Fenwick tree
Van Emde Boas tree
Rose tree
Space-partitioning trees
Segment tree
Interval tree
Range tree
Bin
K-d tree
Implicit k-d tree
Min/max k-d tree
Relaxed k-d tree
Adaptive k-d tree
Quadtree
Octree
Linear octree
Z-order
UB-tree
R-tree
R+ tree
R* tree
Hilbert R-tree
X-tree
Metric tree
Cover tree
M-tree
VP-tree
BK-tree
Bounding interval hierarchy
Bounding volume hierarchy
BSP tree
Rapidly exploring random tree
Application-specific trees
Abstract syntax tree
Parse tree
Decision tree
Alternating decision tree
Minimax tree
Expectiminimax tree
Finger tree
Expression tree
Log-structured merge-tree
Hash-based structures
Bloom filter
Count–min sketch
Distributed hash table
Double hashing
Dynamic perfect hash table
Hash array mapped trie
Hash list
Hash table
Hash tree
Hash trie
Koorde
Prefix hash tree
Rolling hash
MinHash
Quotient filter
Ctrie
Graphs
Graph
Adjacency list
Adjacency matrix
Graph-structured stack
Scene graph
Decision tree
Binary decision diagram
Zero-suppressed decision diagram
And-inverter graph
Directed graph
Directed acyclic graph
Propositional directed acyclic graph
Multigraph
Hypergraph
Other
Lightmap
Winged edge
Quad-edge
Routing table
Symbol table
Piece table