Syntax
Unicode
The decoding algorithm is as follows: Start with bytes. Decode using UTF-8, replacing invalid bytes/characters with Unicode’s REPLACEMENT CHARACTER U+FFFD. NFC normalize the input, warning if input isn’t already normalized. Then the following algorithms in order to do lexical analysis:
line-breaking (specifically, lines end at hard / mandatory breaks)
word-breaking to split up lines into tokens
extended grapheme clustering to split up tokens into characters
Stroscot is case-sensitive, i.e. it does not do any case transformations on identifiers and it compares graphemes literally, but the grammar also does not make any syntactic distinctions based on case.