Ha.nnes.dev

Why ASCII is worse and when to use it

đŸ”€

Also, announcing roc-ascii

2024-09-04

I recently released an ASCII library for the Roc language, but why? Isn’t ASCII worse than UTF-8?

Why ASCII is worse than UTF-8

Why you might want to use ASCII

Except for the control characters

The first 32 ASCII characters are mostly non-printable characters like the “end of transmission block” character or the “bell” character (which used to ring a physical bell on teleprinters). The most commonly found control characters today are the “null” character which is used to terminate strings in languages like C, the “horizontal tab” character (\t) which is displayed as horizontal whitespace, the “line feed” character (\n) which starts a new line and the “carriage return” character (\r) which returns to the start of a line on UNIX systems and on Windows is used with the line feed character to separate lines of text (\r\n). All the control characters can appear in both UTF-8 and ASCII strings, and can undermine some of the benefits of ASCII mentioned earlier. For example, when rendering the ASCII string abc␇␇␇ in the terminal, it will look like it contains three characters, but the extra ASCII bell characters bring the total length to six.

TLDR: If you know that your data will only ever contain characters in the ASCII range, then using ASCII will probably be simpler. However, using ASCII doesn’t remove all complexity from string handling, so you still need to be careful.