Skip to content

Commit

Permalink
Update CHANGELOG
Browse files Browse the repository at this point in the history
  • Loading branch information
josevalim committed Jul 2, 2024
1 parent 692b13d commit c08a9b3
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 7 deletions.
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,23 @@ This release no longer supports WERL (a graphical user interface for the Erlang

#### Elixir

* [CLI] Add experimental PowerShell scripts for `elixir`, `elixirc`, and `mix` on Windows. Those provide a safer entry point for running Elixir from other platforms
* [Enumerable] Add `Enum.product_by/2` and `Enum.sum_by/2`
* [Exception] Add `MissingApplicationsError` exception to denote missing applications
* [Kernel] Update source code parsing to match [UTS #55](https://www.unicode.org/reports/tr55/) latest recommendations. In particular, mixed script is allowed in identifiers as long as they are separate by underscores (`_`), such as `http_сервер`. Previously allowed highly restrictive identifiers, which mixed Latin and other scripts, such as the japanese word for t-shirt, `Tシャツ`, now require the underscore as well
* [Kernel] Warn on bidirectional confusability in identifiers
* [Macro] Improve `dbg` handling of `if/2`, `unless/2`, and code blocks
* [Process] Handle arbitrarily high integer values in `Process.sleep/1`
* [String] Inspect special whitespace and zero-width characters using their Unicode representation

#### ExUnit

* [ExUnit] Support parameterized tests on `ExUnit.Case`

#### IEx

* [IEx] Add `:dot_iex` support to `IEx.configure/1`

### 2. Bug fixes

### 3. Soft deprecations (no warnings emitted)
Expand Down
8 changes: 1 addition & 7 deletions lib/elixir/pages/references/unicode-syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ The characters allowed in identifiers are the ones specified by Unicode. General

Elixir imposes many restrictions on identifiers for security purposes. For example, the word "josé" can be written in two ways in Unicode: as the combination of the characters `j o s é` and as a combination of the characters `j o s e ́ `, where the accent is its own character. The former is called NFC form and the latter is the NFD form. Elixir normalizes all characters to be the in the NFC form.

Elixir also disallows mixed-scripts in most scenarios. For example, it is not possible to name a variable `аdmin`, where `а` is in Cyrillic and the remaining characters are in Latin. Doing so will raise the following error:
Elixir also disallows mixed-scripts which are not explicitly separated by `_`. For example, it is not possible to name a variable `аdmin`, where `а` is in Cyrillic and the remaining characters are in Latin. Doing so will raise the following error:

```text
** (SyntaxError) invalid mixed-script identifier found: аdmin
Expand All @@ -34,12 +34,6 @@ Make sure all characters in the identifier resolve to a single script or a highl
restrictive script. See https://hexdocs.pm/elixir/unicode-syntax.html for more information.
```

The character must either be all in Cyrillic or all in Latin. The only mixed-scripts that Elixir allows, according to the Highly Restrictive Unicode recommendations, are:

* Latin and Han with Bopomofo
* Latin and Japanese
* Latin and Korean

Finally, Elixir will also warn on confusable identifiers in the same file. For example, Elixir will emit a warning if you use both variables `а` (Cyrillic) and `а` (Latin) in your code.

That's the overall introduction of how Unicode is used in Elixir identifiers. In a nutshell, its goal is to support different writing systems in use today while keeping the Elixir language itself clear and secure.
Expand Down
2 changes: 2 additions & 0 deletions lib/elixir/unicode/tokenizer.ex
Original file line number Diff line number Diff line change
Expand Up @@ -509,6 +509,8 @@ defmodule String.Tokenizer do
end
end

# Support script mixing via chunked identifiers (UTS 55-5's strong recommends).
# Each chunk in an ident like foo_bar_baz should pass checks.
defp chunks_single?(acc),
do: chunks_single?(acc, @top)

Expand Down

0 comments on commit c08a9b3

Please sign in to comment.