Description
Currently a symbol
(e.g. in Var(symbol v)
) is printed as 2 x
, meaning it's the symbol table ID 2, variable "x" from that table. For example, (Var 2 x)
.
This has issues with parsing where you have to parse two items for symbol
instead of just one, and also things look inconsistent, is it a symbol (owning) or a pointer to a symbol (non-owning)?
The best proposed idea how to fix this is to introduce:
symref = SymbolRef(int symtab_id, identifier symnym)
expr =
...
Var(symref v)
And then the current (Var 2 x)
becomes (Var (SymbolRef 2 x))
.
See #1492 (comment) for more details.
This cleans up the abstract representation of ASR, now symbol
is always owning, and SymbolRef
is always non-owning.
Important: we need to preserve the current speed, so in C++ the SymbolRef
would actually always be replaced with just a pointer to the symbol
, so the generated C++ code would be equivalent to what it is today, so no slow down. But conceptually, the pointer to symbol x
is the same as (SymbolRef 2 x)
. In languages that do not have pointers, such while printing or serializing (or Clojure), one can use (SymbolRef 2 x)
directly.
Other ideas, just for printing:
(Var (2 x))
(Var {2 x})
But it seems our experience shows that it's better to use longer, descriptive names in ASR printouts and not to use shortcuts, so that it's more obvious what the printout represents to newcomers. The ASR Clojure like printout is NOT used for performance parts of the code. For that we use binary representation where we eventually implement all tricks possible to keep the binary representation as compact as possible, or alternatively to deserialize it as quickly as possible.
Related issues: