Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird behavior in accessors after deleting an instvar #122

Closed
MariusDoe opened this issue Jul 10, 2024 · 12 comments
Closed

Weird behavior in accessors after deleting an instvar #122

MariusDoe opened this issue Jul 10, 2024 · 12 comments
Labels
base system [SCOPE] Squeak's basic (language) concerns such as Kernel, Collections, Graphics, Network ignored [RATING] No longer valid or necessary. programmer [SCOPE] It is for the (maybe professional) programmer

Comments

@MariusDoe
Copy link

I have noticed a weird behavior that occurs when you delete an instvar from a class, but keep the accessors for it and call them.

Steps to reproduce

  1. Create a class Dummy
  2. Add an instvar myVar
  3. Add accessors myVar and myVar (^ myVar and myVar := anObject, respectively)
  4. Delete the instvar myVar
  5. Open a workspace
  6. Execute Dummy new myVar: 42.
  7. Print Dummy new myVar (note: this is a new instance, do not keep the instance from the previous step)

Expected behavior

One of the following:

  • An error is shown in step 4 (something like The instVar myVar is still referenced in these 2 methods: ...)
  • An error is shown in step 6 (something like The instVar myVar does not exist)
  • nil is printed in step 7

Actual behavior

42 is printed in step 7

Analysis

The problem occurs because after step 4, the accessors use the bytecodes pushLitVar: and popIntoLit:, with the referenced literal being the global Undeclared associationAt: #myVar. This results in the myVar: call storing the 42 into the Association's value, which the myVar call on an unrelated instance will read, because the method references the same (global) Association in its literals.

I have traced the reason for the use of these bytecodes:

  1. Class>>compileAllFrom: is called during step 4
  2. During parsing, Parser>>variable is called
  3. It doesn't find the variable myVar in scope
  4. It calls Parser>>correctVariable:interval:
  5. Because the recompilation of all methods is not interactive, Encoder>>undeclared: is called
  6. Encoder>>global:name: is called with Undeclared associationAt: #myVar
  7. A LiteralVariableNode with the association as its key is returned
  8. Afterwards, either an AssignmentNode (for the setter) or a ReturnNode (for the getter) is created
  9. After parsing, MethodNode>>generate: is called
  10. In the setter case, AssignmentNode>>emitCodeForEffect:encoder: calls LiteralVariableNode>>emitCodeForStorePop:encoder:, which calls Encoder>>genStorePopLiteralVar:
  11. In the getter case, ReturnNode>>emitCodeForValue:encoder: (indirectly) calls LiteralVariableNode>>emitCodeForValue:encoder:, which calls Encoder>>genPushLiteralVar:
  12. Both reference the global Association, which was previously written into the literals of the Encoder
@LinqLover
Copy link
Contributor

Thanks for the report and the deep-dive analysis @MariusDoe! I wonder however whether this could be accepted behavior ...

Expected behavior

One of the following:

  • An error is shown in step 4 (something like The instVar myVar is still referenced in these 2 methods: ...)

This might be a helpful enhancement to our tooling (similar to #123), but we can or should probably not raise an error right from the Class/ClassBuilder. Such incidents probably occur often during automatic actions such as loading packages, installing updates, etc., and I think there needs to be some automatic fallback. In some situation, the ill-defined state might even be transient until another class definition is loaded/updated.

  • An error is shown in step 6 (something like The instVar myVar does not exist)

For the same reason as above, I don't think this would be a good solution (though probably fun to implement by inserting some extra error sends through the encoder). IMHO it also sounds a bit too much like a ticking time bomb that you cannot see by looking at the source code.

  • nil is printed in step 7

I actually had assumed this would be the default behavior. :o But then again, I remember a situation where this wasn't the case but I was glad about it ...

Once I added an extension method to a class and wanted to store some extra state related to the receiver object. Because I was inadvertent/too lazy to define an extra class that holds this state (usually through a WeakIdentityKeyDictionary), I just added an instance variable to the class. Of course, when storing this extension method in a package or change set, the changed class definition is not applied, so checking out this patch in another image leads to an undeclared instance variable. Thanks to the current behavior, the extension method can however still manage the state of the object ... as long as it's a singleton. Otherwise, things might get very confusing, including possible memory leaks. But for singletons and class variables (which IIRC use the same mechanism), it's a helpful fallback.

Still, one might argue that this is too dangerous as your example with unrelated instances shows, and might prevent programmers from defining their state properly in the first place. I just realize that this would even lead to shared state between instances of different classes. Oh no.

Another alternative might be to maintain per-instance values in the Undeclared table, something like (Undeclared associationAt: 'myVar') at: anObject or (Undeclared associationAt: {'myVar'. anObject}. But this would further increase the hidden complexity.

Other opinions from the room? :-)

@squeak-smalltalk-bot
Copy link

squeak-smalltalk-bot commented Jul 10, 2024 via email

@LinqLover
Copy link
Contributor

But that would not stop us from mapping the undeclared variables references to nil as Marius suggested. That is, ^undeclaredInstVar returns nil, and undeclaredInstVar := 42 becomes a no-op.
Is the current behavior of global state actually desired, or was it just easier to imlpement?

Best,
Christoph

@squeak-smalltalk-bot
Copy link

squeak-smalltalk-bot commented Jul 10, 2024 via email

@squeak-smalltalk-bot
Copy link

squeak-smalltalk-bot commented Jul 10, 2024 via email

@LinqLover
Copy link
Contributor

(Note: It seems that @squeak-smalltalk-bot is still unable to mirror the entire conversation, check out https://lists.squeakfoundation.org/archives/list/[email protected]/thread/7Q5QUNQDFNOIWNZNYB7IG2Q4OJTVKOXI/ to read all messages)

@codefrau
Copy link
Member

codefrau commented Jul 11, 2024

I would very much suggest that this discussion is ill-suited for a bug tracker (and IMHO squeak-smalltalk-bot should not be allowed to add comments since it blurs the lines between mailing list and bug tracker).

The behavior OP described is not a bug, it works exactly as designed. This issue should be closed.

One piece of information was buried in the comments above: The Transcript will indeed print a warning about each method that had a variable which became undeclared.

Everything beyond that is a design discussion that should happen on the mailing list. There could be a new ticket but again bug trackers are not for discussions.

@marceltaeumel
Copy link
Member

@MariusDoe Please make a new issue to suggest GUI/Tool improvements to better inform users/programmers about this behavior. Our tool support around undeclared things is quite poor at this point.

@marceltaeumel marceltaeumel added programmer [SCOPE] It is for the (maybe professional) programmer base system [SCOPE] Squeak's basic (language) concerns such as Kernel, Collections, Graphics, Network ignored [RATING] No longer valid or necessary. labels Jul 12, 2024
@LinqLover
Copy link
Contributor

Hi Vanessa, Marcel, all,

TBH I don't agree with this perspective on bugs.squeak.org. We don't use this as a real bug tracker in the sense of properly maintaining and tracking issues. From my impression, we only use GitHub Issues 1) to reduce the hurdle for people to report bugs by not requiring them to subscribe to the mailing list and 2) perhaps to gain some more visibility in the GitHub network. Thus, it functions (or ought to function) as an exclave or alternative interface to the mailing list: few people have subscribed to GitHub Issues, and many one-time reporters have not subscribed to the mailing list. If we had not merged both communication platforms, they would diverge. Either some people would miss information (such as about a bug having been fixed) or other people would have to write and read redundant messages. Also, there would be just another separate place that one had to search in order to find historical information about design decisions, implementation considerations, etc. Today, it is quite easy to find most of them in one place using Squeak Inbox Talk. So, I don't think the arguments of Fogel et al. can be applied here. This forum IS a part of the mailing list. :-)

IMHO squeak-smalltalk-bot should not be allowed to add comments

I implemented this behavior, which closely matches that of the OpenSmalltalkVM-Bot for vm-dev, earlier this year in agreement with Marcel on behalf of the previous board for the reasons mentioned above, and I gave the board a prior notice before the bot went live. I am surprised that it now faces existential critism after its prototype has been in place for a couple of years. Of course we can change anything, but I think we should at least give it a fair discussion.

The behavior OP described is not a bug, it works exactly as designed.

Does it? So far, I have only read "it is like that" and a good explanation of how it is implemented in this thread. I am not aware of any test that expects this behavior nor any relevant statement in the Blue Book or the ANSI standard, and personally I have not been convinced why it must be like that. IMHO Marius has made an important point about the confusion emerging from this behavior, and we can see that it introduces hidden shared state, which is arguably not better than eliminating the state in the first place but even might introduce memory leaks.

On another note. Personally, I don't like closing issues before achieving a consensus or at least encountering all arguments on the table. I would also be reluctant to put the burden of brainstorming solutions to random bug reporters. Beyond that, given that we do not really take advantage of any structure of the bug tracker, I can see little sense in closing one issue in favor of another one. Renaming seems easier. But if we do so, as a rule of thumb, I would never close an issue before either a fix has been implemented or another issue has been created. Let's not throw away the advantage of bug tickets -- perhaps the most used one on bugs.squeak.org -- of traceability by their open/closed state. Besides, the classification "Closed as completed" applied to this issue is wrong (it should be "closed as not planned", and btw this is perhaps redundant with the "ignored" label). :-) Generally speaking, I am grateful if any Squeak user of whatever level of expertise donates some time to us for sharing an observation about the system. We cannot (or should not) expect them to track an observation down to the originating cause or even suggest new design decisions (though we always welcome this kind of contributions).

+1 for brainstorming better tooling for undeclared instance variables (as we started above).
-0.5 for keeping the current behavior of the compiler in place (at least we should properly finish our discussion about that first).
-100 for using the bugs.squeak.org tool like this. :-)

Best,
Christoph

@squeak-smalltalk-bot
Copy link

squeak-smalltalk-bot commented Jul 14, 2024 via email

@LinqLover
Copy link
Contributor

But this is only an appeal to tradition, right? Given that this is only a recovery mechanism, I hope no one will seriously depend on this behavior, or are you concerned about compatibility?

And really, just how would one do much different, especially with the available performance back then - remember current machines are at least a million times faster and have about a million times more memory.

I outlined two possible behaviors above: 1) Ignoring all assignments to undeclared variables and treating them as nil, 2) Discriminating the value of an undeclared variable for each instance of the class. To my understanding, 1) should be as performant as the current behavior, 2) would increase memory consumption if you have many instance for the sake of correctness but be similar fast to the current state. I'm not sure why you are mentioning performance here. 😅

@squeak-smalltalk-bot
Copy link

squeak-smalltalk-bot commented Jul 15, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
base system [SCOPE] Squeak's basic (language) concerns such as Kernel, Collections, Graphics, Network ignored [RATING] No longer valid or necessary. programmer [SCOPE] It is for the (maybe professional) programmer
Projects
None yet
Development

No branches or pull requests

5 participants