Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: top level imports #3779

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

feat: top level imports #3779

wants to merge 8 commits into from

Conversation

dmadisetti
Copy link
Collaborator

📝 Summary

Followup to #3755 for #2293 allowing for "top level imports"

For completion of #2293, I thin UI changes and needed for enabling this behavior. Notably:

  • Indicate when in function mode (maybe top level import too)
  • Provide hints when pushed out of function mode
  • Maybe allow the user to opt out of function mode?

+ docs

This also increases security risk since code is run outside of runtime. This was always possible, but now marimo can save in a format that could skip the marimo runtime all together on restart.

There are opportunities here. marimo could lean into this, and leverage external code running as a chance to hook in (almost a plugin system for free)

But also issues, since a missing dep could stop the notebook from running at all (goes against the "batteries included" ethos). This can be mitigated with static analysis over just an import (markdown does this for instance), or marimo can re-serialize the notebook in the "safe" form, if it comes across issues in import.

🔍 Description of Changes

Includes a bit of a refactor to codegen since there were a fair amount of changes.
Allows top level imports of "import only" cells. The contents are pasted at the top of the file, with a bit of care not to break header extraction.

# Normal headers are retained
# Use a notice to denote where generated imports start
# Notice maybe needs some copy edit

# 👋 This file was generated by marimo. You can edit it, and tweak
# things- just be conscious that some changes may be overwritten if opened in
# the editor. For instance top level imports are derived from a cell, and not
# the top of the script. This notice signifies the beginning of the generated
# import section.

# Could also make this app.imports? But maybe increasing surface area for no reason
import numpy
# Note, import cells intentionally do not have a `return`
# for static analysis feature below

import marimo


__generated_with = "0.11.2"
app = marimo.App(_toplevel_fn=True)


@app.cell
def import_cell():
    # Could also make this app.imports? But maybe increasing surface area for no reason
    import numpy
    # Note, import cells intentionally do not have a `return`
    # for static analysis feature below

Top level refs (this includes @app.functions) are ignored in the signatures. E.g.

import marimo as mo

# ...

@app.cell
def md_cell():
    mo.md("Hi")
    return 

Since I was also in there, I added static analysis to ignore returning dangling defs.

@app.cell
def cell_with_dangling_def():
    a = 1
    b = 2
    return (a,) # No longer returns b since it's not used anywhere. Allowing for linters like ruff to complain.

@app.cell
def ref_cell(a):
    a + 1
    return 

LMK if too far reaching and we can break it up/ refactor. A bit more opinionated than the last PR

Test border more on being more smoke tests than unit tests, but hit the key issues I was worried about. I can break them down more granularly if needed. Also LMK if you can think of some more edgecases.

📜 Reviewers

@akshayka OR @mscolnick

Copy link

vercel bot commented Feb 13, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
marimo-docs 🛑 Canceled (Inspect) Feb 14, 2025 0:31am
marimo-storybook ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 14, 2025 0:31am

@leventov
Copy link

@dmadisetti what's the matter with hashing top-level functions in hash.py, Hasher class and surrounding logic? Will it be treated as "normal" top-level/"pure" functions whose code is hashed in the module_hash calculation, with no special treatment needed? Or maybe serialize_and_dequeue_content_refs() can add a check if the function is app.fn for fast-track instead of calling is_pure_function()? Or, is_pure_function() should be changed itself to add such a fast-track? Should/could the code hash of app.fns be saved in a field of the corresponding Cell such that it doesn't need to be re-computed?

@dmadisetti
Copy link
Collaborator Author

@leventov caching is dependent on the runtime of the app. This PR is more to expose cells as usable functions to be exported from other module + some tweaks to make notebooks look more "pythonic" for linters. When marimo first loads this file, no runtime has been initialized.

Cell level caching is going to be coupled more with changes in _runtime.executor

@dmadisetti
Copy link
Collaborator Author

Eh. Just noticed import cells without a return are not liked by ruff. That was a bit of a last minute choice to try and clean up the whitespace- I'll put it back in

Copy link
Contributor

@akshayka akshayka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, getting closer to the ideal of reusable code!

Most of the below is discussion — doesn't need to be immediately addressed, but should be addressed before top level functions are enabled.

But also issues, since a missing dep could stop the notebook from running at all (goes against the "batteries included" ethos).

Yea, this is an issue for sure. As long as the user has marimo installed, marimo edit nb.py should always work, no matter if top-level imports are missing.

This can be mitigated with static analysis over just an import (markdown does this for instance), or marimo can re-serialize the notebook in the "safe" form, if it comes across issues in import.

The former option sounds better. I wonder if we should define the file format to consist of three sections:

  1. A user-defined section, containing arbitrary text (the "header"), except for perhaps a special delimiter token.
  2. A generated section containing top-level imports, if they are missing from the user-defined section, followed by a special delimiter.
  3. Today's generated section:
import marimo

__generated_with = ...
app = marimo.App()

@app.function
def foo():
  ...

@app.cell
def bar():
  ...

In this way, marimo's Python file reader would simply skip sections (1) and (2) (based on the presence of the delimiter token), and programmatically read section 3 as it does today. If the delimiter were missing (user edited the file, or wrote from scratch), marimo would try to read the file programmatically as it does today. Just one proposal, and maybe this is similar to what you've implemented, but I do think it's worth it to write a specification for this very concretely and to document it in the codebase.

I think we should also very clearly define and document what is okay for the user to edit, and how, and what is not okay. One proposal: section 1 is fine to edit arbitrarily (except for a special delimiter?); section 2 should not be edited; section 3's cell and function definitions can be edited, cells and functions can be added, and cells and functions can be removed.

marimo/_ast/codegen.py Outdated Show resolved Hide resolved
marimo/_ast/codegen.py Outdated Show resolved Hide resolved
marimo/_ast/codegen.py Outdated Show resolved Hide resolved
Comment on lines 291 to 302
if cell.import_workspace.is_import_block:
# maybe a bug, but import_workspace.imported_defs does not
# contain the information we need.
toplevel_imports |= cell.defs
if toplevel_fn:
# TODO: Consider fn="imports" for @app.imports?
# Distinguish that something is special about the block
# Also remove the "return" in this case.
definitions[idx] = to_general_functiondef(cell, names[idx])
else:
definitions[idx] = to_functiondef(cell, names[idx])
import_blocks.append(code.strip())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only import blocks are used, then in the below, foo won't get saved as a function. I can see this being a bit confusing for users. I'm wondering if imports could be saved top-level even if they weren't in import blocks.

cell:

import random
...

Another cell

def foo():
  return random.randint(0, 43)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they can- but I think restricting to import only blocks makes sense. Consider the following block:

@app.cell
def _(run_button):
    mo.stop(run_button.value)
    import something_very_expensive_with_side_effects

# notice to separate the imports from the rest of the code.
filecontents = [NOTICE, ""]

filecontents.append("\n\n".join(import_blocks))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should imports be added unconditionally, as you've written, or should imports only be added if they are used in top-level functions?

One thought, if imports are added to the top of the file unconditionally, perhaps we should remove their corresponding defs from cell signatures, so that code completion in editors works better. However, maybe the right thing to do is just bite the bullet and write editor plugins / an LSP-like thing that handle completions for marimo notebook files, in which case my suggestion here is moot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we ruff format the import section?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should remove their corresponding defs from cell signatures, so that code completion in editors works better

Yep, this PR already does this

Also, can we ruff format the import section?

Yes, I'm leaning towards removing the statement block, stripping comments and formatting the imports.
Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that sounds good ... also see my response to your import guard idea.

@dmadisetti
Copy link
Collaborator Author

I think we should also very clearly define and document what is okay for the user to edit, and how, and what is not okay. One proposal: section 1 is fine to edit arbitrarily (except for a special delimiter?); section 2 should not be edited; section 3's cell and function definitions can be edited, cells and functions can be added, and cells and functions can be removed.

I was struggling with this because I recognized having the many imports mixed with comments seemed to leave the notebook feeling a little messy, and more confusing to the intro user. I also think that for the most part, the current serialization is great.

I wonder if part of the UI is a "library mode" flag which is required before activating this. Means we don't have to communicate this information to the casual user, and the user looking for the functionality of exports, reuse, and linting will take the time to understand "library mode".

But also, here's another potential serialization that makes these "sections" a bit more evident:

# Header comments
"""Doc strings allowed too"""

import marimo                                                                                                                                                                                                                     
                                                                                                                                                                                                                                  
if marimo.import_guard():
    # Note these imports reflect the cell content below.                                                                                                                                                                          
    # Editing this block will not change the notebook imports.                                                                                                                                                                            
    import io                                                                                                                                                                                                                     
    import textwrap                                                                                                                                                                                                               
    import typing                                                                                                                                                                                                                 
    from pathlib import Path                                                                                                                                                                                                      
                                                                                                                                                                                                                                  
    import marimo as mo                                                                                                                                                                                                           
                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                  
__generated_with = "0.11.2"                                                                                                                                                                                                       
app = marimo.App(_toplevel_fn=True)                                                                                                                                                                                               
                                          
...

Which also mitigates potential breakage, since marimo.import_guard() could always return False, and still keep linters happy.
I'm sold on reformatting the imports and stripping comments before serialization.

@akshayka
Copy link
Contributor

Yea, appreciate your attention to the intro user.

Hmm, I'd prefer not to introduce a library mode if possible, but can consider it. As an alternative I think the import_guard() idea is interesting. But it would need to return True sometimes right? For example, given

# Header comments
"""Doc strings allowed too"""

import marimo                                                                                                                                                                                                                     
                                                                                                                                                                                                                                  
if marimo.import_guard():
  import numpy as np
                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                  
__generated_with = "0.11.2"                                                                                                                                                                                                       
app = marimo.App(_toplevel_fn=True)  

@app.function
def my_function():
  return np.random.randn(10, 10)                                                                                                                                                                                                                                    
...

for

```python
from my_notebook import my_function

to work, import_guard() would need to evaluate as True. Maybe import_guard would by default be True, but perhaps when reading notebook files in marimo, we'd have a context manager:

with marimo._ast.block_imports():  # makes import_guard() evaluate to False.
  # load the notebook ...

Not sure yet if this is a good idea. Just brainstorming ...

@dmadisetti
Copy link
Collaborator Author

import_guard was relatively easy to put in, and we can strip it out. I have import_guard return True for now, but there area a few cases where False might make sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants