Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vine: serverless import issue #3567

Merged
merged 13 commits into from
Nov 27, 2023
Merged

vine: serverless import issue #3567

merged 13 commits into from
Nov 27, 2023

Conversation

JinZhou5042
Copy link
Member

@JinZhou5042 JinZhou5042 commented Nov 9, 2023

Proposed changes

The previous practice to install a library in which a function uses numpy was like:

def func():
    import numpy as np
    # ......

It is unnecessary to import packages inside that function if we have import statements inserted into the preamble of the serverless library.

To achieve this, the following function was modified by adding an additional argument imports:

def create_library_from_functions(self, name, *function_list, poncho_env, init_command, add_env, imports)

The input imports argument should be formatted as:

imports = {
    'numpy': [],                                  # import numpy
    'pytorch': 'torch'                            # import pytorch as torch
    'tensorflow': ['*']                           # from tensorflow import *
    'sys': ['path', 'argv'],                      # from sys import path, argv
    'os': {'environ': 'env', 'path': 'os_path'},  # from os import environ as env, path as os_path
}

so that it can handle various import statements.

The related test examples in cctools/taskvine/test/vine_python_serverless.py and cctools/taskvine/src/examples/vine_example_function_call.py are modified accordingly.

Post-change actions

Put an 'x' in the boxes that describe post-change actions that you have done.
The more 'x' ticked, the faster your changes are accepted by maintainers.

  • make test Run local tests prior to pushing.
  • make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
  • make lint Run lint on source code prior to pushing.
  • Manual Update Did you update the manual to reflect your changes, if appropriate? This action should be done after your changes are approved but not merged.
  • Type Labels Select github labels for the type of this change: bug, enhancement, etc.
  • Product Labels Select github labels for the product affected: TaskVine, Makeflow, etc.
  • PR RTM Mark your PR as ready to merge.

Additional comments

This section is dedicated to changes that are ambitious or complex and require substantial discussions. Feel free to start the ball rolling.

@dthain
Copy link
Member

dthain commented Nov 9, 2023

Note that the imports field should be optional.

@btovar
Copy link
Member

btovar commented Nov 9, 2023

what about?

import tensorflow
import os.path

def create_library_from_functions(self, name, *function_list, poncho_env, init_command, add_env, [tensorflow, os.path])

Using the strings that then get interpreted as modules seems a little clunky to me. create_library_from_function can then inspect the vars names passed and construct the corresponding import statements.

@JinZhou5042
Copy link
Member Author

JinZhou5042 commented Nov 9, 2023

How about the new feature:

# format it as a dictionary supporting comprehensive usage
imports = { 
    'math': '',                        # import math 
    'numpy': 'np',                     # import numpy as np 
    'random': {'uniform': ''},         # from random import uniform
    'time': {'sleep': 'time_sleep'}    # from time import sleep as time_sleep 
}

# format it as a list or a string for simple usage
imports = ['os.path', 'sys', 'time']   # import os.path
                                       # import sys
                                       # import time

imports = 'tensorflow'                 # import tensorflow

Also, we support optional usage of the imports argument.

@JinZhou5042
Copy link
Member Author

JinZhou5042 commented Nov 9, 2023

Sounds good. I suggest to simplify it, and only accept the list form as:

[ ("tensorflow", "tf"), "math" ]

This is because entries such as "math": "" give the idea that we are not importing anything from that module, and using the string may lead to bugs when we interpret the argument as a sequence.

For using os.path directly I was looking at inpsect.getmodule(), but in this case it returns posixpath. It works in terms of efficiency, but the functions themselves would need import os.path to get the correct name, which is ugly.

@JinZhou5042
Copy link
Member Author

JinZhou5042 commented Nov 10, 2023

Sounds good. I suggest to simplify it, and only accept the list form as:

[ ("tensorflow", "tf"), "math" ]

This is because entries such as "math": "" give the idea that we are not importing anything from that module, and using the string may lead to bugs when we interpret the argument as a sequence.

For using os.path directly I was looking at inpsect.getmodule(), but in this case it returns posixpath. It works in terms of efficiency, but the functions themselves would need import os.path to get the correct name, which is ugly.

Thanks! This format looks more beautiful and makes more sense.
By using this structure, we only support the usage of import xxx or import xxx as xxx, we failed to present from xxx import xxx or from xxx import xxx as xxx in the library.

So whenever we want to use FromImport statements like:

from concurrent.futures import Executor, Future as F
from configuration import training_config, generator_config, discriminator_config

We should perform the argument as:

["concurrent.futures.Executor", ("concurrent.futures.Future", "F"), \
"configuration.training_config", "configuration.generator_config", \
"configuration.discriminator_config"]

And the parsed statements would be:

import concurrent.futures.Executor
import concurrent.futures.Future as F
import configuration.training_config
import configuration.generator_config
import configuration.discriminator_config

I think we can definitely do that if it makes sense to you.

If we want to keep the from xxx import statements, I recommend two approaches:

  1. use a combination of dict, list and tuple:
imports = {
    'torch': [],                                           # import torch
    'torch.nn': [('Conv2d', 'Conv'), 'MaxPool2d'],         # from torch.nn import Conv2d as Conv, MaxPool2d
    'torch.nn.functional': ['relu', ('sigmoid', 'sig')]    # from torch.nn.functional import relu, sigmoid as sig
}
  1. use the original import statements:
["import torch", \
"from torch.nn import Conv2d as Conv, MaxPool2d", \
"from torch.nn.functional import relu, sigmoid as sig"]

How do you think @btovar

@JinZhou5042 JinZhou5042 reopened this Nov 10, 2023
@JinZhou5042
Copy link
Member Author

@btovar How do you recommend?

@btovar
Copy link
Member

btovar commented Nov 13, 2023

Definitely we do not want the original import statements, as that means we are executing arbitrary code.

I suggest reducing the scope of what can be imported, e.g., this facility should be used only to import modules, and not specific objects of functions. With that, from x import y as z can always be replaced with import x.y as z. This only works if y is a module, though. To make this clear, I'd rename modules= to import_modules=.

At the end, this is an optimization, so the function should work even when not running with the library. (That is, I'd expect the function to still have a statement like import tensorflow as tf). If the function needs the import statements when creating the library, then it will be very hard to debug locally.

@dthain
Copy link
Member

dthain commented Nov 13, 2023

This is getting way too complicated. Just translate imports=[x,y,z] into import x; import y; import z; in the library preamble. The function can do something more complicated than that afterwards, if it wants to.

@JinZhou5042
Copy link
Member Author

Sounds great! Then let's use this format:

imports=["time", "os.path", ("tensorflow", "tf")]

We will translate imports into:

import time
import os.path
import tensorflow as tf

@JinZhou5042
Copy link
Member Author

New argument import_modules added to function create_library_from_functions and all related functions.

New format of argument import_modules:

import_modules = ["sys", "os.path", ("numpy", "np")]

This will be translated to:

import sys
import os.path
import numpy as np

FromImport statements should be explicitly specified inside of functions.

The doc was updated accordingly.

@dthain
Copy link
Member

dthain commented Nov 17, 2023

@JinZhou5042 please review the build failures on osx and conda.

@JinZhou5042
Copy link
Member Author

Sure, how do you think about the errors, I'm confused because it works just well in my environment.

@dthain
Copy link
Member

dthain commented Nov 20, 2023

Ok, the OSX test, we can see that package_serverize is looking for the condo-meta directory in the conda installation while it tries to add the python environment:
https://github.com/cooperative-computing-lab/cctools/actions/runs/6857988904/job/18648044731?pr=3567#step:5:1687

But Conda isn't used at all in this OSX build, and we don't want it packing a conda environment at all for the fast test.

The problem is that create_library_from_functions should be called with add_env=False inside of these tests in order to keep things running smoothly:
https://cctools.readthedocs.io/en/stable/api/html/classndcctools_1_1taskvine_1_1manager_1_1Manager.html#a94f6627704d5ebfff389833aff077967

I think you have a similar problem in the build-conda platform.

@JinZhou5042
Copy link
Member Author

Thanks for the information!

@JinZhou5042 JinZhou5042 requested a review from btovar November 20, 2023 21:05
Copy link
Member

@btovar btovar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor typo, but overall looks good!

@dthain
Copy link
Member

dthain commented Nov 27, 2023

RTM?

@JinZhou5042
Copy link
Member Author

RTM!

@dthain dthain removed the request for review from BarrySlyDelgado November 27, 2023 18:48
@JinZhou5042 JinZhou5042 requested a review from btovar November 27, 2023 20:51
@dthain dthain merged commit 2c5068c into cooperative-computing-lab:master Nov 27, 2023
@JinZhou5042 JinZhou5042 deleted the import_issue branch November 30, 2023 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants