Fp8 model init factory #880

sudhakarsingh27 · 2024-05-30T02:23:37Z

Description

Trying to bake in fp8_model_init into layer initialization.

The fp8_model_init context manager needs to be then added/managed by the user. Baking it into TE layer initialization would allow that in addition to being just an argument.

When trying to integrate with larger code bases like megatron or HF accelerate, we just need to pass the argument otherwise we'll have to figure out a place to add this context manager. So theoretically, this would result less code change.

One thing I'm not sure about is if calling fp8_model_init per layer is fine.

(Another though: could this potentially also allow selectively controlling to which layer to apply fp8 weights and is that helpful?)

@ptrendx @timmoon10 @ksivaman, do you think this makes sense?

ksivaman · 2024-05-30T03:59:15Z

Could you outline the motivation behind this? Currently we have the fp8_model_init user API that works as a context manager and IIRC this is trying to effectively expose the same as a parameter. Why?

sudhakarsingh27 · 2024-05-30T16:08:16Z

The CM API needs to be then managed and added by the user. But this change would allow that in addition to just an argument.

When trying to integrate with larger code bases like megatron or HF accelerate, we just need to pass the argument otherwise we'll have to figure out a place to add this context manager. So theoretically, this would result less code change but I'm not sure if calling fp8_model_init per layer is fine. (Could this potentially also allow selectively controlling which layer to only do fp8 weights and is that helpful?)

timmoon10 · 2024-05-31T02:29:03Z

I think initializing FP8 weights with a constructor kwarg makes a lot of sense. In effect, the fp8_model_init context is an indirect way of passing a boolean arg to the module constructors (although it has the advantage/disadvantage of setting all modules to the same value). For backward compatibility, how about an API like:

class Linear(TransformerEngineBaseModule):

    def __init__
        self,
        ...,
        with_fp8_weight: Optional[bool] = None,
    ):
        if with_fp8_weight is None:
            with_fp8_weight = FP8GlobalStateManager.with_fp8_parameters()

        ...

sudhakarsingh27 added 2 commits May 30, 2024 02:07

bake fp8_model_init into the layer

5bdfa6e

bake fp8_model_init into module init

5530a8a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fp8 model init factory #880

Fp8 model init factory #880

sudhakarsingh27 commented May 30, 2024 •

edited

Loading

ksivaman commented May 30, 2024

sudhakarsingh27 commented May 30, 2024 •

edited

Loading

timmoon10 commented May 31, 2024

Fp8 model init factory #880

Are you sure you want to change the base?

Fp8 model init factory #880

Conversation

sudhakarsingh27 commented May 30, 2024 • edited Loading

Description

ksivaman commented May 30, 2024

sudhakarsingh27 commented May 30, 2024 • edited Loading

timmoon10 commented May 31, 2024

sudhakarsingh27 commented May 30, 2024 •

edited

Loading

sudhakarsingh27 commented May 30, 2024 •

edited

Loading