Skip to content

ggml : implement GEGLU_ERF and GEGLU_QUICK ops #14445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented Jun 29, 2025

Complimentary to the other GLU ops, used in mtmd.

Implemented for all currently GLU supported backends, except GEGLU_ERF in Vulkan due to missing erf.

@CISC CISC requested a review from ggerganov June 29, 2025 14:47
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend examples ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jun 29, 2025
Copy link
Collaborator

@qnixsynapse qnixsynapse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYCL code LGTM :)

Copy link
Collaborator

@jeffbolznv jeffbolznv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vulkan code looks fine. I wonder how much precision erf needs and if we could just handcode something in the Vulkan shader.

@CISC
Copy link
Collaborator Author

CISC commented Jun 29, 2025

Vulkan code looks fine. I wonder how much precision erf needs and if we could just handcode something in the Vulkan shader.

The following seems to work for Metal, so should be good enough for Vulkan:

// based on Abramowitz and Stegun formula 7.1.26 or similar Hastings' approximation
// ref: https://www.johndcook.com/blog/python_erf/
constant float p_erf = 0.3275911f;
constant float a1_erf = 0.254829592f;
constant float a2_erf = -0.284496736f;
constant float a3_erf = 1.421413741f;
constant float a4_erf = -1.453152027f;
constant float a5_erf = 1.061405429f;
template<typename T>
T erf_approx(T x) {
T sign_x = sign(x);
x = fabs(x);
T t = 1.0f / (1.0f + p_erf * x);
T y = 1.0f - (((((a5_erf * t + a4_erf) * t) + a3_erf) * t + a2_erf) * t + a1_erf) * t * exp(-x * x);
return sign_x * y;
}

Edit: I'll give it a try.

@CISC CISC requested a review from jeffbolznv June 29, 2025 20:28
@0cc4m 0cc4m self-requested a review June 30, 2025 05:40
Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Vulkan code works on all of my devices.

@CISC
Copy link
Collaborator Author

CISC commented Jul 1, 2025

@lhez I'll add OpenCL too, pending results of #14476

@CISC CISC requested a review from max-krasnyansky July 1, 2025 12:26
@lhez
Copy link
Contributor

lhez commented Jul 2, 2025

@CISC OpenCL looks good and works for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) examples ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants