Skip to content

Conversation

@XiWeiGu
Copy link
Contributor

@XiWeiGu XiWeiGu commented Sep 26, 2023

Add dtrsm_kernel_LN_16x4_lasx.S, dtrsm_kernel_LT_16x4_lasx.S, dtrsm_kernel_RN_16x4_lasx.S and dtrsm_kernel_RT_16x4_lasx.S.
Performance improvement on 3A5000 is as follows. Due to the high optimization of cblas_dgemm, there is limited performance improvement.
image-20230905114838969
image-20230905114838969
image-20230905114838969
image-20230905114838969

@martin-frbg martin-frbg added this to the 0.3.25 milestone Sep 26, 2023
@martin-frbg martin-frbg merged commit e2ca22f into OpenMathLib:develop Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants