Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The reason Spring Data JPA generates a count(*) query when executing the existsById() method. #3814

Closed
hky035 opened this issue Mar 14, 2025 · 8 comments

Comments

@hky035
Copy link

hky035 commented Mar 14, 2025

Hello.

I am studying Spring Data JPA and I have a question about the generated query when executing the existsById().

Despite of the Id is a unique column, is there a specific reason for using a count(*) query?

Out of curiosity, I tested a method like existsByName() for a regular column and found that it generates a query using LIMIT, which actually performed better.

I am curious why a count(*) query is used when checking existsById(ID id) in SimpleJpaRepository.

// SimpleJpaRepository
    public boolean existsById(ID id) {
        Assert.notNull(id, "The given id must not be null");
        if (this.entityInformation.getIdAttribute() == null) {
            return this.findById(id).isPresent();
        } else {
            String placeholder = this.provider.getCountQueryPlaceholder();
            String entityName = this.entityInformation.getEntityName();
            Iterable<String> idAttributeNames = this.entityInformation.getIdAttributeNames();
            String existsQuery = QueryUtils.getExistsQueryString(entityName, placeholder, idAttributeNames);
       ...
// PersistenceProvider
    public String getCountQueryPlaceholder() {
        return "x";
    }
// QueryUtils
public static String getExistsQueryString(String entityName, String countQueryPlaceHolder, Iterable<String> idAttributes) {
        String whereClause = (String)Streamable.of(idAttributes).stream().map((idAttribute) -> {
            return String.format("%s.%s = :%s", "x", idAttribute, idAttribute);
        }).collect(Collectors.joining(" AND ", " WHERE ", ""));
        String var10000 = String.format("select count(%s) from %s x", countQueryPlaceHolder, entityName);
        return var10000 + whereClause;
    }

From my perspective, it seems that the query generation method only for existsById() could be defined within QueryUtils.

I am curious about this matter.

I apologize if my poor English skills made it difficult for you to read.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Mar 14, 2025
@mp911de
Copy link
Member

mp911de commented Mar 14, 2025

There are several aspects. COUNT(*) is the ideal form for counting as it allows the database to optimize the query. We do not want to count identifiers as their value could be null and we don't want to make any nullability assumptions here. COUNT does not count null values.

Also, Hibernate is a bit more lenient on the JPQL standard. Standard JPQL COUNT accepts identification variables or path expressions but not *. I'm not sure about the impact of LIMIT with count queries though.

@schauder
Copy link
Contributor

I think you misunderstood the LIMIT thing.

It is not about applying a limit to the count query. It's about replacing it.

Instead of select count(*) from x we could do select 1 from x limit 1.

@hky035
Copy link
Author

hky035 commented Mar 15, 2025

@mp911de
Thank you for your comment.

I have one more question.

I know Identifiers cannot be null.
The identifier field annotated with @Id cannot be null, so isn't there no possibility of null at all?

Have a great day!

@hky035
Copy link
Author

hky035 commented Mar 15, 2025

@schauder
Thank you for your comment.

I think I may have asked the question incorrectly due to my limited English skills.

Once again, I appreciate your kind answer!

@mp911de
Copy link
Member

mp911de commented Mar 17, 2025

A COUNT(*) query using a primary key index is probably no different from SELECT 1 FROM … LIMIT 1 WHERE …. Care to create a benchmark using JMH (we have already some benchmarks so you can use these as inspiration) and submitting a pull request that changes the exists query (also the one in PartTreeJpaQuery) so that we can explore the changed behavior?

I think I may have asked the question incorrectly due to my limited English skills.

No worries. Your English is fine.

@schauder
Copy link
Contributor

A count(*) typically still requires a full index scan, while the limit variant just needs to access a single block.

But a benchmark will tell.

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Mar 17, 2025
@spring-projects-issues
Copy link

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@spring-projects-issues spring-projects-issues added the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Mar 24, 2025
@spring-projects-issues
Copy link

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.

@spring-projects-issues spring-projects-issues closed this as not planned Won't fix, can't repro, duplicate, stale Mar 31, 2025
@spring-projects-issues spring-projects-issues removed status: waiting-for-feedback We need additional information before we can continue status: feedback-reminder We've sent a reminder that we need additional information before we can continue status: waiting-for-triage An issue we've not yet triaged labels Mar 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants