Skip to content

Refactored task manifest persistence to use JPA #3614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

mminella
Copy link
Contributor

@mminella mminella commented Nov 4, 2019

This commit refactors the persistence mechanism of the task execution
manifest to use JPA instead of raw JDBC. This was done to be consistent
with the rest of the Spring Cloud Data Flow entities.

Resolves #3560

This commit refactors the persistence mechanism of the task execution
manifest to use JPA instead of raw JDBC.  This was done to be consistent
with the rest of the Spring Cloud Data Flow entities.

Resolves spring-cloud#3560
public String toString() {
return new ToStringCreator(this)
.append("manifest", this.manifest)
.append("platformName", this.manifest.getPlatformName())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the taskDeploymentRequest as well here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we are trying to return the String representation of the TaskExecutionManifest here. Now that I think of it again, the manifest would simply return the toString() of Manifest as it doesn't have an explicit one. Also, it would be nice we call this.manifest.toString() and have the toString() implementation inside the Manifest subclass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to have a toString() representation of the entire Manifest, we need to have invoked the sanitization methods to mask the underlying properties as well.

Copy link
Contributor Author

@mminella mminella Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, even if we use the taskDeploymentRequest, we'd still need to sanitize. If you are ok with it, I'd rather just use the whole manifest and sanitize the values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather just use the whole manifest and sanitize the values.

Yes, this is a better option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually just removed the toString method all together. I wasn't using it. The reason it was there was the PoC that created that class just used that to do the serialization. Since this class didn't have visibility to the TaskSerializer class (and I didn't want to move it), I just removed the method.

return new ToStringCreator(this)
.append("manifest", this.manifest)
.append("platformName", this.manifest.getPlatformName())
.append("subTaskDeploymentRequests", this.manifest.getSubTaskDeploymentRequests())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a check if the list is empty before appending?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilayaperumalg In your CTR testing, did you see if this field we even used? If it wasn't, we can remove it completely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed noticing that the subTaskDeploymentRequests are not used at all :-) All we need here is to use the original appDeploymentRequest as TaskDeploymentRequest and the CTR will take care launching the corresponding apps. This means, the entire subTaskDeploymentRequests can be removed from the TaskExecutionManifest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was done in another commit that I'll pick up when I rebase...

@@ -28,10 +28,26 @@
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-deployer-spi</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try not to move things from registry to core package as we're been trying to keep deps tidy in core package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to move that stuff around...but in order for the JSON serialization to work I didn't have a choice. I'm open to other ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why serialization wouldn't work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ObjectMapper needs to have visibility to the classes that are being serialzied/deserialized. In this case, embedded within the manifest are the Resource for the app. That means we need access to those classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the objectMapper is customized in spring-cloud-dataflow-server-core, ResourceDeserializer could also be there instead in spring-cloud-dataflow-core where it sees everything. Thus you don't need to move this stuff around. Unless I missed something else wouldn't that work then.

@@ -42,13 +42,15 @@ public TaskExecution sanitizeTaskExecutionArguments(TaskExecution taskExecution)
return taskExecution;
}

public TaskManifest sanitizeTaskManifest(TaskManifest taskManifest) {
public TaskExecutionManifest sanitizeTaskManifest(TaskExecutionManifest taskManifest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can rename this method as well (to have TaskExecutionManifest)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason the methods in this class are not static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (although still wondering why these methods are not static)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason why this can't be static. I think we are ok both ways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it would be nice to not have instances of this class floating around everywhere it's needed if we can make them static. If you're ok with it, I'll make the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Though it is only used by the task configuration beans, it makes sense to have a static method for this. Please go ahead!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

private ObjectMapper objectMapper;

public TaskManifestConverter() {
this.objectMapper = new ObjectMapper();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been trying to use a shared single object mapper which gets configured in a boot way and given to us from a boot. This way things around it are not scattered in different places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a JPA converter a bean so that you can use DI? If so, I'd be interested in learning how.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it just work as bean per github.com/spring-projects/spring-framework/issues/20852

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer (never knew that)! I'll update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*
* @since 2.3
*/
public interface TaskManifestRepository extends PagingAndSortingRepository<TaskExecutionManifest, Long> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskManifestRepository -> TaskExecutionManifestRepository ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jvalkeal
Copy link
Contributor

jvalkeal commented Nov 5, 2019

There's a lot of merge conflicts in DefaultTaskExecutionService due to changes what just got merged in to master.

numberOfDeletedTaskManifestRows += this.taskManifestRepository.deleteTaskExecutionManifestByTaskExecutionId(taskExecutionId);
}

this.entityManager.flush();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on why this is necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete below done via dataflowTaskExecutionDao would not even be in a same transaction, afaik, as JdbcTemplate and hibernate are probably working via different jdbc connections. I don't think you can do delete operations like this.

Copy link
Contributor Author

@mminella mminella Nov 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jvalkeal The JpaTransactionManager (which SCDF is using) supports the same transaction across the two. You can read more about it in the documentation for it here: https://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/orm/jpa/JpaTransactionManager.html

@ilayaperumalg The reason it is required is because the EntityManager won't execute the query until the flush happens. Because of this, without the flush, the following call to delete the task execution records fails due to the constraint between the manifest table and the task execution table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from TaskExecutionsDocumentation.taskExecutionRemoveAndTaskDataRemove() which seem to fail all the time when I commented out that flush there's no test failures. It might work better if you try to delete the actual entity instead of trying to create a query for it. There has to be better way that hacking into EntityManager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a skipper's Release entity we used these in a field referencing to other table:

@OneToOne(cascade = { CascadeType.ALL })
@JoinColumn(foreignKey = @ForeignKey(name = "fk_release_manifest"))
private Manifest manifest;

Not sure if it helps but we need something to tell hibernate that it should not preoptimize db calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. What do you mean by "delete the actual entity". That's what I'm doing. Unless you're proposing I do a find for the entity, then a delete but that will result in the same issue. For the skipper example you are sharing, those are all entitites, correct? Which would mean that JPA would handle it all. The TaskExecution that the TaskExecutionManifest is not a JPA entity which is why this issue arises in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With deleteTaskExecutionManifestByTaskExecutionId you're essentially creating a query which work differently if you get access to a full entity object and then try to delete by passing that into taskExecutionManifestRepository delete method. Just a theory when things might work better. It'd difficult to try different ways to make this work better as no tests fail if that flush is removed. We should not have these kind of workaround in place which we cannot test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I was able to work without explicit flush by adding explicit new propagation to TaskExecutionManifestRepository:

@Transactional(propagation = Propagation.REQUIRES_NEW)
int deleteTaskExecutionManifestByTaskExecutionId(long taskExecutionid);

See if that work for you as well!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some notes at https://docs.jboss.org/hibernate/core/3.3/reference/en/html/objectstate.html#objectstate-flushing so had an idea that if we wrap this in a new transaction hibernate then sees that boundary and flushes. It's still a hack but we'd be working with a system, not around it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I can add a test to validate the transaction rollback for the method (We don't do this anywhere else that I'm aware of and this PR should be functionally the same as the code before which is why I didn't add it).
  2. Changing the transaction propagation would be incorrect since the entire service method needs to be in a single transaction, committed or rolled back as one. The current transaction as configured should work that way as coded. Changing the propagation would prevent that I think.
  3. I don't see how adding an explicit call to flush is working around it. That documentation you point out explicitly states that a call to flush is a way to make this work.

All of this being said, this is one line of code. I brought this up in the slack channel before adding it and @markpollack seemed ok with it at the time. Any of these other methods seem like overkill for such a simple thing IMHO.

@ilayaperumalg
Copy link
Contributor

For consistency, we can rename all the places of TaskManifest to TaskExecutionManifest.

* @since 2.3
*/
@Entity
@Table(name = "TaskExecutionMetadata")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good time to change the name of the table to TaskExecutionManifest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*/
@Entity
@Table(name = "TaskExecutionMetadata")
@EntityListeners(AuditingEntityListener.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we use any of the auditing listeners. If we don't use it, we can remove @EntityListeners(AuditingEntityListener.class)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ilayaperumalg
Copy link
Contributor

LGTM

@sabbyanandan
Copy link
Contributor

We will issue a new PR in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert JdbcDataflowTaskExecutionMetadataDao to use Spring Data repositories
4 participants