Skip to content

Wire preservation workflow to Archival Packaging Tool (APT) #1465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jazairi
Copy link
Contributor

@jazairi jazairi commented Jul 11, 2025

Why these changes are being introduced:

DataEng has developed APT as middleware between ETD and Archivematica. This new application handles the BagIt logic, including creating bags in an S3 bucket connected to Archivematica. Thus, much of the SIP logic in ETD is no longer required.

Relevant ticket(s):

How this addresses that need:

This adds an Archivematica Payload model that effectively replaces the SIP model. The new model constructs the payload JSON expected by APT. Instantations of the model generate and persist this JSON on create, along with the metadata CSV as an ActiveStorage attachment.

The other significant change is in the Preservation Submission Job. Previously, this job invoked the Submission Information Package Zipper model to stream a serialized bag to S3. Now, it's responsible for POSTing the JSON data to APT and handling the response.

Side effects of this change:

  • The tests that call APT use webmock and stubbed responses. We would normally use VCR for external API calls, but in this case it doesn't seem prudent to pollute the APT S3 bucket, as it's possible the current test bucket will become the bucket we use.
  • The SIP model is retained for historical purposes. This is not ideal in terms of maintainability, but it feels important to retain that data, at least for the time being.

Developer

  • All new ENV is documented in README
  • All new ENV has been added to Heroku Pipeline, Staging and Prod APT bucket temporarily set to ETD bucket until infrastructure is finalized
  • ANDI or Wave has been run in accordance to
    our guide and
    all issues introduced by these changes have been resolved or opened as new
    issues (link to those issues in the Pull Request details above) no UI changes
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated or is unnecessary
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

Requires database migrations?

YES

Includes new or updated dependencies?

YES

Why these changes are being introduced:

DataEng has developed [APT](https://github.com/MITLibraries/archival-packaging-tool/)
as middleware between ETD and Archivematica. This new application
handles the BagIt logic, including creating bags in an S3 bucket
connected to Archivematica. Thus, much of the SIP logic in ETD is no
longer required.

Relevant ticket(s):

* [ETD-669](https://mitlibraries.atlassian.net/browse/ETD-669)

How this addresses that need:

This adds an Archivematica Payload model that effectively replaces
the SIP model. The new model constructs the payload JSON expected
by APT. Instantations of the model generate and persist this JSON
on create, along with the metadata CSV as an ActiveStorage
attachment.

The other significant change is in the Preservation Submission Job.
Previously, this job invoked the Submission Information Package
Zipper model to stream a serialized bag to S3. Now, it's
responsible for POSTing the JSON data to APT and handling the
response.

Side effects of this change:

* The tests that call APT use webmock and stubbed responses. We
would normally use VCR for external API calls, but in this case
it doesn't seem prudent to pollute the APT S3 bucket, as it's
possible the current test bucket will become the bucket we use.
* The SIP model is retained for historical purposes. This is not
ideal in terms of maintainability, but it feels important to
retain that data, at least for the time being.
@jazairi jazairi force-pushed the etd-669-apt-integration branch from a83339c to 8a8ed01 Compare July 11, 2025 21:56
@JPrevost JPrevost self-assigned this Jul 14, 2025
Copy link
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few comments. I'm not sure any require change but wanted to submit my initial thoughts so you can decide if you want to make any changes before we do a test in dev1 APT.

@jazairi jazairi requested a review from JPrevost July 15, 2025 17:53
Copy link
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest changes look good.

Let's figure out how to test this in Dev1 to confirm it works as expected while CB is on vacation so when he is back we are ready to merge/promote.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants