-
Notifications
You must be signed in to change notification settings - Fork 4
Wire preservation workflow to Archival Packaging Tool (APT) #1465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jazairi
wants to merge
3
commits into
main
Choose a base branch
from
etd-669-apt-integration
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+388
−137
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
# == Schema Information | ||
# | ||
# Table name: archivematica_payloads | ||
# | ||
# id :integer not null, primary key | ||
# preservation_status :integer default("unpreserved"), not null | ||
# payload_json :text | ||
# preserved_at :datetime | ||
# thesis_id :integer not null | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
# This class assembles a payload to send to the Archival Packaging Tool (APT), which then creates a bag for | ||
# preservation. It includes the thesis files, metadata, and checksums. The payload is then serialized to JSON | ||
# for transmission. | ||
# | ||
# Instances of this class are invalid without an associated thesis that has a DSpace handle, a copyright, and | ||
# at least one attached file with no duplicate filenames. | ||
# | ||
# There is some intentional duplication between this and the SubmissionInformationPackage model. The | ||
# SubmissionInformationPackage is the legacy model that was used to create the bag, but it is not | ||
# used in the current APT workflow. We are retaining it for historical purposes. | ||
class ArchivematicaPayload < ApplicationRecord | ||
include Checksums | ||
include Baggable | ||
|
||
has_paper_trail | ||
belongs_to :thesis | ||
has_one_attached :metadata_csv | ||
|
||
validates :baggable?, presence: true | ||
|
||
before_create :set_metadata_csv, :set_payload_json | ||
|
||
enum preservation_status: %i[unpreserved preserved] | ||
|
||
private | ||
|
||
# compress_zip is cast to a boolean to override the string value from ENV. APT strictly requires | ||
# a boolean for this field. | ||
def build_payload | ||
{ | ||
action: 'create-bagit-zip', | ||
challenge_secret: ENV.fetch('APT_CHALLENGE_SECRET', nil), | ||
verbose: ActiveModel::Type::Boolean.new.cast(ENV.fetch('APT_VERBOSE', false)), | ||
input_files: build_input_files, | ||
checksums_to_generate: ENV.fetch('APT_CHECKSUMS_TO_GENERATE', ['md5']), | ||
output_zip_s3_uri: bag_output_uri, | ||
compress_zip: ActiveModel::Type::Boolean.new.cast(ENV.fetch('APT_COMPRESS_ZIP', true)) | ||
} | ||
end | ||
|
||
# Build input_files array from thesis files and attached metadata CSV | ||
def build_input_files | ||
files = thesis.files.map { |file| build_file_entry(file) } | ||
files << build_file_entry(metadata_csv) # Metadata CSV is the only file that is generated in this model | ||
files | ||
end | ||
|
||
# Build a file entry for each file, including the metadata CSV. | ||
def build_file_entry(file) | ||
{ | ||
uri: ["s3://#{ENV.fetch('AWS_S3_BUCKET')}", file.blob.key].join('/'), | ||
filepath: set_filepath(file), | ||
checksums: { | ||
md5: base64_to_hex(file.blob.checksum) | ||
} | ||
} | ||
end | ||
|
||
def set_filepath(file) | ||
file == metadata_csv ? 'metadata/metadata.csv' : file.filename.to_s | ||
end | ||
|
||
# The bag_name has to be unique due to our using it as the basis of an ActiveStorage key. Using a UUID | ||
# was not preferred as the target system of these bags adds it's own UUID to the file when it arrives there | ||
# so the filename was unwieldy with two UUIDs embedded in it so we simply increment integers. | ||
def bag_name | ||
safe_handle = thesis.dspace_handle.gsub('/', '_') | ||
"#{safe_handle}-thesis-#{thesis.submission_information_packages.count + 1}" | ||
end | ||
|
||
# The bag_output_uri key is constructed to match the expected format for Archivematica. | ||
def bag_output_uri | ||
key = "etdsip/#{thesis.graduation_year}/#{thesis.graduation_month}-#{thesis.accession_number}/#{bag_name}.zip" | ||
[ENV.fetch('APT_S3_BUCKET'), key].join('/') | ||
end | ||
|
||
def baggable? | ||
baggable_thesis?(thesis) | ||
end | ||
|
||
def set_metadata_csv | ||
csv_data = ArchivematicaMetadata.new(thesis).to_csv | ||
metadata_csv.attach(io: StringIO.new(csv_data), filename: 'metadata.csv', content_type: 'text/csv') | ||
end | ||
|
||
def set_payload_json | ||
self.payload_json = build_payload.to_json | ||
end | ||
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
13 changes: 13 additions & 0 deletions
13
db/migrate/20250624182142_create_archivematica_payloads.rb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
class CreateArchivematicaPayloads < ActiveRecord::Migration[7.1] | ||
def change | ||
create_table :archivematica_payloads do |t| | ||
t.integer :preservation_status, null: false, default: 0 | ||
t.text :payload_json | ||
t.datetime :preserved_at | ||
|
||
t.references :thesis, null: false, foreign_key: true | ||
|
||
t.timestamps | ||
end | ||
end | ||
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.