Skip to content

adds information about input files #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ A list of duplicate resources are identified from Academic Commons (AC) — the

From the duplicate list of parent items, this script searches for their children (assets) from the repository. The resulting list will be exported as 2 CSV files. On Hyacinth — the backend digital object management platform — the exported list will be used to merge stats of duplicates before removal. On DataCite, the duplicates will be redirected to appropriate resources.

**Inputs:**
- 1 CSV file with the full AC corpus from Hyacinth (published and unpublished, assets and items)
- 1 CSV file with defined duplicates, with the following column titles: 'delete--DOI', 'delete--PID', 'OR Digital Object Type > String Key', 'OR Title 1 > Sort Portion', 'keep--PID', 'keep--DOI' (note: you can amend the code, if you do not have all of these column titles)

**Outputs:**
- 1 CSV file for Hyacinth
- 1 CSV file for DataCite
Expand All @@ -24,6 +28,10 @@ From the duplicate list of parent items, this script searches for their children

Adding a new part in [21] to do a child-level mapping from the duplicate asset to its equivalent keeping asset. This mapping facilitates Hyacinth to merge usage statistics before removing the duplicates. The mapping will skip any unpublished duplicates and metadata XML.

**Inputs:**
- 1 CSV file with the full AC corpus from Hyacinth (published and unpublished, assets and items)
- 1 CSV file with defined duplicates, with the following column titles: 'delete--DOI', 'delete--PID', 'OR Digital Object Type > String Key', 'OR Title 1 > Sort Portion', 'keep--PID', 'keep--DOI' (note: you can amend the code, if you do not have all of these column titles)

**Outputs:**
- 1 additional CSV file for Hyacinth

Expand Down