From 21686cbe30e7596cb44522b537548ea436245c01 Mon Sep 17 00:00:00 2001 From: Esther Jackson <64559180+ej2432@users.noreply.github.com> Date: Wed, 15 Feb 2023 11:56:49 -0500 Subject: [PATCH] adds information about input files --- README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/README.md b/README.md index 6b87581..d5d72de 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,10 @@ A list of duplicate resources are identified from Academic Commons (AC) — the From the duplicate list of parent items, this script searches for their children (assets) from the repository. The resulting list will be exported as 2 CSV files. On Hyacinth — the backend digital object management platform — the exported list will be used to merge stats of duplicates before removal. On DataCite, the duplicates will be redirected to appropriate resources. +**Inputs:** +- 1 CSV file with the full AC corpus from Hyacinth (published and unpublished, assets and items) +- 1 CSV file with defined duplicates, with the following column titles: 'delete--DOI', 'delete--PID', 'OR Digital Object Type > String Key', 'OR Title 1 > Sort Portion', 'keep--PID', 'keep--DOI' (note: you can amend the code, if you do not have all of these column titles) + **Outputs:** - 1 CSV file for Hyacinth - 1 CSV file for DataCite @@ -24,6 +28,10 @@ From the duplicate list of parent items, this script searches for their children Adding a new part in [21] to do a child-level mapping from the duplicate asset to its equivalent keeping asset. This mapping facilitates Hyacinth to merge usage statistics before removing the duplicates. The mapping will skip any unpublished duplicates and metadata XML. +**Inputs:** +- 1 CSV file with the full AC corpus from Hyacinth (published and unpublished, assets and items) +- 1 CSV file with defined duplicates, with the following column titles: 'delete--DOI', 'delete--PID', 'OR Digital Object Type > String Key', 'OR Title 1 > Sort Portion', 'keep--PID', 'keep--DOI' (note: you can amend the code, if you do not have all of these column titles) + **Outputs:** - 1 additional CSV file for Hyacinth