Skip to content

Add SRU opener / open-sru #510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TobiasNx opened this issue Nov 23, 2023 · 15 comments · May be fixed by #682
Open

Add SRU opener / open-sru #510

TobiasNx opened this issue Nov 23, 2023 · 15 comments · May be fixed by #682
Assignees

Comments

@TobiasNx
Copy link
Contributor

TobiasNx commented Nov 23, 2023

Like OAI PMH opener, metafacture should be able to retrieve all records of a SRU request. SRU is an important library standard like OAI PMH: See https://www.loc.gov/standards/sru/index.html


Besides being an important library standard.

We have two use-cases:


A new modul should be configurable with the specific SRU keywords.

The opener should be similarl to gist.github.com/jorol/989ea5dd464fb85bd99b4733710d2890#oai-pmh and https://github.com/LibreCat/Catmandu-SRU/tree/dev

In Flux something like:

| open-sru(base="https://services.dnb.de/sru/zdb", recordSchema="MARC21-xml", query="tit = soil biology") [We do not need the extra parser since we are parsing it in the next step of a flux workflow]


Possible other Java sources/repositories that could help develop an SRU opener for Metafacture:
https://github.com/HSG-Library/alma-sru-client
https://github.com/opacapp/opacclient/blob/master/opacclient/libopac/src/main/java/de/geeksfactory/opacclient/apis/SRU.java
https://github.com/indexdata/yaz4j
https://github.com/rism-international/sru-downloader
https://github.com/indexdata/cql-java
https://www.loc.gov/standards/sru/resources/products.html
https://github.com/kitodo/kitodo-production/blob/main/Kitodo/src/main/java/org/kitodo/production/services/data/ImportService.java

Inspiration from a commandline tool for sru https://github.com/ubleipzig/srufetch

@TobiasNx

This comment was marked as duplicate.

@TobiasNx

This comment was marked as duplicate.

@TobiasNx

This comment was marked as duplicate.

@dr0i dr0i moved this to Selected in Metafacture Nov 18, 2024
@TobiasNx

This comment was marked as duplicate.

@TobiasNx

This comment was marked as duplicate.

@TobiasNx

This comment was marked as duplicate.

@TobiasNx TobiasNx changed the title Add SRU opener for metafacture Add SRU opener /open-sru Nov 19, 2024
@TobiasNx TobiasNx changed the title Add SRU opener /open-sru Add SRU opener / open-sru Nov 19, 2024
@TobiasNx
Copy link
Contributor Author

I cleaned up to comments to make this issue more compact.

@TobiasNx
Copy link
Contributor Author

TobiasNx commented Feb 17, 2025

Now I have the first usecase:

I want to fetch all records from this SRU for records with query dnb.isil=DE-SOl1:

https://services.dnb.de/sru/zdb?version=1.1&operation=searchRetrieve&query=dnb.isil%3DDE-Sol1&recordSchema=MARC21plus-xml

I need this here to create a productive workflow that collects all records: hbz/lobid-extra-holdings#8

@TobiasNx
Copy link
Contributor Author

TobiasNx commented Mar 18, 2025

Parameters:

general:

userAgent to provide credentials

sru specific:

query
total
maximumRecords (Number of records per request)
recordSchema

operation (decault operation=searchRetrieve)
version
startRecord

Optional?
sortKey

dr0i added a commit that referenced this issue Mar 28, 2025
@dr0i dr0i linked a pull request Mar 28, 2025 that will close this issue
@TobiasNx
Copy link
Contributor Author

@dr0i why did you assigne me here? gave my +1 here #682

@dr0i
Copy link
Member

dr0i commented Mar 31, 2025

I've assigend you because in our meeting today you asked why/if the SruOpener behaves different to OaipmhOpener when piping records to a file writer, because the XML becomes invalid (header/footer). You want to look if this is so and why and if we want to change the behaviour. (Feel free to discuss this at #682 ).

@TobiasNx
Copy link
Contributor Author

TobiasNx commented Mar 31, 2025

https://github.com/TobiasNx/metafacture_workflows/tree/master/compareOaiAndSru: It seems that the oaiPmh only outputs the header <?xml version="1.0" encoding="UTF-8"?> while sru multiple times. Also oai provides a encapsulating <harvest> element, which sru does not.

The outputted file of oai can be processed. The file of sru breaks like this:

Exception in thread "main" org.metafacture.framework.MetafactureException: org.xml.sax.SAXParseException; lineNumber: 11562; columnNumber: 6; Verarbeitungsanweisungsziel, das "[xX][mM][lL]" entspricht, ist nicht zulässig.
        at org.metafacture.xml.XmlDecoder.process(XmlDecoder.java:92)
        at org.metafacture.xml.XmlDecoder.process(XmlDecoder.java:44)
        at org.metafacture.io.FileOpener.process(FileOpener.java:158)
        at org.metafacture.io.FileOpener.process(FileOpener.java:41)
        at org.metafacture.flux.parser.StringSender.process(StringSender.java:43)
        at org.metafacture.flux.parser.Flow.start(Flow.java:118)
        at org.metafacture.flux.parser.FluxProgramm.start(FluxProgramm.java:168)
        at org.metafacture.runner.Flux.main(Flux.java:87)
Caused by: org.xml.sax.SAXParseException; lineNumber: 11562; columnNumber: 6; Verarbeitungsanweisungsziel, das "[xX][mM][lL]" entspricht, ist nicht zulässig.
        at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1251)
        at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:637)
        at org.metafacture.xml.XmlDecoder.process(XmlDecoder.java:89)
        ... 7 more

But interestingly at least something is written in the yaml file. Probably because of the streaming.

@TobiasNx TobiasNx removed their assignment Mar 31, 2025
@dr0i
Copy link
Member

dr0i commented Apr 1, 2025

OaipmhOpener serves one XML while SruOpener pages through the answers and serves multiple XML documents. I am gonna adapt the latter to also provide just one XML by enusring one <?xml version="1.0" encoding="UTF-8"?> and encapsulating everything in one <searchRetrieveResponse ....

@dr0i
Copy link
Member

dr0i commented Apr 25, 2025

After discussion: structure of an SRU lookup, e.g. https://services.dnb.de/sru/zdb?query=dnb.isil%3DDE-Sol1&operation=searchRetrieve&recordSchema=MARC21plus-xml&version=1.1&maximumRecords=1&startRecord=4 with set total (total is not SRU specific IMO , although it's stated here as SRU specific) to 2 (i.e. "make two lookups: retrieve 1 document per lookup, so long as we don't have 2 documents, starting from the 4th document of the hits of the query "), however small or big for other examples, would look like this:

<?xml version="1.0" encoding="UTF-8"?>
<harvest>
  <searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/">
    <version>1.1</version>
    <numberOfRecords>3031</numberOfRecords>
    <records>
      <record>
        <recordSchema>MARC21plus-xml</recordSchema>
        <recordPacking>xml</recordPacking>
        <recordData>
          <collection xmlns="http://www.loc.gov/MARC21/slim">
            <record type="Bibliographic">
              <leader>00000nas a2200000 c 4500</leader>
              <controlfield tag="001">011156392</controlfield>
...
            </record>
          </collection>
        </recordData>
        <recordPosition>4</recordPosition>
      </record>
    </records>
    <nextRecordPosition>5</nextRecordPosition>
    <echoedSearchRetrieveRequest>
      <version>1.1</version>
      <query>dnb.isil=DE-Sol1</query>
      <xQuery xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
      <startRecord>4</startRecord>
      <maximumRecords>1</maximumRecords>
      <recordSchema>MARC21plus-xml</recordSchema>
    </echoedSearchRetrieveRequest>
  </searchRetrieveResponse>
  <searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/">
    <version>1.1</version>
    <numberOfRecords>3031</numberOfRecords>
    <records>   
      <record>
        <recordSchema>MARC21plus-xml</recordSchema>
        <recordPacking>xml</recordPacking>
        <recordData>
          <collection xmlns="http://www.loc.gov/MARC21/slim">
            <record type="Bibliographic">
              <leader>00000nas a2200000 c 4500</leader>
              <controlfield tag="001">011159960</controlfield>
...
              <datafield ind1=" " ind2=" " tag="933">
                <subfield code="a">CC0</subfield>
              </datafield>
            </record>
          </collection>
        </recordData>
        <recordPosition>5</recordPosition>
      </record>
    </records>
    <nextRecordPosition>6</nextRecordPosition>
    <echoedSearchRetrieveRequest>
      <version>1.1</version>
      <query>dnb.isil=DE-Sol1</query>
      <xQuery xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
      <startRecord>5</startRecord>
      <maximumRecords>1</maximumRecords>
      <recordSchema>MARC21plus-xml</recordSchema>
    </echoedSearchRetrieveRequest>
  </searchRetrieveResponse>
</harvest>

i.e. it's valid XML (by encapsulating everything with the harvest tag like in OAI-PMH and ensure to have only one XML declaration) . Be aware that the SRU specific XML tags can exist multiple times , because they are not altered. They result per lookup (multiple lookups are needed when paging comes into play).

dr0i added a commit that referenced this issue Apr 25, 2025
@dr0i dr0i assigned TobiasNx and unassigned dr0i Apr 28, 2025
@TobiasNx
Copy link
Contributor Author

This looks good in my opinion, as long as it is documented. +1

@TobiasNx TobiasNx assigned dr0i and unassigned TobiasNx Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Working
Development

Successfully merging a pull request may close this issue.

2 participants