Distributed Searching

Note

  • in CSW mode, distributed search must be configured against remote CSW services

  • in OGC API - Records mode, distributed search must be configured against remote OGC API - Records services

Note

Your server must be able to make outgoing HTTP requests for this functionality.

CSW 2 / 3

pycsw has the ability to perform distributed searching against other CSW servers. Distributed searching is disabled by default; to enable, distributedsearch must be set. A CSW client must issue a GetRecords request with csw:DistributedSearch specified, along with an optional hopCount attribute (see subclause 10.8.4.13 of the CSW specification). When enabled, pycsw will search all specified catalogues and return a unified set of search results to the client. Due to the distributed nature of this functionality, requests will take extra time to process compared to queries against the local repository.

OGC API - Records

Experimental support for distibuted searching is available in pycsw’s OGC API - Records support to allow for searching remote services. The implementation uses the same approach as described above, operating in OGC API - Records mode as per OGC API - Records - Part 4: Federated Search (draft).

Note

The distributedsearch.catalogues directives must point to an OGC API - Records collections endpoint.

distributedsearch
    catalogues:
        - id: fedcat01
          type: OARec
          title: Federated catalogue 1
          url: https://example.org/collections/collection1
        - id: fedcat02
          type: OARec
          title: Federated catalogue 2
          url: https://example.org/collections/collection2

With the above configured, a distributed search can be invoked as follows:

http://localhost/collections/metadata:main/items?distributedSearch=true

Merging results

When distributedsearch.merge_results exists and is set to true, pycsw will merge all results in federatedSearchResults. To prevent identifier collision, merged federated search results will have identifiers prefixed by their catalogue id (as defined in distributedsearch.catalogues[*].id. In addition, a federatedCatalogueId property is added to the feature with the catalogue id.

STAC API

Experimental support for distibuted searching is available in pycsw’s STAC API support to allow for searching remote services. The implementation uses the same approach as described above.

Note

The distributedsearch.catalogues directives must point to a STAC API endpoint.

distributedsearch:
    catalogues:
        - id: fedcat03
          type: STAC-API
          title: Copernicus Data Space Ecosystem (CDSE) asset-level STAC catalogue
          url: https://stac.dataspace.copernicus.eu/v1
          collections:
              - daymet-annual-pr

Note

To constrain STAC API distributed search to specific collections, define one to many in the collections (array) directive.

With the above configured, a distributed search can be invoked as follows:

http://localhost/stac/search?distributedSearch=true

Merging results

Merging behaviour is implemented in the same manner as OGC API - Records support.