Configuration

pycsw’s runtime configuration is defined by default.cfg. pycsw ships with a sample configuration (default-sample.cfg). Copy the file to default.cfg and edit the following:

[server]

  • home: the full filesystem path to pycsw
  • url: the URL of the resulting service
  • mimetype: the MIME type when returning HTTP responses
  • language: the ISO 639-1 language and ISO 3166-1 alpha2 country code of the service (e.g. en-CA, fr-CA, en-US)
  • encoding: the content type encoding (e.g. ISO-8859-1, see https://docs.python.org/2/library/codecs.html#standard-encodings). Default value is ‘UTF-8’
  • maxrecords: the maximum number of records to return by default. This value is enforced if a CSW’s client’s maxRecords parameter is greater than server.maxrecords to limit capacity. See MaxRecords Handling for more information
  • loglevel: the logging level (see https://docs.python.org/library/logging.html#logging-levels)
  • logfile: the full file path to the logfile
  • ogc_schemas_base: base URL of OGC XML schemas tree file structure (default is http://schemas.opengis.net)
  • federatedcatalogues: comma delimited list of CSW endpoints to be used for distributed searching, if requested by the client (see Distributed Searching)
  • pretty_print: whether to pretty print the output (true or false). Default is false
  • gzip_compresslevel: gzip compression level, lowest is 1, highest is 9. Default is off. NOTE: if gzip compression is already enabled via your web server, do not enable this directive (or else the server will try to compress the response twice, resulting in degraded performance)
  • domainquerytype: for GetDomain operations, how to output domain values. Accepted values are list and range (min/max). Default is list
  • domaincounts: for GetDomain operations, whether to provide frequency counts for values. Accepted values are true and False. Default is false
  • profiles: comma delimited list of profiles to load at runtime (default is none). See Profile Plugins
  • smtp_host: SMTP host for processing csw:ResponseHandler parameter via outgoing email requests (default is localhost)
  • spatial_ranking: parameter that enables (true or false) ranking of spatial query results as per K.J. Lanfear 2006 - A Spatial Overlay Ranking Method for a Geospatial Search of Text Objects.
  • workers: set the number of workers used by the wsgi server when lunching pycsw using the provided docker/entrypoint.py. If not set, it will use 2 workers as Default.

[manager]

  • transactions: whether to enable transactions (true or false). Default is false (see Transactions)
  • allowed_ips: comma delimited list of IP addresses (e.g. 192.168.0.103), wildcards (e.g. 192.168.0.*) or CIDR notations (e.g. 192.168.100.0/24) allowed to perform transactions (see Transactions)
  • csw_harvest_pagesize: when harvesting other CSW servers, the number of records per request to page by (default is 10)

[metadata:main]

  • identification_title: the title of the service
  • identification_abstract: some descriptive text about the service
  • identification_keywords: comma delimited list of keywords about the service
  • identification_keywords_type: keyword type as per the ISO 19115 MD_KeywordTypeCode codelist). Accepted values are discipline, temporal, place, theme, stratum
  • identification_fees: fees associated with the service
  • identification_accessconstraints: access constraints associated with the service
  • provider_name: the name of the service provider
  • provider_url: the URL of the service provider
  • contact_name: the name of the provider contact
  • contact_position: the position title of the provider contact
  • contact_address: the address of the provider contact
  • contact_city: the city of the provider contact
  • contact_stateorprovince: the province or territory of the provider contact
  • contact_postalcode: the postal code of the provider contact
  • contact_country: the country of the provider contact
  • contact_phone: the phone number of the provider contact
  • contact_fax: the facsimile number of the provider contact
  • contact_email: the email address of the provider contact
  • contact_url: the URL to more information about the provider contact
  • contact_hours: the hours of service to contact the provider
  • contact_instructions: the how to contact the provider contact
  • contact_role: the role of the provider contact as per the ISO 19115 CI_RoleCode codelist). Accepted values are author, processor, publisher, custodian, pointOfContact, distributor, user, resourceProvider, originator, owner, principalInvestigator

[repository]

Note

See Administration for connecting your metadata repository and supported information models.

MaxRecords Handling

The The following describes how maxRecords is handled by the configuration when handling OARec items or CSW GetRecords requests:

server.maxrecords OARec limit/CSW GetRecords.maxRecords Result
none set none passed 10 (CSW default)
20 14 20
20 none passed 20
none set 100 100
20 200 20

Using environment variables in configuration files

pycsw configuration supports using system environment variables, which can be helpful for deploying into 12 factor environments for example.

Below is an example of how to integrate system environment variables in pycsw:

[repository]
database=${PYCSW_REPOSITORY_DATABASE_URI}
table=${MY_TABLE}

Alternate Configurations

By default, pycsw loads default.cfg at runtime. To load an alternate configuration, modify csw.py to point to the desired configuration. Alternatively, pycsw supports explicitly specifiying a configuration by appending config=/path/to/default.cfg to the base URL of the service (e.g. http://localhost/pycsw/csw.py?config=tests/suites/default/default.cfg&service=CSW&version=2.0.2&request=GetCapabilities). When the config parameter is passed by a CSW client, pycsw will override the default configuration location and subsequent settings with those of the specified configuration.

This also provides the functionality to deploy numerous CSW servers with a single pycsw installation.

Hiding the Location

Some deployments with alternate configurations prefer not to advertise the base URL with the config= approach. In this case, there are many options to advertise the base URL.

Environment Variables

pycsw supports the following environment variables:

  • PYCSW_CONFIG: specifies the filepath to a pycsw configuraiton

Configuration file location

One option is using Apache’s Alias and SetEnvIf directives. For example, given the base URL http://localhost/pycsw/csw.py?config=foo.cfg, set the following in your Apache configuration:

Alias /pycsw/csw-foo.py /var/www/pycsw/csw.py
SetEnvIf Request_URI "/pycsw/csw-foo.py" PYCSW_CONFIG=/var/www/pycsw/csw-foo.cfg

Note

Apache must be restarted after changes to configuration

pycsw will use the configuration as set in the PYCSW_CONFIG environment variable in the same manner as if it was specified in the base URL. Note that the configuration value server.url value must match the Request_URI value so as to advertise correctly in pycsw’s Capabilities XML.

Wrapper Script

Another option is to write a simple wrapper (e.g. csw-foo.sh), which provides the same functionality and can be deployed without restarting Apache:

#!/bin/sh

export PYCSW_CONFIG=/var/www/pycsw/csw-foo.cfg

/var/www/pycsw/csw.py