Configuring
Now that is installed, it’s a good moment to look into its run-time configuration options. All configuration can be done through a configuration file, for which an example is available at .
Essential options
Option | Description |
b͡ind-address | The address to bind a TCP socket on. |
p͡ort | The port to bind a TCK socket on. |
b͡ase-url | The URL on which the instance will be available to the outside world. |
a͡llow-crawlers | Set to 1 to allow crawlers in the r͡obots.txt, otherwise set to 0. |
p͡roduction | Performs extra checks before starting. Enable this when running a production instance. |
l͡ive-reload | When set to 1, it reloads Python code on-the-fly. We recommend to set it to 0 when running in production. |
d͡ebug-mode | When set to 1, it will display backtraces and error messages in the web browser. When set to 0, it will only show backtraces and error messages in the web browser. |
u͡se-x-forwarded-for | When running d͡jehuty behind a reverse-proxy server, use the HTTP header X͡-Forwarded-For to log IP address information. |
d͡isable-collaboration | When set to 1, it disables the “collaborators” feature. |
a͡llowed-depositing-domains | When unset, any authenticated user may deposit data. Otherwise, this option limits the ability to deposit to users with an e-mail address of the listed domain names. |
c͡ache-root | d͡jehuty can cache query responses to lower the load on the database server. Specify the directory where to store cache files. This element takes an attribute c͡lear-on-start, and when set to 1, it will remove all cache files on start-up of d͡jehuty. |
p͡rofile-images-root | Users can upload a profile image in d͡jehuty. This option should point to a filesystem directory where these profile images can be stored. |
d͡isable-2fa | Accounts with privileges receive a code by e-mail as a second factor when logging in. Setting this option to 1 disables the second factor authentication. |
s͡andbox-message | Display a message on the top of every page. |
n͡otice-message | Display a message on the main page. |
m͡aintenance-mode | When set to 1, all HTTP requests result in the displayment of a maintenance message. Use this option while backing up the database, or when performing major updates. |
Configuring the Database
The d͡jehuty program stores its state in a SPARQL 1.1 compliant RDF store. Configuring the connection details is done in the r͡df-store node.
Option | Description |
s͡tate-graph | The graph name to store triplets in. |
s͡parql-uri | The URI at which the SPARQL 1.1 endpoint can be reached. When the s͡parql-uri begins with b͡db://, followed by a path to a filesystem directory, it will use the BerkeleyDB back-end, for which the Python package needs to be installed. |
s͡parql-update-uri | The URI at which the SPARQL 1.1 Update endpoint can be reached (in case it is different from the s͡parql-uri. |
Audit trails and database reconstruction
The d͡jehuty program can keep an audit log of all database modifications made by itself from which a database state can be reconstructed. Whether d͡jehuty keeps such an audit log can be configured with the following option:
Option | Description |
e͡nable-query-audit-log | When set to 1, it writes every SPARQL query that modifies the database in the web logs. This can be replayed to reconstruct the database at a later time. Setting this option to 0 disables this feature. This element takes an attribute t͡ransactions-directory that should specify an empty directory to which transactions can be written that are extracted from the audit log. |
Reconstructing the database from the query audit log
Each query that modifies the database state while the query audit logs are enabled can be extracted from the query audit log using the -͡-extract-transactions-from-log command-line option. A timestamp to specify the starting point to extract from can be specified as an argument. The following example displays its use:
djehuty web --config-file=config.xml --extract-transactions-from-log="YYYY-MM-DD HH:MM:SS"
This will create a file for each query in the folder specified in the t͡ransactions-directory attribute.
To replay the extracted transactions, use the a͡pply-transactions command-line option:
djehuty web --config-file=config.xml --apply-transactions
When a query cannot be executed, the command stops, allowing to fix or remove the query to-be-replayed. Invoking the -͡-apply-transactions command a second time will continue replaying where the previous run stopped.
Configuring storage
Storage locations can be configured with the s͡torage node. When configuring multiple locations, d͡jehuty attempts to find a file by looking at the first configured location, and in case it cannot find the file there, it will look at the second configured location, and so on, until it has tried each storage location.
This allows for moving files between storage systems transparently without requiring specific interactions with d͡jehuty other than having the files made available as a POSIX filesystem.
One use-case that suits this mechanism is letting uploads write to fast online storage and later move the uploaded files to a slower but less costly storage.
Option | Description |
l͡ocation | A filesystem path to where files are stored. This is a repeatable property. |
Configuring an identity provider
Ideally, d͡jehuty makes use of an external identity provider. d͡jehuty can use SAML2.0, ORCID, or an internal identity provider (for testing and development purposes only).
This section will outline the configuration options for each identity provision mechanism.
SAML2.0
For SAML 2.0, the configuration can be placed in the s͡aml section under a͡uthentication. That looks as following:
<authentication>
<saml version="2.0">
<!-- Configuration goes here. -->
</saml>
</authentication>
The options outlined in the remainder of this section should be placed where the example shows .
Option | Description |
s͡trict | When set to 1, SAML responses must be signed. Never disable ‘strict’ mode in a production environment. |
d͡ebug | Increases logging verbosity for SAML-related messages. |
a͡ttributes | In this section the attributes provided by the identity provider can be aligned to the attributes d͡jehuty expects. |
s͡ervice-provider | The d͡jehuty program fulfills the role of service provider. In this section the certificate and service provider metadata can be configured. |
i͡dentity-provider | In this section the certificate and single-sign-on URL of the identity provider can be configured. |
s͡ram | In this section, SURF Research Access Management-specific attributes can be configured. |
The a͡ttributes configuration
To create account and author records and to authenticate a user, d͡jehuty stores information provided by the identity provider. Each identity provider may provide this information using different attributes. Therefore, the translation from attributes used by d͡jehuty and attributes given by the identity provider can be configured. The following attributes must be configured.
Option | Description |
f͡irst-name | A user’s first name. |
l͡ast-name | A user’s last name. |
c͡ommon-name | A user’s full name. |
e͡mail | A user’s e-mail address. |
g͡roups | The attribute denoting groups. |
g͡roup-prefix | The prefix for each group short name. |
As an example, the attributes configuration for SURFConext looks like this:
<attributes>
<first-name>urn:mace:dir:attribute-def:givenName</first-name>
<last-name>urn:mace:dir:attribute-def:sn</last-name>
<common-name>urn:mace:dir:attribute-def:cn</common-name>
<email>urn:mace:dir:attribute-def:mail</email>
</attributes>
And for SURF Research Access Management (SRAM), the attributes configuration looks like this:
<attributes>
<first-name>urn:oid:2.5.4.42</first-name>
<last-name>urn:oid:2.5.4.4</last-name>
<common-name>urn:oid:2.5.4.3</common-name>
<email>urn:oid:0.9.2342.19200300.100.1.3</email>
<groups>urn:oid:1.3.6.1.4.1.5923.1.1.1.7</groups>
<group-prefix>urn:mace:surf.nl:sram:group:[organisation]:[service]</group-prefix>
</attributes>
The s͡ram configuration
When using SURF Research Access Management (SRAM), d͡jehuty can persuade SRAM to send an invitation to anyone inside or outside the institution to join the SRAM collaboration that provides access to the d͡jehuty instance. To do so, the following attributes must be configured.
Option | Description |
o͡rganization-api-token | An organization-level API token. |
c͡ollaboration-id | The UUID of the collaboration to invite users to. |
The s͡ervice-provider configuration
Option | Description |
x͡509-certificate | Contents of the public certificate without whitespacing. |
p͡rivate-key | Contents of the private key belonging to the x͡509-certificate to sign messages with. |
m͡etadata | This section contains metadata that may be displayed by the identity provider to users before authorizing them. |
d͡isplay-name | The name to be displayed by the identity provider when authorizing the user to the service. |
u͡rl | The URL to the service. |
d͡escription | Textual description of the service. |
o͡rganization | This section contains metadata to describe the organization behind the service. |
n͡ame | The name of the service provider’s organization. |
u͡rl | The URL to the web page of the organization. |
c͡ontact | A repeatable section to list contact persons and their roles within the organization. The role can be configured by setting the t͡ype attribute. |
f͡irst-name | The first name of the contact person. |
l͡ast-name | The last name of the contact person. |
e͡mail | The e-mail address of the contact person. Note that some identity providers prefer functional e-mail addresses (e.g. support@... instead of jdoe@...). |
ORCID
ORCID.org plays a key role in making researchers findable. Its identity provider service can be used by d͡jehuty in two ways:
-
As primary identity provider to log in and deposit data;
-
As additional identity provider to couple an author record to its ORCID record.
When another identity provider is configured in addition to ORCID, that identity provider will be used as primary and ORCID will only be used to couple author records to the author’s ORCID record.
To configure ORCID, the configuration can be placed in the o͡rcid section under a͡uthentication. That looks as following:
<authentication>
<orcid>
<!-- Configuration goes here. -->
</orcid>
</authentication>
Then the following parameters can be configured:
Option | Description |
c͡lient-id | The client ID provided by ORCID. |
c͡lient-secret | The client secret provided by ORCID. |
e͡ndpoint | The URL to the ORCID endpoint to use. |
Configuring an e-mail server
On various occassions, d͡jehuty will attempt to send an e-mail to either an author, a reviewer or an administrator. To be able to do so, an e-mail server must be configured from which the instance may send e-mails.
The configuration is done under the e͡mail node, and the following items can be configured:
Option | Description |
s͡erver | Address of the e-mail server without protocol specification. |
p͡ort | The port the e-mail server operates on. |
s͡tarttls | When 1, d͡jehuty attempts to use StartTLS. |
u͡sername | The username to authenticate with to the e-mail server. |
p͡assword | The password to authenticate with to the e-mail server. |
f͡rom | The e-mail address used to send e-mail from. |
s͡ubject-prefix | Text to prefix in the subject of all e-mails sent from the instance of d͡jehuty. This can be used to distinguish a test instance from a production instance. |
Configuring DOI registration
When publishing a dataset or collection, d͡jehuty can register a persistent identifier with DataCite. To enable this feature, configure it under the d͡atacite node. The following parameters can be configured:
Option | Description |
a͡pi-url | The URL of the API endpoint of DataCite. |
r͡epository-id | The repository identifier given by DataCite. |
p͡assword | The password to authenticate with to DataCite. |
p͡refix | The DOI prefix to use when registering a DOI. |
Configuring Handle registration
Each uploaded file can be assigned a persistent identifier using the Handle system. To enable this feature, configure it under the h͡andle node. The following parameters can be configured:
Option | Description |
u͡rl | The URL of the API endpoint of the Handle system implementor. |
c͡ertificate | Certificate to use for authenticating to the endpoint. |
p͡rivate-key | The private key paired with the certificate used to authenticate to the endpoint. |
p͡refix | The Handle prefix to use when registering a handle. |
i͡ndex | The index to use when registering a handle. |
Configuring IIIF support
When publishing images, d͡jehuty can enable the IIIF Image API for the images. It uses l͡ibvips and p͡yvips under the hood to perform image manipulation. The following parameters can be configured:
Option | Description |
e͡nable-iiif | Enable support for the IIIF image API. This requires the p͡yvips package to be available in the run-time environment. |
i͡iif-cache-root | The directory to store the output of IIIF Image API requests to avoid re-computing the image. |
Customizing looks
With the following options, the instance can be branded as necessary.
Option | Description |
s͡ite-name | Name for the instance used in the title of a browser window and as default value in the publisher field for new datasets. |
s͡ite-description | Description used as a meta-tag in the HTML output. |
s͡ite-shorttag | Used as keyword and as Git remote name. |
s͡upport-email-address | E-mail address used in e-mails sent to users in automated messages. |
c͡ustom-logo-path | Path to a PNG image file that will be used as logo on the website. |
c͡ustom-favicon-path | Path to an ICO file that will be used as favicon. |
s͡mall-footer | HTML that will be used as footer for all pages except for the main page. |
l͡arge-footer | HTML that will be used as footer on the main page. |
s͡how-portal-summary | When set to 1, it shows the repository summary of number of datasets, authors, collections, files and bytes on the main page. |
s͡how-institutions | When set to 1, it shows the list of institutions on the main page. |
s͡how-science-categories | When set to 1, it shows the subjects (categories) on the main page. |
s͡how-latest-datasets | When set to 1, it shows the list of latest published datasets on the main page. |
c͡olors | Colors used in the HTML output. See section <a href="#sec:customize-colors" data-reference-type="ref" |
data-reference="sec:customize-colors">1.10.1. |
Customizing colors
The following options can be configured in the c͡olors section.
Option | Description |
p͡rimary-color | The main background color to use. |
p͡rimary-foreground-color | The main foreground color to use. |
p͡rimary-color-hover | Color to use when hovering a link. |
p͡rimary-color-active | Color to use when a link is clicked. |
p͡rivilege-button-color | The background color of buttons for privileged actions. |
f͡ooter-background-color | Color to use in the footer. |
Configuring privileged users
By default an authenticated user may deposit data. But users can have additional roles; for example: a dataset reviewer, a technical administrator or a quota reviewer.
Such additional roles are configured in terms of privileges. The following privileges can be configured in the p͡rivileges section:
Option | Description |
m͡ay-administer | Allows access to perform maintenance tasks, view accounts and view reports on restricted and embargoed datasets. |
m͡ay-run-sparql-queries | Allows to run arbitrary SPARQL queries on the database. |
m͡ay-impersonate | Allows to log in to any account and therefore perform any action as that account. |
m͡ay-review | Allows to see which datasets are sent for review, and allows to perform reviews. |
m͡ay-review-quotas | Allows access to see requests for storage quota increases and approve or decline them. |
m͡ay-review-integrity | Allows access to an API call that provides statistics on the accessibility of files on the filesystem. |
m͡ay-process-feedback | Accounts with this privilege will receive e-mails with the information entered into the feedback form by other users. |
m͡ay-receive-email-notifications | This “privilege” can be used to disable sending any e-mails to an account by setting it to 0͡. The default is 1͡. |
To enable a privilege for an account, set the value of the desired privilege to 1͡. Privileges are disabled by default, except for m͡ay-receive-email-notifications which defaults to 1͡.
<privileges>
<account email="you@example.com" orcid="0000-0000-0000-0001">
<may-administer>1</may-administer>
<may-run-sparql-queries>1</may-run-sparql-queries>
<may-impersonate>1</may-impersonate>
<may-review>0</may-review>
<may-review-quotas>0</may-review-quotas>
<may-review-integrity>0</may-review-integrity>
<may-process-feedback>0</may-process-feedback>
<may-receive-email-notifications>1</may-receive-email-notifications>
</account>
</privileges>