DSpace metadata spring clean

The importance of clean, well-structured metadata cannot be ignored; poorly maintained metadata can affect discoverability of assets and can break search engine index. This can affect the impact of your archive and can result in less traffic and, potentially, less funding.

There are various issues with your metadata you should look out for:

Duplication

Make sure the same data is not being captured, either within the same metadata field or across multiple metadata fields.

This problem applies to metadata field names as well. For example, if you have a metadata field called dc.subject.heading which is capturing the same information as dc.subject, consider removing the non-compliant metadata field dc.subject.heading.

Synonyms and Variations

Are you capturing multiple words or phrases which mean exactly the same thing? Check your controlled vocabularies and taxonomies for uniqueness.

Language Settings

If your DSpace archive captures metadata in multiple languages, make sure the language code is set correctly. For example, if you have dc.title specified in English, Spanish and Japanese, make sure the language code is set to en, es and ja for the relevant field value.

Also make sure language codes are set correctly; we have seen repositories with language/region codes such as en+US. This is an invalid language code.

Setting language codes correctly is very important; incorrect codes can break indexing and return inconsistent results. Correct language code settings are a must in distributed, integrated systems such as JCar/JSolr because multi-lingual search relies on these settings to filter by language and to return results based on the user’s current language settings.

The Metadata Clean-up Process

DSpace provides tools for exporting metadata to a comma-delimited file (CSV) for bulk cleanup. Using DSpace’s command line too, export-metadata, you can dump your metadata, clean it up, then re-import it using import-metadata. You can also focus on particular collections or dump all metadata by specifying additional flags during the export and import processes.

Cleaning up metadata does require some server administration knowledge. KnowledgeArc currently provides metadata cleaning services across both hosted and in-house DSpace repositories. Feel free to contact us if you require metadata housekeeping.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *