A Metatadata Application Profile for Data Repository Resources


This work was my final project for IST 681 Metadata, a course offered jointly by the Universities of British Columbia (Canada) and Syracus (United States). My project focused on creating appropriate metadata standards for open source data repository resources in order to improve their discovery, access and retrieval. Data repositories offer an ideal way for researches to publish and share data they create with others. This is essential for ensuring that such data can be preserved and that others can easily find them. Data repositories often do this by archiving submitted data, assigning them a digital object identifier (DOI), and providing a web content that carefully describe each data object including what they are about, how to cite them, or the number of times they are cited or downloaded.

Despite the proliferation of open source repositories for facilitating data exchange, discovery and reuse, not enough attempts have been made to create core metadata standards to aid in their description. Creating such standard could be important for many reasons, the primary being that it would greatly enhance the finding, selection and retrieval of datasets by users based on their specific topics, needs or interests. A related reason could also be that it would enhanced transparency and trust in scientific knowledge production as other scholars would be able to easily discover and work on datasets submitted by their peers, and by so doing, validate and reproduce their findings and thus extend the pace of scientific discovery by building on top of their colleagues' efforts. One likely use case for such a metadata standard could be that a library - most probably academic or a similar research institution - could include metadata records of such repositories in their catalogs for members - researchers, students and similar users - to find, retrieve and resuse.

Thus, given the level of interest in research data exchange and resuse, and the emergence of data repositories as important tools for facilitating these activities, implementing core standards for describing the resources of such repositories has become crucial. This project therefore looked at uncovering the most salient metadata elements for data repository resources that can best enhance their discoverability. I begun with a domain analysis of users and use case scenarios that may be found in the next section.