Data Loaders

PASS Data Loaders

The PASS Data Loaders comprise three components in the Pass Support repository: Pass Journal Loader, Pass NIHMS Loader, Pass Grant Loader. These three components are responsible for loading data from external sources into PASS.

Summary

All three loaders are Java JAR command line applications and can be run from any platform that can run Java applications. These applications utilize system properties and can be configured to run with different parameters.

The Grant Loader is designed to automate the ingestion and processing of grant data from various sources into PASS. It handles the loading of grants using predefined configurations and mappings, ensuring the correct representation and association of grant data. Since there are predefined fields that are required to represent a grant and institutions may have varying representations, specific implementations may be needed in order to accommodate other institutions. The design of the Grant Loader is flexible in that it can accommodate development of connectors to varying data sources.

The Journal Loader facilitates the automated loading of journal data, particularly from sources such as PubMed and PubMed Central. The Journal Loader is responsible for streamlining the process of updating and maintaining journal entries in PASS.

The NIHMS Loader specifically targets the loading and transformation of manuscript submission data from the NLM’s Public Access Compliance Monitor (PACM) into PASS. This enables publications in PASS to be updated appropriately with their publication information that is in PubMed Central. The PACM user guide explains the background of the PACM system and its data. It will assist in setting up the appropriate accounts in order to access the PACM system and API.

Technologies Utilized

  • Docker for running and testing the applications.

  • Java 17+ for the application development.

  • Spring Boot for the application framework.

The following resources cover external resources for data extraction and the infrastructure used to run the Data Loaders:

Last updated