Grant Loader
The Grant Loader ingests grant data from an institution and maps the data to the appropriate PASS Objects in the PASS Data Model.
Summary
This module comprises code for retrieving grant data from some kind of data source, and using that data to update the PASS backend. Typically, the data pull will populate a data structure, which will then be consumed by a loader class. While this sounds simple in theory, there are several considerations which may add to the complexity of implementations. An implementor must first determine the data requirements for PASS and then map these requirements to data available from the data source. Additional data from other services may be needed to populate the data structures to be loaded into PASS. For data loading, the implementor may need to support different modes of ingesting data. Additional logic may be needed in the data loading process to resolve the fields in the data assembled by the pull process. For example, several systems may be updating PASS objects, and that other services may be more authoritative for certain fields than the service providing the grant data. The JHU implementation is complex regarding these issues.
Knowledge Needed / Skills Inventory
Development of the Grant Loader
Programming in Java
Basic understanding of your institution's grant data
Running the Grant Loader
CLI commands
Technologies Utilized
Technical Deep Dive
Configuring using Spring Boot Profiles
The grant loader uses Spring Boot Profiles to select the appropriate classes to be used for a given institution. There is a property in application.properties named spring.profiles.active
that needs to be set when starting the grant loader. This property can be set at runtime as well using the normal spring boot configuration functionality: Spring Boot Configuration.
The code has been factored to ease development for multiple institutions. These are the institution-specific classes which typically need to be implemented and annotated with the @Profile
annotation:
Connector
The Connector class connects to the data store for an institution's implementation, and operates on the data to supply, in as standard a form as possible, the data to be consumed by the Updater.
Updater
The Updater class takes the data supplied by the Connector and creates or updates the corresponding objects in the PASS repository accordingly. There is a Default class whose children may override certain substantive methods if the local policies require.
Profiles
JHU
The JHU implementation is used to pull data from the COEUS/FIBI database views for the purpose of performing regular updates. We identify grants which have been updated since a specific time (typically the time of the previous update), join this with user and funder information associated with the grant, and then use this information to update the data in the PASS backend. The JHU implementation also treats the COEUS/FIBI database as authoritative for all fields in the data. If a grant is being passed in for update, it is assumed that all records for that grant are included in the input.
Grant Data
In order for PASS to map grant data to the associated Objects within PASS, the Grant Loader needs to ingest a CSV file with the following fields and data types:
Column | Type/Size | Required | Description |
---|---|---|---|
GRANT_NUMBER | TEXT/255 | Y | Unique identifier for the grant (institutional Grant ID) |
GRANT_TITLE | TEXT/255 | Y | The title of the grant |
AWARD_NUMBER | TEXT/255 | N | Unique identifier for the award |
AWARD_STATUS | TEXT/255 | N | The status of the award. Valid values: active, pre-award, terminated |
AWARD_DATE | DATE | N | The date the grant award was created. Format: YYYY-MM-DD or YYYY-MM-DD HH:MM:SS.SSS if time is required. Date/Time is (UTC timezone) |
AWARD_START | TIMESTAMP | Y | The timestamp the grant award begins. Format: YYYY-MM-DD or YYYY-MM-DD HH:MM:SS.SSS if time is required. Date/Time is (UTC timezone) |
AWARD_END | TIMESTAMP | Y | The timestamp the grant award ends. Format: YYYY-MM-DD or YYYY-MM-DD HH:MM:SS.SSS if time is required. Date/Time is (UTC timezone) |
PRIMARY_FUNDER_NAME | TEXT/255 | N | The Primary Funder Name (Funder of original source of funds). If not set, the Direct Funder will also be set as Primary Funder. |
PRIMARY_FUNDER_CODE | TEXT/255 | N | The Primary Funder unique identifier (institutional Funder ID). If not set, the Direct Funder will also be set as Primary Funder. |
DIRECT_FUNDER_NAME | TEXT/255 | Y | The Direct Funder Name (Funder from which funds are directly received) |
DIRECT_FUNDER_CODE | TEXT/255 | Y | The Direct Funder unique identifier (institutional Funder ID) |
PI_FIRST_NAME | TEXT/255 | Y | First name of PI |
PI_MIDDLE_NAME | TEXT/255 | N | Middle name of PI |
PI_LAST_NAME | TEXT/255 | Y | Last name of PI |
PI_EMAIL | TEXT/255 | Y | Email address of PI |
PI_INSTITUTIONAL_ID | TEXT/128 | Y | Institutional User ID of PI. This is typically the User ID in the institution's Identity Access Management system. This value is optional if PI_EMPLOYEE_ID exists. If User ID exists in this record, it is important that the User ID is available to PASS during the authentication process. |
PI_EMPLOYEE_ID | TEXT/128 | Y | Employee ID of PI. This value is optional if PI_INSTITUTIONAL_ID exists. If Employee ID exists in this record, it is important that the Employee ID is available to PASS during the authentication process. |
PI_ROLE | TEXT/1 | Y | Role of PI on grant (PI or Co-PI). Valid values: P, C. P=PI, C=Co-PI |
UPDATE_TIMESTAMP | TIMESTAMP | N | Last update timestamp. Format: YYYY-MM-DD or YYYY-MM-DD HH:MM:SS.SSS if time is required. Date/Time is (UTC timezone) |
Usage
Refer to the application.properties file to determine which properties that need runtime values set. The grant loader is a spring boot application, so use the standard Spring Boot configuration functionality according to the Spring Boot Configuration documentation.
Here is an example using Java system properties -D
:
Arguments
Run the above command with -h
to display a full list of arguments for the grant loader.
In this example below using the -a
parameter instructs the grant loader to load data from a file CSV file. The -a
parameter action can be set to either pull
or load
. Use pull
to extract data from the grant source system and store it in a file, or load
to ingest data from an existing file into PASS.
In another example below, startDateTime
and awardEndDate
are used as parameters to limit the date range of the grant data. Since no action is specified, the default is to perform a pull followed directly by a load, using the default connection source.
Running the Grant Loader in Docker
Run PASS Docker
Since the Grant Loader will load data from a CSV into PASS, you will need an instance of PASS running. The quickest way to accomplish this is to run PASS docker.
Start pass-docker
in local mode by running with the docker-compose.yml
and eclipse.pass.local.yml
configurations:
Once pass-docker is up and the loader container is done running, open a browser and go to http://localhost:8080/ and login with nih-user. More details about this account can be found on the PASS docker page. Go to Grants tab to view all the grants. This page will be empty, but after running the Grant Loader it will have all the grants from the CSV file provided.
Setup Grant Loader Test Directory
Create a directory named
grantloadertest
.Change to the directory:
cd grantloadertest
.Create the following files:
grant_update_timestamps
(empty)policy.properties
(empty)env.list
with content:
Copy your grant CSV file to the
grantloadertest
directory.Open a new terminal and cd to the pass-docker directory.
Running Grant Loader Load
For testing purposes, we need to associate nih-user to a grant row in the CSV.
Modify one of your grant rows to change the user fields to:
Open a new terminal window and navigate to
grantloadertest
Run
Note: Replace 1.8.0-SNAPSHOT
with the version of the grant loader you want to use.
Once done, refresh the Grants tab in the browser to see your grant loaded.
Troubleshooting:
If the grant csv contains new Funders, you should figure out PASS policy ID and put the funder local key to policy ID mapping in policy.properties (i.e. funder_local_key=pass_policy_id) before running the grant loader docker command.
If running on Windows references to the current directory should use
${PWD}
for Powershell or%cd%
for Windows Command Line.
Grant Loader Classes & Data Flow Overview
Initialization and Configuration:
The application initializes with
GrantLoaderCLI
, which sets up theGrantLoaderApp
with configurations fromGrantLoaderConfig
.Spring Boot profiles are used to load institution-specific configurations.
Data Retrieval:
GrantLoaderApp
uses theGrantConnector
interface to retrieve data from the data source (e.g., database, CSV file).The
CoeusConnector
implementation (e.g., for JHU) fetches the data and returns it as a list ofGrantIngestRecord
objects.
Data Processing:
The
GrantIngestRecord
objects are built by theCoeusConnecter
by theretrieveUpdates
method.The
CoeusConnecter
implementsGrantConnector
, and is specific the COEUS database at JHU. Another institution should have an implementing class for their institution.
A
LocalKey
is built using utility methods fromGrantDataUtils
and is used by theAbstractDefaultPassUpdater
Data Ingestion:
The processed grant data is passed to the
JhuPassUpdater
, which extendsAbstractDefaultPassUpdater
.An institution with specific needs for updating their data should extend the
AbstractDefaultPassUpdater
. TheJhuPassUpdater
is specific to JHU implementation.
The
JhuPassUpdater
updates PASS objects (grants, users, funders) in the PASS repository, and interacts with the PassClient to perform the actual create and update operations.
Error Handling:
Exceptions specific to the data retrieval or ingestion are handled by
GrantDataException
.Errors are logged, and appropriate messages are reported to the CLI user via
PassCliException
.
Statistics Tracking:
The
PassUpdateStatistics
class tracks the number of grants, funders, and users created or updated.Statistics are updated in the
PassUpdater
and can be reset or reported.
Next Step / Institution Configuration
Institutional configuration is going to be highly dependent on where the institutional grant data comes from. At JHU, we have a Postgres database and the data is pulled from the database using AWS Batch and ECS. There can be multiple ways to set up the infrastructure, but the simplest setup is to have a CSV file exported to a directory where the Grant Loader can ingest the file using the -a
parameter.
Last updated