DHS Data Inventory Introduction
The purpose of the DHS Data Inventory Program (DIP) is to create an inventory of datasets at the Department of Homeland Security (DHS) that is accurate, complete, timely, and useful. The DIP collects metadata about every dataset within DHS, including the DHS management function and all DHS components, builds a dataset-of-datasets called the DHS Data Inventory and makes that inventory available for uses both inside and external to DHS. The DIP is conducted under DHS Delegation 04004 of May 18, 2021, from Secretary Mayorkas to the Chief Data Officer, as authorized by the "Foundations for Evidence-Based Policymaking Act of 2018" (codified at 44 United State Code (U.S.C.) §3520) and the "Open, Public, Electronic, and Necessary Government Data Act" (OPEN Government Data Act) (codified at 44 U.S.C. §3520).Data Inventory Record Basics
The DHS Data Inventory Program (DIP) Record is designed to capture, at minimum, enough data to identify that a dataset exists within a DHS component, and - in the best case - captures substantial detail about the upstream and downstream systems, data quality, data characteristics, authority and ownership. The DIP Record is designed as a RDF graph - in keeping with efforts related to data.gov and anticipating the growth of the semantic web - but can be submitted via an Excel spreadsheet to facilitate ease of use (see 'Record Submission' section below). RDF records can also be submitted as well-formatted JSON-LD, Turtle or RDF/XML. The full specification for the DIP Record - including example records - are provided in the sections below.Required Fields - Minimum Viable Record
The minimum record for the DIP collects just enough data to identify that a dataset exists within a DHS component. The required fields include:- Identifier
- Title
- Description
- Access Level
- Data Catalog Record Access Level
- Date Issued
- Component
Other Important Fields
As part of the upload Mobius will add the following fields (if they are not already defined in the upload):- Transmission Date
- Validity Period
- Point of Contact
- Primary IT Investment UII
- FISMA ID
Record Submission
Users can upload data via an HTML page in the DHS Mobius application or via the API hosted on the DHS Mobius application. Both the submission page and the API accept data in an Excel workbook or any of the RDF formats listed above. JSON-LD is the preferred format and records are stored and saved in JSON-LD on the back end. Records can be submitted in bulk or as single records via both the upload page or the API.The Excel Template for data loading can be downloaded from github at: https://usdhs.github.io/dcat-tool/samples/DIP_Excel_Template.xlsx. Data descriptions are provided as a tooltip for each column in the spreadsheet (be sure to enable edits to see the tooltips). Data in the spreadsheet is validated against the same rules as used for RDF data, so - for example - date columns must have valid dates, or the data will be rejected. Additionally, when records are submitted in bulk, a single validation failure will result in the entire bulk submission being rejected. Details for valid formats for each field/attribute can be found on the attribute definition in the respective Namespace for each attribute. Sample files with valid data using JSON-LD are linked below.
In order to use the Mobuis API you must use a Mobius API token, which can be configured while logged into Mobius. Mobius API tokens are valid for an extended period and can be used for automated scripting to submit DIP Records. The API tab on the 'Data Ineventory Upload' page in Mobius provides both the means to create a API token as well as an example submission statement (using cURL).
RDF and Namespaces
Since the DHS Data Inventory Record is built using an RDF graph (for the Semantic Web) it relies on attributes described in various public Namespaces. The list below provides the Namespaces used in DHS Data Inventory Record. Details for each specific attribute can be found in the 'DHS Collection Vocabulary' section. The last Namespace is a custom extension to the DCAT-US Namespace developed specifically for this project. The details for the attributes in this custom namespace - including datatype and usage - can be found here:https://usdhs.github.io/dcat-tool/dhsnamespace.htmlCommon Name | Prefix | Namespace |
---|---|---|
Dublin Core Terms | dcterms | http://purl.org/dc/terms/ |
Data Catalog Vocabulary (DCAT) - Version 2 | dcat | http://www.w3.org/ns/dcat/ |
DCAT-US Schema v1.1 | usg | http://resources.data.gov/resources/dcat-us/ |
DHS DCAT Extensions | dhs | https://usdhs.github.io/dcat-tool/dhsnamespace.html |
Sample Records
The validity of an RDF format can be tested in advance by loading the data defintion file (dhs_collect.ttl) and a sample record to a (SHACL) test tool (like https://shacl-playground.zazuko.com/) or loading just the (JSON-LD) sample record to a test site (like https://json-ld.org/playground). Here is an example of a simple JSON-LD file: https://usdhs.github.io/dcat-tool/samples/DIP_small_record.json-ld and a complete JSON-LD record: https://usdhs.github.io/dcat-tool/samples/DIP_full_record.json-ld submission for the DIP, so you can see what a valid format looks like. Please note, however, that there are several variations to the formatting that are also valid, so depending on your implementation the format may differ slightly.DHS Collection Vocabulary
Below is a list of terms and their definitions for the DHS Data Inventory.highLevel
Label | Attribute Name | Definition | Namespace | Required |
---|---|---|---|---|
Identifier | dcterms:identifier | Similar to a serial number associated with your laptop, the identifier is a unique and specific combination of characters that identify the resource. The combination must start with the DHS component code, is limited to 128 characters, and be in the following format: code-xxx-x Example:
|
http://purl.org/dc/terms/ | Yes |
Title | dcterms:title | The official name (title) given to the resource, limited to 255 characters. Acronyms should be used sparingly and only when explained in the description. Example:
|
http://purl.org/dc/terms/ | Yes |
Description | dcterms:description | A summary of the resource, limited to 4,000 characters. Example: A mission and overview of the Data Inventory Program (DIP) containing instructions, references, terminology, and explanations designed for users to understand and be able to use the features it contains. |
http://purl.org/dc/terms/ | Yes |
keyword | dcat:keyword | A keyword or multiple (comma delimited) keywords describing the resource. Example: City of New York, Universities, Immigration |
http://www.w3.org/TR/vocab-dcat/ | No |
Publisher | dcterms:publisher | The entity responsible for making the resource available. Entity may include federal agency, sub- agency, state or local agencies, an organization, a network, a service, or a POC. Example:
|
http://purl.org/dc/terms/ | No |
Access Level | usg:accessLevel | The three levels that determine how publicly-available the resource is: Public: is/could be publicly-available without restrictions. Restricted Public: is available under certain restrictions. Non-Public: is not available to the public. |
https://resources.data.gov/resources/dcat-us/ | Yes |
Data Catalog Record Access Level | dhs:dataCatalogRecordAccessLevel | The classification of this Data Inventory Record as either Public or Non-Public. Can this record be published in data.gov?
|
http://github.com/usdhs/dcat-tool/ | Yes |
Date Issued | dcterms:issued | The issued date of the dataset (if not published this may be the date the data was first made available internally). Records submitted via JSON must be in ISO 8601 format (YYYY-MM-DD). Records submitted via Excel must be in standard format (MM/DD/YYYY). |
http://purl.org/dc/terms/ | Yes |
Component | dhs:component | The DHS Component (sub-agency) the data pertains too.
|
http://github.com/usdhs/dcat-tool/ | Yes |
Restriction Reason | dhs:restrictionReason | A reason for which the dataset is not published to data.gov (I.e. not marked as 'public' for accessLevel and not marked as 'public' for dataCatalogRecordAccessLevel). Must be provided as a comma separated list including one or more of the following:
|
http://github.com/usdhs/dcat-tool/ | No |
dataGovernanceInformation
Label | Attribute Name | Definition | Namespace | Required |
---|---|---|---|---|
Creator | dcterms:creator | Either a name or email (individual or group) of the individual responsible for creating the resource. Include the point of contact information with either a name or email (individual or group) when submitting via Excel or a full Vcard when submitting via JSON. |
http://purl.org/dc/terms/ | No |
Governance | dhs:governance | Either a name or email (individual or group) of the individual responsible for overseeing the information contained in the resource. Include the point of contact information with either a name or email (individual or group) when submitting via Excel or a full Vcard when submitting via JSON. |
http://github.com/usdhs/dcat-tool/ | No |
Owner | dhs:owner | Either a name or email (individual or group) of the individual responsible for the accuracy of the information contained in the resource when submitting via Excel or a full Vcard when submitting via JSON. |
http://github.com/usdhs/dcat-tool/ | No |
Steward | dhs:steward | Either a name or email (individual or group) of the administrator of the dataset who ensures the information is properly stored, maintained, accessible, and protected. If submitting via JSON, a full Vcard. |
http://github.com/usdhs/dcat-tool/ | No |
Custodian | dhs:custodian | Either a name or email (individual or group) of the individual who has physical possession of the information. If submitting via JSON, a full Vcard. |
http://github.com/usdhs/dcat-tool/ | No |
Primary IT Investment UII | usg:primaryITInvestmentUII | Used for linking a dataset with an IT UII. A Unique Investment Identifier (UII) is an established and unique identifier of an investment, assigned at the Component level. Example: 010-999992220 Non-Tech Example: The number/ identifier on the 'Property of...' sticker on your laptop. |
http://resources.data.gov/resources/dcat-us/ | No |
FISMA ID | dhs:fismaID | The FISMA ID is a unique identifier that describes various key characteristics of a specific system. If the dataset is part of a system that has a FISMA ID then the FISMA ID should be provided. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
References | dcterms:references | A related piece of information from another source (typically linked via URL) that provides additional related information about the resource. |
http://purl.org/dc/terms/ | No |
Sharing Agreements | dhs:sharingAgreements | A contract that states what and how information can be shared or utilized. Example:
DHS
|
http://github.com/usdhs/dcat-tool/ | No |
contact point | dcat:contactPoint | The contact information (name and email preferred) of the main individual to contact with questions about the dataset. |
http://www.w3.org/TR/vocab-dcat/ | No |
Collection Authority | dhs:collectionAuthority | The legislation or executive order under which the data was collected. The specific document or policy that grants you permission to collect specific data. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Release Authority | dhs:releaseAuthority | The document, legislation or executive order under which he data can be released. Examples:
|
http://github.com/usdhs/dcat-tool/ | No |
Records Schedule | dhs:recordsSchedule | The policy indicating the time period in which the records are retained. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
System of Records | usg:systemOfRecords | If the system is designated as a system of records under the Privacy Act of 1974, provide the URL to the System of Records Notice that relates to the dataset. The URL should be from FederalRegister.gov or point to an entry from the Federal Register. |
https://resources.data.gov/resources/dcat-us/ | No |
PTA Adjudicated Date | dhs:ptaAdjudicatedDate | The date on which the Privacy Threshold Assessment (PTA) was adjudicated |
http://github.com/usdhs/dcat-tool/ | No |
standards
Label | Attribute Name | Definition | Namespace | Required |
---|---|---|---|---|
Conforms To | dcterms:conformsTo | A technical standard that the dataset conforms to. Example:
|
http://purl.org/dc/terms/ | No |
Conforms FIPS | dhs:conformsFIPS | The Federal Inventory Processing Standard that the dataset conforms to, if any. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Conforms NIEM Percent | dhs:conformsNIEMPercent | The numerical percentage (0-100) of the dataset that is NIEM compliant. |
http://github.com/usdhs/dcat-tool/ | No |
Conforms Unicode | dhs:conformsUnicode | True or False Is the information in the resource written in Unicode (does the code given to each character start with the letter 'u')? Information containing foreign languages/characters are typically written in Unicode format (U+XXX). |
http://github.com/usdhs/dcat-tool/ | No |
Identities NativeScript | dhs:identitiesNativeScript | True or False Are the names of individuals and other entities stored in a way (e.g. Roman characters) that can be shared across systems? |
http://github.com/usdhs/dcat-tool/ | No |
Transliteration Standard | dhs:transliterationStandard | The standard of converting text from one language to Roman characters. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
provenance
Label | Attribute Name | Definition | Namespace | Required |
---|---|---|---|---|
Source Datasets | dhs:sourceDatasets | An identifier for the originating dataset (the DHS Unique Identifier if possible or DOI or URL for externally developed datasets). Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Destination Datasets | dhs:destinationDatasets | The unique identifier of downstream datasets fed by this dataset. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Described By | usg:describedBy | A URL link to the dictionary that defines the fields or column heading of the information. |
http://resources.data.gov/resources/dcat-us/ | No |
Described By Type | usg:describedByType | The file format of the describedBy URL link. Should be a standard mime (IANA media)type. Examples:
|
http://resources.data.gov/resources/dcat-us/ | No |
Is Part Of | dcterms:isPartOf | The source in which a grouping of related resources are included in. This is typically the name of the system in which the dataset resides. Example:
|
http://purl.org/dc/terms/ | No |
Is Open Source | dhs:isOpenSource | True or False The data in the dataset is composed of publicly available data and maybe compiled from multiple sources. |
http://github.com/usdhs/dcat-tool/ | No |
Is Commercial | dhs:isCommercial | True or False The dataset is aquired from a private sector provider (may be purchased or acquired via agreement). |
http://github.com/usdhs/dcat-tool/ | No |
Has Data Dictionary | dhs:hasDataDictionary | True or False The dataset has a data dictionary (should be pointed to in the referencedBy attribute). |
http://github.com/usdhs/dcat-tool/ | No |
dataset
Label | Attribute Name | Definition | Namespace | Required |
---|---|---|---|---|
Accrual Periodicity | dcterms:accrualPeriodicity | The frequency with which resources or sets of information are added to the dataset. Recommend timeframes (or frequency) can be found in the Collection Description Accrual Policy Vocabulary. |
http://purl.org/dc/terms/ | No |
datasetGeospatial
Label | Attribute Name | Definition | Namespace | Required |
---|---|---|---|---|
Spatial Coverage | dcterms:spatial | A named local or geographic area. Example:
|
http://purl.org/dc/terms/ | No |
spatial resolution (meters) | dcat:spatialResolutionInMeters | How small of an area, in meters, you can make (or zoom into) an image that will still provide a quality resolution in order to identify the content in the image. Example:
|
http://www.w3.org/TR/vocab-dcat/ | No |
datasetTemporal
Label | Attribute Name | Definition | Namespace | Required |
---|---|---|---|---|
Temporal Coverage | dcterms:temporal | The period of time that the dataset covers. Example:
|
http://purl.org/dc/terms/ | No |
temporal resolution | dcat:temporalResolution | The smallest amount of time between records in dataset. Example:
|
http://www.w3.org/TR/vocab-dcat/ | No |
dataQuality
Label | Attribute Name | Definition | Namespace | Required |
---|---|---|---|---|
Data Quality Known | dhs:dataQualityKnown | True or False Is the quality of the information in the resource known? |
http://github.com/usdhs/dcat-tool/ | No |
Data Quality Percent | dhs:dataQualityPercent | A simple measure of the data quality, on a scale of 0-100. |
http://github.com/usdhs/dcat-tool/ | No |
Data Quality Assessment | dhs:dataQualityAssessment | A written account evaluating the quality of the information contained within the resource. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Data Quality | usg:dataQuality | True or False Does the information contained in the resource meet the agency's Information Quality Guidelines? |
http://resources.data.gov/resources/dcat-us/ | No |
distribution
Label | Attribute Name | Definition | Namespace | Required |
---|---|---|---|---|
Format | dcterms:format | The file format, physical medium, or dimension of the resource. Examples:
|
http://purl.org/dc/terms/ | No |
access address | dcat:accessURL | The landing page (URL) that gives access to the resource. Must be a URL. https://ea.dhs.gov/mobius Example:
|
http://www.w3.org/TR/vocab-dcat/ | No |
Access URL NIEM | dhs:accessURLNIEM | The landing page (URL) that provides you access to the NIEM interface. Must be a URL. |
http://github.com/usdhs/dcat-tool/ | No |
Modified | usg:modified | The most recent date on which the information contained in the resource was changed, updated, or modified. |
http://resources.data.gov/resources/dcat-us/ | No |
Vendor | dhs:vendor | The vendor, supplier, or set of suppliers who supplied the information contained in the resource. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
License | dcterms:license | The legal structure that gives official permission to utilize or distribute information contained in the resource. Example:
|
http://purl.org/dc/terms/ | No |
Access Rights | dcterms:accessRights | Provides information regarding access or restrictions to data based on privacy, security, or other policies Example: User must have PIV card and be granted permission by System Owner |
http://purl.org/dc/terms/ | No |
theme | dcat:theme | The main category of the information contained in the resource. A resource can have multiple themes. Theme is similar to a Keyword but at a much higher level. Example:
|
http://www.w3.org/TR/vocab-dcat/ | No |
Functional Data Domain | dhs:functionalDataDomain | The functional subject area of the resource. List:
|
http://github.com/usdhs/dcat-tool/ | No |
media type | dcat:mediaType | The media type of the distribution as defined by IANA. Example:
|
http://www.w3.org/TR/vocab-dcat/ | No |
Is Stream | dhs:isStream | True or False Data Set or Data Source is streaming. The data is constantly updated. |
http://github.com/usdhs/dcat-tool/ | No |
Access Instructions | dhs:accessInstructions | The steps or actions an individual must take to gain access to the dataset. Can work in conjunction with accessRights (max length 5,000 characters). Example: 'Visit the (accessURL) link and then click on 'Your ArcGIS organization's URL and enter |
http://github.com/usdhs/dcat-tool/ | No |
size
Label | Attribute Name | Definition | Namespace | Required |
---|---|---|---|---|
Table Count | dhs:tableCount | The number of 2-dimensional tables in the data. |
http://github.com/usdhs/dcat-tool/ | No |
Record Count | dhs:recordCount | The total number of records in the data. |
http://github.com/usdhs/dcat-tool/ | No |
byte size | dcat:byteSize | The size of the distribution in bytes. The size in bytes can be approximated (as a integer) when the precise size is not known. |
http://www.w3.org/TR/vocab-dcat/ | No |
security
Label | Attribute Name | Definition | Namespace | Required |
---|---|---|---|---|
Dataset Classification | dhs:datasetClassification | The security classification of the data. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - Person Level | dhs:ch-person-level | True or False The information in the resource contains enough personal records or enough personal information in order to identify an individual. |
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - Financial | dhs:ch-financial | True or False The information in the resource contains financial information. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - Event records | dhs:ch-event-records | True or False The information in the resource contains information about events which are tagged with a specific place and time. |
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - Faces | dhs:ch-faces | True or False The information in the resource contains facial recognition data or images of human faces. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - Fingerprints | dhs:ch-fingerprints | True or False The information in the resource contains fingerprint data or images of fingerprints. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - CUI | dhs:ch-cui | True or False The information in the resource contains Controlled Unclassified Information (CUI). |
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - PHI | dhs:ch-phi | True or False The information in the resource contains Protected Health Information (PHI). Examples:
|
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - PII | dhs:ch-pii | True or False The information in the resource contains Personally Identifiable Information (PII). |
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - Geospatial | dhs:ch-geospatial | True or False The information in the resource contains geospatial information. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - Environmental | dhs:ch-environmental | True or False The information in the resource contains environmental information. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - FISA | dhs:ch-fisa | True or False The information in the resource contains information pertaining to the Foreign Intelligence Surveillance Act (FISA). |
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - 8usc1367 | dhs:ch-8usc1367 | True or False The information in the resource contains 8 USC 1367. Reference: https://www.dhs.gov/sites/default/files/publications/dhs_foia_instruction_section_1367_information.pdf |
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - proprietary info | dhs:ch-propin | True or False The information in the resource contains propriety commercial information. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - Immigration | dhs:ch-immigration | True or False The information in the resource contains information pertaining to immigration. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - Critical Infrastructure | dhs:ch-criticalInfrastructure | True or False The information in the resource contains critical infrastructure information. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - PCII - Protected Critical Infrastructure Information | dhs:ch-pcii | True or False The information in the resource contains protected critical infrastructure information. |
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - biometrics | dhs:ch-biometrics | True or False The information in the resource contains biometric information. Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - Dissemination Restrictions | dhs:ch-disseminationRestrictions | True or False The information has dissemination restrictions. |
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - LES - Law Enforcement Sensative | dhs:ch-les | True or False The information in the resource contains information that is deemed Law Enforcement Sensative. |
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - Synthetic | dhs:ch-synthetic | True or False The data is created manually or artificially apart from the data generated by real-world events. This may include data that is generated for the purposes of modeling and may be generated by a computer simulation. The data approximates real data, but does not necessarily reflect the real world. This includes synthetically derived data (e.g. data that is created programmatically from a set of source data, typically with some algorithm applied) |
http://github.com/usdhs/dcat-tool/ | No |
Characteristics - Anonymized | dhs:ch-anonymized | True or False The identifying information in the resource has been removed or changed so that the individual cannot be identified. |
http://github.com/usdhs/dcat-tool/ | No |
location
Label | Attribute Name | Definition | Namespace | Required |
---|---|---|---|---|
Hosting Location | dhs:hostingLocation | Where the data is located? Example:
|
http://github.com/usdhs/dcat-tool/ | No |
Hosted in Cloud | dhs:hostedInCloud | True or False Is the dataset stored in the cloud? |
http://github.com/usdhs/dcat-tool/ | No |
Easily Accessible By Creating Component | dhs:easilyAccessibleByCreatingComponent | True or False The information can be easily accessed by individuals within the DHS component that created it? |
http://github.com/usdhs/dcat-tool/ | No |
Easily Accessible By All Components | dhs:easilyAccessibleByAllComponents | True or False The information can be easily accessed by individuals within all of DHS? |
http://github.com/usdhs/dcat-tool/ | No |
Easily Accessible By General Public | dhs:easilyAccessibleByGeneralPublic | True or False The information can be easily accessed by the general public? |
http://github.com/usdhs/dcat-tool/ | No |
publication
Label | Attribute Name | Definition | Namespace | Required |
---|---|---|---|---|
Encryption algorithm | usg:encryptionAlgorithm | The specific encryption algorithm used to protect data at rest. Can have multiple comma separated values. Example: |
http://resources.data.gov/resources/dcat-us/ | No |
Record Transmission | dhs:recordTransmission | The date of when the information was submitted into the system. |
http://github.com/usdhs/dcat-tool/ | No |
Validity Time | dhs:validityTime | The period of time the metadata record is valid for. Must be in xsd duration form. Example:
|
http://github.com/usdhs/dcat-tool/ | No |